Sunday, June 5, 2016

Machine Learning

Last week I started the famous Coursera course Machine Learning by Andrew Ng. Over the weekend I completed the first 2 weeks of material and exercises. I think this is a first year course and there are very little prerequisites, so if you already have Linear Algebra and Calculus knowledge it is going to feel a little slow paced and superficial. But Andrew Ng explains everything extremely well and if you don't have much knowledge of these topics you will find his explanations very thorough and easy to follow.
I already covered just about all of this material during my EE pre- and post-grad studies, but since that was about 15 years ago I still watched all the content. I did fall asleep a few times... but it was worth it to refresh all the concepts.

The course has a Honour Code that you must follow which includes not providing code answers to the problems. Since I'm not sure whether that includes blogs outside coursera I will refrain just the same to post any code related to the questions posed in the course. However I will summarize some of the content and discuss some of the concepts here.

Week 1:
Introduction

There are 2 types of Machine Learning, Supervised and Unsupervised.
Supervised learning is where you are given past datasets and want to make predictions based on the past results. There are two types, Regression and Classification. Regression is where the outcomes/predictions are real numbers (or a range of values) while Classification has a binary Yes No outcome/predication.
Unsupervised learning is where you are given unclassified data and you try to find structure in the data which can be used to classify the data into different clusters.

Linear Regression

Linear regression is where you try to fit a linear function (the hypothesis \(h(\overrightarrow{x})\) to represent the data points provided. In two dimensions, if you have an x input providing a y output, this would be the straight line that passes the closest to all of the data points. In the general case (multi-variable) the hypothesis (or prediction) function would be given by the following equation for each new input dataset where each input variable is given by \(x_1..x_n\):

\[ \hat{y}=h_{\theta_0,\theta_1,\theta_2,...,\theta_n}(x_1,x_2,...,x_n)=\theta_0\cdot1 + \theta_1x_1+\theta_2x_2+...+\theta_nx_n \]


written for \(\theta\) and \( x\) as vectors
\[ \hat{y}=h_{\overline{ \theta}}({\overline{x}}) = \sum_{i=0}^n\theta_ix_i \begin{cases}x_0 = 1\end{cases} \]\[ \hat{y}=h_{\overline{ \theta}}({\overline{x}}) =\overline{x}^{T}\overline\theta \]if we let \( X \) be the matrix with each column equal to different measurements of the feature \(x_n\), setting the first column \(x_0=1\), and thus each row representing a sample data set, we can write it as:

\[\hat{\overline y}=h_{\overline{ \theta}}({X}) =X\cdot\overline\theta\begin{cases}\hat{\overline y} = m \times 1 \space vector\\ X = m\times (n+1)  \space matrix \\\overline\theta = (n+1)\times 1 \space vector \\ with \space m \space samples \\ and \space n \space features\end{cases}\]

There are many benefits of writing the equation as a linear algebra vector/matrix operation, in particular it makes the syntax much more compact and using a linear algebra package in programming languages makes the implementation also more compact and usually also more efficient.
Instead of writing multiple nested for loops to compute this function for each value of each input variable we can compute it by a simple matrix multiplication.

Now that we have defined the hypothesis or prediction function for linear regression we can derive the cost function. 
We will choose the cost function to be the sum of the square differences between the predicted and measured results divided by \(2m\). This can be thought of as the square of the \(n\)-dimensional distance between the predicted points and the measured points in \(n\)-dimensional space normalized by the amount of samples:
\[CostFunction=J(\theta_0,\theta_1,\theta_2,...,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(y^{(i)}-\hat{y}^{(i)})^2\\=\frac{1}{2m}\sum_{i=1}^m(y^{(i)} -h_{\theta_0,\theta_1,\theta_2,...,\theta_n}(x^{(i)}_1,x^{(i)}_2,...,x^{(i)}_n))^2\\=\frac{1}{2m}\sum_{i=1}^m(y^{(i)}-(\theta_0\cdot1 +\theta_1x^{(i)}_1+\theta_2x^{(i)}_2+...+\theta_nx^{(i)}_n))^2\\J(\theta_0,\theta_1,\theta_2,...,\theta_n)=\frac{1}{2m}\sum_{i=1}^m(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})^2 \begin{cases}x_0 = 1\end{cases}\\J(\overline\theta)=\frac{1}{2m}\sum_{i=1}^m(y^{(i)}-\overline{x^{(i)}}^{T}\overline\theta)^2\\J(\overline\theta)=\frac{1}{2m}sum(\overline y - \hat{\overline y})^2\\J(\overline\theta)=\frac{1}{2m}sum(\overline y-X\cdot\overline\theta)^2\begin{cases}\hat{\overline y} = m \times 1 \space vector\\ X = m\times (n+1) \space matrix \\\overline\theta = (n+1)\times 1 \space vector \\ with \space m \space samples \\ and \space n \space features\end{cases}
\]


Again we have reached a matrix multiplication for calculating the cost efficiently.

With Gradient Descent we want to choose different \(\theta\) values, progressively closer to the best \(\theta\) values until we reach the optimal \(\theta\) values where the cost \(J(\theta)\) is at a minimum.

The way we will do this is to calculate the gradient of  the cost function \(J(\theta)\) and adjust the  \(\theta\) values in the direction of this gradient. Do calculate the gradient we must take the differential of  \(J(\theta)\) .

\[\frac{\partial }{\partial \theta_k}J(\theta_0,\theta_1,\theta_2,...,\theta_n)=\frac{\partial }{\partial \theta_k}\frac{1}{2m}\sum_{i=1}^m(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})^2 \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{2m}\sum_{i=1}^m\frac{\partial }{\partial \theta_k}(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})^2 \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{m}\sum_{i=1}^m(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})\frac{\partial }{\partial \theta_k}(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)}) \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{m}\sum_{i=1}^m(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})(-\sum_{j=0}^n\frac{\partial }{\partial \theta_k}\theta_jx_j^{(i)}) \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{m}\sum_{i=1}^m(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})(-\frac{\partial }{\partial \theta_k}\theta_kx_k^{(i)}) \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{m}\sum_{i=1}^m(y^{(i)}-\sum_{j=0}^n\theta_jx_j^{(i)})(-x_k^{(i)}) \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{m}\sum_{i=1}^m(x_k^{(i)}\sum_{j=0}^n\theta_jx_j^{(i)}-y^{(i)}) \begin{cases}x_0 = 1\end{cases}

\\

=\frac{1}{m}\sum_{i=1}^m(x_k^{(i)}\sum_{j=0}^n\theta_jx_j^{(i)}-y^{(i)}) \begin{cases}x_0 = 1\end{cases}

\\

\frac{\partial }{\partial \overline\theta}J(\overline\theta)=\frac{1}{m}\sum_{i=1}^m(x_k^{(i)}(\hat{y}^{(i)}-y^{(i)})

\\

\frac{\partial }{\partial \overline\theta}J(\overline\theta)=\frac{1}{m}X^{T}(X\theta-\overline y)\]

We get another very compact and efficient matrix operation for the partial derivatives of the cost function.

Now, to implement gradient descent we will iterate until convergence updating our \(\theta\) values every iteration in the following way:

\[\overline\theta_{i+1}=\overline\theta_i -\alpha\frac{\partial }{\partial \overline\theta}J(\overline\theta)\]

OR

\[\theta := \theta - \alpha \nabla J(\theta)\]

Where \(\nabla J(\theta)\) is:

\[\nabla J(\theta) = \begin{bmatrix}

\frac{\partial J(\theta)}{\partial \theta_0} \newline

\frac{\partial J(\theta)}{\partial \theta_1} \newline

\vdots \newline

\frac{\partial J(\theta)}{\partial \theta_n}

\end{bmatrix}\]

where \(\alpha\) is a constant value, called the learning rate, which must be chosen to an appropriate value. If the learning rate is too large, the descent will overshoot and might not converge while if it is too small it will converge very slowly.

One final point about "linear" regression is that one can use it for just about any type of hypothesis function by just defining the feature \(x_k\) to the function of the actual feature that you would like to use. For instance, one could set \(x_2=x_1^2\) and \(x_3=\sin(x_1)\) and then apply the exact same linear regression gradient descent as explained above.

Finally the Normal Equation was presented, which is a direct method of solving for the optimal values of \(\theta\). The equation is given by:

$$
\theta = (X^T X)^{-1}X^T y
$$


For the second week's assignment you had to basically implement the cost function, gradient descent and the normal equation. With the matrix representations of the calculations it is very easy to implement these algorithms in Octave.
After completing the exercises I continued to try and optimize the algorithm.
To me it seems silly that one must manually choose a good \(\alpha\), so I created a version of gradient descent with an adaptive \(\alpha\) which would automatically adjust \(\alpha\) to a good value for each iteration.

Basically my technique is as follows:
Calculate \(deltaTheta = \nabla J(\theta)\)
Calculate \(cost = J(\theta - \alpha*deltaTheta)\)
Calculate \(newCost = J(\theta - changeFactor*\alpha*deltaTheta)\)
If newCost > cost,
  changeFactor = 1/changeFactor;
  Calculate \(newCost = J(\theta - changeFactor*\alpha*deltaTheta)\)
while (newCost < cost),
   Update \(\alpha\) to \(changeFactor*\alpha\)
   Set cost to newCost
   Calculate \(newCost = J(\theta - changeFactor*\alpha*deltaTheta)\) 

In this way I am either increasing or decreasing the \(\alpha\) by my changeFactor (which works well with a value of 2) iteratively until the change does not result in a reduction in cost.

This had a profound improvement on the test dataset provided:
Static Alpa = 0.01
 Above is the result of a static alpha - converging slowly after about 400 iterations and below the Adaptive algorithm that converges much faster.
Adaptive Alpha starting at 0.01
Note that in the plot above I am also considering each calculation of the cost function (during optimization of Alpha) as an iteration. In reality the amount of Gradient Descent iterations is much lower and also uniformly descending.

In the case where an optimal Alpha is chosen to start with the normal Gradient Descent should be a bit more efficient than the Adaptive Alpha method but in general the Adaptive method will perform much better.

Saturday, May 28, 2016

Intro to Ai: Problem Solving - Graph Search

I am continuing with the Intro to Ai course.
The following is my summary of Unit 2: Problem Solving.

Formal Definition of a problem:

Initial state of the Agent: In route finding it would be the initial position of the Agent
Available Actions for that State: A function that takes a State as input and returns all available Actions that the Agent can execute when in that State. In route finding you would input the agent position and it would return all possible neighboring destinations.
Result: A function taking a State and an Action and returning the new State after applying that Action.
GoalTest: A function taking a State and returns true or false depending on whether the goal is reached.
Path: Sequence of State+Action transitions
PathCostFunction: Takes a Path and returns a Cost of that path. Usually the PathCostFunction is additive, so the Cost of the total path is the sum of the Cost of each step in the Path.
StepCostFuntion: Function that takes a State, Action and resulting State as input and returns the cost of that action. In Route finding the cost could be the distance to travel between the positions or the time it takes to travel.

Problem Summary:
State InitialState
Action[] AvailableActions(State currentState)
State Result(State currentState, Action action)
bool GoalTest(State currentState)
(State,Action)[] Path
int StepCost(State, Action)
int PathCost(Path) = SUM(StepCost for each (State, Action) in Path)

To solve the problem you would start from the InitialState, get all Available Actions for that state, apply those actions using the Result funtion to get a set of new possible States, check whether one of these States is the Goal using the GoalTest, if it is you have solved the problem, if not, continue by expanding these States in the same way one by one. Once you have found the Goal you can calculate the Cost to arrive at the Goal using the PathCost of the Path you found to the Goal. If there are multiple paths to the Goal you might want to optimize by choosing the Path with the lowest PathCost.

Summarized in psuedocode:
currentState = InitialState
while(!GoalTest(currentState))
  actions = AvailableActions(currentState)
  nextAction = SelectNextAction(actions)
  currentState = Result(currentState, nextAction)
  
This would be the general idea, but the above psuedocode is not complete or even correct.
I have introduced a SelectNextAction function that decides which part of the path to go down. The algorithm above would search down a single path ignoring all the actions along the way that were not selected by the SelectNextAction function. So there is something missing.

These two points are the crux of the topic being discussed in this unit.
  1. How to make sure you search all relevant paths so that you always find the optimal path to the goal
  2. How to optimally choose the order in which you search through these paths.

One thing I did not find clearly explained in the course material is the difference between TreeSearch and GraphSearch. What I did not clearly understand initially is that TreeSearch is meant only for Tree structured environments (a structure where each node has children and you search from the root node down (or up depending on how you draw the tree!) until you reach the leaves. You would not apply TreeSearch on a Graph structure. When your environment is a Graph structure - a set of many points with "edges" or "vertices" connecting each point to any number of other points - your TreeSearch algorithm will quickly start looping back to points already traversed and will start going in circles.

To make sure we search all relevant paths without looping back we will introduce two concepts - the Frontier and Explored paths.
Instead of just selecting one possible next action from the current state's available actions, we will save a list of the available States produced from the AvailableActions of the currently selected state. But instead of just saving the State we will store the Path (which is the State including all States and Actions that lead to this State). We will add these Paths to a list of Paths called the Frontier. Initially we will have only the InitialPath in the Frontier, after the first iteration the InitialPath will be removed and all the InitialStates neighbors will be added to the Frontier. Then we will select which state to explore next from all of the Paths that are in the Frontier. In this way we will explore all possible paths.
To avoid back-tracking to states already explored, we will save to the Explored list all the Paths that we remove from the Frontier.

Updated PsuedoCode:
InitialPath = Path(InitialState)
Frontier = [InitialPath]
selectedPath = InitialPath
while(!GoalTest(selectedPath.State))
   Frontier.Remove(selectedPath)
   Explored.Add(selectedPath)
   actions = AvailableActions(selectedPath.State)
   foreach action in actions:
      Frontier.Add(Result(selectedPath, action))
   selectedPath = SelectNextPath(Frontier)

This is the basic common algorithm. The way that the SelectNextPath decides which of the Paths that are in the current Frontier should be expanded/explored leads to different implementations of the search algorithm.

Breadth First
This is the most basic option - we will always explore all of the items in the current Frontier before starting to explore new items added to the Frontier. Basically the frontier would be a first in first out queue. In this way you would expand your search in a circle around the InitialState with each iteration. This would not be a very efficient way to search, it is the equivalent of walking in ever expanding circles (ignoring the terrain, wading through marshes and scaling mountains in perfect circles) until you happen upon your destination.
eg:
State SelectNextState(Frontier)
  return Frontier.Pop()  // Frontier is a First in First Out queue

Depth First
In this case you would always choose the most recently added path from the Frontier, in effect choosing a direction and exploring it until its end before going back and choosing a different direction and again exploring it until its end until you find your destination. If you are lucky you could find the destination quickly, but if you are unlucky and especially if the distance until the end of your path is long this could turn out very bad. As an extreme example, if the path does not have an end (infinitely long) your search will take forever and never complete the first iteration.
eg:
State SelectNextState(Frontier)
  return Frontier.Pop()  // Frontier is a Last in First Out queue (stack)

Uniform Cost
Uniform Cost is basically the same as Breadth First except that it takes into consideration the PathCost of reaching the State in the Frontier. It will always choose the State in the Frontier with the lowest PathCost to explore next.
eg:
State SelectNextState(Frontier)
   foreach path in Frontier:
      if (PathCost(path) < minPathCost) or (minPathCost == None)
          minPathCost = PathCost(path)
          selectedPath = path
  return selectedPath

This is very similar to Breadth First search, except that the search expand in concentric "relief lines" with the same PathCost. It will be more efficient than the Breadth First search especially in cases where there are certain areas of the state map that are much more difficult to reach than others, but it is still not very "goal oriented".

Greedy Best-First
In Greedy Best-First you must first define a distance measure between two states. Then you will always choose the State in the Frontier that is the "closest" to the Destination to expand first.
eg:
State SelectNextState(Frontier)
   foreach path in Frontier:
      if (Distance(path.State, Destination) < minDistance) or (minDistance == None)
          minDistance = Distance(path.State, Destination)
          selectedPath = path
  return selectedPath

This would be the most efficient algorithm, expanding a single path in the direction of the Goal until it reaches the destination. The problem is that in some cases it will not find the best path. It is like being lost in the woods, taking out your Compass and walking in the direction of your Goal. If you reach a cliff-face you cannot pass you will keep walking around it in the direction that is closest to the direction of the Goal - even if that means you will walk ten times further than if you took the other direction.

A*
This is the "best of both worlds" algorithm that takes Uniform Cost and mixes in a bit of Greedy Best-First for a more efficient version that is still guaranteed to find the optimal result.
Again you must define a "distance" metric (heuristic), but now you will use the PathCost + Distance to determine which is the 'best' Frontier node to expand.
e.g:
State SelectNextState(Frontier)
   foreach path in Frontier:
      
      if (PathCost(path) < minPathCost) or (minPathCost == None)
          minPathCost = PathCost(path)
          selectedPath = path
      if (Distance(path.State, Destination) < minDistance) or (minDistance == None)
          minDistance = Distance(path.State, Destination)
          selectedPath = path
  return selectedPath 


After completing listening to the unit I implemented an A* algorithm.
One thing that I found was not explained in the course material is what to do when you have a new Path that is in the Frontier or the Explored set but where the new path has lower cost. In the case of the Frontier you would clearly want to replace the existing path with the new, better one. I did the same thing for paths in the Explored set. I removed it from the Explored set and added it to the Frontier.

The complete code can be found in my github, the main parts I will show below:

I defined a Path Node that stored its parent, Cost, current State and the total A* Cost function:
 class PathNode(object):  
   def __init__(self, parent, state, cost, ffunc):  
     self.Parent = parent  
     self.State = state  
     self.Cost = cost  
     self.FFunc = ffunc  
     return  
Then I created definition of the State to just be an x y coordinate

    def __init__(self, x, y):
        self.x = x
        self.y = y
And finally a class that implements A* search
class AStarGraph(object):
    def __init__(self, costMap, startState, endState):
        self.CostMap = costMap
        self.dimX = len(self.CostMap)
        self.dimY = len(self.CostMap[0])
        self.PathStart = PathNode(None, startState, 0, self.f)
        self.End = endState

        self.Frontier = PriorityQueueSet([self.PathStart])
        self.Explored = dict()

    def f(self, path):
        return self.g(path) + self.h(path)

    def g(self, path):
        return path.Cost

    # The sum of the horisontal and vertical distances should be the perfect candidate for the heuristic
    def h(self, path):
        distanceX = abs(path.State.x - self.End.x)
        distanceY = abs(path.State.y - self.End.y)
        return distanceX + distanceY

    def takeNextNode(self):
        return self.Frontier.pop_smallest()

    def expandPathNode(self, path):
        # Success!
        if path.State == self.End:
            return path

        self.Explored[path.State] = path

        # Add all the options
        for option in self.getOptions(path.State):
            optionPath = PathNode(path, option, path.Cost+self.CostMap[option.x][option.y], self.f)

            if self.Frontier.has_item(option):
                if optionPath < self.Frontier.get_item(option):
                    self.Frontier.replace(optionPath)
                continue

            if (option in self.Explored):
                if optionPath < self.Explored[option]:
                    del self.Explored[option]
                    self.Frontier.add(optionPath)
                continue
            self.Frontier.add(optionPath)

    def getOptions(self, state):
        options = list()
        if (state.x+1 < self.dimX) and (self.CostMap[state.x+1][state.y] != 0):
            options.append(State(state.x+1, state.y))
        if (state.x-1 >= 0) and (self.CostMap[state.x-1][state.y] != 0):
            options.append(State(state.x-1, state.y))
        if (state.y+1 < self.dimY) and (self.CostMap[state.x][state.y+1] != 0):
            options.append(State(state.x, state.y+1))
        if (state.y-1 >= 0) and (self.CostMap[state.x][state.y-1] != 0):
            options.append(State(state.x, state.y-1))
        return options

    def iterate(self):
        return self.expandPathNode(self.takeNextNode())
This code is by no means elegant or optimized. It was just a quick and dirty implementation trying to keep close to the definitions provided in the unit.

While looking for more information on the course forum I came across an excellent site that explains all of these search algorithms extremely well. It has much better code and animations of the different algorithms. It is more geared towards gaming but the concepts are the same.

Thursday, May 26, 2016

Intro to AI - Terminology

I started the Intro to Artificial Intelligence course. I have done the first two units and found them to be very easy to follow, bordering on basic. In the past I've had similar experience with both online and normal in class courses - often they start slow and then somewhere during the course there is a jump in complexity. Lets see if this course is the same or if it will gradually ramp up.

The first unit is very introductory and mainly explains some terminology. My summary of the unit would be:

Unit 1: Welcome to Ai

Some terminology:
Intelligent Agent = AI program
Interface with Environment by sensors and actuators
Perception Action Cycle: Perceive the environment by processing Sensor information and decide how to Act using its Actuators

Attributes of Environments:
Fully or Partially Observable:
Fully Observable - where entire environment state is seen all the time. For example chess or checkers
Partially Observable - where part of the environment state is not visible at any instant. For example Poker or Dominos
Deterministic or Stochastic:
          Deterministic - no randomness. For example chess.
          Stochastic - randomness. Coin toss, dice based games.
Discrete or Continuous:
          Discrete - finite amount of states. Example board games, card games, coin flips.
          Continuous - infinite amount of states. Example Darts, self driving car, speech recognition
Benign or Adversarial:
          Benign - the environment does not react to your actions with an objective that contradicts your objective. Example the weather, roulette, solitaire.
          Adverserial - the environment has an objective that is contradictory to your objective and actively observes your actions and reacts to impede your objective. Ex. Chess

AI can also be thought of as Uncertainty Management

Planning

I tend to easily get distracted. This is probably because I'm interested in a wide range of topics and easily get side-tracked and sucked down a rabbit hole.

As a consequence I tend to be a generalist that rarely gets anything done unless there is a hard deadline involved (and the Panic Monster kicks into action).

I am going to take two measures to try and counter-act these tendencies.
Firstly, I am forcing myself to write a blog entry at least every weekend which will focus my attention and hopefully result in some measurable progress.
Secondly I will enroll in some online courses that will have assignments with deadlines. Probably the blog will mostly be about the assignments of these courses.

To start off I will allow myself one day of researching courses, books, tutorials etc. before deciding on a "syllabus" for myself.

I searched the web for a bit and found that the best recommendations on course material and other online resources came from Quora, specifically these questions:

Based on these answers I have decided to start with a general machine learning course:
and an introductory AI course:
as well as possibly

These courses will force me to do some weekly assignments which should keep me on track. I plan on posting my thoughts on each assignment including my answers and how I got to them. Perhaps it might help others.

It seems that recent advances in AI and ML has started to create some popular techniques and theories, such as Deep Learning, Model Free Methods, Artificial Intuition etc. that some people consider to be such a paradigm shift from previous technologies that it makes them obsolete. As I already mentioned I am a bit of a generalist and usually need to have a very in depth knowledge about a topic to really be able to think clearly about it, so I will first focus on building a background knowledge studying the older techniques and then continue to the state-of-the-art techniques. However, I plan to do some reading on Deep Learning at the same time so that I have a good idea of the end goal. IT seems like the site http://deeplearning4j.org/deeplearningforbeginners.html has many good resources and links that I will follow. (more at http://deeplearning4j.org/deeplearningpapers.html)

 


Wednesday, May 25, 2016

Hello World!

The first of my hopefully-at-least-weekly blog posts about my wanderings and experiments in some of the fields I'm interested in.

Many years ago I studied electronic engineering. Upon completing my studies I was all excited about all the possibilities of things that could be accomplished with my newly found knowledge!
Present day... I realize that in the years since I have used almost none of the things I spent so much time and effort learning. I could probably have done a 1 year basic programming course after highschool and be just as capable at my job.


Today I have decided that I will start learning again. I will force myself to experiment, re-learn and think more deeply about topics such as AI, machine learning, statistical analysis, programming languages, mobile application development and whatever else I've starting playing around with but always gotten distracted midway through in the past. 

Creating this blog will force me to research in a more goal oriented way and document what I do.

Who knows, I might even help somebody else along the way!