Random Quote: Even atheists say a little prayer now and then: Dear God, I am an idiot, thank you for protecting my children. – Garrison Keillor
I mainly program in C and Python. Python is a beautiful interpreted language with excellent design. (For example, it’s one of the three official languages at Google.) Of the modern interpreted languages, it happens to be the one that engineers and scientists have picked up on as their language of choice, which means that good scientific libraries are available. You can even do your symbolic calculations in Python if you wish. For those who would like to learn more, please try these lectures.
When I need speed, I code in Python first to give me a feel for the problem and a way to test results, and then rewrite in C using the the outstanding GNU Scientific Library. (Great job guys, love your work.) I’m also looking forward to future development of Julia.
(Why don’t I use MATLAB like most other economists? MATLAB is a very useful language, but it’s more limited than Python, and the design is starting to show its age. As the marketing people at MathWorks used to say, MATLAB is Fortran for the 1990s.)
Another handy open source programming language is R, which in turn was developed back in the early days at Bell, along with C, the UNIX operating system, information theory, the transistor, the laser, and a few other minor contributions to world science and technology. I have some computer labs introducing R if you’d like to learn more.
I do all my text editing using vim. I used to use Emacs but I was young and needed the money. I have made a FUNDAMENTAL ADVANCE in the field of vim text editing: map “jj” to the escape key. One day it will make me famous.
I’m interested in machine learning. The machine learning problem is one of associating an output with a given set of inputs. One component of the problem is selection of a hypothesis space, which is a suitable collection of candidate mappings from the input to the output space. The second component is a learning algorithm, which takes training data as an input and returns a mapping from the hypothesis space that represents a proposed functional relationship between inputs and outputs. The task of the system designer is to implement this search through the hypothesis space with the objective of finding a mapping which optimizes the ability of the machine to generalize, i.e., to give the “right” output for an arbitrary set of inputs.
Statistical learning theory provides a principled approach to optimization of the generalization ability. The Ayatolla of statistical learning theory is V.N. Vapnik. The Ayatolla of all inductive science is the great Karl Popper.
Here’s a nice quote from Vapnik:
I believe that something drastic has happened in computer science and machine learning. Until recently, philosophy was based on the very simple idea that the world is simple. In machine learning, for the first time, we have examples where the world is not simple. For example, when we solve the “forest” problem (which is a low-dimensional problem) and use data of size 15,000 we get 85%-87% accuracy. However, when we use 500,000 training examples we achieve 98% of correct answers. This means that a good decision rule is not a simple one, it cannot be described by a very few parameters. This is actually a crucial point in approach to empirical inference.
This point was very well described by Einstein who said “when the solution is simple, God is answering”. That is, if a law is simple we can find it. He also said “when the number of factors coming into play is too large, scientific methods in most cases fail”. In machine learning we dealing with a large number of factors. So the question is what is the real world? Is it simple or complex? Machine learning shows that there are examples of complex worlds. We should approach complex worlds from a completely different position than simple worlds. For example, in a complex world one should give up explain-ability (the main goal in classical science) to gain a better predict-ability.