Aug 10, 2012

Aug 6, 2012

Hilbert Disco

The Hilbert curve has the interesting property that points local on the plane are likely to be local on the Hilbert curve as well.

The image below is a colored square. Each square is colored in a sequence along the Hilbert curve. This gives the image the nice property that it has many patterns in it, but still sufficiently complex that it isn't just a pattern. It looks a bit random.


May 19, 2012

GeometricVinyl Clings



Random Stuff with Unit Origami

Translucent paper used to make basic shapes from Tomoko Fuse´s Unit Origami



Apr 30, 2012

Stock Visualization

I've never really liked stock visualizations. Most visualizations don't seem to compare the stocks against the market. Even those that do are often too short term to see the global behavior of the stock.

The images below are Log-Log plots of individual stocks against the Dow Jones industrial average over the course of each stocks lifespan. The log value of the Dow Jones is on the bottom axis and the value of the stock is on the log value of the stock is the on the vertical axis. The red dot is the current value of the stock.

Stocks do well if the line trends to the top of the chart. The market in general is doing well if it trends to the right. Over time all these graphs trend to the right and the top since I've selected for successful companies.

Maybe the most interesting stocks to look at this way are the older ones.
GE
This is GE. GE has been stably rising alongside the index for much of its mature life except recently. This seems to be true of most stocks. The graphs somewhat look like flowers. The stem of the flower ends around 1997 and the flower begins, signalling that the stock became much less correlated with the Dow Jones.
Walmart
This plot is for Walmart which shows essentially the same structure as GE above.
IBM
IBM is another older stock. You can see the flower at the top left, but it looks like the stock might have recently left the flower head for good.

Microsoft

Microsoft pretty clearly shows the flower pattern, but it not clear that it is leaving the flower region at the top of the stalk.
Apple
Apple is a great counter example. The stock was doing pretty poorly most of life until recently. There is no flower pattern in this stock. Things are must less clear in newer companies which have no stalk.
Google
Google's growth has been mostly vertical because it was grown while the Dow Jones hasn't. It's still possible to see though that the value of Google is effected by the Dow Jones, though weakly in how its flower slants.
Amazon

Compare that to Amazon though. The leaves in its flower are completely horizontal. The changes in the Dow Jones don't seem to affect its value and it climbs up slowly despite them.

So what do these visualizations mean?

Nothing. They're just another stupid stock visualization.

Apr 25, 2012

Good-bye Sweet Silence


Comparing the dystopias of Orwell and Huxley, Huxley probably better anticipated the problems of our future civilization. Orwell however was more insightful about the dangers of ubiquitous communication.  Any communication device that works everywhere is just a monitoring device of the mob-ocracy.

Smart phones are the dead end of both personal ownership and communal property. They provide no hope for a democracy of creativity on the internet or privacy. They are only a poor fix for our need for friendship and communication.

I've inconvienced my friends and family for a long time with my refusal to get a modern cell phone. Most have been kind of enough to put up with it, but I can't go on. It's impossible to be really free.

Politeness and Machines



An automatic checkout machine just asked me:

¨Have you swiped your member card yet?¨

This of course caused me to pause and consider what the machine was telling me. The sentence is actually really complicated, but it's just a polite way of saying:

"Swipe your damn card."


The polite version just doesn't work for me though. When anyone says something like it, they are relying on plausible deniability. They might not be sure whether you have swiped it or not. So they can say this without seeming demanding. They can plausibly deny having any knowledge of whether you have done this or not.

The machine however has no plausible deniability. It knows whether you have or not. The fact that it tries to get away with the plausible denial seems to make it unnaturally rude. Instead, it's very clear that it knows you have not swiped it and comes off as very insincere.

Anyway, I got yelled at in the automatic checkout lane today for thinking too much about the linguistics of politeness and how it affects machines. The last chapter of Pinker's "The Stuff of Thought" has some good material on the subject.



Apr 9, 2012

Why I don't like the verb "to teach"

So
是以
Sages manage the work of detached actions
聖人處無爲之事
Conduct the teaching of no words
行不言之敎
They work with myriad things but do not control
萬物作焉而不辭
They create but do not possess
生而不有


"Teaching" is an extraordinary word. Few words can twist the meaning of a sentence like it does. Consider the sentence:


 "Teachers teach students." 


This sentence does more than just convey that education is happening. It frames the meaning of it. The teacher actively gives the education. The student passively receives it. Education is done to the students.


And for the most part, that is how we teach and think about teachers. It is so ingrained that you can see this giving-and-receiving metaphor in the design of classrooms. The metaphor is so hard to escape because the design of our environment affects our behavior.


But consider the word "learning" instead. Students can learn from a teacher, just as they could learn from a book. "Learning" frames education in a way that values effort and participation from students. The best teachers I have had didn't teach. They created environments for learning and simply let it happen.

Mar 20, 2012

Finite Precision and Statistical Models


If you aren't already familiar with some of the common errors in statistical modeling, I highly recommend Peter Kennedy's A Guide to Econometics.  This post is about a modeling issue that I haven't read in any book yet, but seen a couple of times in person.

The issue is caused by precision. Maybe the reason I haven't read this in a book is because precision issues are applied instead of academic concerns. Most of the analysis of statistical model fitting is done with the real number system, which leaves out precision from consideration. Still, finite precision becomes a problem for models around their steep points. Consider this S shaped curve that gives the relationship between two variables:



It is steep in some parts and very shallow in others.  Any uncertainty in the value of one variable implies an uncertainty in the value of the other variable. The uncertainty grows when the derivative of the curve is greater than 1 and shrinks when it is less than one. In other words the same amount of uncertanty in the value of the independent variable (x-value) produces different amounts of uncertanty in the value of the dependent variable (y-value). We are more uncertan about the value of the dependent variable when the curve is steep (derivative greater than 1) and more certain about its value when it is shallow. You can see this in the graph below:

Though the red and green lines span the same region on the independent variable's axis (x-axis), red spans a larger section of the dependent variable's axis (y-axis). Any uncertanty around the red region is magnified considerably. This S shape curve will be problematic because of this.

Consider a set of data that is taken from this curve plus some constant random noise. I've basically defined a homoskedastic data set which is great. But if there is any uncertanty in the independent values as we measured it, then the noise added to the dependent values effectively becomes larger where the model is steep, even if the uncertainty in the value of x is constant. Here is what the set of data looks like for 500 points if there was a Gaussian noise on the dependent values and some smaller Gaussian noise on the independent values.


The curve is so much thicker in the middle. It's heteroskedastic. If instead of Gaussian noise in the independent values we simply rounded the independent values,  we would get the same problem.


Thicker in the middle. Compare this to no noise our rounding in the independent data.


This is homoskedastic. It's an example that you would see in a textbook and is completely unrealistic compared to the first two data sets.

If we try to fit anything to the first two sets of data, the elements in the middle will have a stronger influence on the result than the rest of the data because of the increased variance in the middle of the S shape. That might not look so bad for this data set, but what about replacing our S curve with 1/x or Log(x)? With these, values close to 0 will have an incredible weight in determining the value of a least squares regression.

How many people are really even thinking about these kinds of things when they use statistical models?

Here's some Mathematica code used to generate the three sets from the plots above in order. The S curve is 10*ArcTan(x).

xsamples = RandomReal[{-5, 5}, 500]; 
firstSet =
 Transpose@{xsamples + RandomVariate[NormalDistribution[0, 0.2], 500],
    10 Tanh[xsamples] +
    RandomVariate[NormalDistribution[0, 0.2], 500]}; 
secondSet =
 Transpose@{Round[xsamples, 0.5],
   10 Tanh[xsamples] + RandomVariate[NormalDistribution[0, 0.2], 500]}; 
thirdSet =
  Transpose@{xsamples,
    10 Tanh[xsamples] +
     RandomVariate[NormalDistribution[0, 0.2], 500]};

Mar 17, 2012

The Creative Class

I keep on seeing articles and blogs about the importance of being a content creator. This is actually one of Johnathan Zittrain's big principals for the internet -  we need tech which allow us to create content and not tech which only allows us to consume it.

But more recently, curation as an alternative to content creation has become popular. Pinterest and Tumblr are both examples of this category. Hell so are search engines. Content curation is possible with many more kinds of devices than creation. It can be done using mobile devices and takes advantage passive interaction (Read Wu-Wei)  such as page views.

The shift to curation is in part a response to a saturation of information on the internet. Search engines are data curation. I am hardly qualified to create new content that is worth much except in a few small areas. This blog for example is fairly worthless.

A bit more on the dark side, maybe we should flip how we think about curation from collecting good content to destroying bad content. We should be talking about content destruction. People are reluctant to see that more content is often destructive. Curation is only useful because it filters out worthless data.

Worthless data we keep on ourselves can also come back to hurt us later through data mining.