Monday, December 22, 2008

On Governorship and Violent Crime

On a somewhat-recent episode of the Daily Show, an interesting claim was made:
You have a higher chance of going to prison after serving as governor of Illinois than if you were to murder someone.

The statistics cited are:
  • 4 of the last 8 governors (50%) have faced prison sentences after holding office (counting Blago as a near certainty)
  • Only 48% of murderers actually serve time for their crime
Now, what about these figures?  First off, without actually checking into the history of Illinois, it seems like the first number was probably cherry-picked; that is, governors being convicted of crimes is likely a more recent phenomenon, so if we go back a few more governors we'll probably find mostly non-convicts, which will drop the percentage.
Okay, so that's just an assumption.  Let's actually check Wikipedia.
Well, checking all the way back to Henry Horner back in the 1930s, there aren't any additional convictions other than the four already mentioned (that I found, anyway).  So while that statistic is true, it is indeed a bit cherry-picked.

So now the crime statistics.  This was a bit more difficult to find, and I still haven't found anything backing up the claim made on the Daily Show.  Best I found was this, which reports that the FBI's homicide case closure rate is about 61%, but this is a bit different from the proportion of individuals serving time for the crime: the FBI considers a case cleared when an arrest is made for the crime, not by whether the suspect is convicted or not (it's a tad more complicated than that, apparently, but that's the basic idea).  There's a few other issues with this, as well, but I think they would likely have less of an impact; for example, what about those who murder several individuals but are only charged with some of the murders?  For the sake of this blog post, though, let's just assume the 48% figure is correct.

From the world of statistics comes the idea of a hypothesis test, which allows us to decide with a certain degree of accuracy whether or not a claim about a an attribute of a population.  In this case, we're dealing with the proportion of governors of Illinois that eventually become convicted felons.  With p^ (pronounced "pee hat") representing the given sample proportion of felonious governors, and p representing the population proportion (which is unknown), the claim made by the Daily Show is essentially that p > m, where m is the proportion of murderers who serve a sentence for their crime.  From the above, we know p^ = .5 and m = .48.  Using a significance level of .05, we get a critical value of zα/2 = 1.645 and thus find the test statistic to be z = .080064.  Now, the null hypothesis in this case is p = m (which I should have mentioned earlier), and clearly the test statistic falls outside the rejection region.  Thus there is not sufficient evidence to reject the claim that p = .48, that is, 48% of Illinois governors either have been or will be convicted of a crime.  Basically, this means, no, we can't say with a good degree of certainty that p will be greater than m.

In conclusion, the claim made by the Daily Show, while interesting, has little evidence to back it up.  To remedy this, I suggest we either just let it go, or dig some dirt up on long-dead former governors, as dead people are easier targets.  Also, I have too much time on my hands.

Disclaimer:  most of the above is probably wrong, and numerous egregious statistical errors were made in the writing of this post.

Thursday, December 18, 2008

oh god how did this get here i am not good with array languages

You know, learning new programming languages can be very beneficial for yourself, regardless of how you might use them. You'll be exposed to new, interesting concepts and idioms you've never seen before.
So, when can this go wrong? When you try learning a language radically different from any others you've seen before at three in the morning.

Introducing J, an array programming language. What is an array programming language, you ask? Well, first off, they're confusing as hell and trying to read code written in one is like trying to translate hieroglyphs while simultaneously being whacked with cactuses. Angry cactuses.
Here's a good example of some APL (the original array language) code, which J is closely related to:




(implements a version of Conway's Game of Life, though I'm sure you already figure that out)

Yeah, it doesn't even stick to ASCII. J does, though it still manages to be not much more readable:
+/ i.@x:&.(p:^:_1) N

The above returns the sum of all primes less than N.

As cryptic as it might seem from the above, it's actually a pretty cool language. I wouldn't ever use it for general programming purposes before trying Python first, but if I'm working on a Project Euler problem, sometimes it might be quicker to fire up the J interpreter and write a single line of code.

Of course, I've still got no idea as to what the hell I'm doing.

Saturday, December 13, 2008

Genetic programming and stuff

So a few days ago, I was shown this page. It takes an image as input. Basic, but probably somewhat flawed explanation: the output starts out as a number of invisible polygons, and at each step of the algorithm a random attribute of a random polygon is randomly changed. If this looks more like the source image than does the last mutant image, this result is kept and more changes are made to it. This goes on and on and on. The results can be pretty amazing if you let it run for an hour or two.

Now, I suck at Python, and this implementation was written in Javascript, which I have little understanding of, so I can't learn all that much from the actual code. In a book I have on Ruby, Ruby by Example: Concepts and Code, though, I recalled a section on a somewhat similar evolutionary program that is meant to operate on strings. The concept is similar: a target string is given, and a candidate string of the same length is randomly generated.
In my implementation (which I'm currently working on), a mutation is made to a random character in the candidate string, and if that mutant's fitness (mathematically, 1 - (current difference between target and candidate) / (maximum difference)) the target is higher than that of the current candidate, we make that mutant the candidate. Eventually, depending on the length of the string and how my random number generator's feeling today, it will reach the target string. For observation's sake, we'll output the current candidate every X number of steps.

Now, I am not a computer science major nor have I read that much on it, so I'm not that up on terminology and concepts. This is an interesting concept to me, though. It's a decent analog of biological evolution, but it's a bit oversimplified. For one thing, in biological systems detrimental mutations do sometimes propogate. This model instead throws out every mutation that is not beneficial. Another thing is that this model forces mutations to occur at every step, whereas they're far less common in nature.
Also, this scheme only simulates asexual reproduction, in which only one parent is needed for reproduction. Most animals we're familiar with, though, use sexual reproduction, which involves two parents. In the former, you only get one set of DNA to mess around with. With the latter, though, genetic diversity (and thus chances of a better match) increases as you have two separate sets of genes being mixed and matched. This could be implemented in a way similar to that of the first string-evolution program, except we would start with two distinct random strings, and make mutations to each of those. At each stage of reproduction, however, instead of making changes, we force the two candidates to "breed": take the DNA of each, and randomly assign a weight of 1 to an attribute (in this case, letter) of one parrent and a weight of 0 to the corresponding attribute of the other parent. X number of offspring are produced, and a random mutation is made to each one. The two best mutant offspring are then mated and we continue the process.

Now, my idea of simulating sexual reproduction in this way might be a bit faulty, but if it is I'll find that out soon on my own. Also, it's a bit gross (and, if were we dealing with biological organisms with some recessive detrimental mutations, error-prone): since the offspring that mate are from the same parents, it's pretty much cyber-inbreeding.

Anyways, those are some of my ideas. Feel free to think they're stupid or to not understand them, but in the meantime, I'll try and finish up my implementation of the asexual string evolution.

Toodles for now.

Thursday, December 11, 2008

Oh, also

The previous post embeds the code which I've hosted on Google Code or whatever the kids are calling it nowadays. Thus, whenever I update the code, that magic box will automagically contain the updated file.
The project page can be found here.

I suck at svn, so I'm likely to break something soon. Never used it before, so it took me a little to figure out how to even get the gosh damn thing up.

pybrain4.py - First release

Well, here it is. It can be imported into an interactive python session, or run from the command line with something like:
python pybrain4.py file.b

Additional documentation can be found by calling it with the '-h' or '--help' flags.
Among other things, I'll work on trying to speed it up a bit.

Also, this has only been tested on OS X Leopard with Python 2.6. Compatibility will almost surely be broken with 3.0, but should probably work in 2.5.
Let me know if you have any problems.



Some Brainfuck programs to try out:

Took a while, but...

Pybrain v. 4, my poorly written Brainfuck interpreter, is a success!

Yes, Pybrain v. 1 - 3 did not work. Version 3, the lengthiest, had 177 lines of code. Version 4 currently has 61. It's not complete yet, as I still have to add support for running it as a standalone script and executing a given file instead of just strings from the Python interpreter. Once I polish it up a bit, I suppose I'll post it here.

Version 3 was a complete rewrite (as was the later version 4), and was very object-oriented. Given that resounding failure, I'm not sure that was the best way to go. Perhaps choice of programming paradigm should not be based on one's ability to work bad, stupid puns into the program (e.g. naming the cell class BrainCells).

Tuesday, December 9, 2008

Yup, another infinite loop

This one isn't as obvious as that last one. Still looking.

In the meantime, Foreman and Thirteen totally just made out. Saw it coming from a mile away.
Or like five seconds. I dunno.

I am a complete idiot and I suck at not being a complete idiot

Turns out the problem causing the infinite loop in the Brainfuck interpreter was, well, an infinite loop. I know, I know, it must've been hard for me to catch. Please hold the applause.

I had forgotten to implement the somewhat important part where, you know, the while loop actually works. I had forgotten to add a check to see if the current cell is 0, thus making the loop somewhat useless.

Now I'll go try some other code and probably find new bugs.
Oh boy.

About that interpreter...

Unsurprisingly, it's not working on its maiden voyage. Not unlike the Titanic, except that killed fewer people.

It worked on a small snippet of code, but not the 'hello world' script posted earlier. Rather than doing what it's supposed to, it's not. I expected about that much, though. At least it's going out in an infinite loop instead of some sissy syntax error.

More as this catastrophe develops further.

Hullo, I am Guff and I am bored and I have a Python installation.

And goddamnit, I'm going to use it.

Brainfuck is a very silly programming language. But I suppose that's to be expected when you set out with the intention of making a language that's as frustrating as possible to use. There are far more difficult esoteric languages out there (Esolang is a good resource), and Brainfuck is very easy to understand. The difficulty comes from actually getting it to do anything. For example, here's a 'hello world' program written in Brainfuck (from Wikipedia):
++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++
..+++.>++.<<+++++++++++++++.>.+++.------.--------.>+.>.
(the line break is not significant and should be ignored by most interpreters)

Simple, right?

Now, how does this all work (I'd recommend skipping this section if you're already familiar with Brainfuck)? Well, first off, the standard implementation only recognizes eight characters as Brainfuck operations: <>[]+-.,

Secondly, think of the Brainfuck environment as a contiguous series of cells. The size of cells can vary based on the implementation, but the most common would seem to be a single byte (meaning the maximum value a cell can hold is 255, assuming unsigned integers). The number of such cells also varies; the original implementation, I believe, used 30,000 cells. Now think of there as being a cursor that points to one of these cells. When your program starts, it will be pointing at the first cell.

Now, back to the operators. The <> operators move the cursor to the previous cell and the next cell, repsectively.
[ signifies the start of a loop, and ] marks the end. When the end of a loop is reached, if the current cell is not equal to 0, the program goes back to the corresponding [ operator.
+ and - increment and decrement the current cell, respectively.
Finally, . prints out the ASCII character corresponding to the value of the current cell, and , takes a character as input and sets the current cell's value to the ASCII value of the input (e.g. 'a' -> 97).

Okay, so now why am I rambling on about this? I dunno. I'm kind of bored and I've been learning Python for some time now and yet I still kind of suck at it. It's either this or a Lifetime movie about an evil baby or something. Also, I find Brainfuck's simplicity fascinating. It's a very interesting language, even if it's almost entirely useless and a pain to write in.
So my goal is to write an interpreter in Python. I'm currently just in the planning stage, but here's what I've got so far:

The basics:
  • Take a file or string as input
  • Get rid of every character that is not a Brainfuck instruction, to make things easier
  • Check the string to make sure the code is well-formed, i.e. the braces are paired up correctly
Now, that's not that hard. And actually implementing the instructions properly shouldn't be either. The problem is that Brainfuck is not a rigorously specified language. How should I implement it?
  • What should the cell size be? Well, I think I'll allow for user-definable cell sizes. By default, cells will store Python's int type, which would also automatically switch over to long when the value becomes too large for int to hold (Python 3.0 doesn't make such a distinction; ints can be arbitrarily large). If this is the case, negative values would also be allowed. At least, I can't think of a reason not to. If a maximum cell size is specified, however, overflow will reset the cell to zero, and decrementing a cell of value zero will set it to its maximum value.
  • How many cells? User-definable, or unlimited? The latter would automatically create new cells when the need arises, and I'm thinking this is the most sensible solution.

I think that's about it for now. I'll post again after I make some actual progress.