Sunday, December 15, 2013

Scientific computing: Sharing is caring

You're currently sitting in front of a high-tech, superpowered computational machine. It's capable of running calculations in a second that would take the human brain hours or even days. What are you using it for? Sure, you write code and play high-end games, but do you really do that all the time? Chances are there are days where all you do is check email, stare blankly at Facebook, and browse /r/cats before getting up to make yourself lunch, leaving your browser open to a picture of a stranger's cat.

The best use of time

Doesn't that feel like a waste? It's like what Google employees complain about: your computer is capable of much more than you're actually telling it to do. It's overqualified for its cat-picture-displaying job. Wouldn't you rather use your laptop to its fullest extent? And maybe help science?

There are tons of projects where you can donate your unused CPU power to help scientists run demanding computations for their research. This is the basis of distributed computing, where computational power is divided between a network of computers; this gives the power of a supercomputer without the monetary or space requirements. Distributed systems are especially useful for scientific computing (also known as computational science), which is centered around constructing mathematical models of scientific problems. Scientific problems that could use your help.

Pictured: helping

So now I've convinced you. You're all set to donate some spare PC power to a worthy scientific cause. Where do you get started? Well, for starters, Wikipedia has a whole list of distributed computing projects, which should give you enough of a jumping-off point. For those of you too lazy to change pages, here are some cool ones:

Electric sheep: alright, this isn't exactly science. But it's pretty cool anyway. While your computer sleeps, it talks to other computers on the Electric Sheep network. Together, your computers share the work of creating complex crazy abstract shapes and fractals.

Cosmology@home: if you want a loftier goal with your computing, the people at Cosmology@home is working to find the model that best fits the Universe. You know. No big deal.

Einstein@home: keeping with the space theme, Einstein@home uses your computer's idle time to find neutron stars -- they've already found over three dozen and are supported by the American Physical Society.


Saturday, December 14, 2013

Computer graphics: Cast your rays

I'd be willing to bet money that you've played (or at least seen) some sort of generic dungeon-crawler game with high walls, repetitive textures, and a very basic 3D engine.

Looking at you, Dungeon Master

These games tend to end up with a cult following, but people generally aren't too impressed with the game itself; in a world where true 3D immersion is just baby steps away, blocky pseudo-3D dungeons aren't that cool. Not anymore, anyway. Go back about twenty years, though, and this was pretty high-level stuff. 

Remember the Wolfenstein games? You, the blocky hero, have to sneak through and out of a castle (or bunker), shoot Nazis, and save the day. The first two games, Castle Wolfenstein and Beyond Castle Wolfenstein, looked like this: 

I wasn't kidding when I said you were blocky
Then suddenly, in 1991, Wolfenstein 3D (the grandfather of 3D shooters) came out looking like this: 

Such depth!
It feels a whole lot like 3D (albeit unrealistic 3D), but it still ran on a 2D platform. All semblance of 3D in Wolfenstein 3D was faked using a technique called ray casting. Ray casting divides the map into a grid. Each entry on the grid can either be "wall" or "not wall." Then, for every vertical line in the player's field of vision, a ray is sent from the player out through that line until it hits a wall. 

Rays extend from the player (the green dot) to the wall (the blue squares)
The distance from the player to that wall is calculated, determining the displayed height of the wall (far-away things look little, etc.). This principle also applies to enemies -- the longer the ray from you to Herr Nazi, the smaller his sprite is scaled. There are a ton of subtleties to this (how do you determine how often to check for a ray-wall intersection? what about field of vision?) and lodev has a very detailed and clearly-written introduction to ray casting page here that can clear up some uncertainty (that's also where I got my grid image). 

With ray casting, faking 3D on a 2D platform became easy: the implementation is straightforward and the calculations were so easy that real-time ray casting could be done on a 286. Unfortunately, giantbomb.com is a buzzkill and pointed out all the things you can't do with ray casting
  • Walls can be of only one height
  • All cells are of the same width and height
  • Walls must be one color per face
  • The camera is unable to rotate on the x-axis (look up or down)
  • There can be no cells above cells (e.g. the first and second floor of a building)
  • Walls must be either perfectly horizontal or perfectly vertical
Those dungeon games are making sense now, right? Walls of one height? Check. One color? Grey goes with everything. Can't look up or down? Don't need to in a dungeon; it'll just be ceiling and floor. Perfectly vertical walls? No reason to make your dungeon fancier than that. 

If you're thinking of making a simple game, dungeon crawler or otherwise, ray casting is a good technique to know. It's a simple implementation, so you can spend more time thinking about a plot that doesn't suck. You can use the info at lodev to get started, or there are video tutorials from Unity3DStudent and Ravi Ramamoorthi (check that one out just for his voice).

Or if you're really lazy, I think The Raycasting Game Maker does most of the work for you. 

Saturday, November 30, 2013

Communications security: Watch out for middlemen

Imagine a middle-school classroom. I know we all want to forget, but this will just take a moment. So little Suzie has a crush on little Johnny. She's shy because, you know, middle school, so she passes a note along the row of desks. It's a folded up note with "Johnny" written on the outside; the inside professes her adolescent love. Halfway through class, a note makes it back to her that says "Suzie" on the outside and "Ew gross you're a big booger brain don't talk to me again" on the inside (because, you know, middle school). Not looking so good for Suzie and Johnny, right?

Meanwhile, little Lucy is laughing to herself. Her desk lies between Suzie and Johnny, so the notes were passed through her. Without either of them seeing, she replaced the notes with ones she had written: Johnny got a note that said "You're yucky and have boy cooties so don't talk to me ever," and when he wrote back, heartbroken, his letter was altered to be about booger brains.

Suzie has basically just done an analog man-in-the-middle attack. In cyber security, this is a type of eavesdropping where the Suzies of the Internet insert themselves into a communication between two parties. The attacker intercepts data from both sides, reading information never meant for them and inserting their own messages into the conversation.

The man in the middle breaks the normal flow of communication (source: veracode.com)

The attacker splits the TCP connection between the parties into two new connections: client-to-attacker and attacker-to-server. As you might have guessed, this is bad. What if you're trying to communicate to a bank server so you can transfer money (to pay for that privacy visor you've always wanted). You send $24.99 for your purchase. If someone has man-in-the-middle'd you, they can not only grab all the passwords and account numbers you're using but also change the destination of your transfer to their own account and change the amount to $2499.00.

Unfortunately, this kind of attack is really difficult to prevent. Any time you're trusting someone who's not you with confidential information, you run the risk of having that information misused. Using HTTPS or a VPN can greatly reduce the chance that someone will insert themselves into your conversation: HTTPS uses your browser's SSL (secure socket layer) and verifies the identity of the servers you're connecting to. VPNs require access to a VPN access point. It is, however, possible for men in the middle to intercept HTTPS connections. In this case, the user's browser generally gives a warning.

You've probably seen one of these before

And the user often ignores that warning because they don't know better. Admit it: you've clicked "proceed anyway," haven't you? I hope you didn't give your credit card information...

But besides the fact that we should know about Bad Things People Can Do (tm), why do we care? As long as we stay away from sketchy sites and use secure connections, we're probably going to be okay, right? Well let me ask you something: do you use an iPhone? A lot of people do. There's a slight problem, though: last month, Skycure announced to the RSA Europe conference that many iOS apps are vulnerable to a man-in-the-middle type of attack that "lets the attacker take dynamic control over the app." Again, when you think of financial data being sent and received through an app, this is bad news.

And it's not just people hacking your apps. Since Apple technically has the encryption information for its iMessage protocol and stores the public and private keys for each user, they can basically have a man-in-the-middle eavesdropping party any time they (or a friendly government agency) want to.

And speaking of friendly government agencies, they're pretty well-versed at being middle men. The NSA's Quantum servers are placed strategically enough to run man-in-the-middle interceptions to Google services.

The Snowden / NSA / communications security blowup has really brought this sort of eavesdropping attack to the forefront of everyone's mind. Hopefully this will mean more development in actually-secure communications, but I wouldn't hold your breath. If it can be built, it can be broken.

Also, since mentioning the NSA has probably put me on some sort of list, I'd like to say hello to whatever bored government agent has been told to read this over and make sure I'm not a terrorist. You should check out my other posts; you don't guest-star in any of them, but  they might be a nice change from having to read every Facebook post that calls someone a terrorist.

Thursday, November 28, 2013

Artificial intelligence: Coming soon to a stage near you

Some things are naturally polarizing; people tend to have very strong opinions about things like jazz music, operating systems, and the cats-vs-dogs debate. I know I can't do much to convince you that cats are simply better, but I think I found something that will appeal to both sides of the jazz debate: GenJam.

GenJam is a very special jazz musician; it's a really clever computer program that improvises jazz music in real time. Its name is short for Genetic Jammer: it jams with RIT professor Al Biles using musical phrases developed with a genetic algorithm. It's probably my favorite use of AI because 1) it's impressive that it works at all, 2) it's not a pretentious project, and 3) I don't really know many uses of AI, so my selection pool is pretty small.

AI mostly just makes me think of robots, but everyone knows about robots so that's not as interesting

GenJam was developed by Biles in 1993 and has been consistently improved and advanced ever since. It is now capable of improvisation in over 300 songs (including a bunch of Christmas music, in case you need to book an act for an upcoming Christmas party). If you want to see the kinds of things it's capable of, check out this video of Biles and GenJam having an impov session. Those solos weren't preset in GenJam's memory; it had to learn how to create a good solo and use those techniques on the fly, the same as any human musician. 

Before talking about how GenJam has learned how to play jazz, we should talk about genetic algorithms first. That's how GenJam does its learning, and it's probably not something you've done before. AI-Junkie has a fantastic introduction to genetic algorithms if you're interested in the details, but chances are if you're reading this it's for a grade and not because you actually care about what I have to say. So I'll paraphrase the important parts. All information credit is to AI-Junkie, though. 

Unless you slept through seventh grade, you probably know how genes work in nature: mommy and daddy have separate DNA with slightly different chromosomes; these chromosomes can cross over and create new chromosomes for their offspring that are combinations of genes from both parents. The offspring then mate and mix up their own DNA and novel populations are created. 

Chromosomes cross over and mix DNA. This looks like a bad MS Paint drawing but it's apparently an official McGraw-Hill image. I should apply to their graphics department.

Mutations occasionally happen, changing one or two G's to C's and so on. Sometimes this causes a noticeable change in the organism; sometimes it doesn't. Some of the changes that occur (both through mutations and mating) give the organism a slight advantage: they can run faster, jump farther, and make cuter faces at humans. All of these help them survive and give them a better chance of passing on their genes to another generation, so over time the most helpful genes tend to survive while the less helpful ones become much less common. 

This exact process is modeled in genetic algorithms. "Chromosomes" are the possible solutions to a problem. They're encoded as strings of several "genes," representing a part of the solution (for example, trying to find all English words that can be made with five letters? Your "chromosomes" will be five-letter combinations and each "gene" will be a letter.) These chromosomes can cross over and swap genes. You start with a population of n chromosomes made up of random genes. For each chromosome, you see if it's any good at solving your problem and give it a "fitness score," assessing its usefulness (sticking with the five-letter-word example, any chromosomes / words with only consonants or only vowels isn't likely to be a word and will have a super negative fitness score. Things with impossible consonant combinations will have a slightly less negative score. Actual words will have the highest possible positive score). You breed your chromosomes in a way so that ones with higher fitness scores will be more likely to reproduce and ones with lower scores are more likely to die off (so ZARPB will breed since it's sort of word-like, but ZXSDF should get axed pretty early). You keep doing this until you decide you're done, at which point you should have found a bunch of solutions to whatever problem you're working on (in our example, we'll stop after we've checked all 5^5 possible chromosomes and keep the ones with maxed-out fitness). 

Really, though, just read the intro from AI-junkie. It'll make more sense than I ever could. 

For those of you who like pretty pictures, genetic algorithms basically go like this: 

They seem much easier in flowchart form...
Still with me? Maybe? Okay cool. That's the general idea behind genetic algorithms; here's how GenJam applies it. (note: all following information is from the original 1994 paper announcing GenJam to the scientific community. More on that in a bit). GenJam has two separate populations to build up: one for individual measures and one for four-measure phrases. It's not trying to select one perfect solo; instead, it's trying to get a feel for what sorts of measures and phrases work well for the song GenJam is learning. Fitness scores for measures and phrases are determined when GenJam plays them to a "mentor," who gives feedback by pressing "g" for good and "b" for bad. Any given phrase or measure can be "upvoted" or "downvoted" more than once, so particularly bad measures can be given a score of "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" and hopefully never be played again. There are several ways of mutating measures and phrases, including sorting the notes and reversing their order. With the mentor's feedback GenJam learns how to create good solos for each song in its repertoire. 


The basic layout of how GenJam works

To actually play, GenJam is first given a file that will tell it the tempo of the piece, the style (straight or swing), the chord progression, and what's happening in the song  [PDF warning] (for example, "four bars intro, Biles solos for eight bars, GenJam solos for eight bars," etc -- check out slide 14 from that link to see what I'm talking about). It also gets the midi tracks for any background music (drums, pianos, bass, etc). During a performance, GenJam and Biles listen to each other and improvise on the fly, like two human musicians would do at a jam session.

Biles and GenJam jamming

All the information for GenJam's specific implementation (and the source of the information for the last few paragraphs) is here in the paper Biles wrote for the 1994 International Computer Music Conference. It really is worth checking out: it's extremely accessible, especially as far as scientific papers tend to go, and is an honestly interseting and engaging read. 

Biles did a TEDx talk last year about GenJam, if you'd rather watch and listen instead of read:



This, I think, is one of the coolest uses of artificial intelligence. Not because it's super technologically relevant or useful (I'm firmly in the "the world doesn't need more jazz music" camp) but because music is something we consider uniquely human. And a computer is learning it. It's a different kind of scary than we're used to, and it makes you think. Which, really, is the whole point.

__

Credit where credit is due: I would never have known about this project if one of my professors wasn't really into computer music and always excited to share cool links with the class, so stop by Computer Music Blog and check out Professor Merz's stuff.

Saturday, November 23, 2013

Computer science: the prequel

So I've skimmed a few blogs that have already been written and it's weird to see how many of them are just about C. I mean, yeah it's been incredibly influential and is the precursor of at least a dozen languages, but it's hardly a starting point. It's like writing a biography about Lincoln when you've been asked to summarize the history of the United States. I'm going to take you back a little farther, to the great-great grandparents of computer science.

Hopefully you've heard of Ada Lovelace and Charles Babbage; if not, I'm honestly disappointed in you. It's like being a music major and never having heard of Bach. These two invented computer science back when the lightbulb was still a new and exciting development in technology.

Wow. Such light. So patented.

Charles Babbage is known today as "the father of computing," and for good reason. Babbage, like any other mathematician of his time, relied on mathematical tables for his computations. These tables included the values of logarithms, exponents, and other calculations that we now rely on calculators to solve.

"Table of common logarithms." Instead of running a calculation through a calculator, your great-great-granddad looked it up in a book like this. 

Every calculation in one of these books (and you can see from the picture that there are thousands of them) was done by a computer. Not a computer like you and I are using, but a human computer. That's the origin of the word -- a "computer" in the nineteenth century was a person who did computations for a living. If you that sounds like a massive pain to you, you're right: it took a group of three people six months, working nearly nonstop, to do the calculations to predict the return of Halley's comet. Babbage thought there had to be a better way. In 1821 he conceived of a machine to autonomously compile mathematical tables for polynomial functions; he called this the Difference Engine and quickly sought government funding. Though he initially received some money to work on the project, insufficient funds and personal conflicts with the engineer let to the ultimate abandonment of development of the Difference Engine in 1834. About fifteen years later, he designed a revamped engine (creatively named the Difference Engine No. 2) that would use fewer parts and could calculate seventh-order polynomials with numbers up to thirty-one digits long. Though Babbage made no effort to actually construct this engine, The London Science Museum constructed two Difference Engines in 1991. You can see a video about how these work here, or you could take fifteen minutes in a car and go see one yourself, since the Computer History Museum (yes, there is one. it's in Mountain View) has one of those two engines.

And it's absolutely gorgeous.

But wait, there's more! Despite everything your algebra II teacher may have told you, there's more to life than polynomials. Babbage figured that out somewhere between Difference Engines one and two, and began conceptualizing an all-purpose programmable computing machine. This machine was called the Analytical Engine (creative naming wasn't really a thing in the 1800s) and is what really earned Babbage his title as the first computer pioneer:

It was programmable using punched cards, an idea borrowed from the Jacquard loom used for weaving complex patterns in textiles. The Engine had a 'Store' [a cache!!] where numbers and intermediate results could be held, and a separate 'Mill' [a CPU!!] where the arithmetic processing was performed. It had an internal repertoire of the four arithmetical functions and could perform direct multiplication and division. It was also capable of functions for which we have modern names: conditional branching, looping (iteration), microprogramming, parallel processing, iteration, latching, polling, and pulse-shaping, amongst others, though Babbage nowhere used these terms. It had a variety of outputs including hardcopy printout, punched cards, graph plotting and the automatic production of stereotypes - trays of soft material into which results were impressed that could be used as molds for making printing plates. [text source here, but the link and notes are mine]

But hardware's no good without software, right? That's where Augusta Ada King, Countess of Lovelace -- more commonly known as Ada Lovelace -- comes in (you've probably read her dad's poetry). Lovelace was a longtime friend of Babbage and took a great interest in his work. When an Italian mathematician, Frederico Luigi, Conte Menabrea, published a paper in French (not Italian. it's weird.) about Babbage's Analytical Engine, Babbage asked Lovelace to translate the paper to English and to add her own notes. The two collaborated for a year on a series of notes that ended up longer than the paper itself. In these, Lovelace included step-by-step sequences for solving several mathematical problems. These were the world's first programs written for a general-purpose computer. And they were written in 1843. Seriously. The first computer programs were written while the Oregon Trail was the new big thing

Lovelace fully understood the impact the Analytical Engine could have on the world. She toyed with the idea of an engine that could be fed any series of characters, not just numbers, and have rules to manipulate those characters. Everything, Lovelace recognized, could ultimately be expressed by numbers: Numbers could represent "[letters] of the alphabet. They could represent musical notes. They could represent positions on a chess board" [source]. Electronic instruments hadn't even been invented yet, and Lovelace was imagining computer-generated music. She's been called the "prophet of the computer age," and that title is well-deserved. 

Computer history long predates computers. Thinking that all major developments came with the C language and its syntax is completely ignoring the beauty of a programmable analog computer and the incredible foresight that predicted developments over a century before they were ultimately realized. 

If you got this far, here's a link to a webcomic that I found about Lovelace and Babbage, which I haven't read yet but seems to not suck. 

Tuesday, November 12, 2013

File sharing and the music industry

The idea of sharing has completely evolved in the last few decades. For thousands of years, sharing something with someone meant you had to physically give them something of yours so your friend (or sibling) could enjoy it, too. With computers, though, everything is different: I can share my music without ever giving up my own copy. It's a fantastic system that's allowed my friends and I to show each other obscure bands we would never have known about otherwise. The issue, though, is that this is technically illegal. Music is, after all, copyrighted, and I don't have the legal right to copy an album to a flash drive to show a friend.

Ever since Napster showed up in 1999, peer-to-peer file sharing has skyrocketed. This is great for regular people like you and me, since we now have this weird endless world of free media at our digital fingertips -- it's estimated that 1.2 billion songs were illegally downloaded in 2010.

download map statistics countries
And the US is number one for downloading. USA! USA!
Naturally, the people who could be making money from all this music aren't happy. The Recording Industry Association of America (RIAA) insists that illegal music downloads have caused $12.5 billion losses to the American economy and the loss of 70,000 jobs. Sounds pretty bad, right? Legal-minded people certainly seem to think so, too, since they've developed a track record of severe punishment for what seems like a minor offense. When LimeWire was taken to court, the legal battle lasted five years and ended in a $105 million settlement. People caught illegally downloading music have been forced to pay up to $22,500 per song when taken to court. (Interesting note about that one: the jury decided on a $675,000 total fine for the defendent -- that's the $22,500 per song part -- and the residing judge thought that a fine that enormous might be unconstitutional, presumably because that's pretty cruel and unusual. He suggested it be lowered to a fine of $67,500 -- $2,250 per song. so much more reasonable -- but the First Circuit Court of Appeals  reinstated the original fine.)  More recently, a major supplier of YouTube videos is being sued because people are uploading videos of unprofessional covers of popular songs.

Better watch out, Pentatonix

This might seem a little much, but pirates are ruining our economy, right?

Aren't they?

Turns out they might not be.

While there are conflicting reports on the effect that illegal music downloads have had, one thing is certain: music sales are just not how artists make money any more. A lot of this is probably because it's really hard to make money selling music in a digital age (and digital music sales are here to stay, piracy or no piracy). Money mostly comes from merchandise and tour tickets which, hopefully unsurprisingly, pirates pay for. This study [PDF warning] notes that filesharing has greatly increased consumption of recorded music and has made music more widely accessible -- definitely a social advantage for the artist, as it makes it much easier to create a fanbase if more people are familiar with your work. There is also evidence that, while sales of recorded music have fallen greatly, concert ticket sales and other sources of revenue have grown in the last several years.

music industry revenue record sales concert tickets
As record sales fell, concert sales grew rapidly

But what's the bottom line? Is piracy ruining America or creating new fans who will be more than willing to pay $40 for a t-shirt at a show? Here's my take: if you were never going to pay for a song in the first place, but someone emailed it to you or you downloaded it for free on a whim, you haven't hurt anyone. The RIAA would never have gotten your money anyway. But maybe now you'll go out and buy the album, or a poster, or an expensive concert ticket. Maybe you won't. They're either exactly where they would have been or better off. The band benefits regardless, since they either get money or exposure (and exposure is how you get people to like you enough that they'll give you money).

Anecdotally, I can say that piracy can definitely lead to an increase in revenue. I heard about a band from a casual mention in the newspaper once. Not wanting to commit money to something I might hate, I downloaded their albums illegally. It turned out that I really liked them and have since seen them in concert at least four times and bought a fair amount of merchandise. This probably adds up to at least $300, when buying their music on iTunes would have cost $60, probably only $25 of which would have made it to the band.

This isn't to say I only ever get music illegally. I actually pay for music. I just don't pay full price. There are plenty of places (many of them shady and Russian) that host legal mp3 and movie downloads for about a dollar per album (maybe four for a huge collection) and about five dollars for a movie, and I'm more than happy to pay that. If media wasn't so expensive, fewer people would resort to illegal means to get it. It's like this song that a friend gave me in high school: "you've overcharged us for music for years, and now we're just trying to find a fair balance." Basically, the RIAA and MPAA have done this to themselves.

Sunday, November 3, 2013

Data structures: They're pretty important

I don't know why, but there seems to be this trend where people sleep through CS 146. I've noticed it myself and I've heard stories from other people that this is definitely a thing. First of all, I'm not sure how it's even possible to sleep when Dr. Taylor is running around being excited about Dijkstra. Secondly, though, this is just bad -- that class is probably the most important one in the CS department. Algorithms and data structures are what people want you to know. While I'd love to discuss intricacies of shortest-path algorithms with you, though, the topic for today is data structures. I'm not going to pretend like this decision is because I find them more interesting than algorithms; it's more because I don't pick the topics for this blog. I don't have that kind of power.

So. Data structures. If you're one of those people who slept though Dr. Taylor's class (you can keep it a secret, I swear), let's review. Data structures are, probably unsurprisingly, different ways to store data -- the structure that your data has, if you will. The key thing here is to make sure you're storing data efficiently; the point of a data structure is optimization. Some data structures are better for databases, for example, while others are great for looking up information. This efficiency is a big deal: "efficient data structures are a key to designing efficient algorithms. Some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design" [thanks, Wikipedia]. Entire programming languages are structured around data structures. And you slept through that class. Way to go, hero.

Having the right data structure can make a huge difference. For example, suppose you had a bazillion items that you needed to search through. What if you stored all that information in an array? Arrays are simple, space-wise, and easy to use. But then when you have to search for your thing, it's going to take O(n) to find it. And that's not just the worst case; that's the expected runtime. That's bad, especially when you're working with numbers like a bazillion. Instead, what if you got all fancy and used a B-Tree? Clearly, you passed that homework assignment in 146. Instead of some multiple of a bazillion, your search is going to take O(ln n), and log of a bazillion is a lot smaller than a bazillion.

If you need a good cheat sheet for data structure runtimes (expected and worst-case) under various situations, there's a website helpfully called Big-O Cheat Sheet.

big-o cheat sheet algorithms data structures
You might want to bookmark this

If you take nothing else away from this blog (and let's face it, chances are pretty good that if you're reading this you don't actually care what I have to say about data structures) at least remember how important it is to have the right data structure for your programs. It can make or break your performance. This post on Forbes.com will do a great job describing to you that, even though you passed the class, you don't properly know your data structures. To help with that, Coursera has an algorithms / data structures class and Berkeley has posted lectures online.

You have no excuse now. Go learn things. Since you'll be working so hard to get interviews, you might as well make sure you have a chance at passing them.

Saturday, October 26, 2013

Hacker: You keep using that word...

So. Tell me about hackers. If you're like most people, you think of them as enemies who have malicious intent to get into a computer system (possibly yours, probably that of someone richer and more powerful than you). Maybe you think they're greasy-haired, goth louts who are spending too much time in front of a computer screen (such poetry!). These hackers want to exploit your system for personal gain, and you will be the victim.
criminal computer hacker
Basically this guy
Then there are people who say that "hacking" is getting into someone's Facebook account (because they forgot to log out) and posting a status that says "I am so gay lol lol lol lol ;) " If you think this is hacking, you're probably twelve. I'm sorry; I thought you knew. 
computer kid such a hacker
This is you. Sorry.
It's not exactly surprising that most people think of hacking as something dangerous and hackers as something to be feared; there is no shortage of evidence that malicious infiltration fo a system can cause extensive damage. There's James Jefferey ("Pablo Escobar"), who stole patient data from the British Pregnancy Advisory Service. Then we have the Melissa virus, created by David L. Smith ("Kwyjibo"), which crashed email systems in the late 90s. The 414s illegally accessed information from high-security banks and laboratories. And let's not forget LulzSec's 2011 attack on Sony, which compromised confidential user information. The list goes on and on.

These are clearly Bad Things. What would your mother say if she knew you were breaking into other people's property and stealing their stuff? It wouldn't be good. So why can't we just say that hackers are bad and be done with it?

Because that's not the end of the story.

The hackers described are what's generally known as "black hat" hackers, or hackers that "violat[e] computer or Internet security maliciously or for illegal personal gain" [source]. Where there are black hats, though, there must be white hats. White hat hackers are security experts hired to find flaws in the security systems in place around websites or computer networks. The word "hired" here is important: you know this hacker will be getting into your stuff and you've explicitly given them permission to do so. Otherwise it's technically still illegal. White-hat hacking has been such a valuable tool to security systems that you can now become a Certified Ethical Hacker and make hacking your profession. 
certified ethical hacker CEH white hat hacking
See how professional Philippe looks?
White-hat hackers have been in the news a fair amount recently. A few months ago, Charlie Miller and Chris Valasek exposed to Toyota and Ford ways that their cars could be hijacked with a laptop
... they devised ways to force a Toyota Prius to brake suddenly at 80 miles an hour, jerk its steering wheel, or accelerate the engine. They also ... can disable the brakes of a Ford Escape traveling at very slow speeds, so that the car keeps moving no matter how hard the driver presses the pedal.
The team actually released the code [article, not the code itself, sorry] for this project at this year's DefCon, an annual hacking conference held in Las Vegas. 

More recently, a different team has helped expose vulnerabilities on Wall Street that could allow someone to force a market to crash. 

So now we have good hackers and bad hackers. Dichotomy makes things easy, right? Except there's a whole spectrum between. Grey hat hackers, as you might expect, fall somewhere in between black and white hats. A good example of a grey hat hacker is someone who casually tries to break into websites (without permission), but then contacts the owners saying, "Hey, you have a security vulnerability here and here. If you want, I could fix it for you..." and probably adding in, "...for a price." This is weird on an ethical scale. On the one hand, they still broke in illegally and are basically ransoming knowledge; on the other, they chose not to exploit your website and are offering to help. 

I'd also like to propose that there are shades of grey; not all grey hats are the same color (make a Fifty Shades reference here and I swear I'll block you from this blog). That guy up there? Probably a fairly light shade of gray. A little lighter if he fixes things for free, a little darker if he charges a fee (unintentional rhymes happen). 

Then you have groups like Anonymous.
Anonymous protest sign
You've seen them before.
Anonymous, a group of internet ne'er-do-wells with very loose organization and no real leadership, span the entire range of the grey spectrum. They don't do things legally. Ever. (At least not online. Peaceful protests in person are perfectly safe. Unintentional alliteration happens, too). And a lot of what they do -- like bringing down or defacing websites -- falls solidly in the Bad Things category of pastimes. Still, though, I can't bring myself to label them as black hats. Throughout everything they've done, and they've done a lot, they seem to have their weird, twisted, lulz-soaked hearts in the right place. They've helped expose Chinese hackers who worked against the US. They brought down child pornography websites and made user information publicly available. Anonymous is a group of weird people who do bad things for (generally) good reasons. 

So where does that leave us? Is a hacker someone who crashes your website and steals your information? A professional you hire to protect yourself? An internet junkie with ambition and too much time on their hands? 

Yes.

A hacker is all of these things. The word can mean just about anything, and that makes it completely useless. We wouldn't need terms like "black hat," "white hat," and "hactivist," (which is a stupid word. Please please please don't actually say that) if the term "hacker" had any meaning on its own. So next time someone starts getting all smart and starts talking about "hackers" as if they were all the same entity, go ahead and tell them 
Inigo Montoya you keep using that word

Sunday, October 13, 2013

The free software movement: E-hippies who want to save the world

Let's say you go to a garage sale and buy a chair.


wooden chair
What a nice chair.
It's a nice chair and everything, but after a while you realize you have too many chairs. But you have this massive dog who likes his food bowl up high. So you do the natural thing and turn the chair into a dog bowl holder.

repurposed chair dog food bowls
You're welcome, Rover.
But then you're moving to a new city and you really can't be bothered to bring the chair-slash-bowl-holder. So you sell it at your own garage sale. The person who buys it doesn't have a dog but likes to garden. So they tweak your design a bit and now have a nifty flower pot holder.

repurposed chair flower pot garden
Bear with me: yes, you're still reading a CS blog.
And everyone's happy, right? You go about your merry way and everyone who sees the chair tells you how clever you are and it feels pretty great.

Now let's say you buy some fancy program. It does almost everything you want it to, but it needs one or two more features to be perfect. No matter; you're a CS superhero and those features really wouldn't be that hard to implement, so you do your hacky magic and have the ideal software. You send it to some friends who all say you're very clever, and you feel all warm and fuzzy inside.

Until the cops show up at your door asking you about copyright infringement.

This is where owning software is different from owning anything else. Even if you buy it and have it and it's "yours," you don't own it. You don't have complete freedom. And that seems a little messed up: we live in a culture where "once you buy something, you own it" is pretty much policy (stereotypical American, right? "YOU CAIN' TELL ME WHAT TO DO WITH MAH PROPERTAY").

That's where the free software movement comes in.

The free software movement, which was officially founded by Richard Stallman in the eighties with the launch of the GNU project) isn't about making sure you never have to pay for software ever again (though I feel like we're definitely tending that way as a society, and as a broke college student I'm totally okay with this). Instead, the movement promotes four "essential" freedoms for users:
(0) freedom to run the program,
(1) freedom to study and change the program in source code form,
(2) freedom to redistribute exact copies, and
(3) freedom to distribute modified versions.
You'll notice that none of that says "users should be free from having to spend money for a collection of ones and zeroes"; when you see "free software," it's free as in speech, not necessarily free as in beer. Sorry.

Stallman himself has terrible-quality webcam videos explaining the philosophy behind free software: (CS celebrity alert!)



At the heart of the free software movement is the idea of copyleft, a "general method for making a program (or other work) free, and requiring all modified and extended versions of the program to be free as well."
To copyleft a program, we first state that it is copyrighted; then we add distribution terms, which are a legal instrument that gives everyone the rights to use, modify, and redistribute the program's code, or any program derived from it, but only if the distribution terms are unchanged. Thus, the code and the freedoms become legally inseparable.
Proprietary software developers use copyright to take away the users' freedom; we use copyright to guarantee their freedom. That's why we reverse the name, changing “copyright” into “copyleft.”
Copyleft is a way of using of the copyright on the program. It doesn't mean abandoning the copyright; in fact, doing so would make copyleft impossible. The “left” in “copyleft” is not a reference to the verb “to leave”—only to the direction which is the inverse of “right”.
You can (and should!!) read about copyleft here and here, and there's also a whole series of pages about the underlying philosophy of the free software movement.

The origins of the free software movement and copyleft are described in the documentary Revolution OS, which is much more than "a documentary about Linux." And it's freely distributed (because, really, it would be weird if it wasn't), so you have no excuse not to watch it if you haven't already. You can torrent it here or watch the whole thing on YouTube here.

Revolution OS movie poster linux penguin
Really, this should be required viewing to pass 46A
Free software has done a lot for us. Apache's web services are community-developed and free to use, and they host more than 60% of all web pages whose servers we know. OpenOffice and LibreOffice are free replacements for Microsoft's Office Suite. Adblock Plus makes the Internet a less annoying place. I use Notepad++ and GCC to write and compile my C code. I've used Anki to study. I think you get the idea. The free software movement has contributed to some fantastic code out there, and since projects have an entire community behind them, they continue to evolve and create a more diverse and robust collection of software.

So stop hogging all your code to yourself. You're not helping anyone. You jerk.

Saturday, October 5, 2013

Agile: All the cool kids are doing it

I have a friend, and you probably do too, who decides that once they decide they like something, it's absolutely critical that I like it too.

"This is really great!"

"No, really, you should check it out!"

"I really think you'd like it."

Okay. Yes, I probably would like Game of Thrones. It sounds like the kind of thing I'd enjoy. But since you tell me this every time I see you, I have lost all interest. Done. Over. Don't bring it up again. Please.

This is basically how I feel about Agile development. It really does sound great: for the uninitiated, Agile is a type of software development strategy that focuses on iterative development and client feedback. But those are buzzwords, and buzzwords don't mean anything. In English: the idea of Agile development is that you do things one piece at a time. Pick one specific part to work on during this period (called a sprint) and set a deadline for it. Make sure it works, make sure the customer likes it. Tackle another piece of the puzzle. The important thing here is that you have a working piece of software at the end of each sprint; it just gets fancier and gains more features as time goes on [source]. Clients help shape the development, so if your client suddenly changes their mind about a feature, they'll be letting you know during development and not when you hand them the finished product (no one wants to hear "thanks for that thing you made me, but I actually needed it to do this").

agile flow chart graph method
Agile in pictures. Thanks, Wikipedia


There are many development methodologies that fall under the Agile umbrella, but the most popular is the Scrum framework. (It's a stupid word, I know. It sounds disgusting. It sounds like scum. Let's be adults about this.) Scrum (not scum) focuses on daily face-to-face meetings with all team members, discussing plans and progress. Tasks are kept in a prioritized list based on each task's importance, difficulty, and required time. They're typically formatted as "user stories": As a <user type> I want to <do some action> so that <desired result> [Thanks Wikipedia]. No one gets left in the dark about progress, and things get done. (That entire paragraph, really, is thanks to the Wikipedia Scrum page. Check it out. It's honestly the clearest description you'll find of Scrum techniques.)

The other thing with Agile development is that everyone uses it. Everyone. In 2000, about 1% of companies used Agile. It really wasn't a thing that people did. Now? 60-80% of software developers use Agile tactics. Huge names, too: IBM has a page about Agile development and Cisco has been Agile since at least 2011 [PDF]. And it's not just software: universities, militaries, and even car manufacturers are integrating scrum techniques into their workplaces. Even NASA was Agile-ish for their Ensemble project.

So what's my issue with Agile? Really, it's just that everyone is talking about Agile. Everyone loves Agile. Agile this, Agile that, Scrum Scrum Scrum. And it gets old. It sounds like buzzwords and marketing and company Kool-Aid. And it's really off-putting to be told by everyone ever how great Agile is and how much I'll love it. I'm having Game of Thrones Syndrome with Agile. Keep telling me how much I'll love it and I swear to you I'll never take it seriously.

Watch, first job at a tech company and I end up a diehard Agile fan. Don't you dare tell me you told me so. 

Friday, September 20, 2013

LinkedIn and branding: unless you don't want a job

It's hard to get a job these days. Everyone is looking but no one is hiring. It's the kind of job market where you think "maybe I'll hold onto that retail job after graduation, just in case." 

Or maybe you should update your LinkedIn profile.


linkedin logo
You've heard of it before, I swear


Okay, I know. People always say "make sure your LinkedIn profile is up to date! This is really important! If you don't, you'll never get a job and you'll starve on the streets!" and it seems a little over-dramatic. I mean, chances are pretty good you won't starve (you have that retail job to fall back on, remember? You love working retail!). But it turns out they're onto something with that advice: companies care about LinkedIn a little more than you might think.

Like a lot more.

Like they probably won't hire you if you don't have one.

Ed Nathanson, the director of talent acquisition (read: the guy in charge of getting you hired) at software security company Rapid7 talked to Wired magazine about the importance of LinkedIn and hiring for an April article (which you can check out in full here).
Nathanson says that Rapid7 now uses LinkedIn Recruiter for all of its recruiting purposes, and that the company’s recruiters spend anywhere from four to five hours on LinkedIn each day. He and his team have used LinkedIn to more than double the size of Rapid7 in the last year and a half. 
In other words, Nathanson finds the vast majority of future employees on LinkedIn. And if you aren’t on LinkedIn? He’ll probably never find you. And even if he did, he probably wouldn’t hire you. “I’m always amazed at people who aren’t there now,” Nathanson says. “When I talk to candidates and they aren’t on there that’s a big red flag for me.”
Let's emphasize that last part. It's not just that having a LinkedIn profile helps. Not having one can ruin your chance of getting a job, regardless of how qualified you are. And this isn't just Rapid7. There are tons of companies that have, instead of hiring their own talent-seeking team of professional recruiters, been switching to LinkedIn as their method of searching for and screening potential employees.

So maybe it's a good idea to have a LinkedIn profile that you keep updated. Polish it up every now and then. Reassess your skills. Make yourself look employable. Because if you don't...well, I hope you really like retail.

Friday, September 13, 2013

QR codes: More clever than they seem (promise)

QR Codes have been popping up everywhere lately. They're on flyers, posters, company advertisements, business cards, my sidebar...


banana QR code
...bananas...
cookie QR code
...cookies...

memorial QR code for dead people
...dead people...
volleyball butt QR code
...butts....

everywhere. (For the curious, those were all from wtfqrcodes.com). It turns out the Internet has a lot of hate for QR codes.

QR code flow chart
Not pictured: any love for QR codes

And I get that. It's often way easier to make a link to a website than to create a multi-step process symbolized by what looks like a bar code that's been accidentally run through the wash. According to this poll [warning: PDF] 79% of people have never heard of a QR code (though 81% has seen one before) and 57% of people who have scanned one didn't actually do anything with the information they got from scanning.

So if QR codes are so terrible, why am I making a point to tell you about them?

Because they're really really cool. Not "cool" like "hip," like marketing people want them to be, but "cool" like "wait that weird blob does what?"  Ever wonder why people can cover up half of their QR code with a logo without affecting its ability to take you to the right place?


Octocat QR code customized logo
This should break it, right?

It turns out that QR codes have built-in error correcting mechanisms to help deal with unexpected dirt (or Octocats) that may distort your code. The codes use the Reed-Solomon error correction algorithm, which uses polynomial algebra and compares t checkpoints in the code (therefore being able to correct up to t / 2 mis-read symbols). I won't go in-depth about the actual algorithm; polynomial algebra isn't usually how I like to spend my Friday mornings. For the interested, this site gives a very detailed explanation of how the algorithm is applied. For the slightly less interested, it's the same sort of code (cyclic) as the CRC-32 algorithm that checks Ethernet correctness. Now you have something to talk about at parties.

Not all QR codes are created equal (or equally repairable). There are four different levels of QR error correction, each of which is able to repair a different amount of damage done to the code:
  • Level L: corrects up to 7% damage
  • Level M: corrects up to 15% damage
  • Level Q: corrects up to 25% damage
  • Level H: corrects up to 30% damage

The level is indicated on the QR code itself by the blocks immediately to the right of the bottom left alignment square.

QR code error correction module and levels
You never would have thought to look, would you?

Pop quiz: based on this information, what error correction level was that Octocat code? I'll wait. Level H, you say? You're so clever. Good for you. So that's why the logo didn't ruin it -- that QR code has the highest possible level of error correction [sources 1 and 2]. The main downside to QR codes of level Q or H is their complexity; a small print size or long distance (think business card or billboard) will make your code very hard to read. As such, levels L and M are usually sufficient, unless you plan on keeping your QR code on a surface that might get dirty or change shape


volleyball butt QR code
like a butt