Sunday, November 3, 2013

Data structures: They're pretty important

I don't know why, but there seems to be this trend where people sleep through CS 146. I've noticed it myself and I've heard stories from other people that this is definitely a thing. First of all, I'm not sure how it's even possible to sleep when Dr. Taylor is running around being excited about Dijkstra. Secondly, though, this is just bad -- that class is probably the most important one in the CS department. Algorithms and data structures are what people want you to know. While I'd love to discuss intricacies of shortest-path algorithms with you, though, the topic for today is data structures. I'm not going to pretend like this decision is because I find them more interesting than algorithms; it's more because I don't pick the topics for this blog. I don't have that kind of power.

So. Data structures. If you're one of those people who slept though Dr. Taylor's class (you can keep it a secret, I swear), let's review. Data structures are, probably unsurprisingly, different ways to store data -- the structure that your data has, if you will. The key thing here is to make sure you're storing data efficiently; the point of a data structure is optimization. Some data structures are better for databases, for example, while others are great for looking up information. This efficiency is a big deal: "efficient data structures are a key to designing efficient algorithms. Some formal design methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design" [thanks, Wikipedia]. Entire programming languages are structured around data structures. And you slept through that class. Way to go, hero.

Having the right data structure can make a huge difference. For example, suppose you had a bazillion items that you needed to search through. What if you stored all that information in an array? Arrays are simple, space-wise, and easy to use. But then when you have to search for your thing, it's going to take O(n) to find it. And that's not just the worst case; that's the expected runtime. That's bad, especially when you're working with numbers like a bazillion. Instead, what if you got all fancy and used a B-Tree? Clearly, you passed that homework assignment in 146. Instead of some multiple of a bazillion, your search is going to take O(ln n), and log of a bazillion is a lot smaller than a bazillion.

If you need a good cheat sheet for data structure runtimes (expected and worst-case) under various situations, there's a website helpfully called Big-O Cheat Sheet.

big-o cheat sheet algorithms data structures
You might want to bookmark this

If you take nothing else away from this blog (and let's face it, chances are pretty good that if you're reading this you don't actually care what I have to say about data structures) at least remember how important it is to have the right data structure for your programs. It can make or break your performance. This post on Forbes.com will do a great job describing to you that, even though you passed the class, you don't properly know your data structures. To help with that, Coursera has an algorithms / data structures class and Berkeley has posted lectures online.

You have no excuse now. Go learn things. Since you'll be working so hard to get interviews, you might as well make sure you have a chance at passing them.

2 comments:

  1. Hello Katharine,
    Your blog on Data Structures is very informative and engaging. It is one of the finest blog I have ever read. Starting with CS 146’s rumors to why efficiency matters, you have covered each and every possible thing about data structures. Your hyperlinks are extraordinary. One of the hyperlinks I really like is the one that takes you to the U.C. Berkley’s lectures. I also like the image that you used on your post. I will follow your advice and book mark it on my chrome browser. Just like all of your other posts, this post was very informative and educating. I sincerely hope, all the readers will follow your advice on data structure and consider data structures as a great tool to optimize their program. Overall, this post was a great read.
    Keep Blogging!

    ReplyDelete
  2. thanks for the pictures of tigers. i wanted to facebook one in my sons timeline but haven't figured out how to get it...sad right? you have soo much knowledge and then there's me

    ReplyDelete