Void’s Vault

Knowledge source for efficiency.

Testing MapReduce With MRUnit

I’ve recently started a big data project with Mathieu Dumoulin. We are using Mahout with Hadoop to do some machine learning with some Map Reduce in order to deal with big data the right way. We’ve found the way to test our Map Reduce code, so that’s what I present in this post.

Things to Avoid When Using Design Patterns

While doing a code reviews in a class of studients at Laval University, I heard that some people think that design quality is a very subjective matter. As long as they were using design patterns, the code they developped was clean. In this post, I will explain that everyone should be careful when using design patterns. While there are a lot a reasons to use them (and we should!), some patterns have drawbacks and flaws that should be considered.

MapReduce File Manipulation Using Pig Scripts

In this article, I will present you Pig, a scripting language that is used with Hadoop. I had to learn Pig for a project I worked on with Mathieu Dumoulin. Pig is very similar to the SQL’s syntax, but allows one to manipulate big data in mapreduce mode quite easily.