Sudeep's Blog

Disorganized Thoughts in Organized Manner

This post is all about multiple online tools to create wireframing for a new website and start a new business. Most of the references are from an excellent class on Coursera on Startup Engineering. I have listed them here for a quick reference. Most of the design principles today rest on responsive design, a webpage that floats and re-adjusts itself based on the device and screen-size. Writing pixel-perfect and custom CSS for such a design in a time-consuming task.
  • Twitter bootstrap is the authoritative style for such design. Here is a link
  • There is a web-based, template-based WYSWYG tool for quickly building bootstrap'ed websites: Jetstrap
  • Other notable resources are: (1) Themoforest , A set of paid webpage themes (2) 99Designs , still more expensive and (3) dribbble, more like StackOverflow for design. You can also hire a designer on a contract.
Finally, reading Design books might also be a viable option for non-designers. The Non-Designer's Design Book by Robin Williams is an excellent starting point.
DataKind ( is a unique organization, connecting NGOs in need of data analytics and data scientists who want to change the world. According to them, it's a once-in-a-lifetime opportunity to work on a real life problem and make an impact. "Experience on a novel “real world” data science project. Forget building another spam filter, you’re going to use machine learning to battle human trafficking." They are accepting applications from software developers/data scientists/project managers on new projects in New York area. Hope some interesting projects come out of it!
For a quick update of a python dict, a try/catch block is a better alternative than if/else. for k in some_list: try: dct[k] = dct[k] + 1 except: dct[k] = 1 is more efficient than... for k in some_list: if k in dct.keys(): dct[k] = dct[k] + 1 else: dct[k] = 1 While later might be more readable. Obvious trick. A reminder to never forget it. (See this StackOverflow link:
Today marks the end of Kaggle's MarineExplore Whale Detection challenge. The challenge, simply stated, is this: You are given You are given a set of 2-minute .aiff sound files, some containing sound from some species of whale, while others containing other ambient noises in sea (possibly including sounds from different species of whale). The dataset consists of a 0/1 label train data (30000 samples) and a unlabelled test data (54503 samples). The challenge was to predict the presence of the relevant species of whale in test set . Like many, my initial approach was to read the aiff files and directly use sound frequencies from the file as features. This approach helps 'break-into' the 0.90 AUC (Area-Under the-Curve) score. Some of the most successful submissions, however, treat the problem as an image-processing problem, treating audio spectrogram as relevant feature. Check this forum for more information on these approaches. Using this approach, I have been able to obtain an AUC of 0.96016 with a respectable 56th place out of 249 participants. This gives me a (sorta) coveted Top 25% badge on Kaggle. Click here to checkout my code on Github.
I've recently participated a basic Kaggle Competition arranged by floks at Scikit. Here is a link to the competition. My biggest take-away from it is ipython notebook. A cool tool like R notebook to run and document your data analysis in browser. Here is my first ipython notebook:
I have been playing with Python's machine learning/big data packages and must say that they give R quite a run for money! For now, I can offer a step-by-step installation guide for installing these packages on mac OSX. Click here Finally, here is the main page for sklearn and amazing things it can do. Go like!
Wed Jan 09 2013
Nothing much to report, except this new exciting course announced on Coursera....Startup Engineering.
Sun Dec 02 2012
I remember using C++ pthread_mutex's in ancient past (well, during undergraduate years). That was my entire exposure to multithreading. Well, that and a little of Java's Thread. That was till last week. Like everything else, C++ multithreading has been give a boost (pun, eh) with boost library. Here, in nutshell is how it works:
  • Every boost::thread requires an object of type callable, with a operator() overloaded.
  • The object passed to a thread must be callable. If not, it's best to pass a shared_ptr to the object.
  • The boost::thread object itself is not copyable. But it can be placed in a move-aware container. I found it best to create a vector of shared_ptr's to thread objects and pass them to boost::Thread_group using an add_thread method.
  • Thread synchronization is achieved by simply calling join on Thread or join_all on Thread_group
Check this for more on threads.
Fri Nov 23 2012
Survived a big storm and a bout of cough/cold after that. In the meanwhile, learned a thing or two about factory pattern . The best resource I can give right now is this stackoverflow post.
Mon Oct 29 2012