Posts

Showing posts from 2016

High Performance Computing, yet another brief installation tutorial

Today’s mid- and high-end computers come with a tremendous hardware, mostly used in video games and other media software, that can be exploited for advanced computation, that is: High Performance Computing (HPC). This is a hot topic in Deep Learning as modern graphic cards come with huge streaming process power and large and quick memory. The most successful example is in Nvidia’s CUDA platform. In summary, CUDA significantly speeds up the fitting of large neural nets (for instance: from several hours to just a few minutes!). However, the drawbacks come when setting up the scenario: it is non-trivial to install the requirements and set it running, and personally I had a little trouble the first time as many packages need to be manually compiled and installed in a specific order. The purpose of this entry is to reflect what I did for setting up Theano and Keras with HPC using an Nvidia’s graphic card (in my case a GT730) using GNU/Linux. To do so, I will start assuming a clean Debi

Beyond state-of-the-art accuracy by fostering ensemble generalization

Image
Sometimes practitioners are forced to go beyond the standard methods in order to gain more accuracy with their models. If one analyzes the problem of rocketing accuracy, ensembling is a good starting point. However, the trick lies in getting enough generalization from feature space.  In this regard, ensemble generalization--do not confuse with classic or "standard" ensemble methods such as Random Forest or Gradient Boosting-- is the right path to follow, however complex. The idea is to combine predictions from "base learners" to train a second stage regressor, using these predictions as metafeatures. The trick is to use a J-fold cross-validation scheme and use always the same data partitions and seed. This kind of ensemble is often called stacking --as we "stack" layers of classifiers. Let’s do an example: suppose that we have three base learners: GBM, ET, and RF. Then assume we have a LM as level 2 learner. First we divide the training data into