Some thoughts and notebooks on philosophy, statistics and programming, and some projects I've been working on (R/Python/C++).

  1. Defining custom Greta distributions
    Greta comes with various distibutions for statistical modelling. In case you need one that Greta does not support, find a short tutorial how to do that here.
  2. Dirichlet process mixture models
    If you are interested in mixtures and nonparametric Bayes, check out my notebook on Dirichlet Process mixture models.
  3. Causality
    Some notes on Causality.
  4. Deep drama
    Deep drama implements a long short-term memory network for creating Greek drama. It uses drama from Euripides, Sophocles, Aristophanes and Aischylos from the Gutenberg project to train a recurrent neural network and then uses the trained model to write drama. In that sense it acts similar to other sequence models, just like HMMs.
  5. Analysing heterogeneous RNAi screens
    perturbatr is an R package for analysis of large-scale RNAi interference screens for pan-pathogenic datasets. It uses hierarchical models for inference and network diffusion to account for possible false negatives.
  6. Markov random fields
    Check out some notes on Markov random fields here.
  7. Proper data structures for R
    datastructures is an R package that uses Rcpp modules to extend Boost and STL data structures, such as heaps or maps, from C++ to R.
  8. The EM-algorithm
    If you've always struggled with the EM-algorithm and wondered why it is such an amazing tool check out a short tutorial that I've compiled for a course on statistical models.
  9. Philosophy of Science
    According to Richard Feynman, philosophy of science is as useful to scientists as ornithology is to birds. What a worrying stance for future generations of researchers! For that reason I summarized some important works on philosophy of science and philosophy of statistics that are worth reading.
  10. Essential R
    I compiled a document about my most frequently used tools and libraries when writing R packages. You can find it here.
  11. Interpreting R
    R-- is a work-in-progress R interpreter written in C++. I mainly implemented it to understand syntax trees, lexer, parsers, etc.
  12. Personal GitHub language statistics
    If you ever have been wondering what languages you are using the most on your GitHub projects follow this link.
  13. Pipe-able dataframes for Python
    dataframe is (yet another) Python data structure for tabular data. It uses a piping operator for chaining common data frame operations together elegantly.
  14. Gaussian Processes
    Bayesian non-parametrics such as Gaussian Processes are a wonderful approach to machine learning. Check out my notebooks on regression and classification.
  15. Network-regularized regression
    netReg is an R/C++ implementation of a network-regularized linear regression model. It incorporates prior knowledge in the form of graphs into the model’s likelihood and by that allows better estimation of regression coefficients. The main routines for estimation of coefficients and shrinkage parameters are implemented in C++11. You can also find it on Bioconductor.