Notablog
Some thoughts and notebooks on philosophy, statistics and programming, and some projects I've been working on (R/Python/C++).

Defining custom Greta distributions
Greta comes with various distibutions for statistical modelling. In case you need one that Greta does not support, find a short tutorial how to do that
here.



Deep drama
Deep drama implements a
long shortterm memory network for creating Greek drama. It uses drama from Euripides,
Sophocles, Aristophanes and Aischylos from the Gutenberg project to train a
recurrent neural network and then uses the trained model to write drama. In that sense it acts
similar to other sequence models, just like HMMs.

Analysing heterogeneous RNAi screens
perturbatr is an R package for analysis of largescale RNAi interference screens
for panpathogenic datasets. It uses hierarchical models for inference and network diffusion to account for possible false negatives.


Proper data structures for R
datastructures
is an R package that uses Rcpp modules to extend Boost and STL data structures, such as heaps or maps, from C++ to R.

The EMalgorithm
If you've always struggled with the
EMalgorithm and wondered why it is such an amazing tool check
out a short tutorial that I've compiled for a course on statistical models.

Philosophy of Science
According to Richard Feynman,
philosophy of science is as useful to scientists as ornithology
is to birds. What a worrying stance for future generations of researchers!
For that reason I summarized some important works on
philosophy of science and philosophy of statistics that
are worth reading.

Essential R
I compiled a document about my most frequently used tools and libraries
when writing
R
packages. You can find it
here.

Interpreting R
R is a workinprogress R interpreter written in C++.
I mainly implemented it to understand syntax trees, lexer, parsers, etc.

Personal GitHub language statistics
If you ever have been wondering what languages you are using the most on your GitHub
projects follow
this link.

Pipeable dataframes for Python
dataframe
is (yet another) Python data structure for tabular data. It uses a piping operator for
chaining common data frame operations together elegantly.

Gaussian Processes
Bayesian nonparametrics such as Gaussian Processes are a wonderful approach to machine
learning. Check out my notebooks on
regression and
classification.

Networkregularized regression
netReg is an R/C++ implementation of a networkregularized linear regression model. It
incorporates prior knowledge in the form of graphs into the modelâ€™s likelihood and
by that allows better estimation of regression coefficients. The main routines for
estimation of coefficients and shrinkage parameters are implemented in C++11. You can
also find it on Bioconductor.