Some thoughts and notebooks on philosophy, statistics and programming, and some projects I've been working on (R/Python/C++).
Defining custom Greta distributions
Greta comes with various distibutions for statistical modelling. In case you need one that Greta does not support, find a short tutorial how to do that
long short-term memory network for creating Greek drama. It uses drama from Euripides,
Sophocles, Aristophanes and Aischylos from the Gutenberg project to train a
recurrent neural network and then uses the trained model to write drama. In that sense it acts
similar to other sequence models, just like HMMs.
Analysing heterogeneous RNAi screens
is an R package for analysis of large-scale RNAi interference screens
for pan-pathogenic datasets. It uses hierarchical models for inference and network diffusion to account for possible false negatives.
Proper data structures for R
is an R package that uses Rcpp modules to extend Boost and STL data structures, such as heaps or maps, from C++ to R.
If you've always struggled with the
and wondered why it is such an amazing tool check
out a short tutorial that I've compiled for a course on statistical models.
Philosophy of Science
According to Richard Feynman, philosophy of science is as useful to scientists as ornithology
is to birds
. What a worrying stance for future generations of researchers!
For that reason I summarized some important works on philosophy of science and philosophy of statistics
are worth reading.
I compiled a document about my most frequently used tools and libraries
packages. You can find it here
is a work-in-progress R interpreter written in C++.
I mainly implemented it to understand syntax trees, lexer, parsers, etc.
Personal GitHub language statistics
If you ever have been wondering what languages you are using the most on your GitHub
Pipe-able dataframes for Python
is (yet another) Python data structure for tabular data. It uses a piping operator for
chaining common data frame operations together elegantly.
Bayesian non-parametrics such as Gaussian Processes are a wonderful approach to machine
learning. Check out my notebooks on
is an R/C++ implementation of a network-regularized linear regression model. It
incorporates prior knowledge in the form of graphs into the model’s likelihood and
by that allows better estimation of regression coefficients. The main routines for
estimation of coefficients and shrinkage parameters are implemented in C++11. You can
also find it on Bioconductor.