If you’ve ever implemented a decent generative model you probably been in a situation when sampling using standard random.choice isn’t going to cut it. You want to generate numbers from a certain distribution or you just want a certain letter from your alphabet appear in the results a bit more often to simulate the real data better.
Then you probably ended up implementing something like this:
or you just stumbled upon this post from 2010 that has some heated discussions about this problem in python up and lots of different implementations of this function.
So did I for my machine learning project last week until the glorious internet helped me find this (relatively) new function in numpy that just effortlessly and elegantly does exactly what I want.
Thank you the Numpy crew 🙂
This generic sampling function will generate samples from a given array. The samples can be with or without replacement, and with uniform or given non-uniform probabilities.
Using this nice wrapper we can generate a custom random protein sequence, for example. In this code snippet I’m using a Dirichlet distribution with a uniform prior to sample from. But a prior can be estimated from a given set of proteins to give the generator a better idea of what a dataset of our interest should look like.