Weighted random choice using NumPy

random choice

 

If you’ve ever implemented a decent generative model you probably been in a situation when sampling using standard random.choice isn’t going to cut it. You want to generate numbers from a certain distribution or you just want a certain letter from your alphabet appear in the results a bit more often to simulate the real data better.

 

Then you probably ended up implementing something like this:

or you just stumbled upon this post from 2010 that has some heated discussions about this problem in python up and  lots of different implementations of this function.

So did I for my machine learning project last week until the glorious  internet helped me find this (relatively) new function in numpy that just effortlessly and elegantly does exactly what I want.

Thank you the Numpy crew 🙂

numpy.random.choice
This generic sampling function will generate samples from a given array. The samples can be with or without replacement, and with uniform or given non-uniform probabilities.

Using this nice wrapper we can generate a custom random protein sequence, for example. In this code snippet I’m using a Dirichlet distribution with a uniform prior to sample from. But a prior can be estimated from a given set of proteins to give the generator a better idea of what a dataset of our interest should look like.

Leave a Comment

Your email address will not be published. Required fields are marked *