Collaborative filtering for product recommendations tends to focus on user-item recommendations – given a user, try to find the products they might like based on the products they have already interacted with. While personalisation like this is often preferable, sometimes we would just like to know what products to recommend if a user lands on a given product page. There is far less written about this in the literature, so I would like to share some thoughts (and maths).
Disclaimer : I am a white boy, straight outta Cambridge, UK. I do not condone the nastier, sometimes misogynistic language in this post, but in the interest of keeping it real, I have not made any effort to omit it either :b
Mobile users : you probably want to turn your screen sideways
The following was completely generated, unedited, by a rapbot (minus the title).
House Full of Bricks - D-Prime pump it up , just throw yo hands in the air this goes out to all my niggaz that don't dare i got a whole lot of diamonds in yo veins about havin thangs gold rings , and my platinum chain that's why all these bitch - ass niggaz get smoked they don't give a fuck , that shit ain't no joke nigga this goes out to all them crooked ass cops since back in the days when i was getting popped always had a good time for me to be fine i've lost my mind , and i'm still gon shine but i don't wanna see a nigga on the grind we ain't trying to be ready for war or crime check this out , a lot of people are scarred and it's my time when it comes to the stars cause i was born with a house full of bricks yeah , we can see it all in the mix but now it's hard for me to beg and feed i gotta wake up , so take away my seed
Ever since I listened to Illmatic as a youngster, I’ve loved hip hop, and the amazing creativity of (some) rap lyrics. I also love machine learning. Over a year ago, I decided I would write a machine learning algorithm that could ingest rap lyric text, then generate lyrics of its own automatically. This was hard, and I gave up. Twice. After many, many iterations, I eventually came up with a model that could produce the above. The following is a brief description of how this was achieved, the full gory technical details as well as code will be written up in a later post.
but right now i'm just trying to make it nice this is my life , you can pay the price i ain't gotta wait til it's time to take flight have a party all night , everything's gonna be alright so now do you really wanna get with me tonight it ain't no need to talk about what i write
This is a follow on from the previous post about the birthday problem. Now we will look at the more general case where we want more than two people to share a birthday. Incidentally, the reason for my interest in this problem is that we used to use it as a coding/probability exercise for data science interviews at Qubit (not anymore!), and because there seems to be surprisingly little written about it online.
I have been doing a bunch of side projects recently and not writing them up. This one I think may be of some interest to other people since TensorFlow is so in vogue right now. I have been interest in trying sequence to sequence learning for some time, so came up with a toy problem to solve. I actually took some effort to make the notebook readable, and would probably be easiest to just read that to see the problem and code description : see the notebook here. It includes this picture (just to make this post look more interesting) :
Please note this is a work in progress, I will probably write up the problem/solution itself at a later date (but at my current rate of write-ups, maybe not!)
Just here for code? Git repo.
This is a followup on my last post about enhancing images using a generative adversarial autoencoder structure. This post is about how it was done, and we provide code to hopefully let the reader replicate the results.
This project was done in Theano, and closely follows the code given for the DCGAN paper of Alec Radford et al. I refactored some things to make it easier to change things around, and I had to change the architecture a bit. I originally tried porting the code over to Lasagne, a library built on top of Theano, but decided that it was only slowing me down doing this. After this project I have started to think that sadly, for small experimental projects using novel techniques, working with small simple modules over Theano is quicker than trying to twist your code to fit some given library.
(edit1 : this got to the top of r/machinelearning, check out the comments for some discussion)
Recently a very impressive paper came out which produced some extremely life-like images generated from a neural network. Since I wrote a post about convolutional autoencoders about 9 months ago, I have been thinking about the problem of how one could upscale or ‘enhance’ an image, CSI-style, by using a neural network to fill in the missing pixels. I was therefore very interested while reading this paper, as one of its predecessors was attempting to do just that (albeit in a much more extreme way, generating images from essentially 4 pixels).
Using this as inspiration, I built a neural network with the DCGAN structure in Theano, and trained it on a large set of images of celebrities. Here is an example of a random outputs, the original images are on the left, the grainy images fed into the neural network in the middle, and the outputs on the right.
N.B. this image is large, you should open and zoom in to really see the detail / lack of detail produced by the DCGAN, I certainly do not claim the DCGAN did phenomenally well on pixel level detail (although occasionally I’d say it did a pretty impressive job, particularly with things like hair)
Back in March 2015, I wrote a post where I visualised a large amount of reddit comments, by looking for buzzwords over time and plotting them as a wordcloud. Now that 2015 is over, I decided to plot the remainder of that year.
I made a few small tweaks to the original algorithm. For example, when looking to see how ‘surprising’ a given word is for a given month, the algorithm now looks at the last 12 months to see how prevalent it was (rather than the 12 months of the year, which is less chronological in a way). For aesthetic reasons, I also made sure that each word was only emphasised in one of the months (otherwise ‘Trump’ got ever more buzzword-y over time, and showed up in a lot of the different months).