Sunday, March 4, 2012

2/27 - 3/2 CS373 Blog Post

Good evening all,

One week away until Spring Break!  Unfortunately it comes with a long and enduring road for me.  I have a project and four midterms that are in my way until then!

This week was mainly about the Netflix project.  I wrote quite an extensive wiki of it (viewed here) detailing all of my algorithms that I tried testing, along with the problem itself.  The basic idea of the Netflix project was to achieve a root mean squared error (RMSE) below Netflix's score, which was around .9474.  The root mean squared error calculated a margin of error between a prediction of ratings (that I would generate based on data) and the actual ratings that users gave movies.  Sifting through roughly 1.5 GB of data, I compiled some useful caches to be used in my program.  The two main caches contained an average rating for each movie, and an average rating for each user.  With these two caches, I was able to predict a movie rating based on what was sent to me in the probe file (this file contained a list of movies whose ratings that needed to be predicted).  This project was not very hard to implement; the hardest part was tweaking the program and looking at what gave a better RMSE.  I had stuck with taking the average movie rating and average user rating and figuring out how to implement these two to come up with a semi-decent prediction.  I ultimately decided to make another cache: one that had the calculations of all movie ratings combined and all user ratings combined.  I had then calculated some offsets based from the average and multiplied a given weight to some of the numbers in order to find a decent rating.  All of this is explained in heavy detail in the wiki, linked above!

We had also talked about some more interesting things about Python, notably how many types inherit a lot of the same ideas.  Strings, tuples, and lists are all able to work on things identically when a string is passed in their constructor.  It's a really interesting idea and one that makes sense.  I like how everything in Python is very succinct for the most part. Although, now I often catch myself writing code in my other classes and I forget to add semi-colons, brackets, and the likes. Honestly, this class is making me look down at Java and look up at Python more and more, every day.  And I don't mind that!

I'm very curious as to how this new project is going to unfold.  I'm excited to see what all we need to do to make a nice looking website, and it'll be fun working with a big group.  It makes me wonder how we'll use Google Code and Python to build an aggregation-based website.

Well, I have lots to do right now, but next week should be a much more extensive blog post.

Until then,

Corey

No comments:

Post a Comment