This is who I am.: February 2012

Sunday, February 26, 2012

2/20 - 2/24 CS373 Blog Post

Good evening everyone,

Another week has passed, and we are getting closer and closer to a glorious Spring break! Anyone have any plans?

In class this week, we had a great presentation by some employees of Mutual Mobile, a mobile software developer company. They had discussed what they do, how they do it, and why they do it. As a curious mobile application developer, I've always found it interesting creating apps for mobile technology. As they had said in their presentation, mobile development is becoming a lot more prominent with the rise in mobile technology. The influx of smartphone holders has allowed for the birth of mobile development. It's truly an exciting area of development. Even when I'm busy with projects in multiple classes, I find myself wanting to develop for my iPhone because it's a lot of fun. I'm so used to developing things in the command line; it's nice seeing some GUI output! It was also very cool to see the technologies they use. I had never heard of Jenkins or Jira until they had talked about it; I need to take a look and see if I can incorporate these things into any of my personal projects.

We had also started discussion about the Netflix project, which is due this coming Wednesday. I had heard about the "Netflix question" about two years ago in my 336 class, but that was more so figuring out the logical aspect of the question; not so much figuring out an algorithm that could beat theirs. It's been an interesting problem at the least, and there are a lot of things to take into consideration when trying to produce a better RMSE (Root Mean Squared Error). For instance, the plethora of different caches one could try to implement given the movie data. We had discussed two in class: providing the average rating for a given movie, and providing the average rating for a given user. I'm currently in the process of creating these caches to test out in my program (the average rating for a given user does seem to take a while to create, given that we have 1.5GB of files to look at!). But other ones that can be implemented may be a bit tricky to create. For instance, finding users who have similar tastes in movies would take a while to look at, unless we developed another meta-cache that could tally up all the different movies one user has seen, and then compare from there. Another cache is creating an average rating for movies that were created in a certain decade. For instance, we would have movie averages from the 1950-1960s, 1960-1970s, etc. We could also go further and develop another cache that had movie averages based on the year a user reviewed it. So many options to discover; the only downside is actually parsing all of the 1.5GB of data into a useable cache. But, the rest is fairly easy.

This class has been a great experience; probably one of my favorite classes. I have learned so much simply through practice. It wasn't until this class that I became devoted to creating multiple unit tests to assert that my programs work correctly, and it's kind of a crazy thought. I've looked back at some older programs and projects that I've worked on, and it's absolutely insane that I didn't use unit testing on them! Unit testing in Java was emphasized a little in 337 (Theory in Programming Practice) but they never emphasized on what we were to test on. We would test methods for some corner cases, but not so much the fact that it produces valid output all the time, that we can throw/catch appropriately given the potential input we could face, and so on. It's a little ridiculous. Unit testing, I've learned, keeps me pretty sane about my programs!

This is all for now, I've got to get working on multiple projects. More on the Netflix project will come next week!

Until then,

Corey

Saturday, February 18, 2012

2/13-2/17 CS373 Blog Post

Good evening everyone!

It's been a busy week! The PFD project was due and we had our first exam in the course. A little stressful, but I think we all survived!

This week, the primary focus was on the Project File Dependencies project. The basic concept of this project was to take in an input (essentially a graph of nodes that contained a list of dependencies) and output a result where numbers would be placed in a way so that the number before it did not depend on it. This definitely was a project that could have used several different implementations to get the job done right. In collaboration with my partner, we ended up choosing the easy route by using a list that would, through iterations, sort itself (with as few calls to sort as possible). In our program, we started off by reading the input and we read the appropriate amount of lines based on the number we obtained from the first line. The first number designated how many vertices (or nodes, if you prefer that) are in our graph, while the second number represented how many dependency rules to follow. We initialized a list filled with empty lists, with the number of empty lists being dependent on the number of vertices that would be in the graph. We then iterated through the rest of the input to determine the rules. Each line contained the node number, followed by the number of dependencies it has and the dependency node numbers following that. The construction of our graph rules ended up appending such data into the graph's corresponding index of the rule's node. So, for instance, a rule that contained "5 3 1 2 3" would create a list [1, 2, 3] that would fit in graph[4].

Once we filled up our graph, we went down through the list and appended the indices whose dependency lists were empty. These indices would be appended into an intermediate container (in our case, a sorted list). Since we would go down our graph every time, the list would be sorted initially. For every index we found to be empty, we replaced the empty list with a -1 delimiter to signify that it has been "removed" from our graph. Then, for each node in our intermediate list, we would search through all non-empty dependency lists and remove that node from the dependency list to again signify that it has been removed. After one run, we add the node to a result list. We search again and find for any empty lists; if there are, we add them to the intermediate container and sort. Rinse, repeat, profit.

Through observation, I understand that we could've done a better job in optimizing our code. Every time we add new nodes to our intermediate list, we sort, which can take a while depending on how big the graph is, and how many nodes become free through one removal. I did learn through research though, that Python's sort() function is an adapted merge sort which runs a lot faster than a traditional merge sort algorithm. Regardless, had we had used a different kind of container like a linked list or a binary heap, we could have been more efficient with more optimal code. Maybe I'll experiment with that later.

In class, we discussed about iteration, parallel assignments, argument keywords, default arguments, unpacking iterable items, and creating tuple and dictionary containers in functions (phew, so many topics!). I guess it never really hit me, but it makes sense that within a for each loop that items are copied. It abides to the iterablility (not really a word, but bear with me) standard. Most of our conversation this week revolved around some interesting quirks with Python. Parallel assignment was interesting in the sense that one could pass in any iterable object of a given length (designated in the parameters of the function), and it would work. I guess that's one amazing but potentially dangerous thing about Python; you can pass in just about anything, but it has to make sense. A list, tuple, set, dictionary, xrange, can all be passed in the example slide with the same value. Pretty ridiculous. Passing in arguments as keywords was also a really cool thing to learn about; as Professor Downing had mentioned, this is incredibly useful for defining what your variables are exactly in method calls. Someone looking at fresh code could see a function call like this:

custom_hash(24, 189, data, "x381jfs")

and would probably have no idea what it meant, other than it uses some kind of data to create some hash. If instead, it could be re-written as such:

custom_hash(data_begin_index = 24, data_end_index = 189, list_of_data = data, salt = "x381jfs")

one could then decipher that this is a hash function that takes a list that's only interested in a range of the list, and then salts it. Of course, it is very fragile. It can be easily broken if one decided that the parameter names were either too ambigious, too long, unnecessary, etc. Still a useful thing to use. You can also put the assignments in different places and it would still spit out the right answer, so long as you provide the necessary arguments in the right place for those that aren't assigned explicitly. Furthermore, you can provide default values for arguments if they aren't passed. Again, pretty useful if a function call is used several times and a fixed value is used consistently.

We also discussed about unpacking items, as well as creating tuples and dictionaries straight from the parameter line. Unpacking items was kind of cool. You could pass in just about any iterable item (like the parallel scheme discussed about above) of a given length, based on the parameter, and it would unpack the data inside to whatever you desire. That is, it would strip the data from a container to be used for other purposes. This helps with removing the need to iterate through data and add it to some kind of container. Alongside of unpacking, we can do the opposite with creating tuples and dictionaries. Again, useful functions that eliminate those extra lines of codes because programmers can be awfully lazy sometimes.

Lastly, we had a test last Friday. It didn't turn out as bad as I had expected! As Professor Downing had said, he geared the test to make the cheat sheet virtually useless, but it did indeed help as a very helpful review over all the material we talked about thus far. I stumbled a bit on the Haskell problem (I didn't really look into it that much; I had bet that it would've been a multiple choice question!) but I had a rough idea of the syntax. Hopefully not many points will be deducted from that problem. All I know is that all of us are anticipating what the final scores will be!

Until next time,

Corey

Sunday, February 12, 2012

2/6 - 2/10 CS373 Weekly Blog

Good morning, everybody!

Another week has passed in the realm of college; amazing how time can fly so fast.

This week in software engineering, we discussed several topics: memory allocations on the stack versus the heap, variables and how they're allocated and/or cached; conditional statements, arguments and whether or not they're mutable or not; and different implementations of summing up a container of elements either using an index or an iterator (and whether or not a given container is indexable or iterable).

Discussion about stacks and heaps were a review for me, considering I hadn't really talked about them in a long time (maybe two, three years ago even!). I knew there was a certain limit to how many records a stack can hold, but I didn't know of a specific number. I can't really think of a time that a program would need that many layers of stack records to perform a specific operation (unless, of course, it is an incredibly expensive, basic arithmetic function) that would either use more than 1000 stack frames or would cause it to overflow. Maybe that could be just because I haven't experienced the real world with actual software people use that I'm clueless on this idea, but the idea baffles me. With the heap, I can understand. Many programs can use a 'heap' of memory (get it?) that may cause a lack of memory issue. Again, I guess it'll be something that will be more significant later on in the future. I know this problem is a significant issue when dealing with smaller devices, such as mobile phones (iPhone, Android, Windows Phone 7, etc). Since memory is not as available than those in modern-day computers, it must be used wisely. Memory allocation and release must happen when variables need to be garbage collected; otherwise, memory leaks occur, and this intensifies the lack of memory issue. This is critical!

I never knew that variables were cached, at least to a certain range. It's kind of peculiar what kind of ranges Java and Python have specified (Java: [-128, 127] and Python: [-6, 256]). I guess I can kind of understand Java's range in binary terms (127 is 01111111, -128 is 10000000 with two's complements method). I guess Python decided with that range based on how frequently that range of numbers gets used? Who knows. Since this class, I also didn't know (or at least didn't realize it) that arrays were mutable and were passed by reference. In the past, many professors would just say that all things in Java were passed by value (that is, their values were duplicated and sent to methods). But arrays clearly seem to be the exception here.

We also had a great review and discussion about indexability and iterability (those aren't words in real life; just in the computer science world). As a refresher, indexability refers to those containers where items can be indexed; that is, they can be found by using a certain subindex/get method that can be processed in constant time. There are containers that can do this, but in linear time; these aren't truly indexable because we are merely walking down the list to find our item. Iterability refers to those containers that can be walked through but (usually) don't have an index/don't care to have an index. Such examples can include Maps and and Sets. These containers only care about what that value is rather than where the value is. Other containers, such as arrays and ArrayLists, care about where the value is rather than what the value is. We note that certain containers (ArrayLists, Lists, LinkedLists, etc) can also be iterated upon, but certain containers are better off being indexable (ArrayLists, Lists) and others, iterable (LinkedLists, Maps, Sets).

The PFD project has been an interesting one in using Python. I still get into the habit of using brackets to denote functions or loops; it's weird how everything is strictly bound by whitespace. Also, I have to get into the habit of knowing that for loops do not exist in Python. It's sort of inconvenient, but messier code (perhaps more elegant code when refactoring what to do) can be used to accomplish the same sort of thing. More on the project will come next week!

Until then,

Corey

Sunday, February 5, 2012

1/30 - 2/3 CS373 Blog Post

Good afternoon, everybody!

Another week, another blog post. Like always, lectures have been rigorous but very interesting. It's seriously amazing how much you can learn just by looking at small snippets of code and dissecting every meaning of every part of the syntax. It's almost as if, when beginning to learn the art of programming, that you take some things for granted. That, or they were just never talked about in great detail in the introductory classes. It makes me want to re-evaluate everything that I've learned and assure myself that I am understanding every part of a given programming language.

It's been interesting taking a look at Haskell code. The format is so different from other languages I've encountered! It is always cool to look at new programming languages and to see how they are similar and different. In the end though, they all have the ability to perform the most common tasks, and that in itself is pretty cool.

The Collatz project was actually a lot of fun to program. I had an enjoyable time attempting to optimize it as much as I can in order to get the speed of the program down to a minimum. It was good that I had finished the program early so I could tweak it as much as possible. I also wrote down an awfully lengthy wiki on it unintentionally. I just wanted to keep my information as detailed as possible so that someday (in the future) if I ever looked back at it, I could understand all that I did over again. Although, a mishap did occur when submitting the project. It seems like when I zipped up all of my files, I forgot to do a recursive zip, so my HTML files didn't get submitted! For starting early and finishing early, it was definitely a crucial hit to my grade (already 20 points off). I'm the only one to be blamed, though: I had checked all of my files after submitting, but I had assumed that the files existed within my subdirectories in the zip file when checking. Lesson learned: check everything for my own sanity.

Next week's project seems really interesting. Determining dependencies and listing them out seems like it'll be a pretty thoughtful project. I'll need to brush up on my knowledge of priority queues and binary heaps; it seems like that will be the data structure that will be used to solve this problem. Honestly, I can't recall ever learning about priority queues and binary heaps in CS315 (CS314 now, I suppose), but I have used priority queues in the past. Anyone else learn about those data structures elsewhere?

Until next time,

Corey

This is who I am.