Disclaimer: this blog post expresses some impressions and details of Udacity CS101 "Building a Search Engine" online course. If you are either currently participating it or plan to do so in nearest future, this blog post could be a spoiler. Even though, I'm trying to make it generic as possible and do not spoil important things.
With a quite delay I've concluded units 5 and 6. I'm in a big rush now, since the exam week is already started, but I've not yet completed Unit 7. Fortunately, Unit 7 is not technical one, but rather common computer science education, that helps to shape all knowledge received through seven weeks together.
I would say that those 2 units is something where I start to feel some complexity. In Unit 5 we focused on making things faster, basically by introduction more advanced data structures for the same job. We went from a list based index implementation to self-implemented hash table and then utilized the Python dictionary type. Again, abstracting out of many simple things is what good developer should always do, but I was surprised how many things I forgot about main properties of hash functions and hash tables. We were also did a very basic algorithms analysis stuff.
Unit 6 is a real computer science. Besides of the playing with recursive algorithms we did more advanced things as graph theory. All of that was a fundamentals for implementing Pang Ranking mechanism. We used famous Google's (Larry Page's) algorithm that everybody heard of PageRank. This is where my brains start to heat. Will be honest with you, I still missing it's some parts, so it will take some time get the clear picture about it.
So, the crawler starts to have real search engine features. Not only extracting links and indexing the keywords, that's definitely not enough for search engine. But building the links graph and computing page ranks, that then used in lookup functions to provide the best choice on search keyword. It's very simplified but working model of something that Google have (probably something that Google might have back in 1998).
Python. I like language more and more and start to feel some confidence. In the same time there are several things that I dislike. Not so serious, almost cosmetic.. but something that a little bugs me a little.
Anyway, I have only few days now to submit my exam works. I already glanced on exam tasks and they don't appear to much complex, so I have good chances to be in time with it. Wish me a good luck! I'll update you as soon as I got any results!