CS101 Building a Search Engine: Week 3

Disclaimer: this blog post expresses some impressions and details of Udacity CS101 “Building a Search Engine” online course. If you are either currently participating it or plan to do so in nearest future, this blog post could be a spoiler. Even though, I’m trying to make it generic as possible and do not spoil important things.

Yesterday, I’ve concluded Unit 3 of CS101 Building Search Engine class. I had a little lag, since I’ve been to little vacation at the beginning of previous week, so got a chance to get back to class only Thursday. So, I still have one homework task in my to-do list.

It’s been an interesting unit, through it’s still very basic one. I’m little more confident with Python, getting powered by knowledge of collections, indexes etc. Again, I’m really pleased with language simplicity. Just few code snippets I like,

# creates generic list
        some_list = []
        
        # add something inside
        some_list.append(1)
        some_list.append('z')
        some_list.append([3,2,1])
        
        # iterate by for loop
        for e in some_list:
            pass
            
        # or with while
        while some_list:
            e = some_list.pop()
        
        # get index of element
        index = some_list.index(1)

I also started to familiarize with functional style of Python programming. You can find some good inputs here. Everything look very interesting so far.

This week we moved further with “real” implementation of web crawler. Instead of going by the set of quizzes I went my own path and created my implementation of simple crawler. So, what it does currently is go from ‘seed’ page and collect all links it’s able to find on target pages and related pages. I went a little far, since I made it run on real web requests, instead of test data that current unit supposes. If you are interested code could be found here.

Still I pretend as CS101 student trying to apply only knowledge I got through latest weeks. It’s great exercise I believe, showing some gaps in my education or concept understanding.

Homework was interesting as well. Anna Patterson was a starring guest for homework session. Together with Anna we tried to improve crawler with some real life requirements, like max_pages and max_depth parameter to prevent crawler to stay in indefinite loop. Anna is great expert in this field, so for each homework task I highly recommend to check the answer, a lot of interesting details there.

Posted by Alexander Beletsky. If you liked that material, please consider to follow my twitter account for further updates. If you have comments or questions, do not hesitate to contact my email or raise issue on github.

Alexander Beletsky's development blog

My profession is engineering

CS101 Building a Search Engine: Week 3