techvigil-logo

In this world everything is finite. But there is one exception, the Web. Web has no limitation. Countless number of new web pages are being generated every hour but they are not dying at the same rate. This is quite similar to our body, where new RBC and WBC keep generating and older cells are keep breaking.

You feel very awkward when you try to access a web page that was available some times back, but now it is not. Or, may be the content of that page has been changed. This leads to development of Web archiving and caching. Most of the popular search engines now provide the facility to access a cached version of webpage. But you can not choose your desired point-of-time version to view the page. Instead they show you their latest cached version of the page.

Archive.org is the best place I found for accessing old and archived web pages. This "Wayback Machine" is a digital library of Internet sites and other cultural artifacts in digital form. It is a non-profit project, and is open to everyone in the world for accessing archived contents.

The Wayback Machine provides complete snapshot of all web pages of every website since 1996. They say that it is over 150 billion web pages and consuming 1.5 PB (PetaBytes) of storage, distributed between 900 computers. However you may not find most recently changed contents or newly added websites there.

The best thing about the best program is that they allow to choose any point of time, at which you want to see the page. After giving them the website link, they will show you a calendar spotted with crawling points. You can view the contents on the date by picking up one of the date.

Below is the screenshot when I searched for IBM website for 2000 calander year. The circled dates represent a crawling on the day by Wayback Machine.

Post tagged in: Internet