Sunday, January 28, 2007

Controlling the crawlers

robots.txt is a standard file that any webmaster can put up in his/her web directory which would contain instructions for any web crawler to control which pages (or parts of pages) should/should not be indexed.

Google has started an interesting series of posts on how to use robots.txt and control the Googlebot itself.

This is a comprehensive list of all the web robots out there. Whats even more interesting to note is that the list contains almost 300 web crawlers which are crawling our sites everyday.

Wednesday, October 18, 2006

Googlife – The good life

I am a 24 year old Software Engineer working in the Silicon Valley. Lately, I have realized how my life has gradually started revolving around Google and how soon I have become a Google aficionado.

My day starts in the morning when I check my Gmail (even before trying to brush my teeth) to see all the emails that came in late night or from the other side of the world, overnight. I also get my daily agenda emailed to me by Google Calendar to remind me of the tasks for the day (not to mention the reminder SMSes I receive from Google off and on during the day for meetings or discussions at work).

Its now more of a habit to log onto my personalized Google News page to check whats happening around the world, everyday. (With ofcourse those puzzles I like to solve, delivered everyday on my desktop by Google feeds!)
As the day goes on, Google (well, the search engine itself) comes handy a lot of times whether it is something to do with my code or getting some general information about the technology that I am working on. Google Notebook is the all-time solution for writing down all those small things I need to remember (not to mention the anywhere-access which makes it as great as it is)

When I am back home, I am either busy reading technology blogs (through Google Reader) or sometimes busy finding my 10 year old friends on Google Orkut.

(No, I don’t use Google Wi-fi because I am in Sunnyvale and not Mountain View)

When it’s evening time, my cousins and friends are usually online on Google Talk. We have a good time text/voice chatting for sometime while I am listening to online music (keeping myself updated about the latest from Google Music Trends) before it’s time for dinner and for me to find a new recipe on Google again (trying to make it look as close as possible to the pic on the website :-))

Apart from that, I have recently tried out the new Google Groups (Beta) to start a group for all my old school mates to interact with each other more often and presto! I am surprised by the response.

If I am not reading white papers on Information Integration (after picking them up from Google Scholar) you can definitely find me enjoying the interesting documentaries on Google Videos. Moreover, if there is any place I need to go to on weekends, Google Maps is incontestably the first choice for finding routes.

While I am occupied doing all this, Google Desktop is keeping a track of everything on my machine, so that if I want to search the text for the chat I had 2 years back with my friend debating on the feasibility of semantic web, I can retrieve it on a click of a button.

No wonder, while searching for my car keys in the morning, I wish I could ‘google’ them.

Update : Google is now a dictionary word! Check it out

Some other ways to search..

Ms Dewey – A search engine that talks to you! Ms Dewey comments on every search you make on this website. Sometime she is funny and witty too (with a sarcastic tinge in some of her jokes). Not a very good search but try out a few keywords and listen to what she has to say.

Chacha – Chacha is a search engine where the results are given by humans. So for all your searches, you can use a guide who’ll find the best possible matches for your search and return them to you. But if somebody else has already searched the same keyword earlier, it will return you the matches instantaneously. In the bigger picture, the idea is to make every search a human-directed search (instead of a bot-directed one). Considering the increasing scale and dynamism of the web and ofcourse possibility of human errors, feasibility is definitely questionable.

Tuesday, October 10, 2006

Ajax - asynchronous?

I have always been curious about the asynchronous behaviour of Ajax. Is it really asynchronous? If it is, is it really being used that way?

This post by Peter-Paul Koch clearly expresses my curiosity.

Basically, the idea of asynchronous model is that the user is freed from the client-server request-response cycle and can perform other tasks 'while' waiting for a server response. I could appreciate the idea earlier but then later realised how often will I have (or want) to do 'other things' while I am waiting for the server to respond.

Some of the comments to this post point me to some good examples like :

1. Google Maps - Maps for nearby regions are being downloaded when I am looking at it. (pointed by Dan Knapp)
2. Chat Applications - You can recieve other messages and type at the same time. (pointed by Day Barr)

Some other examples that I could think of:

1. Gmail Attachments - they start getting uploaded even before I send the email.
2. Google Groups - I tried out the beta version that was released recently. I can play with the option, tools on the top while the a new page is being fetched at the bottom.

But still one thing that keeps on tickling me. Why is the asynchronous part made so prominent in the name? Doesnt the real power of ajax lies in the making HTTP requests from within the page and being able to update small parts without reloading the entire page?


Saturday, September 30, 2006

Notepad for the web

I like the concept of Google Notebook.

Some suggestions/ideas:

I think the user would need a strong motivation to move from the standard Windows Notepad to Google Notebook. Let's take up some of the motivating features:

Web availability - Ofcourse. The 'anywhere' access of Google Notebook is definitely a plus. But in that case, we would want to keep the performance (loading speed, for example) as close as possible to a local machine access.(Right now, its pretty decent)

Simplicity - Why do people still use Notepads when we have advanced Word Processors available everywhere? Simplicity is the key. Most of the times people just want to note down simple text, when they dont worry about formatting, designs etc. Also, they dont really want to have cluttery interfaces with lots of options. Google Notebook does a pretty good job at that right now.(as long as it keeps it simple)

Snapshots - I have tried saving some HTML pages and have seen that they have not come properly in the notebook. One, Google Notebook can always try to keep it as close as possible to the real page (by tweaking the HTML a little bit so that it fits the same way in the NoteBook). Two, optionally, we may also give the user a choice of saving a 'snapshot' of the page. The user may just want to read the information sometime and not copy it from notebook - In those cases, snapshots will be really helpful, as it will store the page 'as-is'.
(Also storing the webpages as clips in the notebook would be a definite plus over the standard Notepad)

Keyboard Shortcuts - Right clicking and selecting 'Note this' looks like a lot of work sometimes(if we are saving lots of notes). How about giving a shortcut to the user for that? (like we have keyboard shortcuts in gmail). For e.g Press Cntrl-C would paste the data to the windows clipboard but pressing Cntrl-C-C will send it to the Google Notebook!

Tagging - I like the way notes are organized in the Google Notebook. How about adding tags (titles) to them so that the user can identify what the note is about (It might not be a mandatory requirement). Also, if the user doesnt specify tags, all that the notebook shows is the first few lines (which might not be very helpful in case of HTML pages). How about popping up a small window on top right showing the contents of the note. (I have noticed the arrows on the right for collapsing/expanding but I guess hovering-and-popping will be more convenient to a user. (But if it affects performance, probably thats the last thing we want to do))

Others - There can be many other fancy features which can be added like - converting to text files, adding formatting etc. but I guess the top word here is again 'Performace'. We need to balance the performance with each feature that's added or the entire idea of 'notebook' might get lost.

- Anupam