HOME  |    TRAINING  |   FREE TUTORIALS   |   JOBS
Find out more about our new RSS feed.
FREE Tutorial
HOW EXACTLY DO SEARCH ENGINES INDEX

CATEGORY
SEARCH OUR OTHER TUTORIALS

DESCRIPTION

In this indepth article, Peter P. Thiruselvam takes you through the inns and outs of the proccesses of Search Engines. This interesting approach gives you the background information to understand how they index and work with the content on your website.


Search engines have three major components:

1. The first is a software program, called creepy names like spiders, ants and bots which crawl around the web and visit web pages. These spiders then take that information to their home base, which is the second category: The database or index.

2. This index, which is the second component, is something like a catalog and contains a copy of every single page that the spider brings back. When a spider returns to your website and brings back your website changes to the catalog, it is updated. However, the reality is that in many search engines, the index database does not index all the pages that the spider brings back. Therefore, if you see in your tracking logs that a spider has visited and requested certain pages, it is not always a sure thing that those pages have been indexed. It is best to search the search engine for your URLs in order to be sure that you are indexed.

3. The third component is the Ranking mechanism. This is the software that matches the web surfer's key words with the pages in the database and comes up with matches. These matches are then displayed on the surfer's screen. The higher you are displayed in the surfer's window, the better your chances are of being found. As stated earlier, 93% of Internet users do not look past the first three pages of search results.

These are the common denominators among the different major search engines. As the search engines are owned by different companies, their spiders are tweaked differently. Here is how they are tweaked:

The following information about the behavior of spiders covers:
AltaVista, Excite (which also owns Web Crawler and Magellan), Fast Search, Go(owned by Infoseek), Google, Lycos, Northern Light, Inktomi (which influences the search results of HotBot, AOL Search, and MSN Search).

1. THE SPIDERS:

All the spiders "deep crawl", which means that they crawl several layers into your website. The exception to this from the above listed search engines is the spider of the search engine, Go.

Only Alta Vista "Instant Indexes", which means that the information is put into the database of websites within days. All other search engines take longer.

Excite Inktomi, Go and Lycos cannot crawl nor index frame pages. Alta Vista, FAST, Northern Light can. It is possible to circumvent the problems with framed pages of some search engines by using the <NOFRAMES> tag.

If you have an image map as one of your main pages for links to other areas of your website, you will have problems indexing your pages with the spiders of Excite, FAST, Google, Inktomi and Lycos. They cannot follow the links in an image map. However, links in an image map can be read and followed by spiders of the following search engines: AltaVista, Go and Northern Light.

All robots will read your robot.txt page.

If you have Meta Robot Tags in your code all but Excite's spider can read it. Google may not support checking it.

Work hard (and smart) to have as many well regarded websites linked to yours, known as Link Popularity. Link Popularity is extremely important because all search engines can determine how many links are going to and from your page. Some decide to "index or not" based on this attribute of your website. If you do this, then you will be rewarded by the Inktomi and Lycos spider's deep crawling your website and therefore indexing more of your pages. The powerful Inktomi, as mentioned earlier, is the database behind AOL Search, HotBot and MSN Search. Additionally, AltaVista, Excite, FAST, Google (very important), Go, Inktomi, and Northern Light see this with the rationale that if there are many links connected to this page, then it must be of more importance. This is becoming a popular tweaking among the programming of many spiders.

THE DATABASE AT THE HOMEBASE

All of the major search engines' databases will index the full body text. There may be certain words missing that are called "stop" words. These stop words are words that many spiders skip in order to move faster. The spiders who do skip words belong to AltaVista, Excite, Inktomi, Lycos and Google. FAST, Go and Northern Light's spiders do not.

Concerning your meta tags, all search engines index Meta Keywords except Excite, FAST, Google, Lycos and Northern Lights. Additionally, all spiders will index Meta Descriptions with the exception of: FAST, Google, Lycos and Northern Light.

If you have alt tags for your images on your pages that are being indexed then, AltaVista, Go, Google and Lycos will read and index them but Excite, FAST, Inktomi and Northern Light will not.

Any comment tags that you have in your web page code can only be read by the spider of the northern California baby and powerhouse, Inktomi.

RANKING

If you're hoping that your Meta Tags will boost your ranking for all search engines, this is not the case. Although all major search engines spiders read and index meta tags, only Go and Inktomi do give a boost to meta tags. AltaVista, Excite, FAST, Google, Lycos and Northern Light do not.

The important shift has been toward Link Popularity. As stated earlier, this is an important point to several search engines insofar as indexing is concerned. Additionally, it is also important in Ranking. Getting as many links of good sites linking to yours is an important point with AltaVista, Excite, FAST, Google (very important), Go, Inktomi, and Northern Light.

Chose descriptive keywords to reflect the content. Have great content and a really good TITLE and Meta Keyword Description that entices people to click on your link when it comes up in a search, then you win points with both HotBot and Lycos. You will be boosted in rankings with those search engines. The height of the boost depends on the number of people who click on your link.

One point to remember is that search engines, like most entities, change. This information was current as of this summer(Northern Hemisphere) but current changes in both, climate and search engine tweaking, are in the air.

I hope that you have enjoyed this article.

Peter P. Thiruselvam
pete@ientry.com




8 RELATED COURSES AVAILABLE
HTML 4.0 INTRODUCTION
To create, format and publish a small website using HTML 4.0. You will learn to create web pages incorporating fo....
MACROMEDIA DREAMWEAVER MX INTRODUCTION
To give an introduction to the Internet tools and features of Macromedia Dreamweaver MX. Readers will create an a....
MICROSOFT FRONTPAGE 2002 INTRODUCTION
This training course aims to give you the skills you need to build basic pages both for your company intranet and....
MICROSOFT FRONTPAGE 2000 INTRODUCTION
This training course aims to give you the skills you need to build basic pages both for your company intranet and....
MICROSOFT FRONTPAGE 2000 ADVANCED
To create and manage a website with Microsoft FrontPage 2000. You can use this course to prepare for MOUS Certifi....
 
0 RELATED JOBS AVAILABLE
CONTACT US
Thursday 8th January 2009  © COPYRIGHT 2009 - VISUALSOFT