Friday, February 17, 2012

HOW DO SEARCH ENGINES WORK?

HOW DO SEARCH ENGINES WORK?
SEARCH ENGINE CLASSIFICATION AND THEIR USES FOR STUDENTS
Search engines can be broadly classified into following types as per the requirements of the students-
1) Visual based search engines
2) Crawler or keyboard based search engines
3) Index based search engines
4) Meta/Multi search engines
5) Category based search engines

Among these, Crawler based and the Directory based search engines are widely used by the students. The main difference between the two is the way they perform the search and the way they list the sites. We shall discuss about the different types of search engines in detail.
Reference- www.slideshare.net , 2/1/2011

KEYBOARD BASED OR CRAWLER BASED SEARCH ENGINES
Google and Yahoo are the best examples of Crawler based search engines and are very popular among the students pursuing their higher studies. These search engines compile information automatically by properly aligning the already fed data from previous researches. These search engines crawl entire web to search the required information and present the data in a properly aligned manner in the form of lists so that the user can access the needed information for their use. The presented listings are the contents of the search engine index or catalogues. This index or the catalogue will comprise of each and every site and page of information and it is nothing but a massive electronic medium for data storage. The process of crawling by the search engines is a pretty systematic and a continuous process and whenever any changes are performed on a site, it will automatically result in the change of listing of that particular site on the web. For the changes to occur, it may take some time to add the modified or changed information to the index before it can be presented to the users in the form of useful data. In order to increase the traffic of students to a particular site, companies are nowadays making use of a technique referred to as SEO or Search Engine Optimization. This is achieved by specific key words and organic results. Organic results refer to the ordinary or normal search results against the paid ads results or sponsored results.



INDEX BASED SEARCH ENGINES
These types of search engines are also referred to as Directory based search engines and are powered manually. They depend on the students for data collection, compilation and also for rankings. Index based search engines have their own merits and demerits. It all begins with the webmasters who submit data in the form of title, name, address, and may be comments on the website which is later reviewed by the expert editors who take the last decision on if the page could be included in the index or not. For these paid programs, the students need to register in order to continue the service. One disadvantage of this type of search engines is the delay they happens or takes place in including and indexing any education site in their search engines. Despite of the delay in review, it cannot be guaranteed that a particular site will be included and indexed into their search engine database. The best advantage here is that, the sites that are indexed and included in the directory or database enjoy the trust and loyalty of the students. Here, once a particular website is indexed, included in the database then it cannot be changed under any circumstances in the future. But again it would be a time consuming process and the changes if any need to be material. Therefore, it is advised that companies need to be very clear about the contents of their site before they submit their site to these type of search engines for the reference of students.


PAID INCLUSION PROGRAMS
Paid inclusion is vital to both the webmasters and the searchers. Today’s global market situation has made the paid inclusions a common place with the search engines. To define a pain inclusion program, we can say that for a payment of money, a particular search engine would assure to put or list pages of a website. However, the ranking of a particular page will still depend on some vital parameters of that particular search engine. It can safely be said that the paid inclusion programs do assure top ranking of webpage on the search sites. It all finally depends on how well and how much you can pay for the inclusion program. Education sites will be reviewed after they are carefully reviewed by the editors and if there are any changes that need to be made then they would be conveyed to the concerned companies and then they would be considered for indexing and inclusion in the directory.
http://searchenginewatch.com/article/2067575/The-Evolution-Of-Paid-Inclusion



PAY PER CLICK PROGRAM
Educational companies would pay the search engines on the basis of amount of student traffic that visits their websites. For this reason, the search engine companies will keep some space on their main page so that whenever any students makes a click, the search engine company earn some amount of revenue. Google and Yahoo like companies give room for the educational companies to place their ads on these sites so that students can make use of the sites. Site ranking and its popularity will finally decide how much the company would pay the search engines in the pay per click programs.
Reference- www.redcarperweb.com 2/1/2011


ADVERTISING THROUGH PAY PER CLICK PROGRAMS
As the name suggests, pay per click is an internet advertising technique by which advertisers pay their host whenever their adverts are clicked by anyone on the web. Today, the most popular and influential names in the pay per click program are the google, yahoo and the msn.
Due to the large databases and the quick responsiveness, some of the search engines have become very popular among the students pursuing their higher studies. PPC or the pay per click is the best way through which the search engines earn a lot of money. It is the widely used and the best way of including ones website in popular search engines. Following proper practices of SEO and PPC programs, one can easily benefit to the maximum. The demerit of such programs is that, students may have to incur loss of money sometimes due to the expensiveness. Pay per click programs are the main sources of revenue for many of the search engines. Better practices on the SEO and PPC methods will enable the companies or the websites make great amount of revenue/money and earn profit.
Reference- http://www.selfseo.com/story-19713.php


WORKING OF THE SEARCH ENGINES
Internet is probably the most widely used and the most effective way of interchanging/exchanging information from anywhere across the world. Almost any information can be found on the internet within a matter of moments and at no cost of money. Probably these unique advantages and features make internet so popular, widely used, most preferred medium of accessing information, exchanging information, sharing information, knowing information, etc. World Wide Web has immense and great quantity of information stored, waiting for the recipient to access and explore the same. Web-pages are named after their authors who operate through the servers and since there would be some or the other kind of information present on a page, sometimes it confuses the students while they carry out the search. To mitigate this particular problem, internet search engines were developed and are an effective tool since they collect and present only the relevant piece of data or information before the user. The design of the search engines is such that the user can search the entire web in a matter of seconds and access the required piece of information. Search engines are particularly popular among the students pursuing their higher studies and are very much useful and helpful to them.
Reference- Curt Franklin 2008.

FUNCTIONS OF INTERNET SEARCH ENGINES

There are many different types of search engines available for the students to make a search however their working style and their functionalities remain the same. The primary tasks of search engines are to-
1) Conduct a search on the given keywords as per the user preference.
2) Store the keywords, addresses, and the documents that have been used by the user to enable him to use the data stored for the future.
3) To assist the students or the user in conducting search using the words, visuals, or even the sounds.
During the initial stage of search engine development, search engines had limited data may be a few hundred thousand web pages that the students could make use of. But today the situation has changed drastically as millions of millions web pages can be found on the internet and it is a matter of great convenience to the higher students or any user to get whatever he wants or needs from the internet world.
Reference- Curt Franklin 2008.

WEB CRAWLING
When we are speaking about the internet search engines, we mean worldwide web search engines that are far more powerful to search any piece of information you are searching for around the world. Before the internet arrived in the picture, there were search engines people could use in order to find any piece of information they want. Search engines like Archie and Gopher stored indexes of files that were stored on the servers which were linked to the internet. In order to find any particular information from the millions of million web pages that already exists on the web, a search engine has to make use of special robots called spiders to create a list of the words that are found on the web. The process of creating the lists by the spider is referred to as Web Crawling. In order to create an effective search, the spider will have to build and maintain a useful list of words by looking into hundreds of pages on the web. The spider begins its work from the heavily used servers and the widely used and popular web pages. The spider will start from the most popular and relevant site and start indexing the words on its pages check every link that exists within the website. Google built multiple spiders, 3 at a time and every spider would link to 300 web pages at a time. At its peak of performance, using four spiders, google system would crawl more than 100 pages in a second that generates around 600 kbs of data each second. When the google spider checked the web page, it took the following information-
Words within the page, and location of the words or where the words were found
Words having meta-tags, titles and subtitles were given special considerations during the search. The google spider was so designed so as to index each word from the page but excluding the articles like a, an, the. Other spiders have their own style of working and implement their own approaches. The different approaches enhance the ability of the spiders and provide the users with a more convenient and efficient medium to conduct their search and explore the web.
Reference- http://computer.howstuffworks.com/internet/basics/search-engine1.htm

META-TAGS AND THEIR IMPORTANCE
When subtitles are attached to a web page, meta-tags are formed. These tags help the search engines to index a page in a particular category by way of proper keywords. It is of importance as it will enable the search engines give an appropriate meaning to the content with respect to the search context by the students or the user. It will make the search more precise and will avoid unnecessary dual meanings of keywords while performing the search. Meta-tags will help the search engines to select the correct meaning of the word with reference to the search context. However, one cannot believe on the meta-tag factor blindly as there is a possibility that a careless page owner could have added meta-tags that can fit a very famous topic which has nothing to do with the actual page content. It only reflects that the owner of the page wants it to be a part of search engine activity. There are a few instances where the site owners did not want any of the spiders to access their websites as they are afraid of the fact that their original content may be harmed by the unnecessary intervention of the spiders as spiders are mistaken for a quality player. In order to prevent these types of situations, robot exclusion program was developed. In this program, the spiders are asked to leave a particular web page before it can intervene in the content present on the webpage thus avoiding any kind of tricky situation.
http://computer.howstuffworks.com/internet/basics/search-engine2.htm

MAKING THE INDEX
The information gathered through the robots and the spiders should be well aligned and properly organized to be useful. The gathered data can be accessible to the users by two key components as follows-
1. Information stored with the data collected
2. Indexing method applied.
A search engine may store only the word and the URL however it will not reveal if the search was taken out for a serious and a meaningful cause or was carried out just for the sake of carrying out. It cannot tell if the word was used one time, two times or many times. It simply means that there will not be a systematic ranking that would show the most useful pages on the top of list of the search results. Every commercial search engine applies a peculiar formula that assigns weight to the words in its index. That’s how different results are obtained when using the same search key words on different search engines and with the pages showing in different order. The data obtained during the search will be En-coded in order to save the storage space.
Indexing is only one objective and that is quick access to the required information. Hash Table is the most widely used way to build an index. Here, a formula is used to assign a numeric value to every word so that the entries are spread evenly. This number or numeric distribution is different from the word distribution across alphabets, which is the main reason for the effectiveness of Hash table. When using a dictionary, one can find that there are more number of words available that start from the letter M or S or N as compared to the words that start with the alphabet X or Z or Q. It simply applies that longer duration of time will be taken to search for the words which are heavy in number as compared to the words that are fewer in number. The qualities like efficient indexing and effective storage make Hash table a unique feature to access the information quickly even when the search parameters are complicated.
http://computer.howstuffworks.com/internet/basics/search-engine3.htm


ASPECTS OF SEARCH
A proper query is required in order to carry out a search on the web which is done by indexing of certain data. Even a single word query can be performed on the search engines. Boolean operators are useful in carrying out complex searches on the Web which will also help the students to create a more complex query thereby giving room for a more refined search. Some of the popular Boolean operators are as mentioned below-
1. AND- here, AND is used to conjoin two phrases while some other searches have “+” as replacement for AND
2. OR- It is needed by many of the search engines.
3. NOT- it may or may not be there with each search engine and “-_ is the substitute for this word.
FUTURE SEARCH
Boolean operators help in performing the exact search by the search engines. However, when a certain word or a phrase carries a dual meaning or different meanings then it may become a problem. To quote an instance to this, for example, a particular word like kind can have two different meanings like one showing the quality of a person while the other indicating the type of a thing so whenever such words are used, it is up to the student r the user which one to pick from the webpage and which one to omit that is not relevant to the search. Take another example of a word –remote, this word again carries two meanings, one indicating an electronic device that we use to switch on/off the electronic gadgets and the other indicating the distance far/near. So finally it all depends on the user/students which one to pich from the searches that are shown on the page. Concept based search is another type of search where in statistical analysis of the words is done to achieve the refined search result. Web developers are trying their level best to improve the performance of search engines so that the users can fine whatever they need in a more efficient way and as accurately as they can. Natural-language queries is yet another development in the field of web search that is in its initial stage of development using which the quarant or the student can ask and get answers to his queries just like the way he or she is interacting with their college professor or friend or a colleague. This system of search will remove Boolean operators for framing the query. Some of the most popular search engine sites are ask.com, bing.com, etc.
Opinion on the above facts about the current advancements in search engines, we can safely say that, that day is not far when students will realize that search engines are not only an aide in their studies helping them access, gather and study the information to them but they are also more than a companion when carrying out their projects, taking out valuable information from the knowledge of internet, helping them learn in their own style by offering different articles as per their preference, etc.
Search engines are not just a tool to search for information but also can be used to communicate with each other in order to transfer or exchange information and do a lot of things that can enhance one’s knowledge.
Reference- Curt Franklin 2008.
LIST OF WIDELY USED SEARCH ENGINE SITES USEFUL TO STUDENTS WORLDWIDE
BING.COM- This search engine is primarily designed to divert Google search users and is designed by the Microsoft. Until 2009, bing was used to be MSN search engine. Bing is trying its level best to beat the popularity of Google search engine however it is understood that Google will never step back being one of the first and the largest search engines across the world. Bing uses secondary sources to help the students by providing the required information from its database center.

YAHOO- It is not just a search engines but offers lot more to its users. Yahoo is a data collection source, offers games to children, information facilities to the youngsters, chat services, news on the latest trends, astrology services, travel directory to travelers, email services, search options, and shopping facility for the youngsters and ladies and much more.




ASK.COM- This search engines offers information to higher students pursuing their studies. They can access the required information and can also download the data from this site. With some unique features like super-clean interface and better search choices to the users, this search engine is obviously an enemy of other search engines. The result combination make it the best search engine and competent to the leading search engines like Google, Yahoo, Bing etc.

GOOGLE.COM- This search engine is an answer to any question on search engines and is undoubtedly the king of all the search engines. It offers its users much more than what an user can think of. This search engine can provide almost any kind of information relating to any subject on this planet. Apart from facilities like emailing, chatting, videos, maps, images, books on almost any topics, geographic location through the satellite and much more. Therefore, it can be safely said that, Google is the largest, most convenient, user friendly and the fastest search engine available on earth. This search engine is particularly useful to students as it offers great quantity of information from many available resources.

Reference- http://netforbeginners.about.cm/od/navigatingthenet/tp/top_10_search_engines_for_beginners_.html
WEBOPEDIA- This search engine is like an asset for the people or students of technical background that offers each and every thing about the technical stuff. One can fine any type of information relating to the computers, electronics, and technical stuff here. Meanings and definitions and their explanations can be found on this site. Students from the computer background and information technology background can access any sort of information through this site and benefit from the same.



Sign Up | Sign In


MAHALO-

This search engine is a relatively new search engine as compared to other search engines. It searches the articles and information that the users submit and vote on. On this site one can access to almost any kind of information that the user requires like videos, music, travel, health, education, jobs, sports, languages, games, books, shopping, authors, movies and much more. One can expect very straight answers for the searches and accurate answers to their questions.
INTERNET ARCHIEVE- It is a service that will allow the users to visit the archived versions of the web pages. Visitor just need to type in the URL, input the data range and then start to surf using the archived versions of the web page. This site helps the students with tons of data and information relevant to their studies and helps them in completing their education easily. The information provided to the students by this search engine is of good quality and is therefore very authentic and accurate.
http://searchengineland.com/mahalo-launches-with-human-crafted-search-results-11341

CUIL-

Global




CUIL! International:
This search engine is founded by the couple, Tom costello and Anna petterson. The couple is also Google technical lead and both are students of Stanford University. This search engine ranks the search results sites on the basis of content analysis rather than the normal parameter that is relevance or popularity. Google search engine keeps a record of the users and their searches to improve the user search experience whereas CUIL believes there is a difference between search and surveillance. The word Cuil is pronounced as Cool and Quill as well. As far as performance is concerned, this search engine is up to the mark and looks good apart from being responsive. One good feature about this search engine is about its privacy policy that never asks for any personal identifiable information from its users. This search engine is an asset to the higher students for pursuing their studies.
http://www.searchengineoptimizationportland.com/blog/2010/09/cuil-farewell/

YIPPY –


Yippi is yet another search engine that conducts search in a peculiar way. It simply conducts a search using the contents of other search engines. Yippi is particularly useful in the search as conventional search does not prove to be so helpful since deep web pages are tougher to locate and that is where Yippi becomes useful and important to its users. Yippi can help you in providing any type of information when it comes to news, interest blogs, government information, academic search etc.
Reference- http://netforbeginners.about.com/od/navigatingthenet/tp/top_10_search_engines_for_beginners.htm

DUCKDUCKGO-

This is yet another search engines with some of the unique features that make it stand out from other in the competition. It does not collect the user information that is personal in nature and so it does not share the information. It also does not store information about the sites or links visited by the user and it also filters out the advertising sites. The best unique feature about this search engine is that it accepts the links that carry accurate answers to the questions that are asked thus it is very much useful to the students in their studies. It is also enabled with the translation feature so as to enable its use globally in all the languages.
Reference- http://www.bizjournals.com/philadelphia/print-edition/2011/10/28/duckduckgo-search-engine-takes-on-google.html?page=all
SEARCH ENGINE CONSULTANT- In today’s competitive market firms are making all efforts to remain on the top of the list and internet listing is no exception to this. Companies are making their best efforts in order to achieve higher ranking and expect more and more number of traffic and clicks from the visitors or users. For this activity, it is important that the website is updated from time to time to be accepted by the search engines in their rankings. Internet and the telecommunications are the ever changing worlds. The rankings of the sites are based on different variables and the content being the most important among all factors. Some variables are controlled by the promoters of site while the others are taken care of by the search engines. Changes in the technology and the variable patterns of the search engines alter the site ranking. This can take place as frequently as on a daily basis. So it becomes essential that the firms are updated on a day to day basis else there is a high chance that they lose out in the competition. Change or jumbling of words cannot bring any positive search results as is normally believed to be. So it can be safely said that, one having updated knowledge about the technological stuff need not worry as far as collecting accurate and up to date information from the search engine is concerned. Search engine optimization will certainly bring very positive indications about the rankings once proper methods are applied and proper practices are followed by the companies. It will also help the firms stay updated in the ever changing technological world.
HOW SEARCH ENGINES WORK?
Search engines are an effective tool to find any type of information on the World Wide Web. Without the search engines it is impossible to locate a piece of information in the vast world of internet. To search any information on the internet, an URL as needed which can guide the search engine to a particular location to fine the required information without which the search engine cannot make a search. What common people mean by search is nothing but the searches through the databases of HTML documents that are collected by robot.
Search engines are broadly classified into 3 types namely, robots that are crawler based search engines or spider based search engines, then the second is the human powered search engines and the third is the hybrid or a combination of the two.The crawler based search engines use automated soft-wares that gather the information by visiting the site, read the meta-tag of that site and then follow the links to index the related websites. The crawler will also monitor the changes to the content of the site by regularly visiting the site and this is dependent on the search engine administrators. Human powered search engines depend on human beings for the input of data and also for the indexing of the same. Whenever a search query is placed, the search engine goes through the index first that it has created and does not directly crawls the web. As the search results are based in the indexing, it is necessary that they are updated regularly otherwise the search engine will display the search results even if the site is out dated and out of existence. It will remain in the database till the time indexing is updated.
One may wonder why different search engines produce different result? The actual answer to this is that as the indices are different and since it largely depend on what the spiders or the crawler finds and also on what information is submitted by the humans. Every search engine has different algorithms to make a search and this is what the search engines use to determine the relevance of searched information. An Algorithm includes the frequency and the keyword location which will decide the relevancy of the search. This is how a normal search engine performs its task.
Reference- http://www.webopedia.com/DidYouKnow/Internet/2003/HowWebSearchEnginesWork.asp








BETTER WAYS TO USE THE SEARCH ENGINES
Whenever a search needs to be performed, the user just inputs the required keyword and a number of results in the form of sites are shown on the system. If a search does not produce accurate required information that was needed, the user has no other option but to jump to other search engine and perform a search. This will happen till the time he finds the required information on a particular search engine. From there on he gets loyal to that search engines and never ever returns to the one which did not serve his purpose. This is highly undesirable and no company would want to face such a situation of losing a potential customer. To prevent this, the company has to keep a proper check and update the search engines from time to time so that they display only the needed and appropriate information for the user. Keyword density, relevance of keywords, content quality are the factors which will help the companied to increase their potential in attracting the new customers and retaining the older ones. Some professional firms are expert in these tasks and will certainly be helpful to the companies in updating their sites thereby helping them in making profit.
ROLE OF SEARCH ENGINES FOR STUDENTS
Internet is very vast and its information never ceases to exist. The information on the internet is infinite and keeps on multiplying from time to time. Students often get perplexed when searching for a piece of information on the internet after spending hours on the internet. The real trick is in finding the desired accurate information that is useful and can serve the purpose. As the competition in education increases, so does the importance and use of internet increases.
In order for the students and users to find appropriate data or information on the internet and avoid wastage of time on useless and irrelevant information it is important that the students know how the search engines operate and perform a search. Whenever a particular piece of information is to be found and a keyword is punched, the spider searches the entire web to locate the information and in a matter of moments it gets ready to show the results as soon as the enter key is pressed by the user. The spiders crawls the entire web and keeps the found information in a well aligned and orderly manner. Companies need to be conscious about the information that their site contains so that the search engines can easily find their site and present the most accurate information as per the need of the user. If this as followed, the site will automatically get the best ranking in the search engine listings. Keywords are the heart of search and when correct keywords are used by the users, very accurate and least confusing information can be expected by the user. ref- http://dept.sccd.ctc.edu/tlc/resources/teach.html
Students make use of the search engines depending on the experiences they have had while using a search engine. For example, a tech savvy person wants to gather some hardcore information about the technical stuff for which he keeps hunting from one search engine to another and finally comes to the one that can offer him the latest trends/information and facts about the things he is interested in. Despite of repeated efforts, searching for relevant information on hardcore technical issues, he doesn’t find anything worthy and finally when he visits a site that deals only and only with his stuff say for example webopedia.com he gets what he wants. This particular site can give him all the relevant information so that he may not have to hunt from site after site going through the same search results which are old enough may be or not yet updated. So when he visits this site, he can access the information what he really wants to gather. So the students need to know the exact parameters of search right from the site to the keywords in order to ease their task of learning.
IMPORTANT COMMON ELEMENTS FOLLOWED BY THE SEARCH ENGINES-
1. Each search engine consists of tools like robot, crawlers, and the spiders to conduct a search across the web to read and find the required piece of information.
2. Particular programs are designed so that the spider reads the web for the needed information and the required information is properly indexed.
3. Secondary programs will enable the search engines to match the information with the keywords that are used by the user. The search engines compare and match the keywords and retrieve the data that already exists in the search engine database. This makes the job easier as the gathered information is properly indexed and shown in the results.


PLANNING FOR THE SEARCH ENGINES
The search is carried out by the spider across the web however the search would remain inside the search engine’s index and not across the web or total web. The search engine’s database would decide how many sites would be crawled by the spider in order to present the desired information. The information available in the form of pages may not be usable for all the time hence it becomes necessary to update the information from time to time. The size of the search engines is ever expanding with greater and greater information uploaded on a day to day basis. The usefulness and value of a web page depends on its contents and relevancy. Every search engine’s capacity is not the same. The way the search is carried out, the way the information is collected and the way the collected information is shown all differ from one search engine to another. The more number of times a particular search is made by the users and the more number of search times the spiders crawl the web decide the relevancy and the frequency and also the efficiency and effectiveness of a particular search engine. Every time a search is made, the spider gathers the information after crawling the web and that information is properly indexed and aligned so as to enable the next searcher to readily and easily access the same piece of information. Useless data must be deleted on a regular basis and updated data should be loaded on the search engines to make it more efficient and useful. The Boolean operators and the clarity level make a major difference in the functioning of the search engines.
Many things are to be present in order in order to enable the search engines perform an accurate search. Factors like keyword density which means, the number of times a particular keyword was used by the users, Boolean operators which mean the number of words and the relevancy, are taken into account before a webpage or a document is assigned a ranking. The search engines will also check for the HTML and the meta-tag that the users may have used. Keyword density is a very important factor but again it all depends on that particular type of search engine.

Nowadays, many search engines are using some fraudulent techniques like something called as SPAMDEXING, which means making ones site to climb the ranking ladder using fraudulent techniques such as making some fraud or illegal changes to the documents and content. The companies try to design and change the content of their site in such a way that the spider can easily notice them and always present then in their preferred place on the indexing page or listings. If this is done then, the content shown during a search will not match the keyword and still be presented to the user. The companies will design their site content to that their content is easily traceable by the spiders when crawling the web for search. But many a times by virtue of some programs which can catch such fraudulent activity can defunct the sites or place them at the bottom of the ranking table. Ref- http://www.focus-im.com/2011/12/SEO-What-is-Spamdexing/