At the very simplest form, a search engine is a mechanism to provide an answer (search results) to a question (search query).
However, before search engines can deliver search results for your search query, it first must make a master list of every possible answer (to questions that it doesn’t yet know). The process of making this master list of answers is called crawling.
Crawling a website
You may have heard about how search engines may “crawl” your site. A crawler (also called a spider or bot) is an automated script that visits a page on your website, scans everything, including the content, images, videos, or other media, and creates a cache of that site. A modern search engine crawler can also see your site just as you, as an internet user, will see it in your web browser (with a few minor exceptions). Additionally, it will typically follow links throughout your site to visit other pages on your site and even other websites that you’ve linked to. In fact, it’s possible that the crawler discovered your website by a link from another website.
While it’s possible to block search engines from crawling your site, it’s typically not advised, except in cases where you may be still working on your brand new website, or want to have certain pages no longer be indexed and searchable. This can have a negative effect in that crawlers may give up and stop attempting to crawl your site, or may try so infrequently that the content in the index is not current.
Search indexes
Once a crawler visits a site and downloads all of the relevant information that it can, it stores that site’s data into a search index. An index is a specialized database (or, more accurately, multiple linked databases for different parts and types of data from your site). These databases contain information from every site that a search engine crawler could conceivably access.
According to the most recent claims, Google alone is indexing around “hundreds of billions of webpages” with an index size of over 100 billion gigabytes, but has knowledge of around 130 trillion websites.
Search algorithms, ranking, and Search Engine Results Pages (SERPs)
When you type something into a search engine, in a few seconds, you’ll be forwarded to a page called a search engine results page (SERP, for short). In those few seconds between clicking “Search”, and when you see the results, is where the real magic happens.
A search engine will take what you’ve typed in, called a query, and dissect it into a number of permutations. For example, the search query “best vegetarian tacos” contains a number of potential phrases that could be used to provide a search result against. To the best of its intent, the search engine will determine your intent with your query, rank a set of pages (out of the hundreds of billions indexed), and provide a list of the top results.
Based on your query, and the determined intent of your search query (along with over 200 other factors for the sites’ pages themselves), the results delivered to your browser are ranked. Ranking is the process in which a certain website’s page shown in the search result is ordered by the search algorithm’s perceived “importance” of that result to the query.
In our “best vegetarian tacos” search, we’ll pretend for a moment that you’ve made this search from your mobile device while on your mobile provider’s network. A search engine may think that since you’re searching from a mobile device, and not on wifi, that you may be looking for a restaurant. It determines your intent is for a search result to show you nearby restaurants. Your query asked for vegetarian tacos, so the search algorithm will attempt to limit its results to restaurants that have vegetarian tacos listed in their online menu, and have been referenced with the word “best” on other sites (like ones that may have large amounts of online restaurant reviews).
If you live in a larger city, even these limitations may have a large number of possible results. Since you’re on a mobile device, even if you don’t have GPS activated, your rough location can be determined by the cell tower you’re currently connected to, so the search algorithm will further limit those results to places that are within a certain radius of you, while typically highlighting those locations on a map-based search result.
The end product that you see is the SERP, or search engine results page. Keeping in tune with our example, you’re likely to see a map result showing several restaurants that hopefully serve the style of tacos you’ve asked for. You’re also like to see links to many of those restaurants’ profiles on other sites that are review-based, as well as possibly links to the restaurants themselves. On a rare occasion, where the intent may not be as clear, it’s entirely possible to see a wildcard result. For this example, a recipe for a vegetarian taco may be included. Should you have immediately clicked on that wildcard result, the search engine would use that knowledge to perhaps alter its delivery for an identical or very similar query in the future.
Featured Snippets
You may have heard about (or at least have seen) featured snippets in relation to Google search results. As a somewhat recent development, Google decided that in certain type of search queries, they will show relevant text from the best result directly on the search result page. Typically, this is just text, but can be video, audio, interactive objects (like calculators), or more. These “answer boxes” are extracted from the content on your site.
While featured snippets are great for searchers, many webmasters loathe them, as they “steal” traffic from sites since a searcher no longer needs to visit your site to get an answer. As such, it’s possible to block Google from displaying that for your site, but just like other search bot blocking, it’s not advised.