Knowing The Query The range of Google search:
On average, a Google search query travels 750 miles every single way, to and from the data center.
16 to 20% of queries that get asked every day have never been asked before.
Google has answered 450 billion unique queries since 2003.
Last year, Google made around 500 changes to search.
The query is sent back to Google through the Internet. google has data centers all over the world, but, on average, your query travels about 1,500 miles.
Behind that, all this work we’ve done setting up the index now comes into play. google parse the query, understand what your intent was, [may] personalize the results, and then send it off to google giant index and get back the top results for your query.
In addition to the results themselves, we need to create the presentation of those results, the titles and what we call the “snippets,” the two lines of text that you see. In order to do this, we look at the copy of the Web that we keep and find the most relevant parts of every page, bring up those two most relevant lines for your query and show those to you for each result.
This is also an enormous amount of computation. It’s going from the few words that you typed in to the result pages that we found, to find where on those pages is the text most relevant to your query.
In the case of Immediate, we’re doing that as you’re typing, so the whole process is complicated. So this complex process of ranking is happening in the middle of your typing. If we did this naïvely, we’d be ranking almost 20 sets of results for every query you type, but we are more sophisticated about it. We do a lot of caching and so on.
Then, at the end of that, you get back this beautifully presented result page.
How do searches get customized for the user?
It actually happens at each stage of the direction. When you start typing your query, if you’re signed in, the auto-completions will prefer queries that you’ve typed in before. If you’re in a given local area, we will prefer queries that make sense to you in that metro area.
The second level it happens at is, when we process your query, we also take into account your Web record and so on in order to guess at your motive. During ranking, the process of actually looking at the documents, we also take into account personal signals that make sense for you, and when we search for your personal content in Search, plus Your World, we take into account your personal signals over there.
Finally, when we have the full set of results created, we then customize them for you.
So personalization of your results is deeply included right through the search procedure. Some of that is giving you the right context for things like date and place, and some of it is personalization based on your previous queries and so on.
how do you change a system this complex?
“If you look at search like a big jumbo jet, this is like changing the engines in flight.”
BG: Search is a really complex system, so when we make changes to it, we go through a rigorous process of testing it. We do what are called precision evaluations, where we make some change with ranking, and then ask human raters to evaluate whether it’s a good change or not. We do something like 50,000 of those a year. Some of these things will turn into live experiments. We do something like 10,000 side-by-sides, where we look at the full set of results for algorithm A, algorithm B, which is better, and we ask human beings to [determine] that.
For more feature-oriented experiments where we’re changing the interface in some ways, we do another 10,000 or so live user tests a year. Out of this, we launch about 500 changes to search a year, more than a change a day. So if you look at search like a complicated machine, like a giant jumbo jet – although it’s probably, in some ways, more complex than that – this is sort of like changing the engines in flight before you land.
The Changing Web
The nature of the Web is changing. It’s more dynamic and application-focused and less like a bunch of pages. How is Google adjusting to that change?
BG: There’s a large class of information that remains as content on the Web. It may be embedded in a Flash application in some cases and so on, and we work very hard to extract information from Flash pages and PDFs and things like that. Some things are actually full-fledged applications, and it doesn’t make sense to be indexing them in the traditional sense.
For instance, it doesn’t make sense to index the random pages that we may find about flights. They’re out of date. They’re not particularly relevant. In those cases, we’re going and understanding the data more fundamentally. We can get the equivalent data as feeds from those sites, and then we can index that data side by side.