How do I find it on the Web?

You're reading this because you just know it's out there on the web, and you can't bloody well find it! Here's a simple approach, in three easy steps. It's so easy, you're bound to say something like "That's facile. Give me a more convoluted approach"!

Three steps

  1. Decide what you want
  2. Don't use the Web!
  3. Okay, now find it

This tutorial not unreasonably assumes you have a connection to the Internet, that you have the ability to open up a browser such as FireFox, and that you can access a search engine like Google. If you're using Internet Explorer, then the pop-ups we create to aid your task probably won't work well, if at all.

1. What you want

Most people can't find what they want, because they cannot express what they want. It's that simple! Often, the worst thing you can do is to rush off to Google and simply start typing. If you want to find something non-trivial, it's worthwhile taking a piece of paper (yes, paper) and trying to write down exactly what it is you want. The most important things are:
  1. Express what you want in a single English sentence. State your desire. We'll call this single expression of your desire:

    your query sentence

  2. Make a list of "magic words" which are associated with what you want (More of this later)
  3. Decide whether you want (i) A single datum; (ii) An overview of a topic; or (iii) All of the web-pages which are highly relevant to your topic!

2. Not the Web!

Now that you know what you want, think of strategies you might use to find the information without using the Worldwide Web! Especially if you have time, you might:

The benefit of the above approach is that it forces you to think 'around' the topic, and this enhances the quality of your web search. In addition, you will have already planned out other search strategies, should the information you desire not actually be on the Web.

3. Just find it

There is an infinite number of ways you can find information on the Web, but here are two. In both cases, we'll use Google, as it's currently by far the best search engine, but remember --- there are others. None covers the whole Internet.

3.1 The obvious approach

The obvious search is just that. Now that you've defined what you want (Our very first point above), just type your query sentence into Google, and hit the search button. Google is quite kind, and will tell you which words are common, and can usually be avoided.

So if your sentence is:

I want to find out about the 
 haemoglobin oxygen dissociation curve

... you'll soon work out that words like I want to out the are pretty useless in most searches.

There are some non-obvious things about the 'obvious' search. Most important are the following points:

  1. Spelling differs. You'll miss a vital portion of the Internet literature if you only use American spelling, and often you'll miss almost everything if you use pure Oxford English;
  2. Many words have synonyms. Limiting your search to one of many synonyms is silly. Hyphenation and concatenation (joining of words/letters to make one term) can also make all the difference;
  3. You learn as you search.

The last point is the most important. If your search gives you ten million hits, don't clutch your head in despair. Look down the first ten on the list, and learn from them. You can learn better words to search for, you can learn important associated words, but most important of all, you can learn what not to search for! To exclude a search term in Google, put a minus sign -immediately in front of the word, but often it's better to change your strategy to avoid particular search words! (To encourage Google to use precisely that word, prefix it with a +plus sign. Contrariwise, if you want to include synonyms, you can now prefix the word with a ~tilde).

The simple approach is best for straightforward questions which many people ask. Google is particularly well-tuned for such questions. Even here, a little bit of insight goes a long way. Let's say you want to google: molecular weight lead. Try this search , and then contrast it with: molecular weight Pb. Interesting, isn't it? Then simply search for periodic table, and see what you get! Which was most useful?

Okay, let's assume that you have faithfully adhered to all of the above, and you still can't find the mysterious thing you're searching for. You need to try a more devious approach.

3.2 The devious search - a hunt for content

Most people think of their search in terms of headlines --- I want [insert your query sentence here]. Nothing wrong with this, but there is another way. The key is to try to think of magic words which simply must be present within any reasonable discussion of your topic.

For example, let's say that you are interested in making stone tools. Typing these three words as a Google query may well result in three million hits. If you get smart, and include the search term in quotes, thus:

"making stone tools"

... then you'll cut the number down to about three thousand, but remember that you've now probably excluded many interesting and useful pages which contain the three words, but not in that precise order. In addition, authors may use words like manufacturing or manufacture, mightn't they? There must be a better way!

Go back to the first ten of the three million hits you found with your first search. Look through them. You'll soon spot a rather interesting word: flintknapping

Any reasonable discussion of stone tool manufacture will likely include this magic word. Google this single word, and see what you come up with. You'll probably get about twenty thousand hits, well down from three million, and you can be pretty certain that at least these hits are relevant! (See how you missed at least seventeen thousand hits with the simplistic strategy of "making stone tools")! Even better, the first few Google hits will take you to sites like flintknapping.com, which will give you a general introduction to the field, and point you to other sites.

Within these sites, you'll recover a whole lot of new magic words that describe what the flintknapper does and what he uses --- hafting, knapping, flint, chert, knappable, flakable, obsidian and so on. Really magic words.

Magic words

Often, there's not an unusual magic word that will produce the results on its own. With most topics, however, you will quickly find a combination of words which make a search both sensitive (finding nearly all of the relevant sites) and specific (excluding most of the irrelevant sites). Let's try a few examples.

  1. Country codes

    Let's say you want a list of two letter country codes. With this search term, you get perhaps 3.7 million hits, but Google will have worked its usual magic, and the first few entries will provide what you want.

    Just for fun, let's pursue some alternative strategies. Glance down the list, and you'll see the acronym "ISO". Try ISO country codes and your hit count is down to 640 000, but even more interesting is the new term "ISO 3166" which pops up. Next, search for:

    "ISO 3166"

    ... and you're down to 300 000 hits, and if you try:

    "ISO 3166" country codes

    you're down to 70 000 hits. This is an improvement, but now let's think content. What about the following search?

    Andorra bv Kiribati qa

    You can be pretty sure that each of the 110 000 hits is a web page that contains 'unusual' countries and codes which must be in any comprehensive list! Our previous strategy made no such guarantee. Now combine the two:

    Andorra bv Kiribati qa "ISO 3166" and you're down to just 5000 fairly authoritative and content-full pages. You can refine things further, if you wish.

  2. Haemoglobin/oxygen dissociation curve

    Let's try a biomedical search. Assume you're interested in the fine details of how oxygen binds to the blood pigment haemoglobin. Googling hemoglobin oxygen dissociation curve yields just 17 000 hits. Glancing through some of these is very fruitful in suggesting additional terms (pH, phosphate, DPG, temperature), but then ... oops ... we notice that we used the American spelling of haemoglobin. Using the British spelling gives us just 6500 hits. First, let's combine the two:

    oxygen dissociation curve (haemoglobin OR hemoglobin)

    Nearly 19 000 hits. See how, although Google usually just combines search terms with an implied AND between terms, we can use trickery to OR things together. Now, try the following:

    relaxed tense (haemoglobin OR hemoglobin)

    Not only is the hit count down to about a thousand, but we can be pretty sure that each of these web pages contains pretty detailed information about oxygen binding to haemoglobin. The key that allowed us to refine our search strategy was the knowledge that haemoglobin exists in two different states --- 'relaxed' and 'tense'. When searching, there is no substitute for a detailed knowledge of the subject! Adding the search term "dissociation curve" to our most recent strategy gives us just 74 hits, and we're away!

The above examples are by no means perfect, but they do illustrate a new approach, and how to join this approach with your basic search strategy. You should now have enough information to allow you, with practice, to search more effectively. Finally, let's put our new-found skills into action.


An example: Searching about searching

What better topic to demonstrate our powers on than searching itself? Let's work out a strategy.

  1. What do we want? It would seem reasonable to sit down and think about our objectives. If we do this, we might decide that we want:

    Taking a piece of paper and scribbling a bit (perhaps helped with a quick intial googling!) we might come up with something along the lines of:

    paper sketch

    Now let's search a bit more diligently...

  2. A basic search about searching. Our first attempt using search strategy as our Google term provides about 23 million hits. If we look at the first ten, we encounter a few good pages, but nothing spectacular. Putting "search strategy" in quotes limits us to 600K (600 thousand) hits, but adds not a lot. We also see that search strategies might be a good search term, but after a little tinkering decide that it's no better and may be of less use!

  3. Next, let's add the word advanced. Hey, wait a bit! When we try this, we get 100K hits about searching databases, and other pages of little relevance to internet searches. Why not simply add in the word Google? Jackpot! (Well, almost). This search does illustrate one point that we neglected, and that is that Google has, of course, its own 'advanced' page where you can refine your search. Our tutorial is however, not about such specific refinements. Use the page, by all means, but we believe there are more generic strageties which often help in tricky searches. And of course, you should repeatedly check Google's refined search page which describes a whole lot of Google goodies. Did you know that you can search in a number range by using two periods with no spaces? Thus: $100..500

  4. Okay, let's back off a bit, and think about Google. What if we type in google hacking? Lot's of good stuff! We also learn about sneaky Google refinements you might use (without using the advanced search page). Here are a few:

    We also discover the interesting magic word googledork! You may wish to play with it. "Johnny Long" also seems to be a good search term. (Unfortunately we know of no Google method for searching by file size, as this would be invaluable).

Okay, using the above, let's search for the obvious:

+site filetype +link cache intitle inurl

Our first hit is, of course, Google's own page on advanced operators, which we might have found using less devious methods, but wottehell. We encounter useful operators such as intext: allintitle: allintext: allinurl: allinanchor: and so on. Putting intext: before a word limits the search to actual displayed text of the document, rather than other parts. Using allintext: is the same as putting intext: before each and every word. The allinanchor: modifier limits the search to text that is contained within references to other pages. This is powerful, as often such text emphasises the content of the page.

We'll leave you to combine the above search with the terms search strategy or even "search strategy". By now, you're nearing the ten-word limit that Google imposes, but you're doing so creatively. And did you know that in Google you can use a star as a wildcard in phrases like "agony * * ecstasy" where each stars will match any single word! So you can look for phrases like "agony and the ecstasy" and Google won't count the stars as words! You can effectively increase your word limit.


A caution

There are many things that are not on the Web. No matter how good your search, it will be in vain! This is why it's so important to get your goals and especially your head right before you start the search. Plan your strategy, execute it flawlessly, and you may succeed. If you don't, and you've searched well, you have not failed, you've simply started on a voyage of discovery that didn't end on the Internet.

You may even be able to show to your own satisfaction that the object of your desire isn't indexed on Google or other search engines. If the truly magical words don't yield results, you can be fairly certain that either (a) it isn't there or (b) you haven't thought enough about the subject.


References

By now you can find your own, but you might wish to check out one or two of the following URLs:

  1. http://powerreporting.com/altavista.html Older document on Altavista, but OK

  2. http://www.lib.monash.edu.au/vl/google/googprint.htm Google search

  3. http://www.informit.com/articles/article.asp?p=170880 Google hacking

  4. http://www.neowin.net/forum/lofiversion/index.php/t243326.html more on Google hacking

  5. http://www.google.co.nz/help/faq_filetypes.html Google file types

  6. http://faculty.valencia.cc.fl.us/infolit/Google/help.htm good advanced Google searching

  7. http://www.netconcepts.com/learn/google-research.pdf Good article on advanced googling (PDF)

  8. http://www.google.com/help/cheatsheet.html Google cheat sheet. Invaluable.

Afterword

The alert reader will have noticed how, for our various search examples, the number of hits on Google has multiplied since we first wrote the page! The principles however still remain the same. By the way, do you know how Google really works? Check out this link :-)