The web has become a primary source of information about illnesses or treatments [1], with an exponential growth in both volume and amount of entries available [2]. An important resource in locating medical information online is information retrieval systems, more commonly known as search engines. A December 2009 poll found that 66% of web users have searched for medical information online [3]. This class of search activities, which goes beyond simple fact retrieval, is referred to as exploratory health search [4], [5]. It can be carried out by both expert and non-expert medical users.
A typical example of an expert medical user is a clinician. Diagnostic health search can also be seen as a coarse form of hypothetico-deductive reasoning [1], where web search engines guide the iterative cycle of hypotheses about a disease being formulated from evidence and those hypotheses then guide the collection of additional discriminating evidence. According to recent studies, an increasing number of clinicians use web search engines to assist them in solving difficult medical cases, for instance when confronted with rare (or orphan) diseases [6]. The exact definition of rare diseases in terms of prevalence threshold and requirement for severity vary across the globe, but a disease is, in general, said to be rare if it affects fewer than approximately one in two thousand individuals. A study1 conducted by the European Organisation for Rare Diseases (EURORDIS), showed that 40% of rare disease patients were wrongly diagnosed before the correct diagnosis was given, and that 25% of patients had diagnostic delays between 5 and 30 years.
The current popularity of web search engines (primarily Google) and medical databases (primarily PubMed) for aiding diagnosis may appear a bit surprising as these tools are not optimised for this task. For example, (a) a diagnostic query may be quite long, whereas web search engines are typically optimised for very short queries (2–3 terms long). (b) Queries consist of lists of patient symptoms, often expressed as multi-word units. However, search engines often make term independence assumptions in order to increase efficiency. For instance, web search engines may not distinguish between “sleep deficiency, increased sexual appetite” and “sexual deficiency, increased sleep”, hence returning non-relevant results. (c) Some symptoms listed in the clinician's query may not apply to the correct disease, and conversely, some pertinent symptoms for the correct disease may be missing from the query because they are masked under different conditions. However, search engines are designed to maximise the match between all the query terms and the returned documents.
In short, the clinicians’ queries on rare diseases are likely to be more feature-rich but also harder for a search engine than ordinary web search queries, and should ideally be processed as such. Furthermore, the popularity-based metrics derived from hyperlinking (PageRank), user visit rates, or other forms of user recommendation that are commonly used by search engines are not likely to benefit the retrieval of rare diseases. These practices tend to favour webpages with many in-links (backlinks) or results often viewed by users. Information on rare diseases, on the other hand, is generally likely to be very sparse and less hyperlinked than other medical content. Finally, often efficiency concerns may lead to brute-force index pruning for web search, e.g. by removing from the index terms of low frequency or terms that are unusually long, such as “hydrochlorofluorocarbons” ([7], Chapter 5). Such practices may be particularly damaging for rare disease search, as the medical terminology involved may be exceptionally rare or formed by heavy term compounding. It is probably fair to conclude that the general popularity and ease of use compared to traditional information search and diagnostic support systems (reviewed below) are the main contributing factors to the current popularity.
Motivated by these observations we asked to what degree can web search engines actually be used for diagnosis and what are the main contributing factors that determine success and failure. To try to answer these questions it is necessary to go through a number of steps. First of all an evaluation approach has to be set up. It should consist of cases of varying degrees of difficulty and retrieval performance measures to allow for quantitative comparisons between methods. Furthermore, the web search engine algorithms are not public so one can only to a limited degree change settings and thus decipher why a query returns a given set of results. Google offers a search engine customisation product called Google Custom Search Engine,2 which has a few options for customisation that can be used to emphasise particular resources and thus determine how the choice of the information source (the index) influences the performance. If emphasising resources known to be authoritative in the rare disease domain improves the performance then one can conclude the huge index used by Google Web Search introduces noise. However, this will not give information along the “algorithm dimension”. We therefore made FindZebra, a search engine specifically designed to retrieve rare disease information for clinicians. It uses a specially curated dataset of rare disease information, which is crawled from freely available online authoritative resources. This means that FindZebra searches for rare disease information from a repository of “clean”, specialized resources, unlike web search engines that search the whole web and are hence likely to return spurious, commercial and less relevant results. The same index will be used for the customised versions of Google thus allowing us to gain insight about the adequacy of the Google Search algorithm in rare disease diagnosis.
The rest of this article is organised as follows: Section 2 discusses background work on collecting and retrieving medical information automatically with a focus on rare disease data. Section 3 presents the evaluation approach. Section 4 presents our search engine, FindZebra and the information resources used for its index. Section 5 describes the evaluation, benchmarking FindZebra, different versions of Google Search and PubMed against each other. Section 6 discusses the results and finally Section 7 summarises the findings of the paper.