The automation of doctors and machines: A classification for AI in medicine (ADAM framework)
ABSTRACT
The advances in artificial intelligence (AI) provide an opportunity to expand the frontier of medicine to improve diagnosis, efficiency and management. By extension of being able to perform any task that a human could, a machine that meets the requirements of artificial general intelligence (‘strong’ AI; AGI) possesses the basic necessities to perform as, or at least qualify to become, a doctor. In this emerging field, this article explores the distinctions between doctors and AGI, and the prerequisites for AGI performing as clinicians. In doing so, it necessitates the requirement for a classification of medical AI and prepares for the development of AGI. With its imminent arrival, it is beneficial to create a framework from which leading institutions can define specific criteria for AGI.
Introduction
The advances in artificial intelligence (AI) provide a great opportunity to expand the frontier of medicine to improve diagnosis, efficiency and management. The term ‘artificial intelligence’ was coined by John McCarthy in 1956, and describes the ability of machines to ‘have beliefs’ that make them capable of problem-solving performance.1 Fundamentally, AI is the ability of non-organic systems to perform tasks that require human-level intelligence, but as these tasks evolve then so does AI's potential. In Tesler's theorem, this is known as the ‘AI Effect’: an understanding that AI is ‘whatever hasn't been done yet’.2
When contextualising this to medicine, it is useful to classify technologies into ‘augmented’, ‘narrow’ and ‘general’. Augmented AI is an adjunct to human intelligence and does not replace it; it is a tool that assists humans towards an end. Narrow AI (also referred to as ‘weak’ AI) is focused towards a specific task within a limited range of pre-defined functions; it does not exhibit the flexibility to work outside of these parameters. Artificial general intelligence (AGI; also known as ‘strong’ or ‘true’ AI) refers to a machine with the ability to perform any intellectual task that a human could. When John Searle coined this term in 1980, it included attributes of consciousness, sentience and mind; however, the moral legitimacy of machines mimicking characteristics ordinarily attributed to ‘personhood’ remains an active debate.3,4
Yet, by extension of being able to perform any task that a human could, a machine that meets the requirements of AGI possesses the basic necessities to perform as, or at least to qualify to become, a doctor. In this evolving fast-paced field, I explore the distinctions between doctors and AGI, and the emerging prerequisites for AGI performing as clinicians. In doing so, this work necessitates the requirement for a framework of criteria to classify the involvement of AI in medicine and prepare for the advancement of AGI.
Achieving AGI in medicine
The original test for an operational definition of AGI was the ‘imitation game’ proposed by Alan Turing in 1950 (colloquially known as the ‘Turing Test’).5 This test assesses the ability of a machine to exhibit sufficiently intelligent behaviour that it is indistinguishable from a human in a written conversation to a third party. Since this first proposition, alternate proposals have been put forward. Steve Wozniak, co-founder of Apple, has suggested the ‘coffee test’ for AGI.6 This is a functional test with the requirement that a machine enters a home, finds coffee equipment and uses it to make a cup of coffee. Goertzel proposes a more academic challenge and suggests that to achieve AGI, machines need to attend a university and pass exams.7
These tests exhibit some key characteristics of doctors: the ability to learn, make decisions and communicate with people. Technology is already complementing clinical practitioners through machine learning in augmented and narrow intelligence applications that enhance diagnosis, understanding and treatment.8,9 One prominent example is AI-enhanced retinal examination. This was originally designed to encourage general and emergency clinicians to not forgo diagnostic retinal examination nor rely on secondary referrals to ophthalmologists. A deep-learning system used fundus photographs to identify optic disks with papilloedema, normal disks, and disks with non-papilloedema abnormalities to an acceptably high sensitivity.10 These innovations expand the capability of the standalone physician. Governments are recognising and rewarding this potential; for example, the UK government has committed over £200 million in funding to AI in the field of prevention, early diagnosis and treatment of chronic diseases.11
This transformative effect of AI on medicine is reflected in the 20,000 publications on this topic in the last 10 years.12 Interestingly, over half of these were produced by teams in the USA (based on the institutional affiliation of the lead author); yet, the UK did not feature in the top 20 countries for medical AI outputs in that literature review.12 When stratifying these publications, the most common research themes were ‘cancer’ (˜23%), ‘heart disease’ (˜7%), ‘vision’ (˜6.5%), ‘stroke’ (<5%), ‘Alzheimer's’ (<5%), depression (<5%), kidney disease (<5%) and diabetes (<5%).12 The authors noted that there is a scarcity of publications regarding the ethics of AI (0.7%) despite its potential widespread use and implications. This supports the notion that a robust framework that can act as a normative regulation and decision-making tool may be required.
The potential applications of AI are immense and include tools that are broadly classified as for prognosis, diagnosis, treatment, clinical workflow and expertise.8 More specifically, tools have been designed to improve online consultations, optimise medications in chronic diseases, match tumour genomics to clinical therapy trials and design new drugs.9 When looking to the data driving these changes, the majority has been from patient records, imaging, genetics and electrodiagnosis (eg electromyography).13 Notable outputs have included its applications in diagnosing skin lesions, predicting suicide risk and detecting sepsis.14–16 Extraordinarily, tools are also being designed to help clinicians with clinical processes; one narrow AI tool to help physicians provides real-time instructions/guidance for performing cardiac ultrasounds to optimise views/images.9 As innovation continues to grow, the conversation is shifting to the safety of these interventions. While the development of automated clinical algorithms becomes more easily achievable, the prospect of a reproducibly safe and accurate automated tool or clinician is inevitable.
Defining doctorhood
A doctor is more than just a diagnostic tool. In 2017, a Chinese robot (named Xiaoyi) passed a written medical licensing exam comfortably; had it also passed an objective structure clinical examination (OSCE) then this machine would have met the minimum expectation of graduating medical students worldwide.17 In a Goertzel-inspired manner, this would provide the simplest test of qualification: having the knowledge and minimum social interaction required to pass written medical examinations and OSCEs.
Despite this, a recent survey of general practitioners (GPs) in the UK found that they believe that communication and empathy are exclusively human competencies, and that the scope of AI in primary care was limited to efficiency savings.18 Technology giant, Intel, completed a study in the USA that showed doctors believe that AI will not replace them, rather it will increase their availability for patient care.19 However, they raised concerns about the ambiguity in the algorithm (known as the ‘black box’ dilemma) and the potential for fatal errors. The ‘black box’ issue that clinicians will be unable to understand or rationalise why an AI reaches a decision is one that has been discussed significantly in the literature as a fundamental downfall of AI. Despite its potential, AI also brings new problems; the greatest barriers to AGI include algorithm bias, reward hacking, insensitivity and automation bias (Table 1).20–23
An additional difficulty, first outlined in 1966 by Austin Bradford Hill regarding his novel protocol for randomised controlled trials (RCT) was that determining treatments based on averages does not ‘answer the practising doctor's question: what is the most likely outcome when this particular drug is given to this particular patient?’.24 This incongruence between single-outcome algorithms and heterogeneous treatment effects has been explored in great detail when concerning applicability of RCT results.25 However, when considering complex decision-making, competing diagnoses and balancing clinical risk, then the literature is surprisingly lacking; perhaps, this is due to the assumption that algorithms will eventually learn best outcomes and superiority as they have demonstrated with risk-stratification and management of patients with upper gastrointestinal bleeds (outpatient vs inpatient intervention).26 Decisions to operate are archetypical examples of medical deliberations that require hypothetical-deductive reasoning, individual judgment and heuristics. Innovators have recognised that while AI may not provide a definitive answer, it can be utilised to optimise conditions that favour a specific outcome (eg to operate) through the reduction of risk factors.27 In a study on complex breast cancer patients, the AI decision-making tool was compared with oncologists with the long-term aim of incorporating it in a fashion analogous to a multidisciplinary team (MDT) meeting to aid lone physicians.28 An alternative option to selecting between therapies in complex patients would be utilising historical patient decision making, particularly in the context of patients who lack capacity.29,30 While patient-led decisions have been explored previously, particularly in the context of medical versus surgical management of prostate cancer, it risks perpetuating established inequalities in health outcomes due to ethnicity and socio-economics prejudiced decision-making.31
These drawbacks need to be acknowledged in the design and implementation of a safe medical AI framework through an iterative learning process that recognises their influence on decision making and aims to mitigate their effect. In this sense, it is similar to unconscious bias training in that working towards widespread recognition and training on the issue helps reduce its influence. However, AGI has a powerful value proposition to medical services though a non-fatigable decision and action-making tool that can be deployed anywhere with a maintainable computer infrastructure.
Humanity and humility
Human interaction is a fundamental philosophy of current medical practice. The character and conduct of the clinician influence how patients interact with their diagnosis and can even improve outcomes. A Lancet review reported that a consistent finding for improving outcomes was a warm, friendly and reassuring clinician.32 This ‘doctor effect’ has been reinforced in the primary care setting whereby, typically, human skills of empathy and reassurance are shown to improve patient outlook and promote change in behaviour.33 However, if patients cannot discern between machines and humans, can this effect still be achieved?
Narrow AI can act as an adjunct to these interactions through providing information, recommendations or results to a clinician that maintains the human relationship with the patient. However, AGI would automate the whole process, including the patient-facing interaction. Early work has shown promise in the use of chatbots as a conversational narrative, but wider (patient-led) validation and factor identification is still required to fully understand what aspect of human interaction contributes to the emotional care of patients and how this affects outcomes.34,35
Furthermore, doctors are bound by a series of duties that are proclaimed during their initiation into the role of becoming a physician, a process herein described as ‘doctorhood’. To doctors, a medical career is a transformational experience that involves empathetic engagement, intimacy and detachment, centred around the formation of human relationships. With over a third of patient symptoms existing without medical diagnosis, doctors play a role in their patients’ lives in the absence of curative options through listening, managing dignity and comfort, acknowledgement and appreciation. This facet of the patient–doctor relationship has been shown to have a small but significant effect on improving patient outcomes.36 It has been termed as ‘emotional care’ and ‘cognitive care’, and refers to the process of exchanging emotions and information between patients and doctors.32 However, for patients with medically unexplained symptoms, AI provides a potentially invaluable source through learning algorithms that mine patient information to see correlations that clinicians may not be able to, so providing subsets of this population with a possible treatable condition or valuable diagnosis.37 The cancer detection tool ‘C the Signs’ has aided thousands of GPs in England to consider and diagnose cancer earlier in patients that present with seemingly disparate symptoms.38,39
Although world-leading experts believe AGI may be achieved in the next 10 years, there still exists a large deficit in the literature pertaining to patient attitudes on the use of AI in medical encounters. It is not known in what capacity a patient would be willing to interact with AGI, or even the cultural or demographic permeability of such an approach. Often, patients may attend appointments with a medical problem that is not their primary concern and are reliant on the doctor to elucidate the true reason for their presence. The Academy of Medical Royal Colleges have expressed their concerns regarding the ‘gaming’ of coding that may prioritise diagnosis over patient care.40 However, the technology for detection of mental health disorders, early brain degenerative disorders and subtle micro-expressions is in development and is likely to improve in the near future; this may mitigate these issues and expand the holistic diagnostic ability of these tools but this is yet to be proven.41–43
A test of doctorhood
With the potential imminent arrival of AGI and its foreseeable use in medicine, it is beneficial to classify different forms of AI within medicine and create a framework from which leading institutions can define specific criteria for AGI. While other groups have looked at ways to classify specific applications of AI in medicine (such as for radiology, data learning algorithms and clinical decision-making), there is no unifying or central framework from which regulatory or specialist groups can develop their policies.13,44,45 When considering long-term goals of AI within medicine, this could lead to significant heterogeneity in standards across specialties and locations. In deriving a normative framework, one could contemplate the topics discussed thus far and, specifically, the characteristics that tools require in order to achieve AGI in the healthcare field.
In the progression from augmented intelligence to AGI (worthy of doctorhood), this article has explored how a system needs to achieve minimum levels of competency in knowledge, safety and emotional care. Additional supplementary skills can be determined by specialty bodies; these may include practical ability for surgery, administrative abilities for interns/junior doctors or higher empathetic/conversational thresholds for palliative care and psychiatry. However, this proposal does not discriminate between ‘good’ and ‘bad’ forms of doctorhood, which will only be reflective of the quality of the input of the system.
This Automation of Doctors and Machines (ADAM) framework is set out in Fig 1. As the level progresses, so do the minimum requirements in the core competencies.
Knowledge: sufficient levels of information and decision making to carry out the task.
Safety: a permissible level of accuracy and independence in function.
Emotion: provision of holistic and emotional care for the patient.
Independence: indistinguishable from a human doctor, or patients are aware that it the tool is not human and this is permissible.
Thus, the simplest form of AI in medicine are the machine learning tools that can be applied to sets of data to aid but not determine care (level 1 artificial intelligence doctor (AID) tool). These will typically demonstrate sufficient levels of knowledge, with no requirement for emotion. They will have commercial levels of safety but necessitate a doctor to ensure they are medically safe for application. Examples include reference tools, such as risk stratification scores that produce severity scores (eg for common presentations like pancreatitis or appendicitis). These are widely used already, and could be further ‘automated’ through the automatic generation of scores for patients at presentation.
When the tool develops an independent function outside of the human doctor's care, it becomes a level 2 AID tool. This is where the majority of currently AI applications in medicine are aiming. They demonstrate a clear and narrow function that is both accurate and safe, usually a diagnostic tool (with sufficient knowledge and safety), but require human instrumentation in order to prove effective. A triaging service is an illustration of this stage, as it provides a complete and narrow medical expertise service independent of the clinician, but that ultimately will be reviewed by a clinician. This application is already being explored in triage heavy services such as emergency medicine and radiology. Models have been successfully designed to triage patients with common presentations (eg abdominal pain), likelihood for hospitalisation, for resource (personnel) allocation in remote environments and severity of (radiology) diagnosis.46–49 The cancer diagnostic tool (C the Signs) discussed earlier would fall into this category.
To develop further, a tool must demonstrate some element of emotional and cognitive intelligence to provide care specific for the patient. These tools are still ‘limited’ as they have a precise output but are independent in this end. For example, a tool that can diagnose and prescribe accurately while demonstrating consideration for the patient's life through personalised selection (eg dosing regimen) would fulfil this criterion for level 3. In contrast to polypharmacy rationalisation tools, items in this category would place greater focus on patient preference. For example, they may aid a patient diagnosed with depression pick the best form of therapy or medication after consideration of their priorities, character traits, employment/availability, aims and likelihood of engagement.
The distinction between these tools and level 4 is the ‘personable’ factor that studies attribute to improved health results. While subject to generational influence, this criterion would involve a complete doctor–patient interaction conducted by AI. Due to medical ethics of transparency and honesty, a Turing-like test is not permissible as patients should be informed of the specifics of their service provider. However, at level 4, the use of AI is totally acceptable to the patient. With regard to safety, a similar system is adopted to specialty trainees, whereby a senior doctor verifies and monitors the output and practice of this stage. This had the added benefit of enabling parallel streaming of patients in a conventionally safe manner.
Finally, level 5 is achieved when the AGI functions as a totally independent and autonomous practitioner with sufficient knowledge and ability to complete the patient–doctor interaction without validation. The pre-validation of these tools will happen in their development. The potential for level 5 AID tools is enormous, and it is difficult to predict whether these will be single-specialty based or demonstrate cross-speciality autonomy. This level also presents its own complexities with uncertainty regarding the ultimate responsibility and liability for the care provided.50–54
The benefit of a graded system, such as the ADAM framework, is the enhanced ability of unified monitoring, evaluation and learning. Its stepwise progression lends itself to a layered approach that would allow external regulatory and specialty bodies to define complexity, safety and intervention in a graduated manner.
Implementation the ADAM framework in practice
The ADAM framework is analogous to the classification of driving automation proposed by the Society of Automotive Engineers (SAE; Fig 2).55 It provides a simple tiered framework with clear designations of the characteristics and distinction of human versus automated involvement at each level. This framework is unique in that it directly calls for emotive aspect of patient care and doctorhood to be a part of the expansion of AI, and provides a way to quickly and easily compare multiple new technologies. It would act as a centralised framework from which specialist bodies (eg royal colleges) and regulators could classify new and emerging technologies. In some developed form, it may act as a way to assess automated tools in order to make them suitable for medical environments.
In implementing AI in medicine, it will be important to determine the level of standards and regulation for different tools. With so many groups working on innovations, it will soon become challenging to set individual standards for each form of technology. Specialist bodies and regulators will be able to define their own requirements for the four domains at each level of the ADAM framework. In turn, this will provide a simple way to implement broad rules that hospitals and innovators can work towards in order to gain some form of accreditation from the standard-setting groups.
In practical terms, this would enable quick comparison of the function of multiple technologies and aid healthcare organisations to differentiate between their capabilities. It is highly likely that further work on this framework would be needed, particularly to subcategorise technologies within each level. This is because, at current progression, each level will be achieved in a time-dependent manner, with level 5 occurring furthest in the future. As such, time urgency may necessitate further requirements within each domain at the most imminent level (at this point, level 2). However, the hope is that the introduction of uniform considerations for new technologies will help to align the thoughts of the groups that are attempting to rationalise this ever expanding and complicated field. In making the framework simpler in its first iteration, it aims to also aid the clarity and understanding for patients who will be able to concisely see what form of technology they are engaging with.
Conclusion
The development of AI brings with it an exciting era of modern medicine. In order to fully enhance, expand and regulate this field, the ADAM framework provides a tool to classify its use in medicine. In being able to categorise forms of medical AI, this allows clinicians, patients and regulators to delineate different forms of AI, and a foundation is created from which governing bodies can set and standardise levels of care.
- © Royal College of Physicians 2021. All rights reserved.
References
- ↵
- Mccarthy J
- ↵
- Maloof M
- ↵
- ↵
- Hildt E
- ↵
- Turing AM
- ↵
- Fast Company
- ↵
- Goertzel B
- ↵
- ↵
- Meskó B
- ↵
- ↵
- Department for Business, energy and Industry Strategy
- ↵
- Tran B
- ↵
- Jiang F
- ↵
- ↵
- Walsh CG
- ↵
- ↵
- Galeon D
- ↵
- ↵
- Intel
- ↵
- Weber C
- ↵
- Keskinbora KH
- ↵
- Challen R
- ↵
- ↵
- Hill AB
- ↵
- Kent DM
- ↵
- ↵
- Loftus TJ
- ↵
- Xu F
- ↵
- Lamanna C
- ↵
- ↵
- ↵
- ↵
- ↵
- Greene A
- ↵
- Inkster B
- ↵
- ↵
- ↵
- Bakshi B
- ↵
- NHS England
- ↵
- Academy of Medical Royal Colleges
- ↵
- IBM Research
- ↵
- Luxton DD
- ↵
- Khoury N
- ↵
- ↵
- ↵
- Farahmand S
- ↵
- ↵
- Kim D
- ↵
- Weisberg EM
- ↵
- ↵
- Anderson M
- ↵
- Sullivan HR
- ↵
- Keskinbora KH
- ↵
- ↵
- Shuttleworth J
Article Tools
Citation Manager Formats
Jump to section
Related Articles
- No related articles found.
Cited By...
- No citing articles found.