Smartphone Apps Meet Evidence Based Medicine

Smartphone Apps Meet Evidence Based Medicine

“The future of medicine is in your smartphone,” proclaimed an eminent medical researcher in a 2015 Wall Street Journal essay. In a sense, the future is already here, judging from the proliferation of apps and medical devices that are connected to smartphones. One industry study in 2015 identified more than 165,000 health-related “apps” for smartphones on Google Play and the Apple iTunes store. But how much does this technology lead to improved patient outcomes? That question is one of evidence based medicine, to be answered by clinical trials and systemic reviews by medical experts.

An important partial answer to this question came in mid-December 2016, when the Cochrane Collaboration (which conducts highly regarded systematic reviews of medical interventions of all sorts) released a major report on the use of automated telephone communications systems (ATCS) for preventing disease and managing chronic conditions [1]. This massive 533-page report evaluated 132 clinical trials with over 4 million participants, done according to the rigorous process that Cochrane uses for its assessments of other interventions.

The authors defined ATCS broadly as “a technology platform through which health professionals can collect relevant information or deliver decision support, goal setting, coaching, reminders or health-related knowledge to consumers via smartphones, tablets, landlines, or mobile phones, using either telephones’ touch-tone keypad or voice recognition software.” Thus, ATCS encompasses a range of services, from simple telephone reminders of appointments, to two-way communication between patients and providers that may include transmission of patient data to healthcare providers. A related rubric is mHealth, which the World Health Organization defines as “medical and public health practice supported by mobile devices, such as mobile phones, patient monitoring devices, personal digital assistants, and other wireless devices.”  The review covered studies published between 1980 (when, presumably, some patients were still using rotary dial phones) through June 2015. However, a large fraction (perhaps most) of the papers examined in the Cochrane review involved use of smartphones and can be considered forms of mHealth.

The Cochrane report had positive but also mixed conclusions: “Our results show that ATCS may improve health-related outcomes in some long-term health conditions,” remarked lead author Josip Car from the Lee Kong Chian School of Medicine, Nanyang Technological University, Singapore. But many studies reviewed by the project failed to show improvements in patient outcomes from use of ATCS technology. Where improvements were observed they were often modest in size (Table 1). Much of the evidence for effectiveness, the report concluded, was “moderate” or “poor” in quality due to limitations in the clinical studies.

Table 1. Capsule Summary of Cochrane Review of ATCS (Dec. 2016)

Outcomes of apps for preventative care

Probably does have a positive effect on: Probably little effect on:
  • increase in immunization uptake in children
  • probably slightly increase in cervical cancer screening
  • improved medication adherence
  • increase in uptake of screening for breast cancer and colorectal cancer
  • uptake of breast cancer screening
  • clinical outcomes related to medication adherence (blood pressure control, blood lipids, asthma control, therapeutic coverage)


Outcomes for condition-specific apps

Positive outcomes in: Little or no effect on outcomes related to: Insufficient evidence related to:
  • decreases in cancer pain
  • decreases in chronic pain
  • decreases in depression
  • heart failure
  • hypertension
  • mental health
  • smoking cessation
  • preventing alcohol/
    substance misuse or managing illicit drug addiction
  • asthma
  • chronic obstructive pulmonary disease
  • hypercholesterolaemia
  • obstructive sleep apnoea
  • spinal cord dysfunction
  • psychological stress in careers [of patients]

To some extent, deficiencies in the apps themselves may have led to their ineffectiveness. In late December 2016, we searched the ISI Web of Science using terms “smartphone and app and health”. Our search uncovered 304 papers, most of which appeared in the past two years (Figure 1). Roughly 20 of these papers were comprehensive reviews of smartphone apps in particular medical fields.

Figure 1: Papers published on mHealth apps by year.

Figure 1: Papers published on mHealth apps by year.

Most of these reviews expressed optimism about the potential usefulness of apps and of mHealth in general, but also most noted many limitations in the apps (Table 2). Indeed, the weakness of the medical component of many healthcare-related apps is painfully obvious. Many of the apps were insufficiently informed by best practices developed over the years by experts in health interventions and did not follow guidelines of evidence based medicine. Some even lacked obvious measures to ensure validity: in their 2014 review of 107 apps for self-management of hypertension, Kumar et al. found that “none of these apps employed the use of a [blood pressure] cuff or had any documentation of validation against a gold standard. One author considered mHealth to be “a strategic field without a solid scientific soul.”

Table 2. Seven comprehensive reviews of smartphone medical apps that appeared in 2016.

Except as noted, all of these apps are intended for use by consumers.

Category: Stress management (Coulon 2016)

92 apps screened and evaluated, of which 60 met inclusion criteria

Conclusion: “32 apps included both evidence-based content and exhibited no problems with usability or functionality” “these apps have the potential to effectively supplement medical care”

Category: Weight loss app for Arabic speakers (Alnasser 2016)

298 apps screened and evaluated, of which 65 met inclusion criteria

Conclusion: “The median number of evidence-informed practices was 1, with no apps having more than six and only nine apps including four to six.. These findings identify serious weaknesses in the currently available Arabic weight-loss apps.”

Category: Mental health (Radovic 2016)

208 apps reviewed (chiefly, symptom relief and symptom relief and general mental health education)

Conclusion: “Most app descriptions did not include information to substantiate stated effectiveness of the application and had no mention of privacy or security. Due to uncertainty of the helpfulness of readily available mental health applications, clinicians working with mental health patients should inquire about and provide guidance on application use, and patients should have access to ways to assess the potential utility of these applications.”

Category: Inflammatory bowel disease self-management (Con 2016)

238 apps screened and evaluated, of which 26 met inclusion criteria

Conclusion: “Apps may provide a useful adjunct to the management of [inflammatory bowel disease] patients. However, a majority of current apps suffer from a lack of professional medical involvement and limited coverage of international consensus guidelines.”

Category: Suicide prevention (Larsen 2016)

123 apps screened and evaluated, of which 49 met inclusion criteria

Conclusion: “All reviewed apps contained at least one strategy that was broadly consistent with the evidence base or best-practice guidelines. … Potentially harmful content, such as listing lethal access to means or encouraging risky behaviour in a crisis, was also identified.” “Clinicians should be wary in recommending apps, especially as potentially harmful content can be presented as helpful.”

Category: Weight management (Bardus 2016)

Review of 23 apps

Conclusion: “overall moderate quality … more attention to information quality and evidence-based content are warranted”

Category: Nutritional tracking for diabetics (Darby et al 2016)

11,000 apps screened and evaluated, of which 42 met inclusion criteria

Conclusion: “The apps considered in this review provide great potential for improving outcomes in patients with diabetes but also do include some limitations… In the future, actual testing of a subset of apps in patients with diabetes should be considered”

Even worse, some of the apps are potentially dangerous: in his 2016 review of suicide prevention apps, Larsen et al. complained that some apps provided “potentially harmful content, such as listing lethal access to means [of suicide].” Many apps raised obvious privacy concerns for patients and can be suspected of selling patient data.

Detecting Atrial Fibrillation

But clearly, mHealth offers unprecedented possibilities with far-reaching consequences by enabling long-term monitoring of patients for medical conditions, analyzing health data on smartphones held by the patient, and transmitting data to healthcare providers.

Screening for “silent” atrial fibrillation (AF) is a case in point. Unlike ventricular fibrillation, which is lethal within minutes, AF is not immediately life threatening, but over time it increases the risk of stroke and other events. Persistent AF is easy to diagnose with an electrocardiogram, but sporadic bouts of AF may be asymptomatic and remain undetected. Some experts estimate that about one quarter of AF cases are “silent” and presently undiagnosed.

mHealth for Screening for AF

The prospective benefits of screening—for any medical condition—both to the patient and to the population in terms of reduced burden of disease, depend on many factors. Two recent studies suggest scenarios for screening for AF that could have a substantial health impact.

The first is a 2016 study by Belgian investigator Lien Desteghe and colleagues, who compared the performance of two handheld ECG devices (Kardia Mobile by AliveCor), and a second device (MyDiagnostik, Applied Biomedical Systems, Maastricht, the Netherlands) against a standard 12-lead ECG examination. The study included 469 patients in cardiology and geriatric wards of two Belgian hospitals. For the geriatric patients, the AliveCor device correctly identified approximately 80% of the individuals with AF, while confirming the absence of AF in virtually all of the patients who did not have that arrhythmia as determined by the “gold standard” ECG test. (The performance of the MyDiagnostik device was similar).

While these investigators found the devices to be “suboptimal” in performance compared to the “gold standard” 12-lead ECG, they nevertheless found important benefits of the technology. Screening patients for AF with such devices is much cheaper than having a cardiologist read a standard 12-lead ECG, and more reliable than having a nurse check the patients’ pulses for abnormal rhythms. They estimated that if a “structured screening strategy” were used (limiting screening to patients with no known AF who had no implanted cardiac devices), new AF cases could be identified and strokes prevented in this population at comparatively low cost. Whether similar encouraging results would be found with other screening populations remains to be seen.

In a very different setting, last year a team of investigators led by McManus (University of Massachusetts) reported a study that screened residents of four villages in rural India with the AliveCor device. The investigators identified AF in 5% of the screened individuals, an incidence rate similar to that in Western countries. This incidence rate is much higher than previously accepted for India—evidently most AF cases escape diagnosis in the Indian healthcare system. “Mobile technologies may help overcome resource limitations for atrial fibrillation screening in underserved and low-resource settings,” the authors conclude. The devices are inexpensive and can be used reliably by village healthcare workers with little formal medical training, both important benefits.

In 2013, David McManus and colleagues from the University of Massachusetts and Worcester Polytechnic Institute, Worcester, Mass., showed that a smartphone, with its camera placed against the skin, can measure pulse irregularities and reliably detect AF. Since then, six, albeit small, clinical studies have shown that smartphones can reliably detect AF.  The apps measure pulse rate, either using the phone’s camera to detect pulsatile changes in blood flow (photoplethysmography, the approach taken by McManus) or by recording single-channel ECGs via electrodes attached to the smartphone, and use sophisticated algorithms to identify AF.

An example of the second approach is Kardia Mobile by AliveCor (Mountain View, California). This $US100 device is connected to the phone by a wireless link and is normally mounted to the back of the phone. When the user touches a pair of electrodes on the device, it generates a one- lead ECG signal that can be stored in the phone, viewed by the patient (after authorization by a physician) or transmitted to healthcare providers. The companion app provides an “instant analysis” that tells the patient if AF is present.  In March 2016 AliveCor announced a wristband for an Apple Watch that had embedded conductors and is wirelessly linked to the watch. By touching the outer surface of the band, the user can capture 1 lead ECGs and have the device indicate the presence of AF or normal heart rhythm.

Other apps attempt to detect AF use only the phone, placing its camera against the skin. These have had mixed success at best. Reviews by users of one app (Photo AFib Detector, available both on ITunes and Google Play Store) complain that it failed to detect their AF. One reviewer complained that the app’s only means of sharing the results is by posting them on Facebook, which raises obvious privacy concerns. CCapp, the Hong Kong company that developed the app, provides no information on how the app had been tested or its accuracy. The company has no apparent medical competence; its main product is an app that uses artificial intelligence to identify optimal locations of a house or office according to the traditional Chinese principles of Feng Shui.

Apart from smartphone apps, many new mHealth devices for cardiology are coming onto the market. Zio Patch (IRhythm Technologies) is a stand-alone device that attaches to the body, and records a single channel ECG for up to 14 days. Then the patient mails the device to the company, which analyzes the data and prepares a report to the patient’s physician.

Using such devices to diagnose AF under physicians’ orders would pose few problems (assuming they work reliably). Doctors have been using Holter monitors to collect 1 or 2 days of ECG data for diagnosis of arrhythmias since the early 1970s. The new elements are sale of ECG screening devices directly to the public, bypassing traditional channels of distribution of monitoring devices, and the push to use such devices to screen asymptomatic individuals for “silent” or asymptomatic AF.

Selling such devices directly to the public will create new pressures on the healthcare system. On its website (and in many ads on the Internet), AliveCor says that a user can “relay [heart activity data] to your doctor to inform your diagnosis and treatment plan.” But a busy healthcare facility such as the Hospital of the University of Pennsylvania has thousands of cardiac care patients, and is ill equipped to handle streams of data of variable quality sent to it by thousands of patients with wireless-enabled monitoring devices. Mechanisms must be established to compensate healthcare facilities for the additional staff time needed to manage and interpret these data.

Despite occasional reports of incorrect detection of AF  by the AliveCor device, it appears to be generally reliable. Such wearable devices can improve patients’ awareness of heart health and facilitate treatment by their doctors. However, the benefits of screening of asymptotic individuals for AF, which is facilitated by mHealth devices, remain unclear, particularly for individuals suffering rare bouts of AF.

“How much atrial fibrillation constitutes a mandate for therapy?” asks one 2016 set of evidence based guidelines for management of AF. Diagnosis of AF will lead to anticoagulant therapy, which has its own set of risks to the patient, and it will increase the patient’s utilization of medical services with potentially significant economic consequences.

Insurance giant Aetna currently covers the use of long-term monitoring of patients by the IRhythm Technologies and other devices to diagnose patients with unexplained symptoms suggesting cardiac arrhythmias—as it has long done with Holter monitoring. Aetna presently does not cover use of AliveCor and several other devices “because their clinical value has not been established.” Nor does it cover the costs of screening asymptomatic individuals for AF in the absence of clinical proof that such screening actually improves patient outcomes. Needless to say, obtaining such proof is far harder than simply showing that the devices can reliably detect AF.

To assess the effectiveness of new mHealth technologies for early detection of AF, a large randomized clinical trial, the mHealth Screening To Prevent Stroke Trial (mSToPS) is currently in progress. The trial, sponsored by Scripps Translational Science Institute, will be carried out in collaboration with Aetna. The study will match 2100 patients at high risk but not previously diagnosed with AF, against 4000 controls. The monitored subjects will be provided with a wearable ECG recording patch (the Zio Patch) or with a wristband device developed by Amiigo (North Salt Lake, Utah) combined with a proprietary app that detects AF. The investigators will follow patients for 3 years to compare rates of AF diagnosis, incidence of stroke and other consequences of AF, as well as differences in healthcare use and costs. The comparison population will be unscreened individuals using claims data from Aetna. The study began in November 2015 and is expected to conclude in September 2019.

What Is To Be Done?

The reviews we examined frequently called for more regulation of medical apps. The U.S. Food and Drug Administration (FDA) considers an app to be a medical device if it meets the statutory definition of medical device (broadly, something that is used to diagnose or treat a disease) and hence subject to FDA premarket approval requirements. The AliveCor and IRhythm Zio patch both have FDA clearances as Class 2 devices, on the grounds that they are functionally equivalent to other established products (electrocardiographs and data recording devices, respectively).

But many smartphone medical apps are not subject to FDA regulation since they do not meet its definition of medical devices. This includes many apps intended to help patients manage chronic illnesses such as those covered in the recent Cochrane review. Some of these apps may pose significant risks to patients, raise privacy concerns, be poorly designed or be difficult for patients to use effectively. With the new Trump administration, the FDA is likely to have little appetite for increasing its regulation of this technology.

App rating systems, for example the recently proposed MARS system, can help raise the quality of apps by enforcing standards of usability and information quality. However, this does not address the major question: how much do the apps improve patient outcomes?

For their part, healthcare providers and insurance companies can publicize high quality apps. For example, insurance giant Humana lists on its website “5 great healthcare apps”. But even recommended apps may still lack substantial clinical evidence for effectiveness. That kind of evidence is simply too expensive to obtain. One would have to sell many $5 apps to be able to support even simple clinical trials, let alone conduct the much more extensive trials needed to establish improvements in patient outcome.

In important ways smartphone technology for healthcare remains at what one firm (Gartner, Stamford, Connecticut) calls the “hype” stage of innovation, with excessive optimism by many people about the wonderful things a new technology can accomplish. With time, Gartner explains, the initial enthusiasm gives way to disillusionment as the limits of the technology become apparent. That is clearly happening now, with glowing promises about the use of smartphones in healthcare (as in the opening quote in this article) being followed by numerous less-than-stellar evaluations of its medical effectiveness. Finally, as people gain more experience with the technology, they develop more realistic expectations about what it can accomplish and durable success stories can emerge. This may well happen with mHealth technology to screen for “silent” AF, but we will have to wait for the conclusion of a lengthy and expensive clinical trial to know for sure.


  1. P. Posadzki et al., “Automated telephone communication systems for preventive healthcare and management of long-term conditions,” Cochrane Review, Dec. 2016.