All posts by Leah Whitehouse

The Cambridge Women in Data Science Conference

I was recently browsing Harvard’s Institute of Applied Computational Science website and saw there was a Women in Data Science conference. I was so excited to attend, so I set a reminder on my phone. As soon as the conference went live, I forwarded a link to all of my colleagues. Not much later, I started receiving feedback that the conference was sold out! It was really thrilling to see that there was so much interest. The conference was a great opportunity to hear how some women in data science are leveraging machine learning to transform healthcare, and advocating for open science to foster public debate of big data algorithms that are influencing society. Here are some highlights:

When Regina Barzilay, MIT Professor of Electrical Engineering and Computer Science, was a breast cancer patient at MGH, she could see how machine learning could be an approach to uncovering insights in the vast collection of patient information, including mammogram scans, pathology reports, and family history. Today, she’s in remission and collaborates with MGH to train the models to detect high-risk lesions sooner than ever imagined and their likelihood of being cancerous, reducing the number of unnecessary surgeries.

Heather Bell, who leads a digital and analytics department in biopharma, provided a big-picture talk of how various companies are using artificial intelligence to streamline the otherwise long and expensive R&D pipeline. One challenge is that it can take several months to recruit participants for clinical trials. In one example she shared, Clinithink developed a NLP platform that converts written doctor notes to structured data that can rapidly identify participants based on criteria. The platform was shown to recruit 2.5 times more participants in 5% of the time. In another example Heather provided, wearables and web applications are now proving to effectively monitor health between doctor visits. In one study, lung cancer patients responded to a brief questionnaire once a week about various health metrics like appetite and weight. The device algorithm, developed by SIVAN Innovation, generated an alert to the patients’ doctors in the case of a concerning change. Of the intervention cohort, 50% more were alive 7 months longer than the regular follow-up cohort. The trial was stopped early as the effect was so large.

Francesca Dominici, HSPH Professor of Biostatistics and Co-Director of Harvard’s Data Science Initiative, shared her powerful longitudinal study demonstrating an association between exposure to air pollution and mortality risk among all Medicaid beneficiaries (~67 million per year). As the study sparked media headlines and supports more stringent environmental policy during a time it’s hotly debated, Francesca espouses principled data science and an open science framework in which data are publicly available and results reproducible. While an inevitable concern in an open science framework is privacy, it’s worth considering Cynthia Dwork’s invention differential privacy — an effective tool that goes beyond de-anonymization to protect individuals’ identities in research databases. Coincidentally Cynthia was also a speaker at WiDS to discuss her latest endeavor of developing a metric for an algorithm that classifies people as fairly as possible.

Cynthia discussed how subjective this is so in that sense the metric must be culturally aware, which is another rationale for open science.

Rounding out an exciting day of data science, Tamara Broderick, MIT Assistant Professor of Computer Science, discussed achieving accurate Bayesian inferences with optimization, which I encourage you to watch here, as well as some of the other talks I’ve highlighted. It was inspirational to hear these accomplished women in data science presenting some of their impactful research. I am really looking forward to next year’s conference and I hope you are too.

To stay up-to-date on the Women in Data Science (WiDS), go to https://www.widscambridge.org.