Part of Abebe's presentation.

Cornell researcher talks about data gaps in research for people of color

By Nathan Stiff

Even in today’s data-driven world, we lack information about some of the most vulnerable populations, leading to ineffective policymaking, according to Cornell University researcher Rediet Abebe.

Abebe gave a talk March 5 at the Computer Science Instructional Center on what she calls the “data gap,” and how she has confronted it in her research. She is a computer science researcher born in Ethiopia and co-founder of Black in AI, a group aiming to increase the presence of black researchers in the AI field, as well as the Mechanism Design for Social Good research initiative.

Abebe spoke about the lack of health data for marginalized populations, which she said can severely impact the quality of care they receive. The United States’ high-tech health system uses massive amounts of data to determine how to invest.

“There’s this sort of nice feedback loop that’s happening where the availability of data is enabling better health policy, and better health policy is making it easier to collect higher quality data,” Abebe said.

Abebe said there is a  lack of public health research and solutions in developing nations, where detailed health data is often unavailable. She discussed her research at Microsoft, which used Bing search data as an alternative source for researching information needs and misconceptions about HIV, malaria, and tuberculosis in Africa. She focused on HIV in her talk.

The study used topic modeling, a machine learning tool for organizing large amounts of text. The tool groups words into clusters, or “topics,” based on how likely they are to appear in the same search query. She highlighted six of these topics in her talk, which included searches related to HIV symptoms, the stigma against HIV and so-called “natural cures” for HIV. The study, currently available as a preprint, could be used to determine the types of HIV information Africans search for, and what to prioritize in public health research and education in Africa as a result.

She also talked about the limitations of this research; those with internet access, as those who use Bing are not necessarily a representative sample of Africa as a whole. The study also only analyzed English-language searches. Despite these issues, Abebe said these methods could be useful for filling in the gaps in health research.

Abebe said the data gap is a problem in the U.S. as well, where lack of data on underrepresented groups such as black and Latinx women lead to gaps in researchers’ understanding.

Patrick Kamongi, a computer science post-doc and NIST researcher from Rwanda, found Abebe’s research very interesting.

“There’s many good [things] that can happen, but from a technical point of view this is a very good area for AI, for computer science, for data science, for informatics, for medicine, bridging the gap of the computation aspect of that,” Kamongi said.

Abebe’s talk was part of the Department of Computer Science’s Inclusion Speaker Series, which aims to foster inclusion by featuring speakers from groups currently underrepresented in computer science. Brandi Adams and Freddie Salley founded the series last semester.

“A lot of people say that one of the reasons why we don’t have more women, and more people of color, and more women who are people of color who are computer science professors, is that they don’t have people to model themselves after,” Adams said. “Our idea was to come up with a series in which we have these very people give technical talks.”

According to University of Maryland data, in 2018, more than 80 percent of students who graduated in 2018 with a bachelor’s degree in computer science were male, and less than 12 percent were considered underrepresented minorities.

Nimo Hired, a junior computer science major, said this was her first Inclusion Speaker Series talk.

“I thought it was very inspiring also that it was taught by a black woman,” Hired said. “I’m also Somali, so I’m from an African country, and I thought the whole idea of using search to figure out disparities and health and the combination of using tech for public good and public health was just inspiring in itself.”

Leave a Reply