Knight Foundation: Can Data Work for the People Without Selling Out?
More and more, data gets used to define our lives. But where does that leave the layperson who might not know what to make of this information era? And what can be done with data gleaned from the public beyond helping companies turn a profit? Is there a way to democratize the digital realm?
Last week at Civic Hall in New York, the Knight News Challenge on Data divvied up grants from a $3.2 million pot for 17 winning teams to help them develop technology that tackles matters as far-ranging as public-privacy issues and questionable conduct among police officers. The ideas could help improve and inform journalism around the world, but also spoke to broader issues that extend beyond the needs of newsrooms.
For example, Security Force Monitor, developed by the Human Rights Institute at Columbia University, compiles unstructured data from a host of sources to create a platform that maps out reports of human rights violations committed by police and security forces. Meanwhile research institute Data & Society is working on a website that could help people discover hidden biases in data that might be used to discriminate against them.
The competition, conducted by the Knight Foundation, started off with 1,065 applicants. Eight teams among the finalists each got funding that ranged from more than $237,000 to $470,000 to help make their technology ready for full release. Nine early-stage ideas each received $35,000 so their teams could develop demos within the next six months. The purpose of the competition is to support ideas that can make data work on behalf of communities and individuals.
Some prior recipients of funding from Knight—Emily May of Hollaback; Seth Flaxman, of Democracy Works; and Nancy Lublin of Crisis Text Line—came out to offer up advice for the new winners and to discuss ways data can help improve society.
Hollaback, said May, addresses harassment both online and in the streets. It can be hard to get some folks to believe such problems are real, she said, which necessitates having proof. “What we’re trying to build and use the data for is to effect change long-term,” May said.
That includes research that shows how sharing stories can reduce the trauma associated with harassment. “That’s the crux of our work—looking at things like mapping and content analysis, and partnerships with research institutions,” she said.
Democracy Works started a website called TurboVote in 2010, Flaxman said, to provide info to users on how to vote and stay registered. The project quickly became very data-intensive, he said, as it involved things such as collecting thousands of addresses related to where to send forms for voting by mail. “We had to build out the capacity for data quality assurance before we could even build services on top of that,” he said.
Crisis Text Line is an emotional support hotline for people, available day-and-night by text, said Lublin. From the onset, she said, the plan was to use natural language processing to auto tag the texts in real time. “One of the first people I had to speak with was danah boyd,” Lublin said. “We sat at The Coffee Shop at Union Square for two hours on the weekend. I realized this had to be built around the data.”
A principal researcher at Microsoft Research, boyd is also the founder of Data & Society, which tries to understand how data-driven technology intersects with the world on a social level.
Before Crisis Text Line launched, Lublin said, the team prepared to deal with a very large mental-health dataset. In just over two years, more than 13 million messages have been exchanged via Crisis Text Line. Through machine learning the system recognizes words, even new terms, which can indicate the severity of a person’s message and assign the appropriate priority. “When you go to the emergency room, the gunshot wound is taken before the funny rash,” Lublin said. “A hotline should be the same way.”
So if someone texted to Crisis Text Line they intended to harm themselves, they would be pushed to the top of the queue. The system has learned to recognize shorthand, such as “KMS” (kill myself) as well. Though the service is built for English, Lublin said the algorithm has also started to pick up certain phrases from other languages, faster than the humans who staff the hotline.
Speedy use of data is prized in many circles, but for civic purposes there are concerns that haste can do more harm than good. “There is an ethos of ‘move fast and break things’ in technology,” Flaxman said. “In civic technology, the things that might break are someone’s ability to vote or someone’s mental health.”
Lublin said the new grant recipients should be aware that there are no standards in the still-changing data frontier. That raises questions such as how long an organization such as Crisis Text Line should retain personal identifiable information that has been collected. “It’s noncommercial use; it’s all gated, it’s all encrypted but how long should we hold it for? Five years? Seven years?” she asked. “There ought to be a set of data ethics and standards.”
Meanwhile, Facebook, Twitter, and other companies have set the bar low, May said, on such matters from the public’s perspective. “Everyone is so freaked out that our little nonprofit is going to sell their data,” she said, “and their deepest, darkest secrets are going to be everywhere.”
Pressure to sell personal data can come from many avenues, Lublin said, such as the media, foundations, and donors who see it as a way for nonprofits such as hers to be sustainable. “Pharmaceutical companies, healthcare companies—and even a couple really dickish hedge fund guys who have said, ‘I’ll pay for the data because it’ll give us an edge on things happening in the market,’” she said. “I’m sure insurance companies would live to redline based on it.”
There is an ongoing societal need, however, to support causes where there is no money to be made, Lublin said—such as afterschool programs and homeless shelters. This is why Crisis Text Line may open up its information for noncommercial use to make lives better, but refuses to sell it. “We’re not going to monetize the data,” she said.