Diffbot’s A.I. Engine Draws Global Map of Machine Learning Expertise

Xconomy San Francisco — 

A year ago, the leading Chinese Internet company Tencent Holdings pegged the global number of artificial intelligence researchers and professionals at 300,000 or less—just as the unmet demand for such experts was pushing salary offers to as much as $1 million. In February, the Canadian firm Element AI estimated that talent pool at no more than about 90,000, Bloomberg reported.

Now, Silicon Valley company Diffbot has used its A.I.-powered fact-mining engine to comb the global Web and make its own census of skilled people in the field.

“We found that there’s much more,” Diffbot co-founder and CEO Mike Tung says. Rather than looking for all A.I. talent, Mountain View, CA-based Diffbot searched only for people with expertise in machine learning, a much-valued specialization within A.I. In a report released this week, the company says it found 720,325 professionals with machine learning expertise, and 221,592 of them are in the United States alone. Diffbot says it’s the single largest survey of machine learning skills ever compiled.

The report demonstrates the growing potential of A.I. software to quickly amass data and analyze it, compared with traditional methods such as time-consuming surveys of limited population samples, and the extrapolation of those results to estimate the size of entire groups. Diffbot’s study may also contribute more granular evidence showing which countries are leading the way in A.I. technology development, and whether the field’s much-lamented talent shortage is as deep as hiring companies fear.

The fastest-growing job category in a 2017 study by LinkedIn was machine learning engineer. These experts design advanced software systems that can change their behavior based on “insights” from the results of their earlier actions.

Tung says Diffbot’s search method, which employs software that uses A.I. technologies such as machine learning, computer vision, and natural language processing, allowed the company to turn the Web into a sensor that used A.I. to detect professionals with advanced A.I. skills.

“Like Cerebro in ‘X-Men,’ you can find all the mutants in the world,” he says.

By sharing summary data about the thousands of experts it found, Diffbot is revealing what Tung calls a competitive advantage that his company has long held in its own hiring of A.I. experts. Diffbot has deployed its automated Web-scouring engine to find the types of job candidates who are key to its progress in the development of that core product itself.

“We’ve been using it for this reason for many years,” Tung (pictured in center above) says.

Now, by releasing an analysis of the database of machine learning talent it compiled, Diffbot is hoping to demonstrate the kind of information that clients can derive by the same means—about A.I. workers and many other topics.

Diffbot’s customers can query its “Knowledge Graph” of more than a trillion facts gleaned about 10 billion “entities,” which include people and products.

Diffbot has extended the reach of Web data capture by scanning types of items not usually tracked by search engines, such as advertisements, images, and the reader comments posted below articles. The system finds connections among the facts scooped up from these public sources—like linking a product’s description to all the prices for it found in current ad displays. Diffbot structures the facts within its searchable Knowledge Graph, which is continually updated.

Diffbot, founded in 2008, attracted customers including Cisco (NASDAQ: CSCO), Salesforce (NYSE: CRM), and Crunchbase while operating in beta mode until August, when it opened up its “knowledge-as-a-service” tool to the general public.

The August announcement sparked interest among new customers, Tung says, and Diffbot’s staff of about 30 are working to help these potential clients integrate the company’s services into their existing systems.

The company’s machine learning expertise report is Diffbot’s first major release of a study based on data in its Knowledge Graph, Tung says. Diffbot may produce more such reports if people find them interesting and useful, he says.

The global machine learning expertise map

Tung says Diffbot found a significantly greater number of A.I. experts than Element AI or Tencent (which is one of Diffbot’s investors) because its Web-crawling engine casts a much wider net, and works in multiple languages. Montreal-based Element AI had relied on LinkedIn profiles to estimate the number of A.I. professionals, according to Bloomberg.

To find people who identified themselves as skilled in machine learning techniques, Diffbot’s algorithms scanned an array of document types, including publicly posted resumés, curriculum vitae, personal Web pages, company staff biographies, university faculty directories, news articles, scholarly publications, papers found through searches of Google Scholar, and professional sites such as GitHub’s.

Tung says top academics and machine learning experts at companies are more likely to be found through these sources than through LinkedIn, where new graduates and jobseekers commonly create profiles.

Diffbot’s count includes a wide variety of professionals, including those who are not PhD’s. It encompasses top engineers who can build entire machine learning systems and practitioners who can write code. Diffbot’s Web-crawling engine picks up on terms in addition to “machine learning” that indicate when a person is involved in the field, including “neural networks,” and “TensorFlow.’’

On the other hand, it would not award a place on the experts’ list to the drummer for a band dubbed “Machine Learning,’’ Tung says. Due to the search engine’s natural language processing capablities, it can identify the sense in which a term is being used, he says.

Companies using the database as a recruiting resource would, of course, have to verify the expertise of a candidate through traditional means such as interviews and testing, Tung says.

Once the machine learning experts were compiled as a cohort, Diffbot’s group data could be sorted out by various factors, such as national origin, place of current employment, gender, education, and professional background, Tung says.

For example, Diffbot found that women make up a bit more than 24 percent of U.S. machine learning experts—a gender diversity score that was lower than that in China and four other countries. More than 51,000 women are employed as machine learning professionals in the United States, Diffbot found. Tung says this identified talent pool could be a hiring resource for companies trying to correct a gender imbalance caused by institutional bias.

With its 221,592 experts, the United States employs 30.8 percent of the global talent pool in machine learning, followed distantly by India, where 59,980 are employed, Diffbot found. Ranked next are the United Kingdom, Canada, China, and France. If California were a country, it would rank above India. It employs 74,791 machine learning professionals—more than New York, Texas, and Massachusetts combined.

In U.S. hiring, Google and Microsoft led the pack, with more than 4,000 machine learning experts each. That’s about four-fold higher than Apple’s count of 1,064, Diffbot reported. Tung says that financial analysts who track the market for technologies involving machine learning might use Diffbot’s engine in a search for correlations between expert staff strength and the performance of new products.

The Diffbot data may also provide some insights to add to the public discussion of a possible “A.I. war” between the United States and Asia, Tung says. The company’s report found that five of the top ten universities producing global talent in machine learning are in China. But the employment pattern suggests a “brain drain” from China. Among graduates from those Chinese university programs, more than 62 percent work in the United States, Diffbot found.

“In the study, most of U.S. A.I. research is being carried out by Chinese or Indian nationals,” Tung says. “A.I. development in the U.S. is very co-dependent on Asia.”

Tung raises a caveat to Diffbot’s findings about staff strength in China, however, because A.I. experts in China are less likely than U.S. professionals to create online materials that Diffbot can scan, such as their own Web pages.

Diffbot’s report doesn’t resolve a burning question about the A.I. job market: How big is the shortfall between the number of trained professionals and the number of open jobs? That widely held perception of a significant candidate shortage has helped drive up compensation for A.I. experts.

Tung says Diffbot doesn’t yet capture structured data about jobs and job postings—but it plans to, he says.

“It’s on our roadmap,” Tung says.

Photo courtesy of Diffbot