大数据 gdelt: Global Database of Events, Language, and Tone

sabre

Data: Querying, Analyzing and Downloading: The GDELT Project

The GDELT Project

www.gdeltproject.org

先声明一下，这个大数据研究项目，我今天早晨刚看到，
第一印象不错

Querying, Analyzing and Downloading

The entire GDELT database is 100% free and open and you can
download the raw datafiles, visualize it using the
GDELT Analysis Service, or analyze it at limitless scale with Google BigQuery.

The GDELT Project is the largest, most comprehensive, and highest resolution open database of human society ever created. Just the 2015 data alone records nearly three quarters of a trillion emotional snapshots and more than 1.5 billion location references, while its total archives span more than 215 years, making it one of the largest open-access spatio-temporal datasets in existance and pushing the boundaries of "big data" study of global human society. Its Global Knowledge Graph connects the world's people, organizations, locations, themes, counts, images and emotions into a single holistic network over the entire planet. How can you query, explore, model, visualize, interact, and even forecast this vast archive of human society?

Born Out Of The 2014 Ebola Epidemic GDELT's Mass Translation Infrastructure Helped BlueDot Identify 2019's Coronavirus

February 5, 2020

On March 13, 2014, GDELT's global monitoring infrastructure detected the first local reports of what would go on to become the Ebola epidemic of 2014-2016. Unfortunately, they were in French and GDELT's English-only processing at the time never sent an alert. At the time, GDELT monitored global news media in more than 100 languages (which has grown today to more than 150), but the immense computational demands of high quality robust machine translation at the scale of even a fraction of the totality of global news output each day was beyond tractability of the day. Experimental work supported by Google Translate at the time reinforced just how much of global events and narratives were missing from the world's English language press and that to truly understand the world, GDELT must find a way to machine translate everything it monitored around the globe in as many languages as possible.

The end result later that year was GDELT Translingual, a pioneering infrastructure that first introduced the world to the concept of at-scale mass machine translation of the news, a model which GDELT has helped bring to countless industries in the years since. Powered by Translingual's global infrastructure, GDELT today translates absolutely everything it monitors globally in 65 languages, allowing it to surface the most nuanced narratives and subtle indicators about the least expected events.

Fast forward to this past December when the Chinese Coronavirus first emerged and this vision of mass machine translation of the planet made it possible in December 2019 for BlueDot Global to use its machine learning algorithms to identify the earliest reports of the Chinese Coronavirus from GDELT's feeds when it was still just a handful of cases of "viral pneumonia … of unknown cause", with BlueDot sending out an alert nearly a full week before the CDC's and WHO's official warnings.

GDELT's mass machine translation initiative was born out of its inability in 2014 of its event and knowledge graph systems to identify the first French-language domestic reports of the Ebola outbreak that it had monitored due to its inability to process content beyond English.

Fast forward to this past December and the ability of GDELT's immense translation infrastructure that resulted from that outbreak allowed BlueDot Global's machine learning algorithms to flag and send out alerts of the Chinese Coronavirus outbreak almost a week before official sources.

That's a pretty incredible outcome.

sabre

An AI Epidemiologist Sent the First Alerts of the Coronavirus

The BlueDot algorithm scours news reports and airline ticketing data to predict the spread of diseases like those linked to the flu outbreak in China.

www.wired.com

An AI Epidemiologist Sent the First Warnings of the Wuhan Virus
The BlueDot algorithm scours news reports and airline ticketing data to predict the spread of diseases like those linked to the flu outbreak in China.

On January 9, the World Health Organization notified the public of a flu-like outbreak in China: a cluster of pneumonia cases had been reported in Wuhan, possibly from vendors’ exposure to live animals at the Huanan Seafood Market. The US Centers for Disease Control and Prevention had gotten the word out a few days earlier, on January 6. But a Canadian health monitoring platform had beaten them both to the punch, sending word of the outbreak to its customers on December 31.

BlueDot uses an AI-driven algorithm that scours foreign-language news reports, animal and plant disease networks, and official proclamations to give its clients advance warning to avoid danger zones like Wuhan.

Speed matters during an outbreak, and tight-lipped Chinese officials do not have a good track record of sharing information about diseases, air pollution, or natural disasters. But public health officials at WHO and the CDC have to rely on these very same health officials for their own disease monitoring. So maybe an AI can get there faster. “We know that governments may not be relied upon to provide information in a timely fashion,” says Kamran Khan, BlueDot’s founder and CEO. “We can pick up news of possible outbreaks, little murmurs or forums or blogs of indications of some kind of unusual events going on.”

卡城西北 · 最后编辑: 2020-02-26

一万年以后的人对我们的了解，会比我们对一万年前的社会的了解多多少？
看似很显然的答案，其实未必。

sabre

卡城西北说:
一万年以后的人对我们的了解，会比我们对一万年前的社会的了解多多少？
看似很显然的答案，其实未必。

我是通过一个别的地方了解到这个项目的
很多人用他们的数据去分析和预测

一个比较有意思的分析各国媒体对本国和外国事件的正面负面英国给中国媒体打分很高美国给中国和美国打分很低中国给自己打分高给所有其他国打分低前两天看到一个意大利韩国崩溃了的文章印证了这个大数据还是有点意思

卡城西北 · 最后编辑: 2020-02-26

sabre 说:
我是通过一个别的地方了解到这个项目的
很多人用他们的数据去分析和预测

一个比较有意思的分析各国媒体对本国和外国事件的正面负面英国给中国媒体打分很高美国给中国和美国打分很低中国给自己打分高给所有其他国打分低前两天看到一个意大利韩国崩溃了的文章印证了这个大数据还是有点意思

我们现在也能看到太多诸子之间相互埋汰的文章，多的甚至成为我们了解诸子有些思想的主要来源。这是一种干扰，和对真实情况的破坏。
现在的技术比1w年前发达了，图像声音都能保存。先不说数字存储介质也会失效，假设1w年后的人能看到。问题是，大量的信息中99%是没有意义的，比如网络段子，搞笑视频，网友的胡说八道等等。海量无意义的信息，会不会让1w年后的信息处理能力过载？
还有，我们现在还能搞到春秋人的dna，因为那是都土葬。现在越来越大的比例火葬，估计1w年后的考古学家找不到我们的DNA。
许多这类问题，让我觉得，1w年后的人对我们的了解，除了音容笑貌以外，不会比我们对1w年前的人的了解更多。

sabre

卡城西北说:
我们现在也能看到太多诸子之间相互埋汰的文章，多的甚至成为我们了解诸子有些思想的主要来源。这是一种干扰，和对真实情况的破坏。
现在的技术比1w年前发达了，图像声音都能保存。先不说数字存储戒指也会失效，假设1w年后的人能看到。问题是，大量的信息中99%是没有意义的，比如网络段子，搞笑视频，网友的胡说八道等等。海量无意义的信息，会不会让1w年后的信息处理能力过载？
还有，我们现在还能搞到春秋人的dna，因为那是都土葬。现在越来越大的比例火葬，估计1w年后的考古学家找不到我们的DNA。
许多这类问题，让我觉得，1w年后的人对我们的了解，除了音容笑貌以外，不会比我们对1w年前的人的了解更多。

文明跟病毒一样，有生命力的，留下来，
有人说，这个叫社会达尔文，

一键看好帖

交易

帖子

用户

本地社区

加国生活

移民签证

服务黄页

大数据 gdelt: Global Database of Events, Language, and Tone

更多选项

sabre

Data: Querying, Analyzing and Downloading: The GDELT Project

sabre

An AI Epidemiologist Sent the First Alerts of the Coronavirus

卡城西北

sabre

卡城西北

sabre

Similar threads

家园推荐黄页

家园币系统数据