Abstract
With the explosion of Internet bandwidth, there are more and more social media sites (e.g., Flickr, YouTube, Facebook, and Google News) for people to capture and share social media data online. As a result, a popular event that is happening around us and around the world can spread very fast, and there are substantial amounts of events with multi-modality (e.g., images, videos, and texts) in Internet. Most of these social events from different news medias are related with some specific topics, and it is time-consuming to manually identify or cluster them. Cross-collection social event analysis can discover collective and subjective information from the vast amounts of multiple cross-collection sources in social news medias, and the mining results can be helpful for many applications such as social event detection, social event tracking and social event prediction.
Dataset
To facilitate more research on social event analysis, here we introduce a cross-collection social event dataset, created by Institute of Automation, Chinese Academy of Sciences. The evaluation dataset is constructed from online social news media sources. These websites are all in English and cover a long period including rich relations text metadata and image metadata about the hot social event “Arab spring”. We have collected eleven countries information including Algeria, Bahrain, Egypt, Iraq, Jordan, Lebanon, Libya, Saudi, Syria, Tunisia, Yemen . Note that the data are given in the form of data table of mysql.
Data Collection
In order to capture the hot topic information from the newspaper documents. We crawled news published in the websites of New York Times, Sputnik, and Hurriyet Daily News, which are important news agencies in U.S., Russia, and Turkey, respectively. Totally, we collect 40,532 new documents from March 2011 to December 2015. Then the rich textual metadata and image metadata are captured via their APIs.
The basic statistics of our dataset is presented in table below:
Country/Num | Algeria | Bahrain | Egypt | Iraq | Jordan | Lebanon | Libya | Saudi | Syria | Tunisia | Yemen |
---|---|---|---|---|---|---|---|---|---|---|---|
Nytimes | 127 | 324 | 2080 | 1696 | 381 | 265 | 1515 | 592 | 3557 | 342 | 766 |
Sputnik | 82 | 181 | 1506 | 2167 | 253 | 272 | 1380 | 1505 | 9330 | 224 | 1195 |
Daily News | 78 | 143 | 1629 | 1925 | 157 | 277 | 744 | 485 | 5738 | 278 | 415 |
Total | 287 | 648 | 5215 | 4831 | 791 | 694 | 3639 | 2582 | 18625 | 844 | 2376 |
Downloads
News media documents of the hot social event “Arab spring” in eleven countries
Citation
Multi-modal Multi-view Topic-opinion Mining for Social Event Analysis