In today's social media, the huge web data distribute among different OSNs (Online Social Network). These data from different sources share a unique overlapped user base, i.e., the individuals who simultaneously get involved in different OSNs for data generation and consumption. Analyzing the cross-OSN data based on overlapped users provides one important way to connect and exploit the isolated social media data islands. To advance the research around this topic, CrossOSN-U is released with the overlapped users' behavioral and social relation data on different OSNs (e.g., Google+, YouTube, Twitter, Flickr, Instagram, Tumblr).
The CrossOSN-U dataset is constructed as follows: (1) The first step is to obtain the userIDs for the same individual (overlapped user) on different OSNs. Third-part social media aggregation tools like About.me and social network sites like Google+ encourage users to disclose their userIDs in other OSNs, from which we collect the overlapped users' cross-OSN userIDs. (2) Respective APIs are then leveraged to crawl the userID's available data on the corresponding OSNs. The current CrossOSN-U consists of several sub-datasets, to enable the exploration of overlapped users' cross-OSN data from different views and towards different applications.
This sub-dataset consists of users' heterogeneous behavioral data (i.e., interacting with objects of different modalities) on Twitter and YouTube. Specifically, the dataset contains user profile and historical video behaviors on YouTube; and user profile, social relation, and historical tweeting data on Twitter. The metadata for all the involved YouTube videos are also included.
Overlapp users' heterogeneous data on Twitter and YouTube.
Note that the original tweeting data are not released due to the Twitter data policy. The Twitter tweeting data is provided as users' topical distribution (modeled by LDA over 39,659 Twitter users). The basic statistics of our dataset is presented in table below:
#YouTube users | #Twitter users |
#Overlapped users |
#Videos |
#Average videos per YouTube user |
#Average friends per Twitter user |
---|---|---|---|---|---|
38,377 | 39,659 | 11,687 | 2,280,129 | 93.60 | 891.1 |
Detailed description of the data format is available at: readme.pdf
Please cite the following papers if this dataset helps your research:
In addition to the cross-OSN heterogeneous behaviors, there also exist cross-OSN homogeneous behaviors, where the interacted objects are from the same modality. The cross-OSN homogeneous behaviors capture significantly different meanings even involved with the same modality of objects, which is one important difference of cross-OSN computing from cross-media computing. This sub-dataset consists of overlapped users' homogeneous behaviors regarding videos on YouTube and Google+.
Overlapp users' homogeneous behaviors regarding video on Google+ and YouTube.
In particular, the video-related behavior of uploading, add-to-playlist, favorite, rating, commenting on YouTube, and that of sharing, commenting on Google+ are collected for the overlapped users. The videos are from a unique video pool on YouTube. The basic statistics of our dataset is presented in table below:
#YouTube users | #Google+ users |
#Overlapped users |
#Videos |
---|---|---|---|
9,560 | 9,728 | 8,492 | 1,620,404 |
The following paper provide example research based on this sub-dataset: quantifying the signficance of cross-OSN homogeneous behaviors in reflecting user interests.
#Overlapped users |
#Average followers per Twitter user |
#Average friends per Twitter user |
#Average friends per Flickr user |
---|---|---|---|
7,118 | 1,808 | 1,032 | 101 |
Detailed description of the data format is available at: readme.pdf
More details about the dataset and relevant analysis can be found at:
This sub-dataset is constructed based on overlapped users' behaviors around common events between Twitter and YouTube.
20 Google trending events in the year 2012 are selected which have wide coverage on both Twitter and YouTube. The events are listed as follows:
We identified the overlapped users who involved in at least one of the selected events. For each event, the number of involved users in one or two OSNs are summarized below:
Overlapp users involved in different events on both Twitter and Flickr.
Totally 8,540 overlapped users between Twitter and YouTube are examined. This sub-dataset contains these users' historical video behaviors on YouTube, and their historical tweeting behaviors on Twitter. The textual metadata for the involved YouTube videos are also included. The basic statistics of our dataset is presented in table below:
#Events |
#Overlapped users |
#Average video behaviors per YouTube user |
#Average tweets per Twitter user |
---|---|---|---|
20 | 8,540 | 82.7 | 998 |
Detailed description of the data format is available at: readme.pdf
The following papers provide example research based on this sub-dataset: examining overlapped users' responses to the same events in different OSNs.