Mining twitter data with resource constraints

George Valkanas, Ioannis Katakis, Dimitrios Gunopulos, Antony Stefanidis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Social media analysis constitutes a scientific field that is rapidly gaining ground due to its numerous research challenges and practical applications, as well as the unprecedented availability of data in real time. Several of these applications have significant social and economical impact, such as journalism, crisis management, advertising, etc. However, two issues regarding these applications have to be confronted. The first one is the financial cost. Despite the abundance of information, it typically comes at a premium price, and only a fraction is provided free of charge. For example, Twitter, a predominant social media online service, grants researchers and practitioners free access to only a small proportion (1[%]) of its publicly available stream. The second issue is the computational cost. Even when the full stream is available, off the shelf approaches are unable to operate in such settings due to the real-time computational demands. Consequently, real world applications as well as research efforts that exploit such information are limited to utilizing only a subset of the available data. In this paper, we are interested in evaluating the extent to which analytical processes are affected by the aforementioned limitation. In particular, we apply a plethora of analysis processes on two subsets of Twitter public data, obtained through the service's sampling API's. The first one is the default 1[%] sample, whereas the second is the Garden hose sample that our research group has access to, returning 10[%] of all public data. We extensively evaluate their relative performance in numerous scenarios.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014
EditorsDominik Slezak, Hung Son Nguyen, Eugene Santos, Marek Reformat
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages103-109
Number of pages7
Volume1
ISBN (Electronic)9781479941438
DOIs
Publication statusPublished - 16 Oct 2014
Externally publishedYes
Event2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014 - Warsaw, Poland
Duration: 11 Aug 201414 Aug 2014

Conference

Conference2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014
Country/TerritoryPoland
CityWarsaw
Period11/08/1414/08/14

Fingerprint

Dive into the research topics of 'Mining twitter data with resource constraints'. Together they form a unique fingerprint.

Cite this