2017 guideline to gat access to the data

2017 Data and Topics release for Content Analysis and Microblog Search

by Eric SanJuan

The MC2@CLEF2017 lab has released a collection of 70 000 000 microblogs over 18 months dealing with cultural events.
Microblogs are in all languages.


This collection is used to test multilingual content analysis
and microblog search

For content analysis topics are in any language and results are expected in four languages: English, Spanish, French and Portuguese

For microblog search topics are in four languages:
Arabic, English, French and Spanish
and results are expected in any language.

Due to legal restrictions, to access the topics, data and APIs powered by Indri,
registration at CLEF 2017 is mandatory:

Up to five personal logins are provided to each team registered for at least one of the MC2 tasks.

About the data:

MC2 CLEF 2017 lab deals with how cultural context of a microblog affects its social impact at large. This involves microblog search, classification, filtering, language recognition, localization, entity extraction, linking open data and summarization. Regular Lab participants have access to the private massive multilingual microblog stream of The festival galleries project. Festivals have a large presence on social media. The resulting mircroblog stream and related urls is appropriate to experiment advanced social media search and mining methods.

About Content analysis:

Given a stream of microblogs, the task consists in:

  • filtering microblogs dealing with festivals;
  • language(s) identification;
  • event localization;
  • author categorization (official account, participant, follower or scam);
  • WikiPedia entity recognition and translation in four target languages: English, Spanish, Portuguese and French.
  • automatic summarization of linked WikiPedia pages in the four target languages.

Each item will be evaluated independently, however, language identification could impact WikiPedia linking and the resulting summaries.

About Microblog Search:

Given a cultural query about festivals in Arabic, English, French or Spanish, search for the 64th most relevant microblogs in a collection covering 18 months of news about festivals in all languages.

About evaluation:

Extensive textual references will be provided by organizers.
Runs will be ranked based on Discounted Cumulative Gain and Informativeness following Patrice Bellot, Véronique Moriceau, Josiane Mothe, Eric SanJuan, Xavier Tannier
: INEX Tweet Contextualization task: Evaluation, results and lesson learned. Inf. Process. Manage. 52(5): 801-819 (2016)

About CLEF 2017 Lab Scheadule:

  • Registration closes: 21 April 2017
  • End Evaluation Cycle: 5 May 2017
  • Submission Participant Papers [CEUR-WS]: 26 May 2017
  • Camera Ready Copy of Participant Papers and Extended Lab Overview [CEUR-WS] due: 3 July 2017
  • Conference: September 11-14 September, 2017

Contact for MC2@CLEF2017 Lab: admin@talne.eu