2 - MicroBlog Search

Organizers: University of Avignon, Derby and London Universities


Given a cultural query about festivals in Arabic, English, French or Spanish, search for the 64th most relevant microblogs in a collection covering 18 months of news about festivals in all languages.


Arabic and English queries are extracted from the Arab Spring Microblog corpus:
Features Extraction To Improve Comparable Tweet Corpora Building by Malek Hajjem and Chiraz Latiri (JADT 2016).

French queries are extracted from the VodKaster Micro Film Reviews:
Contextualisation de messages courts : l’importance des métadonnées by Jean-Valère Cossu, Julien Gaillard, Juan-Manuel Torres-Moreno and Marc El Bèze.

Spanish queries are sentences from the Mexican newspaper La jornada.


A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to admin@talne.eu.


Each individual participant can only submit three runs, so up to 15 runs per team. Submissions will be uploaded on a MySQL server through a web interface.

Expected format for each language are one table per run with five fields:

  1. topic id
  2. microblog rank between 1 and 64
  3. microblog id
  4. microblog author
  5. microblog content

There is an extra constrain: an author should not appear more than 8 times in a topic.

Articles in this section

  • Microlog Data Set

    by Eric SanJuan

    The document collection provided by GAFES project consists a pool of more than 70M unique microblogs from different sources with their meta-information and expanded URLs on a MySQL server. Due to legal terms the access to this database is restricted to registered participants under privacy (...)