2 - MicroBlog Search

Organizers: University of Avignon, Derby and London Universities

Synopsis

Given a cultural query about festivals in Arabic, English, French or Spanish, search for the 64th most relevant microblogs in a collection covering 18 months of news about festivals in all languages.

Topics

Arabic and English queries are extracted from the Arab Spring Microblog corpus:
Features Extraction To Improve Comparable Tweet Corpora Building by Malek Hajjem and Chiraz Latiri (JADT 2016).

French queries are extracted from the VodKaster Micro Film Reviews:
Contextualisation de messages courts : l’importance des métadonnées by Jean-Valère Cossu, Julien Gaillard, Juan-Manuel Torres-Moreno and Marc El Bèze.

Spanish queries are sentences from the Mexican newspaper La jornada.

Data

A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to admin@talne.eu.

Submission

Each individual participant can only submit three runs, so up to 15 runs per team. Submissions will be uploaded on a MySQL server through a web interface.

Expected format for each language are one table per run with five fields:

  1. topic id
  2. microblog rank between 1 and 64
  3. microblog id
  4. microblog author
  5. microblog content

There is an extra constrain: an author should not appear more than 8 times in a topic.


News

  • CLEF 2017 Microblog Cultural Contextualization overviews in Dublin

    Labs 4, 13:45-15:45, CMC, room 5039

    Content analysis and Microblog Search:

    1. Detailed overview
    2. participant presentations
    3. discussion towards Cultural Image Queries over Social Media.

    Labs 5, 16:45-18:15, CMC, room 5039

    Time Line Illustration:

    1. Detailed overview
    2. evaluation material release
    3. discussion towards dealing with Language Dialects and Varieties in Mining and Search over Cultural Social Media posts.

    View online : CLEF 2017 program

  • Topics released for task 3

    Topics are given in the file clef_mc2_task3_topics.xml

    There are extracted from 4 festival programs (see readme file): Vielles Charrues 2015
    Transmusicales 2015, Avignon 2016, Edinburgh 2016.

  • Topics released for tasks 1 and 2

    Topics have been released for tasks 1 and 2.

    A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to admin@talne.eu.

    The complete stream of 70 000 000 microblogs is available for registered participants.
    An indri Index with a web interface and online API are available to query the whole set of microblogs.

Articles in this section

  • Microlog Data Set

    by Eric SanJuan

    The document collection provided by GAFES project consists a pool of more than 70M unique microblogs from different sources with their meta-information and expanded URLs on a MySQL server. Due to legal terms the access to this database is restricted to registered participants under privacy (...)