MC2 2018 Lab

Multilingual Cultural Mining and Retrieval

Home > Tasks 2018 > 1 - Cross Language cultural microblog search

1 - Cross Language cultural microblog search


Given a movie title and microcritics from the French VodKaster Network, the task is to find all relevant microblogs from the MC2 corpus in French, English, Spanish, Portuguese and Arabic. Runs will be evaluated based on the informativity of top ranked microblogs which combines graded relevance socores and diversity.

Task description

Use case

Browsing the VodKaster website allows French readers to get personal short comments called (microcritics) about movies. You can get similar or complementary opinions on twitter but they are less specific to movies and harder to find. The use case is to display to the reader a concise summary of microblogs related to the microcritics he/she is reading, considering bilingual and trilingual users that would read microblogs in other languages then French.


Topics represents a selection from VodKaster microcritics in French mentionning the term festival
Each topics contains:

  • A topic ID
  • A title made of the movie name
  • A narative showing a microcritic about the movie
  • A list nuggets (i.e terms and expressions ) manually setracted from microcritic

Microblog Corpus

The collection of microblogs is provided by the French national research project GAFES about Festivals. It has been collected based on the keyword festival from may 2015 to November 2016. It has been complemented with microblogs about cites like Cannes, Avignon, Lyon, Rennes and Edinburgh. It contains microblogs in all languages. Its usage is restricted to active participants only for research purpose.

A login is required to access the data, once registered to CLEF


Runs will be primarily evaluated on the informativeness following INEX Tweets Contextualisation methodology [1] and using the FRESA [2] software extended to Arabic, French, Portuguese and Spanish. All Fresa metrics will be computed between runs top ranked microblog extracts and a textual reference to be provided by organizers. Following the evaluation process in [1], this reference will be based on both manual and pools runs from participant submissions.

[1] INEX Tweet Contextualization task : Evaluation, results and lesson learned
Patrice Bellot , Véronique Moriceau , Josiane Mothe, Eric SanJuan , Xavier Tanier :- Inf. Process. Manage. 52(5) : 801-819 (2016)


Submitted summeries should be in TREC format in a tabulated file with five fields:

  1. a run ID : (Team name for exemple)
  2. tweet id : a long integer representation of the unique identifier of this Tweet
  3. an integer indicating its position in the summary
  4. a float number as an estimation of its relevance
  5. the main language of the microblog content (fr, en, es, pt or ar)
  6. an extract of the microblog content with the author name if considered as relevant

Runs will be truncated at 50, 150 and 300 words, content will be concatenated and displayed to evaluators that will highlight relevant passages. Therefore, the concatenation of content in the last column should be readable by a human (i.e. this column needs to be readable on its own).


  • Registration closes: 30 April 2018
  • End Evaluation Cycle: 04 June 2018
  • Submission of Participant Papers [CEUR-WS]: 08 June 2018
  • Notification of Acceptance Participant Papers: 15 June 2018
  • Camera Ready Copy of Participant Papers [CEUR-WS]: 29 June
  • September 10-14 2018 CLEF 2018 Conference

Task organizers