MC2 2018 Lab

The goal of the MC2 2018 edition is to develop processing methods and resources to mine the social media sphere surrounding cultural events such as festivals, music, books, movies and museums, Social media posts linked to an event result in a very noisy corpus such as: informal language, out of the language phrases and symbols, hashtags, hyperlinks… The information is also often imprecise, duplicate, or non-informative. The interest of mining such data is to extract relevant, and informative content, as well as to potentially discover new information.

Following the 2016 CMC workshop and the 2017 MC2 lab centered on festivals, this lab will also provide access for registered participants to the microbolg collection of the GAFES project funded by the French National Research Agency and lead by the University of Avignon. However, the lab will be centered on social posts in Arabic latin languages including (but not restricted to) English, French and Spanish. GaFes Data will be extended with VodKaster micro critics. Data is available in XML format, one document per author with all its posts including reposts.


  • CLEF 2017 Microblog Cultural Contextualization overviews in Dublin

    Labs 4, 13:45-15:45, CMC, room 5039

    Content analysis and Microblog Search:

    1. Detailed overview
    2. participant presentations
    3. discussion towards Cultural Image Queries over Social Media.

    Labs 5, 16:45-18:15, CMC, room 5039

    Time Line Illustration:

    1. Detailed overview
    2. evaluation material release
    3. discussion towards dealing with Language Dialects and Varieties in Mining and Search over Cultural Social Media posts.

    View online : CLEF 2017 program

  • Topics released for task 3

    Topics are given in the file clef_mc2_task3_topics.xml

    There are extracted from 4 festival programs (see readme file): Vielles Charrues 2015
    Transmusicales 2015, Avignon 2016, Edinburgh 2016.

  • Topics released for tasks 1 and 2

    Topics have been released for tasks 1 and 2.

    A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to

    The complete stream of 70 000 000 microblogs is available for registered participants.
    An indri Index with a web interface and online API are available to query the whole set of microblogs.

Latest articles

  • CLEF Conference deadlines

    by Eric SanJuan

    Submission of Long Papers: 28 April 2017 Submission of Short Papers: 5 May 2017 Notification of Acceptance: 9 June 2017 Camera Ready Copy due: 23 June 2017

  • TimeLine Illustration based on Microblogs

    by Lorraine, Philippe

    This paper by Nayanika DOGRA, Philippe MULHEM, Nawal OULD AMER, and Lorraine GOEURIOT presents the approach used by the LIG-MRIM research group to the participation of the pilot task TimeLine illustration based on Microblogs for the 2016 CLEF Cultural Microblog (...)

  • Wikipedia XML corpus for summary generation

    by Eric SanJuan

    Wikipedia is under Creative Commons license, and its contents can be used to contextualize tweets or to build complex queries referring to Wikipedia entities.
    We have extracted an average of 10 million XML documents from Wikipedia per year since 2012 in the four main twitter languages:- en, (...)

  • The festival galleries dataset

    by Eric SanJuan

    This data set allows to experiment microblog search and stream summarization.
    Microblog collection
    The document collection is provided to registered participants by ANR GAFES project. It consists in a pool of more than 50M unique micro-blogs from different sources with their meta-information (...)

January 2018 :

Nothing for this month

December 2017 | February 2018