Microblog Cultural Contextualization

Microblog Cultural Contextualization

CLEF 2017 Lab

MC2 CLEF 2017 lab deals with how cultural context of a microblog affects its social impact at large. This involves microblog search, classification, filtering, language recognition, localization, entity extraction, linking open data and summarization.

Regular Lab participants have access to the private massive multilingual microblog stream of The festival galleries project. Festivals have a large presence on social media. The resulting mircroblog stream and related urls is appropriate to experiment advanced social media search and mining methods.


  • CLEF 2017 Microblog Cultural Contextualization overviews in Dublin

    Labs 4, 13:45-15:45, CMC, room 5039

    Content analysis and Microblog Search:

    1. Detailed overview
    2. participant presentations
    3. discussion towards Cultural Image Queries over Social Media.

    Labs 5, 16:45-18:15, CMC, room 5039

    Time Line Illustration:

    1. Detailed overview
    2. evaluation material release
    3. discussion towards dealing with Language Dialects and Varieties in Mining and Search over Cultural Social Media posts.

    View online : CLEF 2017 program

  • Topics released for task 3

    Topics are given in the file clef_mc2_task3_topics.xml

    There are extracted from 4 festival programs (see readme file): Vielles Charrues 2015
    Transmusicales 2015, Avignon 2016, Edinburgh 2016.

  • Topics released for tasks 1 and 2

    Topics have been released for tasks 1 and 2.

    A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to admin@talne.eu.

    The complete stream of 70 000 000 microblogs is available for registered participants.
    An indri Index with a web interface and online API are available to query the whole set of microblogs.

Latest articles

  • CLEF Conference deadlines

    by Eric SanJuan

    Submission of Long Papers: 28 April 2017 Submission of Short Papers: 5 May 2017 Notification of Acceptance: 9 June 2017 Camera Ready Copy due: 23 June 2017

  • TimeLine Illustration based on Microblogs

    by Lorraine, Philippe

    This paper by Nayanika DOGRA, Philippe MULHEM, Nawal OULD AMER, and Lorraine GOEURIOT presents the approach used by the LIG-MRIM research group to the participation of the pilot task TimeLine illustration based on Microblogs for the 2016 CLEF Cultural Microblog (...)

  • Wikipedia XML corpus for summary generation

    by Eric SanJuan

    Wikipedia is under Creative Commons license, and its contents can be used to contextualize tweets or to build complex queries referring to Wikipedia entities.
    We have extracted an average of 10 million XML documents from Wikipedia per year since 2012 in the four main twitter languages:- en, (...)

  • The festival galleries dataset

    by Eric SanJuan

    This data set allows to experiment microblog search and stream summarization.
    Microblog collection
    The document collection is provided to registered participants by ANR GAFES project. It consists in a pool of more than 50M unique micro-blogs from different sources with their meta-information (...)

October 2017 :

Nothing for this month

September 2017 | November 2017