Data

The lab gives access for registered participants to a massive collection of microblogs and urls related to cultural festivals in the world.

It allows researchers in IR and NLP to experiment a broad variety of multilingual microblog search techniques (WikiPedia entity search, automatic summarization, language identification, text localization, etc.).

A login is required to acces the data, once registered on CLEF each registered team can obtain up to 5 extra individual logins by writing to eric.sanjuan@univ-avignon.fr or malek.hajjem@univ-avigon.fr


Articles in this section

  • Content Analysis Results: Language identification 2017

    by Malek Hajjem

    Results
    Topics are a random selection of original microblogs posted in June 2016 without external links and with more then 80 characters.
    Submissions and scores for the two best teams can be found here Syllabs and Lia.
    The task paper can be found here (...)

  • Available ressources Clef 2018: detailed description

    by Malek Hajjem

    The festival galleries dataset
    A massive collection of microblogs and urls related to culture festivals are provided for registered participants here . In order to deal with such large dataset we propose different format :
    A CSV format : It is a tab-separated CSV file that could be useful (...)

  • The festival galleries dataset

    by Eric SanJuan

    This data set allows to experiment microblog search and stream summarization.
    Microblog collection
    The document collection is provided to registered participants by ANR GAFES project. It consists in a pool of more than 50M unique micro-blogs from different sources with their meta-information (...)