A cross-domain ontology has been developed for COVID-related topics as discussed in parliamentary data, social media data, and integrated into EOSC as a resource that can make data comparable, interpretable and highly communicative for researchers, as well as journalists, NGOs and citizens.
Challenge
The pandemic has generated a huge variety of research activities, studies and policies across both the life sciences and the social sciences and humanities (SSH). Examples include genomic sequencing, assays of immune response, clinical trials, population health analyses, exploring vaccine hesitancy, investigating the role of social media, public debate and economic analyses of the impact of public policy issues (e.g., lockdown measures, face masking).
Potential insights from combining the data and conclusions from these heterogeneous forms of research are, however, obscured by the lack of interoperable ways to describe them.
For example, in SSH there is the need of supporting the systematic investigation of public attitudes in parliament, social media and social surveys towards COVID-related public health measures across languages and regions.
This challenge requires the integration of various sources of metadata so that COVID-related societal issues can be easily coupled to specific scientific concepts from multiple disciplines and become automatically detectable from parliamentary data and social media data.
An additional challenge is the diversity of the target data, and in particular parliamentary records. These data may seem to be quite similar across Europe and they are widely available because the Right to Information Act mandates the timely release of data by national parliaments. Nevertheless, each country has a different parliamentary system, and each parliamentary office publishes debates in their own way. Formats, metadata, and even debate structures are all different. In their original form, they could not be compared or analysed as a unit, nor could the COVID debates be coupled to data from the health domain in a straightforward way. Semantic interoperability with textual data from social media data and survey data is also not trivial.
Solution
"You have to know what the public thinks to be able to effectively communicate your policy."
Pieter Fivez
With harmonised metadata for COVID-related data, it is possible to couple public political debate with biomedical records as well as data reflecting societal responses and opinions, such as social survey data and social media sources.
To generate a harmonised model for metadata, an ontology of COVID-related topics from parliamentary data and social media has been developed, providing a societally relevant categorisation to which subtopics of a diverse set of scientific fields can be easily linked. This would include e.g., links to high-level concepts in medical ontologies such as the ICD-10 or SNOMED-CT, as well as identifiers of public policy issues such as lockdowns and face masking. (This work was funded in the context of the H2020 project EOSC Future.)
The project also used the harmonised language-independent representation format for parliamentary debates as developed in the context of ParlaMint. (This is an initiative funded by CLARIN ERIC). The data were made available for download and search so that interested parties could now extract the relevant information in a comparable format, and identify suitable topics that link to the COVID-19 pandemic and for which resources are available in the health domain, survey data, and social media data.
With this cross-domain ontology and the harmonisation of data formats across languages the democratic process can be studied more effectively. This facilitates analysing speaker and party statistics – who spoke more and on which topic, who changed their mind, which party defended/opposed which proposals. And it enables the analysis of debate topics – which topics are most popular at what time, how topics change and interrelate? -, and the tracking of time- and context-bound social tendencies. The analysis potential also extends to social media data and social survey data.
Impact
Ontology
The main strength of the developed ontology is that it is concise, transparent and hierarchically organised (close to one hundred concepts in five layers), while allowing for cross-domain referencing and covering the majority of high-level topics related to COVID-19.
It can be used in tandem with its associated pre-trained machine learning models for topic detection in social media and parliamentary data, as part of our provided workflow.
Moreover, its concise and transparent nature allows for easily including specific domains or extending its current functionality.
This ontology will also serve as a clear example of empirical data-oriented development of application-oriented knowledge resources which are focused on cross-domain referencing.
Workflow
A workflow is now available for the automated monitoring of public discourse (social media) and parliamentary discourse (parliamentary data) for societally relevant COVID-related topics. This monitoring workflow, which is currently available for Dutch text data, can be deployed for observing the gap between public opinion and public policy making in a large-scale quantitative way which allows for concrete statistical analysis.
The workflow uses state-of-the-art Artificial Intelligence to detect COVID-related topics along a broad range of scientific disciplines, and cross-reference these topics to domain-specific knowledge resources. This will for instance allow for observing public attitudes towards specific policy measures, which could provide crucial information to policymakers.
The workflow is completely open- source and will be integrated in EOSC via the CLARIN-Flanders repository. To increase visibility and accessibility, the pre-trained machine learning model will also be hosted on Huggingface, the popular platform for large language models.
While this workflow is retroactively applicable to data from the COVID pandemic, it will also serve as a clear example for similar applications in the future, such as upcoming pandemics or other scenarios which require societal crisis management.
Strategy and data availability
The ParlaMint initiative has established a strategy for handling parliamentary data and processing in times of any emergency (COVID-19 is a straightforward example). This will enable the creation of reference corpora with parliamentary records from previous crises where the analysis cuts across disciplines because of the layered interaction between causes and societal responses. Examples: the great economic global recession, periods of floods in Europe, the Ebola outbreak, migration patterns, etc.
Development of standards
The encoding standard adopted for parliamentary data (ParlaCLARIN) will be further developed to cover more detailed and specific metadata across languages and parliaments. The corpora will serve as a baseline for further updates. Uniform updates across the corpora will strongly support various methods of comparative research of public debate.
ParlaMint Tutorial
Pretnar Žagar A., Pahor de Maiti K, Fišer D. (2022). What's on the agenda? Topic modelling parliamentary debates before and during the COVID-19 pandemic. https://sidih.github.io/agenda.html
Applications beyond COVID
The project 'ParlaMint - A Resource for Democracy' set out to analyse public discourse on migration and migrants in more detail. It is looking into parliamentary debates as well as news articles to explore how migration and migrants were referred to during the so-called migration crisis (2015/16) and the advent of COVID-19 (2020) in two countries – Italy and the UK – and show how this may impact public opinion on the topic.
Contributors: Dario Del Fante and Virginia Zorzi
Read more- The project 'Networks of Power - Gender Analysis in European Parliaments' examines different aspects of power in parliamentary discourse in three European countries, with a particular focus on gender distribution in the debates.
Contributors: Jure Skubic, Alexandra Bruncrona, Jan Angermeier, Bojan Evkoski and Larissa Leiminger.
Read more
"ParlaMint is a tool for democracy."
Virginia Zorzi