DOKK Library

Making our data FAIR - why should we care?

Authors Margareta Hellström

License CC-BY

Plaintext
Making our data FAIR
- why should we care?
            Margareta Hellström
ICOS Carbon Portal and LU domain specialist for SND
            FAIR data seminar @ LU, 2019-02-05




 This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
Today’s research: survival in the data ocean!
 •   Big Data tsunami
 •   Not enough metadata
 •   Fragmentation of information
 •   Machine inoperability
 •   Human intervention needed
 •   Increasing data & metadata losses
 •   Too many standards & formats
 •   Reproducibility problem




                    This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
        The fate of research data
                               • Big data: Massive datasets produced through large science projects,
                                 government records, social media, large corporations, …
                                           • Long-tail data: Large amounts of small-to-medium size datasets
                                             from very heterogeneous sources
                                                           • Literature limit: Many data files are never published,
                       Organized
                                                             catalogued or even explicitly mentioned in scientific
                         “big data”
                                                             literature. They become “dark data”.
           Data size




                                                    Literature limit
                                                                                        Long-tail data

                                                                 Unpublished and dark data

                                           Number of data sets

A.R- Ferguson et al. (2014), Big data from small data: data-sharing in the 'long tail' of neuroscience.“ Nature neuroscience, 17(11), 1442-1447.
                                          This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
FAIR principles to the rescue!
 • stands for Findable, Accessible, Interoperable, Reusable
 • not a standard, but a set of 15 guiding principles*)
 • aims to free up researchers from “data wrangling”,
   leaving them time to “do science”
 • was coined by FORCE11 in 2014, out of discussions in the
   Life Sciences community
 • has become the new fashion (and Holy Grail!)
 • is increasingly called for by funders & policy makers



*) see the last 4 slides for a list of the principles!


                         This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
What FAIR isn’t
• FAIR is not a standard
• FAIR is not equal to ‘Open’ or ‘Free’
• Data are often Open but not FAIR                          NOT
                                                            FAIR
• Data could be Closed, yet perfectly FAIR
• FAIR is not equal to Linked Data, Semantic Web or RDF
• FAIR is not assuming that only humans can find and re-use data
• FAIR is not for humans only but for machines as well
• Data that are not FAIR are pretty ‘Re-useless’…..


M. Wilkinson et al. (2018): Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles for the European Open Science
Cloud, http://doi.org/10.3233/ISU-170824 and GO-FAIR (2018): FAIR Data Stewardship Awareness Course.


                               This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
        The research data lifecycle
                                                                                                           Correspondingly, data comes in
                                                                                                           several levels, from raw values to
                                                                                                           finalized analysis results




          Research projects can be broken
          down into steps or phases                                                           FAIRness needs to be applied
Diagrams from Z. Zhao (2018) and U. Schwardmann (2016)                                        where it makes sense
                                           This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
What’s in it for me?
 Making your data “FAIR enough” gives you better control of what
 happens to your data by:
 • helping making your data sustainable
 • ensuring your data can be found by others
 • facilitating collection of relevant metadata
 • guaranteeing data can be cited when used
 • enabling collection of data usage statistics
 • supporting creating data management plans
 • simplifying reporting to funders & emplyer
 • streamlining estimations of data curation & archival costs


                 This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
How can my data become FAIR?
• Make a plan for the data before you start a project!
• Collect detailed descriptive information (= metadata)
  throughout
• Use standards and formats common to your discipline
                                                                                                            FAIR
• Store the data in a trusted & sustainable repository or
  data center
• See to that the data gets persistent identifiers (DOIs)
• Apply a suitable usage license
• Provide end users with information on “intended use”
• Make the data “as open as possible, as closed as
  necessary”



                 This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
Learn more!
• “Turning FAIR into reality”, S. Jones ed. (2018), Report from the
  European Commission’s high level expert group on FAIR data,
  http://doi.org/10.2777/1524
• “The FAIR data principles”, FORCE11 (2014),
  https://www.force11.org/group/fairgroup/fairprinciples
• “Cloudy, increasingly FAIR; revisiting the FAIR Data guiding principles
  for the European Open Science Cloud”, B. Mons et al. (2017),
  http://doi.org/10.3233/ISU-170824
• “A design framework and exemplar metrics for FAIRness”, M.
  Wilkinson et al. (2017), bioRxiv preprint,
  http://dx.doi.org/10.1101/225490




                  This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
Thanks for your attention!
  • Comments or suggestions?
  • Angry or happy about FAIR?
  • Want to discuss (environmental & earth
    science) data management?

  • E-mail margareta.hellstrom@nateko.lu.se,
    or come see me at Geocentrum II !




                This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
                 Some extras…




This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
     Motivation behind the FAIR principles
       • The challenge of enabling optimal use of research data and methods
         is a complex one with multiple stakeholders: Researchers,
         Professional data publishers, Funding agencies (private and public),
         and a Data Science community
       • Computational analysis to discover meaningful patterns in massive,
         interlinked datasets is rapidly becoming a routine research activity.
       • Providing machine-readable data as the main substrate for
         Knowledge Discovery and for these eScientific processes to run
         smoothly and sustainably is one of the Grand Challenges of eScience.
       • The FAIR principles were formulated as guidelines & best practices for
         both data producers and data consumers
FORCE11, 2014 https://www.force11.org/fairprinciples

                                    This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
F for Findable
 • F1. (meta)data are assigned a globally unique
  and persistent identifier
 • F2. data are described with rich metadata
  (defined by R1)
 • F3. metadata clearly and explicitly include the
  identifier of the data it describes
 • F4. (meta)data are registered or indexed in a
  searchable resource


 FORCE11, 2014 https://www.force11.org/fairprinciples




                             This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
A for Accessible
 • A1. (meta)data are retrievable by their identifier
   using a standardised communications protocol
 • A1.1 the protocol is open, free, and universally
   implementable
 • A1.2 the protocol allows for an authentication and
   authorization procedure, where necessary
 • A2. metadata are accessible, even when the data
   are no longer available


 FORCE11, 2014 https://www.force11.org/fairprinciples




                             This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
I for Interoperable
 • I1. (meta)data use a formal, accessible, shared,
   and broadly applicable language for knowledge
   representation
 • I2. (meta)data use vocabularies that follow FAIR
   principles
 • I3. (meta)data include qualified references to
   other (meta)data




 FORCE11, 2014 https://www.force11.org/fairprinciples




                             This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.
R for Reusable (and Reproducible)
• R1. meta(data) are richly described with a plurality
  of accurate and relevant attributes
• R1.1. (meta)data are released with a clear and
  accessible data usage license
• R1.2. (meta)data are associated with detailed
  provenance
• R1.3. (meta)data meet domain-relevant community
  standards


FORCE11, 2014 https://www.force11.org/fairprinciples




                            This presentation by M. Hellström is distributed under a Creative Commons CC-BY license.