Skip to main content

Web Content Display Web Content Display

Research data

Research data

The basic rule for sharing research data is that data should be as open as possible and as closed as necessary.

What is research data?

Research data are materials in digital and analog form, observed, collected, processed or produced as part of scientific activities. They are considered by the scientific community to be essential for evaluating the results of scientific research, as well as useful for implementing new research. 

Two types of research data can be distinguished in the research process: 

Secondary research data - is data that is the result of previous research and analysis or comes from source documents. 

Examples: previously published research datasets, published publications, library collections, archives, museum collections, official documents (CSO, etc.), legal acts. 

Primary research data - is data produced in the course of ongoing research or projects. 

Examples: surveys, questionnaires and their analyses, audiovisual materials, photographs, notes, software, results of computer simulations, algorithms, samples, laboratory protocols, methodological descriptions, etc. 

 

Research data management

The cost of data management is not as great as the cost of producing new data.

Adequate management of research data stems from the need for responsible stewardship of public funds. Care must be taken to ensure that these funds are not spent on similar research already funded from public sources.

Data sharing 

Research data produced during publicly funded projects should be made available according to the principle "as open as possible - as closed as necessary". Research data should be made available as soon as the project is completed and/or simultaneously with the publication of research results, e.g. in an article. 

Open research data is collected in data repositories, to which any interested party has free access. 

If there are reasons why research data or parts of it cannot be made available in an open model (legal issues, commercialization of results, the research is a prelude to another project, etc.) there is the possibility of the researcher granting access to the data "on request" or with a periodic grace period.  

Data protection 

If personal or sensitive data is collected in the course of research, it should be pseudonymized or anonymized before sharing.   

Pseudonymization means processing personal data in such a way that it can no longer be attributed to the data subject without the use of a "key." Additional information that allows the data to be reassigned to a specific person must be stored separately with appropriate security measures. Pseudonymized data is still personal data. 

Anonymization is the processing of personal data that makes it impossible to assign data to a specific person. Anonymized data is not personal in nature. 

For more information on data protection, visit the Data Protection Inspector's Office.

Metadata

Metadata is data about data.  

In research data management, both the file metadata that the researcher specifies and the metadata describing the dataset are important - this is the ready-made metadata schema used in the data repository. 

File metadata 

In the course of scientific research, a large number of files in different formats and different versions are usually produced. Proper file management during research facilitates the identification and efficient use of the files collected. 

In file management, it is very important to name files properly When naming files, use only numbers, letters and underscore characters.  Special characters, hyphens, spaces should not be used. Dates should have a uniform format, e.g. DDMMYYYY. If the research assumes the production of a large number of files when using numbering in file names, it is worth starting with 001 instead of 1.  

The file name should contain enough descriptive and contextual information to reflect the contents of the file in a way that can be understood by both the researcher and his colleagues and future users. Avoid giving files overly generic names that can become problematic, for example, when files are relocated.  

It is also important to uniformly label successive file versions. The easiest way to organize data file versions is to use ordinal numbers such as 1, 2 and 3 for major version changes and decimal numbers for minor changes, e.g.: version 1.1. Avoid names such as "final version," "copy 2," etc. 

Metadata describing the dataset 

DublinCore is one of the metadata storage standards used to describe a dataset in data repositories.

Example metadata::

    Title 
    Author/Authors 
    Data description 
    Project summary 
    Keywords 
    Areas according to MEiN and OECD 
    Related publications 
    Linked dataset 
    Data producer 
    Funding (for the project) 
    Data collection period 
    Type of data in the collection

Data licensing

When sharing research data, indicate the licenses under which users can use the data. Sharing under Creative CommonsCreative Commons open licenses is recommended. 

Licenses for research data 

CC0 - Copyright Waiver - transferring the dataset to the public domain, allows users to use the dataset unlimitedly and without any obligations, 

CC BY - Attribution - allows users to copy, modify, distribute and create new works or collections based on the licensed dataset, provided that the authorship of the dataset is marked, allows the use of the dataset for commercial purposes,  

CC BY-NC - Attribution-NonCommercial Use - allows users to copy, modify and distribute a licensed dataset, for non-commercial purposes only, provided that the authorship of the dataset is indicated, 

CC BY-SA - Attribution - Under the Same Conditions - allows users to copy, modify and distribute the dataset as long as they indicate the authorship and share the original and modified data under the same license, 

CC BY-NC-SA - Attribution-Noncommercial Use - Under the Same Terms - allows users to copy, modify and distribute datasets with the proviso that both the original and modified data will be made available under the same license and for non-commercial purposes only, 

CC BY-ND - Attribution - No Derivative Works - allows users to reuse the dataset as long as they mark the authorship. However, the license does not allow modification of the collection. It is not advisable for licensing research data, as it makes further work on the data virtually impossible, 

CC BY-NC-ND - Attribution - Non-Commercial Use - No Derivative Works - is the most restrictive license. It allows users to download a dataset and share it, provided authorship is specified. The collection cannot be modified or used commercially. This is the most restrictive of the licenses. It is not advisable for licensing research data, as it makes further work on the data virtually impossible. 

Database licenses 

PDDL (Public Domain Dedication and License PDDL) - public domain for databases. Assumes unrestricted ability to download, share, and modify databases, 

ODC license (Open Data Commons Attribution License ODC-BY) - allows copying, modifying the database, provided that the authorship of the database is marked, 

Database License ODC (ODbL) - permits the copying, processing and distribution of a database, provided that its authorship is acknowledged and the modified database is disseminated under the same terms and conditions under which the original database was made available.

Licenses for computer programs 

GNU GPL - General Public License - permits running, analyzing, distributing and modifying a program for any purpose. Derivative works (including modified source code) must be made available under this license,

GNU LGPL - Lesser General Public License - allows a program to be run, analyzed, distributed and modified for any purpose. It imposes restrictions known as copyleft only on individual source files. The license obligates you to release only the source code (source files) in its original version with no derivative works.

FAIR principles

In line with the idea of open research data "open to the greatest extent possible, closed only to the extent necessary," FAIR rules were formulated.
The rules also apply to metadata, the data that describes research data in data repositories. 
 
FAIR is an acronym for:  

    Findable - easy to find and search for 
    Accessible    - available to all 
    Interoperable - viable for integration/connection e.g. with other data sets 
    Reusable – allowing multiple use 

 
Findable 

    data are described with metadata (e.g., in data repositories) that make them easy to find by both humans and computer systems 
    the data is provided with a unique identifier, e.g. DOI 
    metadata and research data are indexed in searchable data aggregators (e.g. OpenAIRE) 

 
Accessible 

 research data are findable thanks to metadata and an assigned identifier, e.g. DOI 
data are freely available to the public through common and free computer tools, and if specialized software is required, it is only due to the specifics of the data and the preservation of its quality   
metadata for the data is always available, even if the dataset itself has been moved, deleted, or access to the data has been restricted at the request of the researcher   

 
Interoperable   

the data are stored using publicly available programs, allowing them to be combined or exchanged with other datasets  
a suitable standard for recording metadata (e.g., Dublin Core) and data ensures that they are easily read by both humans and computers 
metadata and datasets contain links to related subsequent versions of these studies, other datasets, or publications 

 
Reusable  

 The data is reusable, which means it can be used again multiple times
 The data must be well described, i.e., its metadata includes information about authorship, where the research was performed, etc. 
 datasets carry licenses under which the data can be reused or processed 

 
The FAIR principles are still evolving recommendations to be applied in the process of sharing, but also protecting research data in open access. Initiatives such as the European Open Science Cloud (EOSC) as well as GO FAIR continue to work on developing FAIR standards. 
 

Other useful links: GO FAIR Principles, FORCE11terms4FAIRskillsHow to be FAIR with your dataFAIRassist

Data Management Plan (DMP)

A proper data management plan created at the beginning of a project saves a lot of time during the collection of research data, as well as during its consolidation at the end of the entire project. 

Instrukcja planu zarządzania danymi (DMP)

What is the DMP?

A PZD Data Management Plan (DMP) facilitates the planning of procedures related to the acquisition, processing, and sharing of research data. It is part of the project proposal and describes what will happen to the data both during and after the project or research. The DMP is referred to as a "living document" that should be updated as changes occur at each stage of the research work. 

According to the principles of FAIR (Findability, Accessibility, Interoperability, Reusability), a data management plan should describe what steps will be taken to make the research data produced easy to find, accessible, linkable to other data through easy reading by both humans and computers, and reusable.  

Preparing a DMP also helps take into account legal issues that may arise during the implementation of the research. It is necessary to identify the owners of copyrights and intellectual property rights to any data acquired and produced. It is also necessary to specify under what licenses the research data will be made available in open access after the project.  

Research data produced in the course of research is the property of the scientific unit where the research was conducted. 

The DMP serves as a support for researchers in conducting research, but it is also an informative document for those responsible for sustainable data management activities at the Jagiellonian University. 

What should a data management plan include? 

Plans may vary depending on the research funding institution. Anyway, they should contain information common to all research activities, in terms of: 

    Data - ways of acquiring and producing new data (primary data) and reusing existing data (secondary data), specifying their type (e.g., experimental or observational data) and format (e.g., .xls, .pdf) and file volume (this information can be modified during the project or after its completion); 
Documentation of the research methodology, how the data is organized (folders, files and their naming);
Open access metadata standard (e.g., Dublin Core); 
 A description of procedures to ensure data quality control - division of responsibilities and activities related to the supervision and control of data accuracy; 
storage and backup during research - security of data and metadata (physical and virtual media, e.g., cloud); 
Ethical requirements and legal issues - how to ensure compliance with regulations on personal and sensitive data and the security of its processing, how to manage other legal issues, such as intellectual property rights or copyrights, the licenses under which the data will be made available in open access (e.g., in a research data repository); 
Data sharing and long-term storage (archiving) - how (e.g., in a repository) and when (e.g., during or after a study) the data will be shared, restrictions on access, reasons for embargoes, ways to select data, where long-term storage will take place, methods or software tools to access and use the data, assignment of an identifier, e.g., DOI, to each dataset; 
 Data management tasks - the selection of a data manager during (e.g., project manager) and after the research (e.g., repository administrators);  
Data management  costs - identification of financial resources for data management in accordance with FAIR principles during and after the project (e.g., additional storage, long term storage, open access) - which will be covered by 2% of the project's indirect costs. 

Obligation to create DMPs 

In Poland, most projects are funded by the National Science Centre, which as of 2019 has made it mandatory to include a research data management plan in the project funding application form (Announcement: "NCN plans for research data management" dated 3.04.2019). 

The NCN website has posted “Guidelines for Applicants for Creating a DMP in a Research Project”.  

The Research Data Management Plan may be subject to changes during the course of the project, like the research plan with which it is associated, without consulting NCN. The final version of the DMP is required at the final report submission stage.

NCN allows for the possibility that some projects will not produce, reuse, or analyze any research data and other similar materials. In such cases, however, justification is required.  

Other research funding organizations, institutions and agencies that require the creation of DMPs also include: 

    European Commission (EC) 
    Ministry of Education and Science (MEiN) 
    National Centre for Research and Development (NCBR). 
    Agency for Medical Research (ABM) 

It is advisable to use the DMP forms available on the Horizon Europe website and on the DMPTool and DMPonline websites. Examples of data management plans can also be found on the website of the British institution specializing in research data management, Digital Curation Centre.

Advantages of DMP 

Decisions made at the beginning of the research will affect later access to research data immediately after the project, but also in the long term. 

A well-designed data management plan has many benefits: 

    assists in the selection of hardware and software,  
    regulates intellectual property rights and ethics, 
    facilitates the selection of data for long-term archiving and for further sharing,  
    helps to  prepare later publications using data recorded and consistently documented throughout the project, 
    affects the increase in citations of both articles and data sets, 
    enables continuity of work if the composition of the project team changes, 
    guarantees access to data in the future, 
    leads to greater cooperation and advanced research,  
    prevents unnecessary duplication, e.g., re-collection or reprocessing of data, 
    allows validation of the results,  
    prevents data loss.

Cracow Open Research Data Repository RODBUK

RODBUK was established as a collaboration of six Kraków universities: Jagiellonian University in Krakow, Stanislaw Staszic Academy of Mining and Metallurgy in Krakow, Tadeusz Kosciuszko Cracow University of Technology, Cracow University of Economics, Pedagogical University of the Commission of National Education in Krakow, Bronislaw Czech University of Physical Education in Krakow.

It is the first repository in Poland to adopt a distributed operating model. Each participating university administers its own instance of the system, and all research data resources are visible to users in a common aggregator https://rodbuk.pl

In support of the open science policy, RODBUK allows academics, postdocs and students conducting research projects to deposit, archive and share data across disciplines and in a variety of formats. Each dataset that is deposited in the repository will be automatically assigned a DOI identifier.

The use of RODBUK for users is free of charge.

NAVOICA educational platform - research data management courses

NAVOICA invites you to take online courses in research data management. The aim of the courses is to provide participants with knowledge of research data management and to develop skills and competencies to put this knowledge into practice, during the implementation of research projects. Successful completion of the course results in a certificate.

NAVOICA is a nationwide educational platform owned by the Ministry of Education and Science and developed by the Information Processing Centre - National Research Institute (OPI PIB). The platform is named after Nawojka, who, according to legend, was the first Polish female student. NAVOICA offers free online courses of the MOOC (Massive Open Online Courses) type by universities and educational institutions. The Polish MOOC project is non-commercial in nature. 

The platform offers:

    a wide range of virtual courses
    a wide range of thematic courses with varying levels of difficulty
    high quality level of training created by experts, lecturers and academics of Polish universities
    the ability to learn anytime, anywhere, at your own pace
    obtaining an electronic certificate, which can be downloaded as a PDF file and printed out

Research data management courses:

urs dla naukowców - poziom podstawowy

Kurs dla naukowców - poziom średnio zaawansowany

Zarządzanie danymi badawczymi dla data stewardów - kurs podstawowy

Zarządzanie danymi badawczymi dla data stewardów - kurs średnio zaawansowany

UJ Research Data Support Team

The UJ Research Data Support Team includes Deputy Director for Digital Resources, UJ Coordinator for Open Access to Scholarly Publications and Research Data Dr. Leszek Szafranski, as well as members of the Digital Collections Department of the Jagiellonian Library Małgorzata Galik, Joanna Konik and Michał Romek. 

The team undertakes a number of activities to support UJ faculty members in research data management. It organizes training sessions where researchers can learn about research data topics. It provides assistance with the revision of Research Data Management Plans (DMPs) and consultation on depositing data in the Cracow Open Research Data Repository RODBUK. . The team cooperates with the Research Support Centre UJ , which coordinates the implementation of research projects of Jagiellonian University community, and with the Data Protection Inspector's Office.

Contact information:

email: l.szafranski@uj.edu.pl
phone: 12 663 3556

email: malgorzata.galik@uj.edu.pl
email: joanna.konik@uj.edu.pl
email: michal.romek@uj.edu.pl
phone: (+48) 12 663 3589