Research Data Management

Research data management is an important part of research. It helps organize the research process, ensures its transparency, and increases the impact of research results. Good research data management allows for validation and replication of research, and promotes the reuse of data in new research.

Here you will find frequently asked questions about key aspects of research data management - documenting and organizing data, personal data and ethics issues, creating management plans, and data storage and sharing.

Frequently asked questions

Research data, its storage and organization

What is research data?

Research data is any information that is collected, created, observed or generated in the course of a research project and serves as a basis for obtaining research results and drawing conclusions.

Data that is not directly related to the scientific content or methodology of the research, such as administrative documents, internal or external communications, legal documents or research publicity materials, are not considered research data.

What is a dataset?

A dataset is a structured set of information that consists of collected and analyzed data elements, often organized in tables or other formats. The type and form of datasets vary depending on the field of research. A qualitatively organized dataset also includes documenting and explanatory information that helps to navigate the data set. In the context of repositories, a dataset is understood as research data and its accompanying documentation, deposited or self-archived in an online repository.

What is metadata, a metadata record, why is it important, and where should it be collected?

Metadata is data about data that provides information about the content, structure, origin, format, and other essential aspects of a dataset. It is recommended to collect metadata in a ReadMe file or in a metadata record created in a repository.

A metadata record is a structured description of a dataset that helps identify, understand, and reuse it. It is created during data deposit and is intended to be human and machine readable.

High-quality metadata is essential for other researchers (and computer systems) to be able to find, interpret, and reuse a dataset. The more accurately and completely the metadata describes the dataset, the higher its transparency, reusability, and citation.

What is a ReadMe file and a codebook?

A ReadMe file is a simple text document that accompanies a dataset and provides basic information about its content, structure, origin, processing steps, and usage conditions, so that users can understand and use the data without direct contact with the authors. A codebook, on the other hand, is a detailed explanatory document that describes the meaning, format, value interpretation, and possible codes or classifications used during data collection or processing.

What are good practices for naming files and folders?

File and folder names should be short, understandable, and consistent so that they are easy to read and easy to use in the long term. Detailed information should be included in the documentation or metadata, not in the file name itself. The folder structure should be logical and concise so that the path to a specific file is as short as possible - excessively long paths can cause problems when opening, sharing, or archiving files, as some operating systems limit the total path length.

It is recommended to use only Latin letters (A–Z, a–z), numbers (0–9), hyphens (-), and underscores (_) in names, without using spaces. It is recommended to use the internationally recognized format YYYY-MM-DD for dates (for example, 2025-07-17), while for numbering, use two-digit ordinal numbers (01, 02, 03, etc.) so that the files are arranged in chronological order automatically. It is not recommended to use Latvian diacritical marks (ā, ē, š, etc.), symbols of other languages (æ, ø, å, etc.) or special characters (*, /, , ?, :, ", <, >, |, etc.), as they may not be interpreted correctly in different systems. It is recommended to use only one period in the file name - right before the extension, for example, data_2025-07-17.xlsx.

Personal data and ethics

What is personal data and special categories of personal data?

Personal data is any information that directly or indirectly identifies a natural person - name, surname, email address, health data or IP address. Special categories of personal data (sensitive data) include information about a person's race, ethnic origin, political or religious beliefs, health, sexual orientation, as well as genetic and biometric data. The processing of such data in research requires stricter protection measures, legal justification and explicit consent.

According to the requirements of the GDPR, researchers must ensure the protection of the privacy of this data at all stages of the research data life cycle - from planning to archiving. This includes anonymization or pseudonymization, encrypted storage, restriction of access and clear information to participants about the use of the data.

If personal data, particularly sensitive data, or an interference with the privacy of research participants is expected in the study, it is necessary to obtain an opinion from the Research Ethics Committee. The need for an opinion must be assessed taking into account the specifics of the study and the rules of the institution or funder.

What is the difference between data anonymization and pseudonymization?

Anonymization is the process of completely transforming data so that it is no longer possible to identify a specific person, even with the use of additional information. Pseudonymization, on the other hand, is the process of replacing personal data with fictitious identifiers (pseudonyms), such as unique codes or numbers, while maintaining the ability to restore the original information if necessary to achieve the research goals.

Research data management

What is research data management and why is it important?

Research data management (RDM) is a systematic approach to planning, organizing, and maintaining data throughout its life cycle - from acquisition or creation to documentation, processing, short-term and long-term storage, and sharing. Good data management helps ensure data quality, accessibility, and reusability, as well as compliance with legal, ethical, and institutional requirements and funder conditions.

What is a Research Data Management Plan and why is it needed?

A research Data Management Plan (DMP) is a document that describes how the information and data used in a research project will be collected, organized, processed, stored, protected, and shared. It helps ensure data quality, transparency, and compliance with regulatory requirements, such as the General Data Protection Regulation (GDPR) and ethical requirements, as well as promoting effective collaboration with colleagues and funders, and facilitating data reuse.

What are the FAIR principles?

The FAIR principles are guidelines for managing research data to ensure their maximum usability. They require that data be Findable, Accessible, Interoperable and Reusable. This means that data are described with clear and standardised metadata, stored in trusted repositories with clearly defined access conditions, and prepared in a way that can be understood, processed and used by both humans and computer systems.

The FAIR principles apply to both the data themselves and their metadata, and their aim is to promote the usability of data, but this does not mean mandatory publication - access to data can also be limited if required by legal, ethical or other justified considerations.

What is version control?

Version control means that all changes to files, scripts, or documentation are stored in a transparent manner so that you can go back to previous versions at any time and understand what was changed, when, and why. This helps avoid errors, maintains transparency, and facilitates collaboration within a team, especially when multiple people are working on the same files. To implement this, you can use simple techniques such as file versions with dates and comments, or tools like git or GitHub, which automatically track all changes.

Data storage and sharing

How to store research data securely?

It is recommended to store research data in secure and encrypted environments that provide access control, such as institutionally hosted servers or authorized cloud services (Nextcloud, SharePoint) in the short term and in trusted data repositories (RSU Dataverse, DataverseLV) in the long term. It is also important to regularly create backup copies to prevent data loss and to comply with applicable data protection requirements, including the GDPR.

What are data repositories and how to choose one?

Data repositories are online storage or digital platforms designed for the secure long-term storage, organization, sharing and reuse of research data. They provide access not only to the data itself, but also to the accompanying metadata, promoting adherence to good data management practices.

Research data can be deposited in institutional repositories, such as ZDIS Pure, RSU Dataverse, the national repository DataverseLV, as well as other international or sectoral repositories, such as Zenodo, Figshare or Dryad.

When choosing a repository, it is recommended to make sure of its reliability - is it registered with re3data.org and certified with CoreTrustSeal. Trusted repositories provide long-term availability, DOI assignment, metadata standards and support for compliance with regulatory, ethical and funder requirements.

Before data is deposited, it must be prepared in a structured format and supplemented with high-quality metadata, taking into account the requirements of the specific repository regarding formats, descriptions and access conditions.

What is a DOI?

A DOI (Digital Object Identifier) is a persistent alphanumeric string used to uniquely identify digital objects, such as scientific articles, datasets, or books. It provides long-term and consistent access to a specific object, even if its Internet address (URL) changes. The DOI system facilitates the search, access, and accurate citation of information in the research process.

What should I do if I do not want to share my research data?

The principle of good practice is that data should be as open as possible and as limited as necessary.

Data sharing promotes transparency and reproducibility of research, however, data openness is not mandatory in all cases. If the reluctance to share is based on valid reasons, such as sensitive information, legal or ethical constraints, the FAIR principles allow for limited access, keeping metadata publicly available and specifying the access procedure.

Unjustified refusal to share data is not in line with good research practice, and in many cases, funders, institutions and journals also require minimal data transparency. If data openness is still not possible, they can be deposited with limited access, for example, in RSU Dataverse, as well as using anonymization or encryption, ensuring data security without their full publication.

How long should research data be retained after the project ends?

The length of time research data should be retained depends on the type of study, the requirements of the funder, institutional regulations, and legal requirements. It is generally recommended to retain data for at least three to five years after its last use, but if it has lasting value, such as in clinical trials, the retention period may be ten years or more.

Contacts for consultations on research data management

Research data management - RSU Data Curators Unit, datukuratorirsu[pnkts]lv (datukuratori[at]rsu[dot]lv)
Personal data processing and related issues - Data Security and Management Department, personu[pnkts]datirsu[pnkts]lv (personu[dot]dati[at]rsu[dot]lv)
Quantitative data processing (R, STATA, IBM SPSS, REDCap), statistical processing of quantitative data, qualitative data processing with NVivo - Statistics Training Laboratory, statistikarsu[pnkts]lv (statistika[at]rsu[dot]lv)
Issues related to the adaptation and development of psychometric instruments - Psychology Laboratory, tarrsu[pnkts]lv (tar[at]rsu[dot]lv)
Evaluation of ethical aspects of medical research - Research Ethics Committee, pekrsu[pnkts]lv
Simulation technologies - Medical Education Technology Centre, mitcrsu[pnkts]lv (mitc[at]rsu[dot]lv)
Other research services offered by RSU - https://www.rsu.lv/en/research/research-services

Research Data Management

Data Stewards Unit

Related news