Findable #1: (Meta)data are assigned with globally unique and persistent identifiers
One of the big problems with data concerns the ability to cite your own and other people’s data, and keep pointing to its exact location in cyberspace, because the location might change. This can be solved by using persistent identifiers. They work like a big index or registry where you assign a unique key (the identifier) to each data set. If someone tries to follow the identifier (often referred to as “resolving a persistent identifier”), the resolver will point to the correct web address (URL). If the URL changes – i.e. if data is moved –the one who made the key is responsible for providing the new location to the resolver. In this way, you do not end up in blind alleys of "page not found". That is why it is called persistent.
A DOI (Document Object Identifier) is an example of such an identifier and looks like this; 10.1234/abba (prefix/suffix). This can be resolved and points you to the URL. Most data repositories can issue and maintain DOIs or other persistent identifiers that you can use. Persistent identifiers usually contain some basic descriptive metadata such as title and author.
Other persistent identifiers are used to prevent ambiguity, e.g. giving a person a number instead of a name. This solves the problem of distinguishing Sam Smith from Sam Smith - yes, there is more than one person with that name! ORCID is an example of a service, where a person is assigned a unique code for further reference, e.g. '0000-0002-1825-0097' that will point to one - and only one - person.
Findable #2: Data is described with rich metadata
When humans and machines look for data, metadata are often the first point of contact, as they are usually indexed in search engines etc. It is often the metadata that determine whether the data set they describe is perceived as relevant or not for a given usage scenario.
If you asked a human being, the same query words they would use to find a data set should be available in the metadata. This is metadata about the context and/or prerequisites for the data set, quality issues etc., as well as a number of discipline-specific data, e.g. sample size, equipment etc. This also includes details about the data set that may not be important to you, but could somehow be used to make your data findable outside your own discipline. So, try to think outside the box when adding metadata to your data.
Findable #3: Metadata clearly and explicitly include the identifier of the data they describe
Metadata and data are two separate things and should be treated as such. They can have a life of their own. An example is metadata that are harvested for big indexes that do not hold or index the contents of the data files. Just like article metadata that are fetched for indexing in commercial and non-commercial search engines without them looking at the content of the article itself. If the metadata do not include reference to the data they describe, it is doubtful - especially for a machine - that the data described by the metadata will ever be found.
Findable #4: (Meta)data are registered or indexed in a searchable resource
Traversing the entire internet for (research) data sets is neither feasible nor doable. And it leaves too much room for serendipity, which can be nice in some cases, but not desirable when making structured searches for data sets. Making research data available on project websites etc. usually adds to the risk of being found by coincidence only. Repositories are websites – most often – and represent a common way for building structured indexes of metadata and data sets that are uploaded to the repository. The indexes often adhere to a specific way of describing the data using a common standard. This will allow both repository and other search engines to harvest and index these registries, often aggregating them to larger indexes that eventually can be cross-searched.
Repositories come in many shapes and forms. Some are generic repositories that will take almost any data set, while others are targeted towards specific disciplines or research data types. Repositories are usually owned and operated by institutions, research communities, or private companies. The question of where exactly to deposit your data is a matter of determining the best repository for your specific data set, thereby maximizing its findability and potential. This is often evaluated on a case-to-case basis.
Go to the webpage for A FAIRy tale for more information about the FAIR principles.
Based on 'A FAIRy tale' CC-BY-SA 4.0 ‘DK Fair på tværs’.