Release Types
When planning a release, your first task is to determine the type of release that is most appropriate for your product. The USGS offers types of releases that may include code: data releases and software releases. Software releases are further divided into provisional and official releases. This page discusses the differences between these types of releases.
tl;dr
Use a software release whenever you write new code. Data releases are for running existing tools. Use an official release; provisional releases cannot be cited or published.
Types of Releases
Software vs. Data
Software releases are the most common for releasing code. In general, use a software release whenever you write new code to conduct a computational routine, scientific analysis, or other procedure. This can include full-fledged software libraries, extensions or modifications to existing libraries, code documenting the steps of a scientific analysis, and scripts used to clean and parse data. Most research code should use a software release. Note that if you develop code, and then use that code to produce a dataset, then you might want to complete both a data and a software release.
By contrast, data releases are intended to document data. They can include code, but the code is limited in scope. Critically, code in a data release documents how an existing software tool was used to produce a dataset. The main purpose of the code is to record the inputs and settings used to run the tool. Data release codes are typically very short - as a rule of thumb, no more than ~10 lines. One exception is code that includes a configuration file indicating the settings used to run the tool; these may be longer than 10 lines if the tool has many options to specify. We reiterate that code in a data release should leverage existing software. If you produced a new dataset, but wrote new code to do so, consider completing both a data and software release.
This guide is focused on software releases and does not discuss data releases in detail. However, if you have decided on a data release, you can read more about the requirements here: Data Releases.
I think I need both?
Don’t panic! This is pretty common. Often you’ll write code (or extend an existing codebase) in order to implement a new scientific routine. You then use that code to produce a dataset and conduct some sort of analysis. If you think your new code is a significant scientific contribution, consider first publishing it in a software release. This helps other scientists build off your work, and will usually save you time in the long run. Then (once the software release is complete), create a data release that includes:
Your new dataset/analysis, and
The code that uses your software release to produce the data.
Software Releases
Software releases are divided into two types: provisional releases, and official releases. Official releases are the most common and are almost always preferred over provisional releases.
Provisional releases releases are for temporary or unfinished code used in an ongoing collaboration. Typically, these releases are used for sharing code in an active project. Critically, provisional releases cannot be cited and are not suitable for use in publications. This includes use as supplementary information, or in an archive linked to by a publication. As such, provisional releases are generally less common for scientific purpopses. When a provisional release is deemed necessary, we recommend that authors upgrade to an official release as the project stabilizes and matures.
Use an official release for semi-stable code, completed code, or code used for a publication. Note that official release code does not need to be fully complete, and it’s common to release code that will undergo future development. As a rule of thumb, consider an official release once your code implements its core intended purpose. You can then update the release as new features are added.
Note
The resources in this guide are written for official software releases.