class: center, middle ## Easy reproducibility with Dockter
#### Code for Science and Society - Commmunity Call #### 7th December 2018
@stencila
Press
P
to switch to presenter mode
--- ### Complex reproducibility Reproducible research can be tough: - packages versions, - system requirements and dependencies, - capturing and curating metadata.  ??? True reproducible research is hard to achieve. Reproducible documents, including Stencila, contain much more than just text. Reproducible and interactive documents require installing the right version of, for example, R or Python, the correct packages (and their versions), any necessary system requirements and so on. --- ### Dockerfile complexity ```Dockerfile FROM ubuntu:18.04 # This section adds system repositories required to install extra system packages. RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 51716619E084DAB9 ... # This section installs system packages required for your project # If you need extra system packages add them here. RUN apt-get update \ && DEBIAN_FRONTEND=noninteractive apt-get install -y \ gdal-bin \ libgdal-dev \ libproj-dev \ r-base \ ... # This section copies package requirement files into the image COPY DESCRIPTION . ... ``` ??? Containers such as Docker provide a solution for packaging together all necessary dependencies for reproducible documents. However, containers themselves can be complex and are yet another skill to learn. Docker containers are generated through user-written Dockerfiles which are not the most straightforward tool to learn. --- ### Dockter simplicity R project requiring two spatial data packages: ```R # main.R library("sp") library("rgdal") ... ``` Tell Dockter to compile the project: ```bash $ dockter compile ``` Dockter generates files which you can learn from / edit: ```bash .environ.jsonld # JSON-LinkedData file of project meta-data .DESCRIPTION # R package description file .Dockerfile # Dockerfile ``` ??? At Stencila we developed a tool called Dockter to help researchers create Docker containers. Dockter generates the Dockerfile directly from the source code used in the research document. You can inspect the generated Dockerfile and edit it manually. --- ### Dockter who? Capturing `schema.org` compliant metadata: ```json-ld "@context": "https://stencila.github.io/schema/context.jsonld", "type": "SoftwareEnvironment", .... "softwareRequirements": [ { "type": "SoftwarePackage", "description": "Provides bindings to the 'Geospatial'... "name": "rgdal", ... "author": [ { "type": "Person", "name": "Roger Bivand", "familyName": [ "Bivand" ], "givenName": [ "Roger" ] }, ... ``` Credit where credit is due! ```bash $ dockter who Roger Bivand (rgdal, sp), Tim Keitt (rgdal), Barry Rowlingson (rgdal), .... ``` ??? we leverage the fact that we generate a JSON-LD document with all the software requirements for a project, including their authors. The who subcommand could be a "standing on the shoulders of giants" list of the authors of the current project (if available) and all of the software packages it relies on. --- ### Next steps - Improving reproducibility and slimming images with Nix: ```sh $ dockter compile --nix ``` - Adding support for Jupyter notebook and JATS project source files ### Help wanted Try it out, give us your feedback! - https://github.com/stencila/dockter - https://community.stenci.la/t/dockter-for-nix/219 ??? Docker images can be quite large. We are working on getting Dockter to generate smaller images using Nix - which is a package manager. Nix not only is smart enough to create smaller images but also has a better way to define the dependencies between packages.