"A codec is a device or computer program for encoding or decoding a digital data stream or signal. Codec is a portmanteau of coder-decoder. - Wikipedia
Encoda provides a collection of codecs for converting between, and composing together, documents in various formats. The aim is not to achieve perfect lossless conversion between alternative document formats; there are already several tools for that. Instead the focus of Encoda is to use existing tools to encode and compose semantic documents in alternative formats.
As far as possible, Encoda piggybacks on top of existing tools for parsing and serializing documents in various formats. It uses extensions to schema.org as the central data model for all documents and for many formats, it simply transforms the data model of the external tool (e.g. Pandoc types, SheetJS spreadsheet model) to that schema ("decoding") and back again ("encoding"). In this sense, you can think of Encoda as a Rosetta Stone with schema.org at it's centre.
⚡ Tip: If a codec for your favorite format is missing below, see if there is already an issue for it and 👍 or comment. If there is no issue regarding the converter you need, feel free to create one.
Format | Codec | Powered by | Status |
---|---|---|---|
Text | |||
Plain text | txt | toString |
✔ |
Markdown | md | Remark | ✔ |
LaTex | latex | Pandoc | α |
Microsoft Word | docx | Pandoc | β |
Google Docs | gdoc | JSON |
β |
Open Document Text | odt | Pandoc | α |
HTML | html | jsdom, hyperscript | ✔ |
JATS XML | jats | xml-js | ✔ |
jats-pandoc | Pandoc | β | |
Portable Document Format | pdf-lib, Puppeteer | β | |
Math | |||
TeX | tex | mathconverter | ✔ |
MathML | mathml | MathJax | ✔ |
Visualization | |||
Plotly | plotly | Plotly.js | ✔ |
Vega / Vega-Lite | vega | Vega | ✔ |
Bibliographic | |||
Citation Style Language JSON | csl | Citation.js | ✔ |
BibTeX | bib | Citation.js | ✔ |
Notebooks | |||
Jupyter | ipynb | JSON |
✔ |
RMarkdown | xmd | Remark | ✔ |
Spreadsheets | |||
Microsoft Excel | xlsx | SheetJS | β |
Open Document Spreadsheet | ods | SheetJS | β |
Tabular data | |||
CSV | csv | SheetJS | β |
Tabular Data Package | tdp | datapackage-js | α |
Collections | |||
Filesystem Directory | dir | fs |
β |
Data interchange, other | |||
JSON | json | JSON |
✔ |
JSON-LD | jsonld | jsonld.js | ✔ |
JSON5 | json5 | json5 | ✔ |
YAML | yaml | js-yaml | ✔ |
Pandoc | pandoc | Pandoc | ✔ |
Reproducible PNG | rpng | Puppeteer | ✔ |
XML | xml | xml-js | ✔ |
Key
Several of the codecs in Encoda, deal with fetching content from a particular publisher. For example, to get an eLife article and read it in Markdown:
stencila convert https://elifesciences.org/articles/45187v2 ye-et-al-2019.md
Some of these publisher codecs deal with meta data. e.g.
stencila convert "Watson and Crick 1953" - --from crossref --to yaml
type: Article
title: Genetical Implications of the Structure of Deoxyribonucleic Acid
authors:
- familyNames:
- WATSON
givenNames:
- J. D.
type: Person
- familyNames:
- CRICK
givenNames:
- F. H. C.
type: Person
datePublished: '1953,5'
isPartOf:
issueNumber: '4361'
isPartOf:
volumeNumber: '171'
isPartOf:
title: Nature
type: Periodical
type: PublicationVolume
type: PublicationIssue
Source | Codec | Base codec/s | Status | Coverage |
---|---|---|---|---|
General | ||||
HTTP | http | Based on Content-Type or extension |
β | ![][http-cov] |
Person |
||||
ORCID | orcid | jsonld |
β | ![][orcid-cov] |
Article metadata |
||||
DOI | doi | csl |
β | ![][doi-cov] |
Crossref | crossref | jsonld |
β | ![][crossref-cov] |
Article content |
||||
eLife | elife | jats |
β | ![][elife-cov] |
PLoS | plos | jats |
β | ![][plos-cov] |
The easiest way to use Encoda is to install the stencila
command line tool. Encoda powers stencila convert
, and other commands, in that CLI. However, the version of Encoda in stencila
, can lag behind the version in this repo. So if you want the latest functionality, install Encoda as a Node.js package:
npm install @stencila/encoda --global
Encoda is intended to be used primarily as a library for other applications. However, it comes with a simple command line script which allows you to use the convert
function directly.
encoda convert notebook.ipynb notebook.docx
Encoda will determine the input and output formats based on the file extensions. You can override these using the --from
and --to
options. e.g.
encoda convert notebook.ipynb notebook.xml --to jats
You can also convert to more than one file / format (in this case the --to
argument only applies to the first output file) e.g.
encoda convert report.docx report.Rmd report.html report.jats
You can decode an entire directory into a Collection
. Encoda will traverse the directory, including subdirectories, decoding each file matching your glob pattern. You can then encode the Collection
using the dir
codec into a tree of HTML files e.g.
encoda convert myproject myproject-published --to dir --pattern '**/*.{rmd, csv}'
You can also read content from the first argument. In that case, you'll need to specifying the --from
format e.g.
encoda convert "{type: 'Paragraph', content: ['Hello world!']}" --from json5 paragraph.md
You can send output to the console by using -
as the second argument and specifying the --to
format e.g.
encoda convert paragraph.md - --to yaml
Option | Description |
---|---|
--from |
The format of the input content e.g. --from md |
--to |
The format for the output content e.g. --to html |
--theme |
The theme for the output (only applies to HTML, PDF and RPNG output) e.g. --theme eLife . Either a Thema theme name or a path/URL to a directory containing a styles.css and a index.js file. |
--standalone |
Generate a standalone document, not a fragment (default true ) |
--bundle |
Bundle all assets (e.g images, CSS and JS) into the document (default false ) |
--debug |
Print debugging information |
Encoda exposes the decode
and encode
methods of the Executa API. Register Encoda so that it can be discovered by other executors on your machine,
npm run register
You can then use Encoda as a plugin for Executa that provides additional format conversion capabilities. For example, you can use the query
REPL on a Markdown document:
npx executa query CHANGELOG.md --repl
You can then use the REPL to explore the structure of the document and do things like create summary documents from it. For example, lets say from some reason we wanted to create a short JATS XML file with the five most recent releases of this package:
jmp > %format jats
jmp > %dest latest-releases.jats.xml
jmp > {type: 'Article', content: content[? type==`Heading` && depth==`1`] | [1:5]}
Which creates the latest-major-releases.jats.xml
file:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1 20151215//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<title-group>
<article-title/>
</title-group>
<contrib-group/>
</front>
<body>
<sec>
<title>
<ext-link ext-link-type="uri" xlink:href="https://github.com/stencila/encoda/compare/v0.79.0...v0.80.0">0.80.0</ext-link> (2019-09-30)
</title>
</sec>
...
You can query a document in any format supported by Encoda. As another example, lets' fetch a CSV file from Github and get the names of it's columns:
npx executa query https://gist.githubusercontent.com/jncraton/68beb88e6027d9321373/raw/381dcf8c0d4534d420d2488b9c60b1204c9f4363/starwars.csv --repl
🛈 INFO encoda:http Fetching "https://gist.githubusercontent.com/jncraton/68beb88e6027d9321373/raw/381dcf8c0d4534d420d2488b9c60b1204c9f4363/starwars.csv"
jmp > columns[].name
[
'SetID',
'Number',
'Variant',
'Theme',
'Subtheme',
'Year',
'Name',
'Minifigs',
'Pieces',
'UKPrice',
'USPrice',
'CAPrice',
'EUPrice',
'ImageURL',
'Owned',
'Wanted',
'QtyOwned',
]
jmp >
See the %help
REPL command for more examples.
Note: If you have executa
installed globally, then the npx
prefix above is not necessary.
Self-hoisted (documentation converted from various formats to html) and API documentation (generated from source code) is available at: https://stencila.github.io/encoda.
Check how to contribute back to the project. All PRs are most welcome! Thank you!
Clone the repository and install a development environment:
git clone https://github.com/stencila/encoda.git
cd encoda
npm install
You can manually test conversion using current TypeScript src
using:
npm start -- convert simple.md simple.html
That can be slow because the TypeScript has to be compiled on the fly (using ts-node
). Alternatively, compile the TypeScript to JavaScript first, and then run node
on the dist
folder:
npm run build:dist
node dist convert simple.md simple.html
If you are using VSCode, you can use the Auto Attach feature to attach to the CLI when running the debug
NPM script:
npm run debug -- convert simple.gdoc simple.ipynb
Run the test suite using:
npm test
Or, run a single test file e.g.
npx jest tests/xlsx.test.ts --watch
To display debug logs during testing set the environment variable DEBUG=1
, e.g.
DEBUG=1 npm test
To get coverage statistics:
npm run cover
There's also a Makefile
if you prefer to run tasks that way e.g.
make lint cover
You can also test this package using with a Docker container:
npm run test:docker
As far as possible, tests should be able to run with no network access. We use Nock Back to record and play back network requests and responses. Use the nockRecord
helper function for this with the convention of starting the fixture file with nock-record-
e.g.
const stopRecording = await nockRecord('nock-record-<name-of-test>.json')
// Do some things that connect to the interwebs
stopRecording()
Note that the HTTP fetcher implements caching so that you may need to remove the cache for the recording of fixtures to work e.g. rm -rf /tmp/stencila/encoda/cache/
.
If there are changes in the URLs that your test fetches, or you want to check that your test is still works against an external API that may have changed, remove the Nock recording and rerun the test e.g.,
rm src/codecs/elife/__fixtures__/nock-record-*.json
npx jest src/codecs/elife/ --testTimeout 30000
We 💕 contributions! All contributions: ideas 🤔, examples 💡, bug reports 🐛, documentation 📖, code 💻, questions 💬. See CONTRIBUTING.md for more on where to start. You can also provide your feedback on the Community Forum and Gitter channel.
Aleksandra Pawlik 💻 📖 🐛 |
Nokome Bentley 💻 📖 🐛 |
Jacqueline 📖 🎨 |
Hamish Mackenzie 💻 📖 |
Alex Ketch 💻 📖 🎨 |
Ben Shaw 💻 🐛 |
Phil Neff 🐛 |
Raniere Silva 📖 |
Lorenzo Cangiano 🐛 |
FAtherden-eLife 🐛 🎨 |
Giorgio Sironi 👀 |
To add youself, or someone else, to the above list, either,
Ask the @all-contributors bot to do it for you by commenting on an issue or PR like this:
@all-contributors please add
Generated using TypeDoc