# Notebooks and Publishing Infrastructure [Andreas Wagner](https://www.lhlt.mpg.de/wagner/en) ([@anwagnerdreas](https://twitter.com/anwagnerdreas/status/1450393915778117641)) <https://s.gwdg.de/x217Pd> 2 November 2021 _Workshop "[Digital Publications in the Humanities](https://rdm.mpdl.mpg.de/mpdl-services/workshops/workshop-digital-publications-in-the-humanities/)"_ --- ## What are "notebooks"? ##### Prose + Executable Code - [Jupyter](https://jupyter.org/) (Also, cf. [JupyterLab in the MPS](https://rdm.mpdl.mpg.de/2021/11/01/jupyterlab-in-the-max-planck-society/)) <img src="https://pad.gwdg.de/uploads/14d58ed6-11f7-48ac-a29c-422d9787e19a.png" style="float: right; height: 200px;"></img> - [RMarkdown](https://rmarkdown.rstudio.com/) - [Observable](https://observablehq.com/) - [ERA](https://elifesciences.org/collections/d72819a9/executable-research-articles) (Sponsored by [MPS](https://elifesciences.org/about)) - [ERC](https://o2r.info/pilots/) (DFG funded, general [definition](https://research-compendium.science/)) - [Neo4J Browser Guides](https://neo4j.com/cloud/aura/developer/guide-create-neo4j-browser-guide/) (e.g. [Offshore Leaks](https://offshoreleaks.icij.org/pages/database)) - [ArcGIS StoryMaps](https://www.esri.com/en-us/arcgis/products/arcgis-storymaps/overview) (e.g. [History and the City](https://storymaps.arcgis.com/stories/c54162ae75cd471286793dde45807be9)) ---- #### In the Humanities? Exploration more important than reproducibility? ![](https://pad.gwdg.de/uploads/b2ebc2dc-c7b8-4aab-8018-781f5217e3ee.gif) --- ## How are notebooks published? 1. Generating static output - [NBViewer](https://nbviewer.org/) - [GitHub](https://github.blog/2015-05-07-github-jupyter-notebooks-3/) 2. Running interactive computation environment - [Binder](https://mybinder.org/) (Docker on demand) - [Google Colaboratory](https://colab.research.google.com/) - [Kaggle](https://www.kaggle.com/), [Gradient](https://gradient.run/free-gpu), [and others](https://www.dataschool.io/cloud-services-for-jupyter-notebook/) - [Observable](https://observablehq.com/) (Browser) - Neo4j, ArcGIS, ... - [Streamlit](https://streamlit.io/) ---- ### Journals & Publication Platforms ##### Collections - [Pangeo](https://gallery.pangeo.io/contributing.html) (Jupyter documents) - [NeuroLibre](https://www.neurolibre.com/about/) (Jupyter documents) <br/> ##### Journals - [Copernicus ESSD](https://essd.copernicus.org/articles/) (ERC documents) - [eLife](https://elifesciences.org/about/aims-scope) (ERA documents) <br/> ##### Platforms - [CurveNote](https://curvenote.com/) (Jupyter derivative) - [Stenci.la](https://stenci.la/) (ERA documents) - Brill/MPIWG - Melusina Press/DHd [Software Sustainability Institute Overview](https://www.software.ac.uk/resources/guides/where-can-i-publish-executable-papers) (a bit dated?) ---- ##### Components - Document, Code, Data - Runtime Environment - Language Kernels - Output Renderer, UI <br/> ##### Standards - Source/View Document Format - Interactive Computing Protocol - Environment Description - Integration in Editorial/Publishing/Preservation Infrastructure <!-- ---- ### Challenges - **Environment** → environment.yml (anaconda/binder), pre-run cmds?, nix? <img src="https://pad.gwdg.de/uploads/bffbd0b8-8cf4-4006-8985-1c5a05dd3108.png" style="float: right; height: 260px;"></img> - **Missing functionality/linkages** - See [discussion](https://discourse.jupyter.org/t/feature-idea-jupyterhub-binderhub-jupyter-book-as-a-publishing-platform/8359) on Jupyter forums... - CPU/GPU? - Bandwidth for Big Datasets? --> --- ## Publication aspects What environments are publications working in: - Editorial workflows - Publishing - Hosting - Archiving - Citation, PID <br/> ---- ### Editorial workflows <!-- <img src="https://pad.gwdg.de/uploads/6fb052a5-add1-485d-a99a-c4ae76af4c15.png" style="height: 600px;"></img> --> ![](https://pad.gwdg.de/uploads/a5f7dcaa-0f72-4bd4-b22e-f3ad595255c9.png =x600) ---- ### Hosting <!-- <img src="https://pad.gwdg.de/uploads/d0fe0700-37b4-4326-9d1d-7601951d220e.png" style="float: right; height: 100px;"></img> --> <img src="https://pad.gwdg.de/uploads/580d8147-2748-4162-9b76-5d6b1fc2f139.png" style="height: 220px;"></img> <img src="https://pad.gwdg.de/uploads/d510d8fb-ce6f-40d3-be64-7b3f98f632ef.png" style="float: right; height: 300px;"></img> <img src="https://pad.gwdg.de/uploads/f1fbe6e6-8be5-4f82-98f5-d161383903bc.png" style="height: 200px;"></img> --- ## Summary (1) #### Information to capture/document - H/w requirements (GPU? RAM?) - Libraries - Services/Servers (Elasticsearch, ...) - File/directory manifest - "Start" (and "view") document(s) - Languages and file formats - License, authors etc. ---- ## Summary (2) #### Tooling to implement - Validator - Entry form - Diff'ing, Annotating - Quotas for runtime sessions ---- ## Summary (3) #### Much of this already exists ... <br/> ... in many divergent ways in different places/disciplines. <br/> ⇒ Infrastructure politics challenge? (NFDI?) ---- #### Literature - Nüst, D. (2021). A Web service for executable research compendia ... (Version 2). <https://doi.org/10.5281/zenodo.5108218> - Guizzardi, G. et al. (2021). Announcing the next phase of Executable Research Articles. <https://elifesciences.org/labs/a04d2b80/announcing-the-next-phase-of-executable-research-articles> - Feature Discussion (8359) on Jupyter forums: <https://discourse.jupyter.org/t/feature-idea-jupyterhub-binderhub-jupyter-book-as-a-publishing-platform/8359>... <!-- ## Notes Exploration/Variation more important than reproduction (cf. [Citation file format](https://citation-file-format.github.io), support by platforms, tooling) -->
{"title":"Executable Publications","subtitle":"Electronic notebooks and (humanities) publishing infrastructures","author":"Andreas Wagner","type":"slide","slideOptions":{"transition":"none"},"tags":"Slides, Tools, Notebooks"}