HyperScience

First Reproducible Research Paper

I have been playing around with reproducible research in org-mode. As an example for students, I have produced a paper written entirely in org-mode and containing all the required calculations within the document itself. The diagram was made using DITAA and the values were calculated and plotted using calls to elisp and python routines. The paper is formatted as if if were a paper in the Springer Journal Shock Waves, as I wanted to demonstrate the ease of using org-mode with a \(\LaTeX\) style for journal papers. It should be possible, though not necessarily straightforward, to change the style to suit whatever form of journal paper was required. I just chose this one because I like the Springer journal format and fonts.

The paper is on the eternal problem of whether your tea will be cooler after 10 minutes if you mix the milk and the tea immediately and then waiting 10 minutes or by waiting and then adding the tea. You’ll have to read the paper to find out the answer! This is a simple enough problem that allows for a demonstration of how equations and figures are generated and presented using the org-mode markup features.

It was an interesting experience to write this paper. It took me around 2 days of full-time work to get it up and running, but now that I’ve done it, the process should be much quicker for an actual paper. It was not always straightforward to get working either: I found the referencing of figures and tables was hit-and-miss. I’m not sure whether this is normal, or something to do with my configuration, but I often found myself looking at and compiling the .tex files produced by org to determine where my labels ended up and why they were sometimes not found. But eventually I did get it working.

My knowledge of python was not really good enough to allow me to do much in the way of calculating and plotting data, and I was not able to work out how to call a python routine to put numbers in a table. I therefore decided to do most of the table calculations in elisp, where the interface to the org-table is quite seamless. Although I find mathematics in lisp a little awkward, I was able to get it all working with a minimum of fuss, and elisp is pretty easy to debug in emacs.

It’s certainly very neat to be able to populate the text with computed numbers that can change whenever the input parameters for the paper change. And having the plots automatically update when the data is changed is also wonderful. For me, this is the way to properly write a paper, even though there is certainly more groundwork that needs to be done to get the paper written. It may not also work so well for papers where there is a lot of computational work, or where commercial gui-based software is used. But most of my papers contain only small calculations using scripting languages called from the command-line, and org-mode is perfect for that workflow.

An annoyance that I was not expecting is that for the Springer journal file, the abstract occurs in the preamble, so I could not just include a bunch of #+LATEX_HEADER: commands. Instead I needed to use a \input command to include the \(\LaTeX\) within the document.

My plan is for this paper to form a template for a document on how to set up emacs and org-mode from scratch in a new linux distribution, so any student gets a head-start in how to make a reproducible research paper. I could have added more bells and whistles, but I deliberately chose a minimal useful set to not cause unnecessary confusion.

Here is the pdf of the paper

Here is the org-mode file

Here is the bibliography file

Here is the \(\LaTeX\) header file

Because of the wordpress limits on file extensions I had to change all but the pdf file to a .txt extension.

Sometime soon I plan to write the installation from scratch document that allows one to go from a new installation of ubuntu to being able to produce this document.

Org-babel for J

As part of my emacs org-mode work flow, I have been using org-babel for a while. This allows you to insert code blocks into org buffers and have those blocks be executed when your file is compiled. This is a really handy method for doing reproduceable research. For example, you can call the source code function in R to do the statistical calculations for data in a table. If the data in the table changes, so will the calculation of the output data change. This prevents the perennial problem of having data in one file (typically a spreadsheet) and not knowing whether the document you generated for a paper used the 12th of September or the 15th of September version of the spreadsheet. By having explicit links to data and to the algorithm that manipulates that data, you can explicitly record the calculations you used to produce your data. And so can anyone else if they want to. This is very important for producing believeable data.

Org-babel is built into emacs org-mode, and supports an amazing array of programming languages, from compiled languages like C to interpreted languages like python or MATLAB, to specialised scripting languages like awk or gnuplot.

The best feature for me is that org-mode can read from or write to org tables, allowing a seamless integration between code and document. However, this capability differs between programming languages. Some languages, like python and common lisp, seem to be very well catered for in this regard. However my favourite programming language, J, is rather less well catered for. In particular, there does not seem to be a built-in way to pass variables to and from the code block. Instead, you can run your code as if it were a script, and the source block will provide the last calculated value as an output. For example,

#+BEGIN_SRC j :exports both
 NB. The square root of the sum of the squares of the numbers
 NB. between 1 and 10
	    [a =: %: +/ *: 1 + i.10
#+END_SRC
#+RESULTS:
: 19.6214

The output, as stated by the comment, produces the Euclidean norm of the integers between 1 and 10 inclusive, which is 19.6214, and displays it as the result from evaluating the source block. However, for other programming languages one could supply a variable argument using the :var command in the header, to pass a variable argument to a function. So, for example, the 10 in the example above could be replaced by each of the values in the column of a table.

Like most things in emacs, the code for executing commands in code blocks is available as elisp. So, in theory, it should be possible to modify the existing elisp export code to pass variables, including rows and columns in tables, to a J function. At the moment though, my understanding of elisp is not sufficiently good to be able to work out how to do this, but it sounds like a very useful thing to do, and necessary if J is to be seriously used from within org-mode. If anyone has managed to do this, I’d be very interested to know how it’s done. If not, I’ll need to learn some more elisp and try to reverse engineer how it’s already been done for MATLAB code to see if I can do something equivalent for J.

Oh, and happy new year for 2021. I wanted to get one more blog entry done before the end of 2020, as an old-year’s resolution…

Hello, Org

This is a test blog post for writing wordpress blog items using org-mode. I’m hoping this will let me populate my blog in a more seamless manner than editing it via the WordPress web interface.

Introduction

When Ι started this blog, I got an account with an ISP and installed WordPress using some kind of automatic template. The good thing about this is that I could start a blog without knowing what I was doing. The unfortunate part about it is that when I need to configure the blog, I have no idea whatsoever about how things are set up. Also, I would need to log in to the WordPress site and use their editor to build the pages.

But what I really want to do is use org-mode directly to upload posts to my blog. As I do most of my work documentation it would be awesome to convert some of that to my web site. Fortunately some awesome person has already done the work to make this possible. It’s called Org2Blog, created by Puneeth Chaganti, and currently maintained by Grant Rettke. It’s an org-mode package that allows you to edit posts using org-mode markup and extensions. You can edit and upload the post without leaving emacs, which is great!

The remainder of this post contains some fairly useless examples of this packages capabilities.

Examples

Figure 1 is an example of html import for images. One of the nice things about org-mode is you can specify different output parameters for images in pdf and html export formats. This can be handy for making things work in HTML.

avatar image

Here is an equation, in case you ever need to know the solution to a quadratic equation:

\[ x = -b \pm \frac{\sqrt{b^2-4 a c}}{2a} \]

  • Here are some points
    • \(\LaTeX\) is great to put inline into your blog, like \(\sum_{i=0}^n i^2 = \frac{(n^2+n)(2n+1)}{6}\)
    • Centered equations use two dollar signs, as opposed to inline equations like that above, which use only one

    \[\sum_{i=0}^n i^2 = \frac{(n^2+n)(2n+1)}{6}\]

I have not yet done much to see what can be done with org-mode and wordpress. There are several things I don’t know how to do. For example, to center the image above I had to set margins, and if I were to caption it, the caption would be left aligned. I also don’t know how to number equations. References would be the next interesting thing to try hruschka2010two. Well, it seems that citations from a bibtex file will export to WordPress, but the citation style, based on the key, does not seem to change (ie I can’t get numbered or superscript citations). Nonetheless, I’m declaring the export sufficiently capable to be useful for most blog posting purposes.

Bibliography

  • [hruschka2010two] Hruschka, O’Byrne & Kleine, Two-component Doppler-shift fluorescence velocimetry applied to a generic planetary entry probe model, Experiments in Fluids, 48(6), 1109-1120 (2010).

Research notes

One of the problems I have been engaged with for many years now is the idea of how to keep a consistent record of my research that I can come back to years later and pick up where I left off.

Having thought a lot about this, I have some important criteria for such a record.  Lab notes must be

  1. Future-proof (not dependent on proprietary software);
  2. Based fundamentally on ASCII text;
  3. Flexible enough to cope with a variety of experiment structures (one-off experiments, repetitive experiments, numerical experiments etc);
  4. Easy to input and maintain;
  5. Able to time-stamp the state of an experiment at the time it was performed;
  6. Able to incorporate the source code used to analyse the data;
  7. Able to facilitate collaboration with my group members and external colleagues;
  8. Easy to transform into outputs like papers and presentations;

I have tried many different notetaking systems to achieve this, including commercial systems like LabArchives, Microsoft OneNote, Microsoft SharePoint, My own TikiWiki webpage, Zim desktop wiki, Tiddlywiki and others I have forgotten, but none of them could give me what I need according to the above 8 points.  I believe I now have a system that works the way I need it to for lab notes, though I know it is not for everyone: emacs org-mode.  I hope to write up a little more about org-mode in the future, and why I think it’s a great way to produce consistently high-quality, self-documenting research outputs.