Automating workflows
Writing an empirical paper involves—next to the actual writing—reading in data, analyzing it, producing results, and finally presenting them using tables and figures.
When starting a Ph.D., one typically imagines producing tables by means of lots of copy-pasting. But actually, I strongly advise you not to do that and instead to use built-in commands or add ons that allow you to produce LaTeX (or LyX) tables. There are at least two good reasons for this. First, it’ll save you time, fairly soon, maybe already when you put together the first draft of your paper. But at least when you do the first revision of that draft. The reason is that you will produce similar tables over and over again, because you will change your specification, the selection of your sample, or something else. And you will do robustness checks. The second reason why one wants to automate the creation of tables is that it will help you make less mistakes, which can come about when you paste results in the wrong cells or when you accidentally put too many or too few stars denoting significance next to the coefficient estimates.
Here’s an example of one way to do it in Stata and LaTeX (I usually use Stata for organizing my data, matching data sets, producing summary statistics, figures, and so on). I think the way it’s done here is actually quite elegant. This post is also useful when you’re using LyX, by the way, because you can always put LaTeX code into a LyX document.
So far this is all about generating tables. But actually, the underlying idea is that you organize everything in a way so that you can press a button and your data set that you will use for the analysis is built from the raw data, then you press a button and the analysis is run and the tables and figures are presented, and finally you press a button and the paper is typeset anew. This is described very nicely in Gentzkow and Shapiro’s Practitioner’s Guilde that I have already referred to in an earlier post. On the one hand, this is best practice because it ensures replicability of results, but on the other hand it will also save you time when you revise your paper, and believe me, you will likely have to do that many times.
Recent Comments