Archive | advice to Ph.D. students RSS for this section

The role of empirical work

I just came across a nice article by Dan Hamermesh in a recent issue of the AER. It was discussed by Einav and Levin in another interesting publication in Science related to big data.

Einav and Levin write:

Hamermesh recently reviewed publications from 1963 to 2011 in top economics journals. Until the mid-1980s, the majority of papers were theoretical; the remainder relied mainly on “ready- made” data from government statistics or surveys. Since then, the share of empirical papers in top journals has climbed to more than 70%.

Isn’t that remarkable? I certainly was under the wrong impression when I was a Ph.D. student in Berkeley and Mannheim and thought that it’s all about theory and methods. Where does this come from? Maybe it was because one tends to see so much theory in the first year of a full-blown Ph.D. program, which is full of core courses in Micro, Macro and Econometrics, covering what is the foundation to doing good economic research. In any case, my advice to Ph.D. students would be to strongly consider working with real data, as soon as possible. There is certainly room for theoretical and methodological contributions, but this should not mean that one never touches data. At least in theory 😉 everybody should be able to do an empirical analysis. And for this, one has to practice early on. Even if one wants to do econometric theory in the end. But even then one should know what one is talking about. Or would you trust somebody who talks about cooking but never cooks himself? OK, I admit, this goes a bit too far.

After having said this let me speculate a bit. My personal feeling is that one of the next big things and maybe a good topic for a PhD could be to combine structual econometrics with some of the methods that are now used and developed in data science (see the Einav and Levin article along with Varian‘s nice piece). In Tilburg, for instance, we have a field course in big data, by the way, and another sequence in structural econometrics (empirical IO).

Ballet, van Gogh and behavioral economics

picture taken from

picture taken from here

At the recent Netspar Pension Workshop I’ve been talking to Susann Rohwedder from the RAND Corporation. We talked about van Gogh and how he spent his youth in Brabant, not far away from Tilburg. The way he was painting at that time can be described as relatively dark and gloomy and not nearly as amazing as what he produced later in his life in the south of France, with the exception of the potato eaters, probably. Here, what dominates, arguably, is good craftsmanship. What I find remarkable is that he learned painting from scratch before moving on and developing something new.

Likewise, also Picasso first learned painting from scratch, producing paintings that were well done, but way more realistic that what he is known for now. Susann remarked that also for modern dancing people often say that one should first learn ballet dancing, in order to get a good grip on technical skills, before moving on. Interesting.

This discussion made me realize that there is a strong communality with my thinking about behavioral economics. There are many people who do research in behavioral economics without ever learning classical economics from scratch, and I always wondered why they do that. Standard economic theory is the simplest possible model we can think of, and it works just fine for many questions we may want to answer. There is of course lots to be gained by studying behavioral aspects of individual decision making, as recently demonstrated once more by Raj Chetty in his Ely lecture. But I think the best way to get there is to first fully understand classical economic theory and only then build on that. In passing, another thing that Chetty pointed out very nicely was that the best way to go about doing behavioral economics is probably not to point out where the classical theory is wrong—any model is wrong, because it abstracts from some aspects of economic behavior in order to focus on others—but to ask the question how we can use the insights from behavioral economics for policy making.

Brushing up the basics and online lectures in general

Yesterday, we had Mirko Draca over as a guest, also presenting in the economics seminar. Over dinner, he mentioned that there are two main lecture series that he would recommend when it comes to learning more about time series analysis and statistics in general. They are:

  1. Ben Lambert: A large series of undergrad and masters levels short videos, including time series:
  2. Joseph Blitzstien:  His probability course at Harvard which starts at the basics and then gos onto a lot of useful distributions and stochastic processes:

This reminded me of my wish to actually use online resources  more actively myself. And I would like to encourage especially Ph.D. students to actively look for interesting content on the web. It seems to me that such web lectures are tentatively underused and underappreciated, and that we usually don’t take the time to watch them as if they were real seminar talks or real lectures. However, that may be a mistake, and by making use of these resources ourselves, we may actually learn how to use the web more effectively when it comes to designing courses.

This is more broadly related to the challenges faced by universities, as described in a piece published by The Economist earlier this year.

But it concerns also conference visits. For example, most people don’t know that the plenary talks of many conferences are freely available on the internet. See here for some nice examples. All of them are highly recommended.

Correct and incorrect models

Today in class, somebody asked a question in my panel data econometrics class. The question concerned the assumption of strict exogeneity and whether it was violated in the example I gave before. I replied that yes, it could indeed be violated, but most of the time, in one way or another, a model will be mis-specified and assumptions will not hold in the strict sense. What I meant was that in some vague sense, the assumptions was a good enough approximation (without me going into the details of my example, think of the correlation between the error term and the regressor as being almost zero).

That made me think again of Milton Friedman, who argues in a famous essay that a model should be judged by its ability to predict counterfactual outcomes, or in his own words, “to make correct predictions about the consequences of any change in circumstances”. Sometimes, this is what we are after, and this is referred to as a positive approach (being able to make the right predictions)—as opposed to a normative one (where we can talk about welfare and how one can maximize it).

That sounds reasonable at first. But can we really make such a clear distinction? Can’t we simply see welfare as the outcome we would like to predict? Of course, we always need a model to make statements about welfare, but then it could also be that all models agree on the direction of the welfare effects of certain policy changes and only differ with respect to the exact magnitude. Therefore, I prefer to think of a model as a set of assumptions that are for sure wrong in the zero-one sense. But the question is how wrong, and that depends on the purpose the model is meant to serve. So, it’s a matter of degree. If the model allows me to make fairly accurate welfare statements (and I can be sure of that for whatever reasons—this is the red herring here), then I can apply Friedman’s argument that it’s good in his sense, but then I can even use if for welfare comparisons, so it serves a normative purpose. In a way, all this brings me back to an earlier post and in particular the part about Marshak.

PS on September 19, 2014: There are two interesting related articles in the most recent issue of the Journal of Economic Literature, in discussions of the books by Chuck Manski and Kenneth Wolpin, respectively. In these discussions, John Geweke and John Rust touch upon the risk of making mistakes when formulating a theoretical model, and how we should think about that.

Structural estimation in Stata

This goes to the ones who already know what they want to do, and it has to do with structural modeling. It’s about how to do this in Stata (of all places).

There are many reasons why you may want to use Stata for your empirical analysis, from beginning to end. Usually, you will use Stata anyways to put together your data set and also to do your descriptive analysis–it’s just so much easier than many other packages because many useful tools come with it. Plus, it’s a quasi industry standard among economists, so using it and providing code will be most effective.

So, if your structural model is not all that complicated, you can just as well estimate it in Stata.

Today, I want to point you to two useful guides for that. The first one is the guide by Glenn Harrison. This is actually how I first learned to program up a simulated maximum likelihood estimator. It’s focused around experiments and the situation you usually have there, namely choices between two alternatives. It’s a structural estimation problem because each alternative will generate utility, and the utility function depends on parameters that we seek to estimate.

Then, today I bumped into the lecture notes by Simon Quinn, which I found particularly insightful and useful if what you’re doing has components of a life cycle model. What I like particularly about his guide is that it explains how you would make some choices related to the specification of your model and functional forms.

Of course, there are also many reasons why you may not want to use Stata for your analysis. But in any case, it may not hurt to give it a thought.

Automating workflows

Writing an empirical paper involves—next to the actual writing—reading in data, analyzing it, producing results, and finally presenting them using tables and figures.

When starting a Ph.D., one typically imagines producing tables by means of lots of copy-pasting. But actually, I strongly advise you not to do that and instead to use built-in commands or add ons that allow you to produce LaTeX (or LyX) tables. There are at least two good reasons for this. First, it’ll save you time, fairly soon, maybe already when you put together the first draft of your paper. But at least when you do the first revision of that draft. The reason is that you will produce similar tables over and over again, because you will change your specification, the selection of your sample, or something else. And you will do robustness checks. The second reason why one wants to automate the creation of tables is that it will help you make less mistakes, which can come about when you paste results in the wrong cells or when you accidentally put too many or too few stars denoting significance next to the coefficient estimates.

Here’s an example of one way to do it in Stata and LaTeX (I usually use Stata for organizing my data, matching data sets, producing summary statistics, figures, and so on). I think the way it’s done here is actually quite elegant. This post is also useful when you’re using LyX, by the way, because you can always put LaTeX code into a LyX document.

So far this is all about generating tables. But actually, the underlying idea is that you organize everything in a way so that you can press a button and your data set that you will use for the analysis is built from the raw data, then you press a button and the analysis is run and the tables and figures are presented, and finally you press a button and the paper is typeset anew. This is described very nicely in Gentzkow and Shapiro’s Practitioner’s Guilde that I have already referred to in an earlier post. On the one hand, this is best practice because it ensures replicability of results, but on the other hand it will also save you time when you revise your paper, and believe me, you will likely have to do that many times.

Writing papers and theses

It’s August, which means that students are finishing up their research master or master theses. Here is some advice that I give most of them at one point or another, and I think also Ph.D. students may not be aware of all of the following. I’ll focus on the form for now, and will talk about the contents of a good paper at a later point in time. 

Let’s start with the very basics. You want to make your paper to be pleasant and easy to read in terms of the font size. My usual advice is to use a font like Times with a size of 11pt, to change the spacing to 1.5 or double space, and to use margins of 3cm in the top and in the bottom, and 2.5cm on the left and right.

Footnotes are usually placed after the end of a sentence, after the full stop. And acronyms should be defined before being used. You can do this by writing out the acronym and putting it in parentheses right after that. From then on you can use it. Particular sections or figures you refer to should start with capital letters. So, you would say “in the previous section”, but “in Section 3”.

Equations should only be numbered when you refer to them. Also, when you have an equation that is a “displayed formula” (so takes a whole line) and the sentence ends with that equation, then the equation should end with a full stop in it. When the text continues after the equation, then there should sometimes be a comma, for instance because the equation uses something that is defined afterwards using the expression “where”.

Overall, I think the best advice I can given is to be very careful so that the writing is of high quality. First of all, the English should be correct. There should not be any typos, and you should make extensive use of the spell checker. Then, the references should be in good order.

The following mostly applies to Ph.D. students in economics and related disciplines. When you’re writing papers, you should definitely use LyX or LaTeX together with BibTeX. Also references to figures, tables, equations, sections, and so on should be programmed so that when you change the structure of the document or insert a section or another figure, all the references are updated. This will save you a lot of time in the future, when you go through the 10th or so revision of a paper.

Generally, learn from others. In an earlier post I’ve already suggested that you should read papers in top 5 journals. Not all of them are well-written. But the chance that you get a well-written paper is higher than in other journals. Look at how introductions are structured, how the research is motivated. And spend a lot of time working out the arguments.

Andrew Chesher told me once, when I was visiting UCL as a Ph.D. student, that one may want to think about the following structure: this is what I’m doing > this is why I’m doing it and why it’s interesting > this is how I’m doing it > this is what I find. I think this is a great way to think about presenting research. He also said that academic papers should not have any superfluous written text and that for every word one should ask oneself whether it’s really necessary. Thereby, one can make text shorter and ultimately more clear.

Always make sure you use easy to understand and short sentences, mostly active tense, and that each paragraph roughly corresponds to one line of thought. But don’t be too mechanical.

Respect the reader by explaining well. Think of your reader as not being an expert on the topic you’re writing on, but as being smart and having a general education in economics. That way, you will not make the mistake of not explaining things that may be clear to you, but not to most readers.

And before I forget: many students write that “coefficients are significant”, but it should actually say that they are “significantly different from zero”.

If you want to learn more, have a look at my earlier post on the challenge of writing, where I also provide a reference to Silvia’s book. And if you’re interested in working some more on your writing, you may also want to consider having a look at a classic, the “Elements of Style“.