Essays on machine learning applications in economics : causal inference and prediction
This study includes three chapters related to machine learning applications with focus on different empirical topics. The first chapter talks about a new method and its application. The second chapter focuses on young economics professors salary issues. While the third chapter discusses scientific paper publication values based on text analysis and gender bias. In the first Chapter, I give a discussion of Double/Debiased Machine Learning (DML) which is a causal estimation method recently created by Chernozhukov, Chetverikov, Demirer, Duo, Hansen, Newey, and Robins (2018) and apply it to an education empirical analysis. I explain why DML is practically useful and what it does; I also take a bootstrap procedure to improve the built-in DML standard errors in the curriculum adoption application. As an extension to the existing studies on how curriculum materials affect student achievement, my work compares the results of DML, kernel matching, and ordinary least squares (OLS). In my study, the DML estimators avoid the possible misspecification bias of linear models and obtain statistically significant results that improve upon the kernel matching results. In the second chapter, we analyze the effects of gender, PhD graduation school rank, and undergraduate major on young economics professors' salaries. The dataset used is novel, containing detailed and time-varying research productivity measures and other demographic information of young economics professors from 28 of the top 50 public research universities in the United States. We apply double/debiased machine learning (DML) to obtain consistent estimators under the high-dimensional control variable set. By tracking the first 10 years of their professional work experience, we find that there barely exist effects on young faculties' salaries from the above three factors in most of the experience years. However, the gender effect on salary in experience year 7 is both statistically significant and economically significant (large enough in magnitude to have a practical meaning). In experience years 5 to 7, which are also near most faculties' promotion years, the gender effects are obvious. For both PhD graduation school rank and undergraduate major, the estimates for experience years 7 to 9 are large in magnitude; however they do not possess statistical significance. Overall, the effects tend to expand with years of experience. We also discuss possible economic mechanisms and reasons. In the third chapter, we build machine learning and simple linear models to predict academic paper publication outcomes as measured by journal H-indices, and we discuss the gender bias associated with these outcomes. We use a novel dataset with paper text content and each paper's associated H-index, authors' genders, and other information, collected from recently published economics journals. We apply term frequency-inverse document frequency vectorization and other Natural Language Processing (NLP) tools to transfer text content into numerical values as model inputs. We find that when using paper text content to predict an H-index, the prediction power is around 60 % in our classification model (4 tiers) and the root mean squared error is around 44 in our regression model. Moreover, when controlling for paper text, the gender causal effect hardly exists. As long as the paper contains similar text, gender does not influence the change in H-index. Additionally, we give real-world meanings associated with the models.