/r/econometrics
Econometric Analysis, newly published papers anything to do with the field.
Do not request paid-for help.
Do not ask someone to do your homework.
If requesting help with a homework/quiz question, demonstrate what you have managed to achieve by yourself.
/r/econometrics
I'm very confused by my problem set on DID.
I'm supposed to replicate table 1 panel A of this paper. I can do it fairly easily running the specification
ln(e/p) = alpha_i + gamma_t + beta1 x ln(minwage)_it + beta2 x X_it + e_it
Where X_it are the covariates unemployment rate and relative size of youth population.
My issue is that 1) I know this is the specification they used because I can replicate the entire table perfectly using it, and 2) they call this diff-in-diff. But from everything I had seen before, for example this Callaway, Goodman-Bacon, Sant'Anna paper, indicates that for this to be a DiD specification there should be an interaction of ln(minwage) with POST_t, which is a dummy for the post treatment period.
I have no idea how I could implement that into my regression since states are treated multiple times (min wage increases multiple times) over the sample period, so I don't know what the POST dummy would look like. Moreover, I'm fairly certain the authors don't do that.
So I guess my question is, are the authors running a DiD or just a standard regression with state and time fixed effects? And what is the interpretation of the parameter of interest? Would it still be ATT if the DiD assumptions hold?
Thank you in advance for the help!
I am working on a paper that used Dynamic Stochastic General Equilibrium (DSGE) to model macroeconomic policy changes. I am looking to replicate this paper but add other models that have different starting assumptions like Systems Dynamic Modeling.
What other models can I add that I will help make more robust results?
Hi everyone,
I'm struggling to understand the concept of the cumulative dependent variable in local projections, specifically when it's written as $( y_{t+h} - y_{t-1} )$. For example, if I have the inflation rate on the left-hand side, how should this be computed?
In the lpirfs package in R, it seems they compute it literally as $( y_{t+h} - y_{t-1} )$. So, if $ y_{t+h} = 5$ and $y_{t-1} = 2 $, they get $3$. However, I thought cumulative inflation should be the sum of the rates from period $t-1$ to $t+h$ which would be let's say $2+5=7$.
Thanks in advance!
Halo, we're currently using this free software called DEAP to run our analysis. Is there another software that's not very complicated to use that you could recommend that would give me the efficiency frontier as well in the results? Any help would be greatly appreciated!
Hello i currently am developing an algorithm which will retrieve, process and store log returns and realised volatility in option derivatives of stock symbols (e.g., i have been using TSLA for testing purposes so far). I am also looking to store options chain data, and I have currently successfully set up a POSTGRESQL database to store historical options chain data, log returns and realised volatility. I am now looking to expand on this system by looking into live 2-tick data feeds with a Redis database that way i can cache at 1 hour intervals and then feed back into my POSTGRESQL database by continuously updating the historical options chain data with the live feed. I currently only have a free plan with Redis which offers 30mb of cache memory which might be fine for testing purposes but might not handle production deployment. I was wondering if anyone who has experience with live feeds had any tips of being extremely memory efficient when retrieving a live feed, or are there any other services like Redis which might be useful? Is there another way to set this up? What is the optimal amount of database memory one needs to do high frequency trading? Any and all advice is highly appreciated!
I'm working on a project for a political economy class on economic voting in the EU since 2019. I'm a real beginner with this kinda stuff, but I put together a dataset with the % vote change for the incumbent party, a dummy variable = 1 if the incumbent party lost voteshare, and another =1 if the incumbent party maintained power. I then assigned each election with cpi change data for 1,2,3 months and quarters before the election, as well as the total inflation rate leading up to that election since 2019. I tested numerous regressions for the 50 or so elections in my dataset and got no statistically significant relationship between inflation and whether incumbents were punished or lost power. All the literature I've read would suggest the result should be otherwise. Any thoughts?
Ii have two models and I’m trying to compare whether adding a lagged dependent variable further reduces heteroskedasticity in the model. Model 1 already has no heteroskedasticity. My regression equation is something like:
Y = a + bx1 +y(-1) + ut
Would i need run the original regression and then squaring the residuals for both explanatory variables x1 and y?
Hi everyone, I'm working on a paper about geopolitical risk and commodity markets, and I'm struggling to decide which variable to use to represent commodity market prices.
The assignment involves building a panel data model with country-specific GPR index (geopolitical risk index) and EPU (Economic policy uncertau) index as independent variables. However, I'm unsure whether to use the Primary Commodity Price index (PCPI), the Commodity Terms of Trade index (CTOT), or possibly another index.
If i choose PCPI, would it still be compatible with a panel data model ? On the other hand, wiuld using CTOT be more relevant for my study, since it's country-specific?
Any advice would be greatly appreciated
Just curious if the course is worthwhile/insightful. My modeling skills are a bit rusty -- is this course worth taking? It seems to focus on classical models (ARMA, VAR, VECM), which I suppose could make sense in a small n/macro context, but I question to what extent this stuff is cutting edge in 2024.
https://www.imf.org/en/Capacity-Development/Training/ICDTC/Courses/MFx
Hi everyone! So i graduated a few months ago (BA econ), and my degree only had an introductory econometrics module. I actually passed and scored better than average, which is suprising but I'm convinced that passing a course vs actually getting the feel of it is way different? So I'm taking out time to learn it myself.
From the research I did, this is the way to start: basic stats knowledge, basic programming, knowing vectors & matrices. Some of the most suggested resources are Ben Lambert and Wooldridge's textbook. I would like to know what else should I keep in mind to actually completely understand it? Any suggestions?
Im getting very very confused between the difference of fixed and random effects, because both definitions are not the same in the panel data, and longitudinal data context.
For starters, panel data is essentially longitudinal data right? Observing individuals over time.
For panel data and panel data regression, I have read several papers saying that fixed effects are models with varying intercepts, while random effects has one general intercept. Even in STATA and R, this seem to be the case in terms of the coefficients. And the test used to identify which is more appropriate is using Hausman Test.
However, for longitudinal data and when Linear mixed model is considered. Random effects model is the one with varying intercepts, and fixed effects is the one with constant estimates. And the one that was told to me to use in order to determine if fixed or random effects is appropriate is by doing LRT test.
I am really confused. So can anyone help me?
I'm reading through Greene's section on maximum likelihood estimation, and I think I need some reassurance about the following representation of a Hessian (included in the image).
If I understand H_i correctly, we've taken all the individual densities {xi, yi}, created a matrix of the partial derivatives of each, then summed them together? I just want to make sure I'm not missing something here.
I do see a lot of topics revolving around use of DiD in econometrics, specifically based on a lot of calculations and estimates, but I am interested in real-time results, for example, given a case or research, how did you test or assumed (as they're not always subject to testing) SUTVA, NEPT, EXOG? In practice, without the mathematical applications, as in you're interpreting the results.
Can somebody tell me what is the difference between gls omega, which we know exactly but what about gmm omega . How do we get gmm omega ??
Hello there,
I'm currently starting my research project for my undergrad econometrics course. I was thinking about how IRS budget increases are advocated for as a way to increase tax revenue, and described as an investment that pays for itself.
My research question was whether increased funding to the IRS increases tax collection effectiveness. I came up with the following model based on data I was able to collect:
Tax Collection Effectiveness = β0 + β1(Full Time Employees) + β2(IRS Budget) + β3(Working Age Population) + β4(Average Tax Per Capita)+ β4(Cost of Collecting $100) + ε
The main point of interest is budget, but holding the working age population, average tax per capita, and cost of collecting $100 seemed like good ways to control for changes in the number of tax filings, increases in tax that might result in more misfilings, and easier filing technologies (such as online). I have data from at least the past 20 years for every category of interest.
I decided to look at two measures of tax collection effectiveness: The number of identified math errors on individual tax returns, and the number of convictions from criminal investigations. I reason that either one should increase with a more effective force.
When I ran them, I got bupkis for significant effects, shown below:
I'm a bit disappointed, since it seems there ought to be some effect, and figure I'm likely doing something wrong given my inexperience. Would you happen to have any suggestions on a better model to approach this question with, or different data to try and collect? I figure that 20 years might just be too little data, or perhaps I ought to look specifically at personnel in the departments focused on narcotics/financial crimes and mathematical errors. Any suggestions are appreciated!
Hi! I am looking for advice on what laptop to buy.
I am an MSc economics student who will start specializing in econometrics, potentially to the point of eventually doing a PhD. If not, I would like the option of using the laptop for a job in data analytics later. I am also considering doing some elementary courses in machine learning.
I have been happy with my MacBook Air 2017 (though I've only used it for R Studio, Stata, Gretl and some Python), and I have found a good price for a 2022 MacBook Air M3. Does anyone have experience with it? Any recommendations?
Thanks!
I want to analyze how incomes among construction workers differ based on if they live in a state with Prevailing Wage Laws, Right to Work laws, and the what percent of workers in their state are in unions (see below). I am using the 2022 ACS 5 year sample from ipums. The paper I'm replicating is here. Please let me know what your thoughts are. Please let me know if the subscripts make sense.
Prevailing wage laws are laws that ensure in a construction project with state/federal funding pay their workers a living wage. This is as bids for contracts start high and then go low, as the contractor foots the bill.
Worker i, Year t, and State S
a = intercept
B1 is a dummy representing if the state a worker lives in is a Right to work state
B2 is an interaction term where the first PWL is a dummy representing if there is an existing Prevailing wage times the prevailing wage minimum for that state in raw nominal dollars.
B3 is the percent of construction workers in that state who are unionized in that year and state (unionstats.com)
B4 is a dummy for laborer as while all subjects work in the construction industry not all of workers are laborers. (as defined by the ACS;codes 6200 - 6950). I want to see if office workers/management have higher wages than laborers.
B5 is Occupational dummies, for the occupations that are laborers; office workers get 0 in every column.
B6 is Demographic Controls (Age, Age^2, dummies for each race, female, dummies for each marital status, metropolitan dummy, dummies for each level of education, head of household dummy, dummy for veteran status, and immigrant status dummy).
E = Error term
I am an undergraduate student and I need to find out income, price and interest rate elasticities of money demand for homework. I used M2, CPI, GDP and consumer loan rate as variables (2006:Q1-2024:Q2). I generated double log model with first differences but I can not derive meaningful values, income and interest rate insignificant. Variables do not have unit root, all seasonally adjusted stock data what is wrong? I need help.
Finance student here, working on my thesis.
I aim to create a model to analyze the relationship between future stock returns and credit returns of a company depending on their past returns, with other control variables.
I have a sample of 130 companies' stocks and CDS prices over 10 years, with stock volume (also for 130 companies).
But despite my best efforts, I have difficulties understanding the difference between a pooled VAR and a panel VAR, and which one is better suited for my model, which is in the the form of a matrix [2, 1].
If anyone could tell me the difference, I would be very grateful, thank you.
Hello guys. I dont understand why the regression model should take another form. Isn't the form already sufficient in the first state. Here is the school assignment question.
Hi all,
I would like to learn more about economics. I am in a MSF program right now and my professor has changed my mind on mathematics in the field of finance and economics. I have learned and subscribed to Warren Buffetts view that fundamental analysis is really all you need. But taking this class, my professor has shown some really cool mathematical methods to predict the future. Don’t know if any of it works or is reliable, but its really interesting and I would love to learn more. Its mostly statistical analysis and maybe some calculus not too sure only took calc 1.
I think its cool and I would love to learn more. Does anyone recommend any easy to read economic/financial mathematical books? Any intermediate books?
Hi folks, I am just starting my analysis concerning inflation throughout Europe and I've been thinking about how to deal with the following. The indices measuring inflation in various sectors are heavily dependent on past values. However the main focus of my analysis is the difference between countries and second, difference between different kinds of products (like fresh food, processed food, bakery, fruits, non-perishable food...etc.) so the persistency and dependency is basically just in my proxy variables. Can it ruin even the inference I need to do about my dummies for countries/categories? Thanks 💪.
How can I estimate inflation in the main categories? What equations can I use to predict changes in inflation across these categories
i just wanted to search UN Comtrade SITC 3. But my student email cant do it because my campus not have any subscription to UN Comtrade dataset. Maybe someone can suggest something. Or maybe there are volunteers who can help me. Hopefully there will be kind people.
I am taking my first-ever econometrics class in undergraduate, and as part of the class, we write a short 5-page paper answering an economic question. Maybe I am overcomplicating things, and my professor said to use data available on Stata, like the auto dataset, but I wanted to do something different. I decided on 'How does secondary school enrollment in Ecuador affect GDP per capita?' So, I used the World Bank API in R and got data from 1991 to 2023. I only have 33 observations per variable.
To recap, my dependent variable is GDP per capita, and my independent variables are: School enrollment, secondary (% gross), and Foreign Direct Investment (FDI), net inflows (% of GDP). I have 33 observations (1991 to 2023).
I ran my regression and got an R^2 value of 0.87. School enrollment, secondary (% gross) was statistically significant, and FDI was not. I'm just worried that 33 as my sample size (n) isn't good or that it makes my results less reliable.
I, of course, emailed my professor, but he won't answer over the weekend, so any insights would be welcomed!
hi, i'm an undergrad student very interested in multivariate time series topics. i want to learn more about T-VAR, especially regarding the potential of this model to build causal/counterfactual analyses. what are some good reads on this topic?
Hi guys,
I am a senior economics undergrad and I am considering starting my masters degree in financial economics. So, I will be using my laptop very often for data analysis but it is pretty old now and I need a new one because it is having difficulties running regressions. I don't have the biggest budget, just something average price is okay for me (I am using gfn for gaming purposes so you know what I mean). Anyways, if you guys could recommend something average for an econ graduate student would be great.
(edit. I live in Turkey)
Cheers.