Defining a new analysis: help defining the feature space
I am weighing creating an informal analysis of innovation and its effect on economic performance.
So far, I have the following data pulled; from a preliminary look, most datasets appear to have a large number of non-null values. I am thinking of performing OLS/Linear Regression. The data is grouped by country and would per analyzed per capita.
Independent variables:
- New patent applications(discrete)
- Average work hours per week (continuous)
- Government type (categorical)
- Social progress score (continuous)
Dependent variable:
- GDP (continuous)
However, I have two concerns. First, I would like to have more variables as inputs, as what I have so far seems to be a weak proxy for “innovation”. One option is to add in confounders (addressed below), normalize for these, and create an “innovation composite score”.
Second, if I do an innovation composite score, I am unclear exactly how to normalize the input variables based on the confounding variables. If I do not do an innovation composite score, I am also at a loss for how to add in these features into the feature space - categorical binning of a “developed” score? Am I overthinking it?
Potential confounders
- Education score (continuous)
- Income (DON’T HAVE - need to find)
- Poverty (proxied through “number of calories per day”, continuous)
- Infrastructure score (continuous)
In summary, I am looking to further define my feature space, including accounting for confounders. Thank you for your thoughts!
Sources:
New patents by country (2023, 2024)
- https://worldpopulationreview.com/country-rankings/patents-by-country
Education levels by country (2023)
- https://worldpopulationreview.com/country-rankings/education-rankings-by-country
Average hours in a work week by country (2023)
- https://worldpopulationreview.com/country-rankings/average-work-week-by-country
Poverty, proxied through daily supply of calories per person (2023)
- https://ourworldindata.org/grapher/daily-per-capita-caloric-supply?time=2022..latest&country=~USA
Infrastructure (various factors) (2023)
- https://worldpopulationreview.com/country-rankings/infrastructure-by-country
Government type -
- https://worldpopulationreview.com/country-rankings/government-system-by-countryW
World Happiness Report (various factors) (2023, 2024)
- https://www.worldhappiness.report/data-sharing/
Social progress by country (2023)
- https://worldpopulationreview.com/country-rankings/social-progress-index-by-country
Population (2023)
- https://data.worldbank.org/indicator/SP.POP.TOTL?end=2024&start=2022
Output: GDP change % YoY (per capita)
- https://data.worldbank.org/indicator/NY.GDP.MKTP.KD?end=2024&start=2021
[link] [comments]
Want to read more?
Check out the full article on the original site