Inequality in Education: Sex/Gender
For International Women’s Day, I was inspired by tweets that zeroed in on women - more so than usual anyway. I am a day late because leading up to that, I was focused on analyzing this particular dataset for finding regional variances (sneak peak of that in the output). My interest was mostly at the nexus of place and education in Tanzania and how using such data can create a more targeted organizational approach to tackling the toughest educational inequalities.
One such inequality is between the sexes. No patience for punchlines here: educational outcomes for Tanzanian girls are unequal compared to their counterparts.
We already have some indication across the board of what levers the government (local and national) can begin to tinker with to close these gaps. In this quick linear regression, I use the 2013 Primary School Leaving Examination (PSLE) outcomes to try to capture the magnitude of the inequality. My past research has mostly found this magnitude expressed in terms of absenteeism, pass-fail differences, etc. To be clear, those values are likely more impactful and more clear-cut. Quite frankly, this analysis is closely tied to them as I’ll be looking at the Calculated Average PSLE score and gender’s effect on it. If you can close the pass-fail gaps or address various causes of absenteeism, my assumption is we would see the effect of gender narrow. Let’s get to it. (A note: gender is complex and different from sex, however, I don’t have the data to truly do justice to the non-conforming communities. Regrettably, gender and sex are interchangeable here to mean female vs. male.)
I have been giving Jupyter a second try (usually a PyCharm faithful). I can see why people love it so much, especially since I can export to HTML and paste to my blog. Find my notebook and analysis in my GitHub account! First, we read in our data using pandas and clean out some N/A values. For this quick analysis, I look at two categorical variables: Sex and Regions. I already dummy coded Sex (Female = 1, Male = 0) when cleaning up the scraped data, but pandas can also dummy code all 25 regions. Statsmodels’ ols function does this step for you as well when fitting your model so I opt for this instead. (Many thanks to scipy-lectures for their tutorial on setting this up using ols).
#Read in CSV
psle2013 = pd.read_csv("~/Documents/GitHub/ImportingNECTA/CompleteDatasets/necta_psle_2013.csv")
#Drop NAs, get Dummies if desired, call .head() to check dataframe if desired
psle2013_noNA = psle2013.dropna(axis=0, how=‘any’)
psle2013_noNA2 = pd.get_dummies(psle2013_noNA, columns=[‘Region’])
#Assign variables for building the model
CalcAverage = psle2013_noNA.CalcAverage
sex = psle2013_noNA.SEX
regions = psle2013_noNA.Region
#For just the DAR-ES-SALAAM Dummy Variable
dar = psle2013_noNA2.Region_DAR
#Build the model, print the model summary
model = ols(“CalcAverage ~ sex + regions”, psle2013_noNA).fit()
print(model.summary())
The results show the regression taking into account regions. However, all else equal, girls on average scored -0.13 less on their calculated average score. Consequently, assuming a score of 3 (2.5+) is passing, the magnitude could very well be the difference between passing and failing the exam. I feel that this needs some reiteration: because of the system as currently implemented, being born a girl could have been the difference between passing or failing the PSLE in 2013. Of course, I am oversimplifying the issue, but the issue does boggle the mind.
Recently, PSLE is no longer a barrier to attending secondary education. However, the fight is not won as other discriminatory practices have taken its place (e.g. banning once pregnant girls from continuing their education). Unfortunately, this discussion is still very high level. I’m going to continue to piece together data that might give more actionable items to close these gaps. In the meantime, check the work of good people like Dropwall (eagleanalytics.co.tz) who are working on a system that predicts potential dropouts. Literature shows that the issues they are finding are of acute importance to girls.