Hidden Pitfalls of Data Interpretation

Development policy cannot divest itself from statistics and data. To measure the impact of development policy and for calibrating new policy decisions there is a need to have quality data at our disposal. In fact, the latest advances in development economics relate to careful collection of data ab initio (Randomized Control Trials) to give credence to policy analysis. To identify areas requiring urgent policy intervention robust data is a must.  For example, one of the most commonly used indicators to look at gender related discrimination is the number of females per 1000 males or the sex ratio. An adverse ratio especially at the lower age bracket (0-24 months) impliesfemale infanticide.  

The manner in which data is collected has engendered controversy in our state. The issue of the 2011 Census comes to mind. There have also been innumerable complaints about the indifferent quality of data collection and the possible pitfalls that are inherently hidden in such an exercise. The consequent call towards collecting more robust data is therefore a concomitant fall out of the said deficiency.The initiative of the Statistics and Evaluation department to bring out the Annual State Economic Surveys is a move in the right direction.   However, of equal importance is another related but relatively less explored issue-- data interpretation. The latter is as crucial as the former because a wrong interpretation based on wrong assumptions and economic models can lead to wrong policy interventions leading to undesired impacts. I look at the multiple interpretations that come out of the same data set by zooming in a recent study that took the policy and academic world by storm this past week.   A recent controversy has erupted in the policy world in the wake of an NBER paper of Roland Fryer Jr, “An Empirical Analysis of Racial Differences in Police Use ofForce.” Roland Fryer Jr is Henry Lee Professor of Economics at Harvard and is a winner of the Clark’s Medal in 2015. The youngest African-American Professor to receive tenure at Harvard, Prof. Fryer’s work on education and how racial injustice prevails in the education sector in the US is pioneering. He used randomized experiments to analyse the differential impact of policy interventions in the education sector in the US. Study The paper in question, however, analyses the use of force by police authorities on citizens in the US and tried to examine any differences of police behaviour based on the race of the citizens. In other words the paper looked at whether policemen in the US respond differentially to citizens based on whether they are white, black, Hispanic or Asian. This is a question of importance in the US because there have been many complaints about how the police force in that country victimizes citizens of colour- not just now but historically as well. The findings of Prof. Fryer arebased on four surveys. Two of these surveys were specific to state of Texas, one to the city of New York and the other a triennially conducted survey of nationally representative sample of citizens in the US. His findings confirmed the prior of a racial bias against blacks and Hispanics in the US in respect of cases of non-lethal use of force. Depending on the model and specification used, blacks are between 21.3 - 19.3 % more likely to be involved in an interaction with the police in NY. However, in cases where police shootings are involved the prior that policeman discriminate against blacks and other races is not borne by the data. The findings are that blacks are 23.8% less likely to be shot at by police than whites. This is contrary to popular perception and other studies that have looked at similar issues of racial discrimination. A possible explanation However, using the same data the above results have problems of an interpretative nature. Police have a bias against all people who are black. Hence, even before they can use lethal force, they stop people who fit their racial prejudice. A racially biased police officer would on an average arrest blacks who are both a real threat (armed blacks) as as those who are not a real threat (blacks returning from a library and with books in his/her hand). The latter have a less real threat than the former and therefore a racially biased police officer because of a lower average of threat perception among all arrestees, ends up shooting lesser number of people they have arrested. The existing bias has the impact of reducing the average threat perception of the arrested individuals. That is officers arrest both individuals with a high actual threat and low actual threat- lowering the actual threat perception of arrestees. This happens only for those against whom they have bias. Whereas a racially biased police man would only arrest those whites with a high actual threat. In the above example, they do not typically arrest white female carrying books in her hand. In statistical parlance, the above is referred to as selection bias.   Another criticism of the study was that Prof. Fryer used data that was provided by the Police department therefore biasing the results in one direction. Police have an incentive only to provide data that show them in a positive light. However, Prof. Fryer answers the criticism by stating that the same data set does show a bias in non-lethal use of force hence, there is no reason to believe there would be a differential reporting of data on lethal use of force. Omitted Variables Even superior minds and quality statisticians have been found to be susceptible to interpretations pitfalls. RA Fisher, a statistical guru of sorts believed that smoking did not cause cancer. His analysis was that those who are more susceptible to lung cancer have genes that make them smokers. Thus, he argued that it is the gene that caused lung cancer and not cigarettes. Fisher did not provide any statistics to back his claim. However, Kaprio and Koskenvuo in 1989 used empirical results from monozygotic twins- one of whom is a smoker and the other a non-smoker. By selecting monozygotic twins these researchers ruled out the possibility of differential impact of genes. They found that the smoking twin died quicker than the non-smoking twins. The latter empirical result therefore negates the Fisherian interpretation of smoking data.   In short, interpretation of data is as much a tricky proposition as robust data collection.  

(The author is an IAS Officer of Nagaland cadre and can be reached at vyasan_r@yahoo.com. The views expressed above are personal.)



Support The Morung Express.
Your Contributions Matter
Click Here