Wednesday, 8 February 2017

How useful are data

Data are not only used in natural science and have been widely in all sectors in our life. Economics and many other social science researches love using data to make estimates, draw relationships between different variables, evaluate policy's impacts. In addition, data in many case people believe represent the truth. However, is this statement? This statement is definitely wrong that data can represent an image that is close to the truth but can never represent the truth. Even sometimes we are able to collect data from the whole population, due to the unavoidable human errors and research imperfection, we cannot get the true data from the population.

Moreover, even with the correct or perfect data, some conclusions drawn from analysing the data can be inaccurate and sometimes can be very different from the nature. Firstly, the researchers may not find the most fitted models to start with. Many data analyses are started with linear regressions, especially in some social science researches. However, sometimes the relationships between the variables can be much complicated than that and often it is just impossible to build a perfect model. Secondly, there have been so many assumptions in many existing regression methods. More assumptions there are, less credible the model becomes. In addition, the strengthen of the assumptions also affect the credibility of the model. Common sense tells us that models with weaker assumptions are better than models with stronger assumptions. Currently many models and methods have been built to weaken the assumptions in the existing models and methods. Thirdly, it is naturally difficult to include all relevant data and variables in the models. Moreover, more variables does not increase the requirement for the sample size only, but also may move the covariances between independent variables away from zero, though such problem will also hide in the error term otherwise.

Because of the imperfection of data analysis methods, data cannot be the only evidence we rely on. Of course, we can still use the data analysing results but we have to understand the possible errors existing in the model and make adjusted conclusions from such results.

No comments:

Post a Comment