Monday 6 February 2017

The effect of outliers on the overall model

When we are conducting a test or doing a survey, there are some results that could be very away and different from the average results. Therefore, many researchers want to calculate the effects of all extreme data or so-called outliers on the final models in order to improve their models' accuracy. There have already been some methods to test if the outliers are influential points. The two main measures of influence are leverage and Cook Statistic.

The formula for measuring the leverage suggests that if the sample size is large enough, the leverage of a particular datapoint will be very very small, which means the influence of the datapoint is very limited on the overall result. In addition, Cook's distance depends on the size of the residual and the size of the leverage; therefore, once the sample size is large enough, Cook's distance will also become insignificant to the overall result. Therefore, in a large sample, the need to test the influence of the outliers seem unnecessary in my opinion. In addition, if the sample is so small that makes the influence of the outliers become significant, the small sample size definitely causes many other problems in the model and make the model result seem inaccurate and less credible. In this case, thinking about the influence of the outliers seems less important. Based on these opinions, I think that measuring the influence of outliers in a model is not something very important or necessary that has to be done.

No comments:

Post a Comment