Thursday 1 August 2019

Some thoughts about data

These are just some of my personal thoughts about the current data world. First, it will make data analysis a lot easier if observations can have uniform ID numbers despite their sources. At the moment, observations which come from different sources tend to have different ID numbers; for example, one individual whose information is included in two databases usually has two ID numbers if the two databases do not belong to one source. When doing data analysis, one database is sometimes not enough given the question we want to answer. If observations can have uniform ID numbers across databases, it will be much easier to merge different databases and carry out the data analysis we want. However, having a uniform ID number can make data analysis easier but is problematic. Firms and individuals do have their unique IDs which help to identify them (for example company registration numbers and individuals’ passport and driving license numbers). However, these unique ID numbers, especially individual ID numbers,  are not included in most databases to protect the observed targets. Data are powerful and misusing data is dangerous. This is why large tech companies should be regulated for how they handle the data they collected from their users. Therefore, I think that when it comes to individuals’ information, to protect individuals’ privacy, databases should not include individuals’ ID numbers; but when it comes to some certain information, such as publicly listed companies’ information, we should have a uniform ID system for databases across the world.
A uniform ID system for publicly listed system is as easy as it sounds. Yes, publicly listed companies have their unique ID numbers, but these numbers merely show very limited information. For example, if one publicly listed companies acquire another publicly listed company, the ID number only shows the acquirer’ identity after the acquisition and the acquired company just disappear in the database. An ID number can be seen as a spot on a graph, and a spot can only included very limited information, which may be not even enough to trace back to its original company. What I think may work is to create a form of multiple dimension ID number. Multiple dimensions can help to contain sufficient information to identify the companies.

No comments:

Post a Comment