A few years ago, when I was serving as technology director of a suburban New York City school district, the leadership was looking for ways to make students' and district employees' data more accessible and useful for strategic planning. As part of this effort, we conducted evaluations of our student, human resource, and financial information systems. We wanted to know what data was being captured, where it was being stored, how well it was maintained, and what value it could add in district-wide decision making. While reviewing the district's human resources information system (HRIS), I met with a clerk who kept track of teachers' professional development records. At the time, I thought I'd uncovered a treasure chest of untapped data that could be used to explore relationships between teacher participation in professional development programs and students' academic performance. For an exciting moment, I imagined that with this information, we'd be able to find the training programs with the most positive impact on student success and then target our future professional development investments accordingly.
My enthusiasm waned quickly when the clerk explained that she did not keep the professional development records in the district's HRIS because it was too difficult to use and did not have enough data entry fields to meet her needs. Instead, she had developed a makeshift spreadsheet that held staff demographic and professional development information. Even though the data was reasonably accurate and current, it would not easily give us the answers we needed. There were several obstacles to making use of all the collected data:
The data was not compatible with other district data. A quick look at the spreadsheet revealed that data was organized by teachers' last names. There was no identification number or key that uniquely identified each record. This was not a problem for the clerk since she scrolled through names alphabetically to find the records she needed to update. However, the lack of a key meant that the data could not be automatically joined with additional teacher data from other sources in the district. In other words, the computer would not know Mr. Theodore Evans from Ms. Ingrid Evans. If we wanted to measure the impact of a particular training program on student performance, we would have to run a costly and time-consuming project to manually combine the data from the spreadsheet with information from the HRIS and the student information system. The advantage of a district-wide automated information system was lost because this department wasn't using it.
The data was not readily available to key decision makers. The spreadsheet was stored on the clerk's hard drive. All professional development inquiries had to go through the clerk, who would look at the spreadsheet for answers. Decision makers could not perform numerous ad hoc queries or test hypotheses. Their ability to tease the nuances from the data was severely limited.
The data was kept in one of several parallel systems. When I asked the clerk how she kept her information in sync with the district's HRIS, she showed me a preprinted slip that had spaces to note changes in a teacher's demographic data. At the bottom of the slip was a routing list of seven offices including HR, payroll, the staff development center, and the superintendent's office. I learned that this routing slip was used not only by the HR clerk to maintain her spreadsheet, but also by individuals in six other offices to update other collections of staff data that did not fit into the district's HRIS. This practice of entering and maintaining the same data in several places is known as having parallel systems. Not only is this practice extraordinarily inefficient, it also calls into question the validity of any one data source. We did not know which of the seven databases was the most accurate. In reality, they were all inaccurate to varying degrees. …