RealScape-Metropolitan Fixed Assets Change Judgment by Pixel-by-Pixel Stereo Processing of Aerial Photographs
Koizumi, Hirokazu, Yagyu, Hiroyuki, Hashizume, Kazuaki, Kamiya, Toshiyuki, Kunieda, Kazuo, Shimazu, Hideo, AI Magazine
This article describes the Fixed Assets Change Judgment (FACJ) system and its core tool, RealScape. RealScape automatically detects changes in the height and color of buildings based on three-dimensional analysis of aerial photographs. The three-dimensional analysis employs a pixel-by-pixel stereo processing method that calculates the height of each pixel in aerial photographs and thus enables precise difference detection between previous and current aerial photographs. The FACJ system reduces the labor costs to one third of the traditional approach and the required judgment duration to about two weeks per 100 k[m.sup.2]. The FACJ system was experimentally used by the Tokyo Metropolitan Government for the first time in 2005. Since then, it has been used at its tax bureau every year to calculate the municipality's fixed-asset tax. After the success in Tokyo, other major city governments, including Osaka and Sapporo, have followed suit.
The Japanese fixed-property tax is imposed by municipalities on the owners of land, buildings, and depreciation assets (all hereinafter referred to as "fixed assets") on January 1 of every year by calculating the tax sum according to current asset values. For this purpose, the municipalities take aerial photographs every year on January 1 and compare the photographs with those of the previous year to identify building-change information (new construction, loss, enlargement, reform, reconstruction, work in-progress, and so on). The identification of such changes is entrusted to survey companies who hire a large number of workers (figure 1, left). However, reliance on human labor has led to problems detailed in the following paragraphs.
[FIGURE 1 OMITTED]
Huge Costs, and the Impossibility of Eliminating Human Judgment Errors
It takes about 10 hours to read and interpret a single photograph, and the average municipality must perform this work for several hundred photographs. In addition, errors are not acceptable from the viewpoint of fair taxation, in particular, oversights in finding actual changes to buildings. Nevertheless, the current work done using the traditional system is dependent on the capabilities of individuals, so errors are unavoidable. In addition, in visual-identification work, attempts to prevent oversight errors are made by performing several read operations per area (figure 2), but this leads to a further increase in cost. Every photograph is taken over a scale that can cover an actual area of 800 by 600 meters or 500 by 600 meters (variable depending on the municipality), and every municipality has several hundred photographs that must be read. As a result, it is not rare for the person-hours required for the photograph-reading operation to exceed 10,000.
Under these circumstances, the incentives for the municipalities to overcome such challenges by automating or systematizing the photograph-reading work are now higher than ever. The criteria for the identification of changes are based on laws and guidelines issued by the Research Center for Property Assessment System. Specifically, the criteria are designed to detect height changes of 2 meters or more in an area of approximately 2 by 2 meters, and color changes in an area of approximately 2 by 2 meters. In other words, these criteria require the detection of any change in an area over 2 by 2 meters. Since these requirements set such a high hurdle for achieving automatic processing, the attempts made to automate this work have so far been limited to use of a few specific tools, and the available technology is still far from real systematization. The automation of this work involves the following problems.
[FIGURE 2 OMITTED]
Height and Color Change Detection
The height information may be obtained by aerial surveys using a laser profiler, but the laser profiler does not satisfy the area requirement because its resolution is too sparse to satisfy this requirement in normal use. …