Establishing Classification and Hierarchy in Populated Place Labeling for Multiscale Mapping for the National Map
Butzler, Stephen J., Brewer, Cynthia A., Stroh, Wesley J., Cartography and Geographic Information Science
Computers and the Internet have an increasing role in the use and production of maps, but we have been alarmed by professionally produced and popular mapping services that leave out seemingly obvious placenames, such as omitting Pittsburgh and Philadelphia on a map of Pennsylvania while many smaller towns are named. Online maps are viewed using a variety of platforms, with varied combinations of features, and at multiple scales on demand. Thus, decisions about what labels to display need to be made dynamically and automatically. As more and more information becomes available for mapping purposes, a production challenge is to incorporate socio-economic attributes at the design stage in order to create better digital maps. For example, we want to have enough data considered by labeling algorithms so that smaller and less significant places are omitted before major cities when some labels are necessarily crowded out at smaller scales. Conversely, there is a potential danger of information overload (or slow processing) when additional data is used to create an overly complicated algorithm when a simple but elegant solution would suffice.
When creating a map product that is meant to be viewed at multiple scales, all of the design issues become increasingly complicated. Labeling map features is an especially burdensome problem, even with advanced label placement tools, such as Maplex (Esri ArcGIS). There are many tips in the cartographic literature on label placement for manual methods, and numerous articles that detail particular placement algorithms (for reviews see Kern and Brewer 2008; Huffman and Cromley 2002; Edmonson et al. 1996), and the topic remains of interest, evident in recent conference presentations, for example, Jordan and Michna's presentations at ICC2009. There are few resources, however, that explain how to marshal federal data attributes to best make use of commercial off the shelf (COTS) tools to map entire nations in an automated fashion. That practical goal is partly addressed by this paper, within a specific context of improving The National Map of the United States.
Labeling populated places provides an excellent case for considering the perceived importance of a given locale based on its position and labeling in given geographic extents and at given scales requested by online map users. In their current form, online tools, such as The National Map Palanterra-based Viewer, (served by the U.S. Geological Survey for viewing and downloading U.S. geographic data at viewer. nationalmap.gov) render the somewhat complicated national point layers of federal placenames when representing populated places. Our work considers the current state of labeling of populated places in The National Map Viewer and comparable consumer navigation mapping (Google maps in this pilot study), considers the polygon alternative to point-feature labeling of places, and whether richer attribute sets add value to labeling populated places.
This pilot project addresses the specific problem of labeling U.S. places through a set range of scales. The broader applicability of this project lies within the challenges of labeling in general. By using comprehensive data sets and adding attributes that enhance place distinctions, cartographers have the ability to make more refined decisions about hierarchy of place which can be incorporated directly by automated labeling decisions. The challenge of using the most suitable data sets for labeling is compounded for us by the inherent limitations of the overarching project--information has to be nationally available (not just prepared for some cities or some states), copyright free, and meet quality expectations of the U.S. federal government so that the maps are authoritative sources of geographic information. These limitations place fairly stringent restrictions on using user contributed data, vernacular geographies, or other local placename inventories.
Source for Geographic Names
The Geographic Names Information System (GNIS; gnis.usgs.gov) is the official repository for domestic geographic names and is maintained by the U.S. Geological Survey (USGS) to support the efforts of the Board on Geographic Names (BGN) Domestic Names Committee (DNC), the body tasked with maintaining domestic geographic names (U.S. BGN 2003). The DNC rarely initiates name changes or corrections (except in the case of derogatory names). State naming agencies/committees, local governments and constituencies as well as relevant federal agencies are consulted by the committee as needed for their deliberations. Current Federal Geospatial Data Committee (FGDC) standards name GNIS as the official source for the naming of geographic features on federal maps. Categories of feature names within GNIS include natural features, populated places, civil divisions, areas and regions, and cultural features. The focus of our work is on the populated place names listed in GNIS.
Names contained within GNIS originate from multiple sources. Many were derived from the original USGS 1:24000 topographic map series, though names are also submitted by Federal government partners and by local constituents through a formal submission process. All geographic name records contain name and coordinate location attributes. Some also contain a variant or multiple variants which may represent additional spellings or historical variations. Some records contain a feature designation such as State Capitol or County Seat.
Defining Populated Place
In its broadest definition, the term populated place encompasses the villages, towns, and cities that comprise the settlement landscape in the United States. However, other types of organized settlement
are included in this broad category, such as townships and boroughs. The latter are types of civil divisions that are incorporated, but exist primarily in the eastern half of the U.S. and serve as sub-county divisions. In comparison, for much of the western U.S., a place is either incorporated but contained within a county, or is simply an unincorporated area within a county. Populated places also name neighborhoods--which do not posses formal boundaries, except perhaps at a very local level. Neighborhoods are often cultural or historic designations which may possess limited value beyond the local constituency.
Within GNIS there is a Feature Class called "Populated Place." GNIS also provides distinctive classes for "Civil" and "Census" places. The USGS (2011) defines these three feature classes as quoted below:
Census: A statistical area delineated locally specifically for the tabulation of Census Bureau data (census designated place, census county division, unorganized territory, various types of American Indian/Alaska Native statistical areas). Distinct from Civil and Populated Place.
Civil: A political division formed for administrative purposes (borough, county, incorporated place, municipio, parish, town, township). Distinct from Census and Populated Place.
Populated Place: Place or area with clustered or scattered buildings and a permanent human population (city, settlement, town, village). A populated place is usually not incorporated and by definition has no legal boundaries. However, a populated place may have a corresponding "civil" record, the legal boundaries of which may or may not coincide with the perceived populated place. Distinct from Census and Civil classes.
Additional related classes in GNIS are (quoted from USGS 2011):
Locale: Place at which there is or was human activity; it does not include populated places, mines, and dams ([but does include] battlefield, crossroad, camp, farm, ghost town, landing, railroad siding, ranch, ruins, site, station, windmill).
Military: Place or facility used for various aspects of or relating to military activity.
We have chosen to disregard locales in this pilot study.
We consider military places when they coincide with census features only.
[FIGURE 1 OMITTED]
Individual records are searched through a GNIS web interface, however, the BGN also provides the contents of the GNIS database with various digital file gazetteers. The records can be derived topically; such as "populated place," containing only the feature class populated place, or "concise," containing all large features appropriate for mapping at a scale of 1:250,000. The records are also accessible as national or state files containing all feature classes for a given geography. The third type of gazetteer file expands the records to add FIPS (Federal Information Processing) 55 Census ID and class codes. The legacy FIPS ID codes have been replaced with American National Standards Institute (ANSI) based GNIS IDs.
Yet, the FIPS55 class codes will persist for the near term and are extremely useful in further categorizing populated places. A subset of 78 class codes that are of interest to our work (quoted from USGS 2006; also see Figure 1):
Class C--Incorporated Places
C1: an active incorporated place that ... does not serve as a primary county division equivalent
C5: an incorporated place that also serves as a primary county division ... it is not included in any adjacent primary county division of class T.
Class M--Federal Facilities
M2: an installation of the U.S. Department of Defense... [that] has been reported by the Census Bureau as a CDP [census designated place].
Class T--Active Minor Civil Divisions
T1: Identifies an active minor civil division (MCD) that is not coextensive with an incorporated place.
Class U--Populated (Community) Place
U1: a [CDP] with a name identical to the authoritative common name.
U2: a CDP with a name not identical to an authoritative common name of essentially the same area.
U4: a populated place wholly or substantially within the boundaries of an incorporated place with a different name.
U6: a populated place located wholly or substantially outside the boundaries of any incorporated place or CDP with an authoritative common name recognized by the U.S. Geological Survey.
The reader will see a fair amount of repetition among categories as they examine the definitions, and that was one challenge for this pilot project. One placename may appear in two or three categories, but we did not want to include many repeated names on the maps since space is scarce as scale decreases. For example, Pittsburgh has two separate records in GNIS as a P1 and C5, and Reserve is a T1 and a U2. We wanted to elevate labels for a place to the highest category suited to each and remove lower-level versions that would be repeats of that name.
GNIS features are all identified by a single XY coordinate. However, linear and areal features that spread across multiple 1:24,000 topographic maps possess secondary points to ensure that labeling occurs even when the primary point is on a different map sheet. At larger scales, these locator coordinates may not provide value in label placement if they fall out of the area being viewed. Rather, the actual areal feature may allow better placement options in GIS-based mapping environments.
In contrast to approved and defined placename types listed in U.S. federal data infrastructures (e.g., from GNIS or the U.S. Census Bureau), vernacular geography defines the world around us using natural language definitions and concepts of space and place (Montello et al. 2003; Garcia Adeva 2008). These definitions are rarely included in existing official databases or gazetteers because the spatial extent of vernacular geographies are difficult to precisely define, and map users may not agree on common meanings for these names or place types. At small map scales, vernacular terms such as "The South" or "The Steel City" are of limited value because these terms are less useful for reference-map reading tasks, such as navigation planning, in comparison to official place names that are realized at the same scale. At large scales, vernacular geography can add a level of detail that increases the usefulness of a map. However, given the scope of this project, in particular the requirement that the data be nationally consistent across the country and approved by the U.S. federal government, using vernacular geography is not yet feasible. In addition, we focused our work on scales where placenames were congested, from 1:100,000 to 1:1,000,000, which are smaller than the scales at which vernacular geographies typically are useful.
Our goals of this pilot project included:
* determining point or area label placement for populated places;
* evaluating the relationship between populated place, civil place, and census place;
* evaluating the use of the census class codes in categorizing places;
* determining attributes which could provide an ordered classification and be used to produce label hierarchy;
* establishing an actual hierarchy of places within the constraint of real geographies crowded with placenames; and
* using labels with fonts and sizes readable and differentiable at desktop computer screen resolution.
For our pilot project, we chose to map GNIS points for the states of Colorado and Pennsylvania. Western states have a comparatively- simpler hierarchy because there are no Minor Civil Divisions (MCDs) such as townships or boroughs. (Western states do posses subcounty divisions. These divisions are called Census Civil Divisions (CCD), and while CCDs do subdivide the county, they are primarily statistical units and do not typically correspond with a defined place.) Sub-county place labels comprise incorporated municipalities, Census Designated Places (CDPs), or U4/U6 populated places--neighborhoods. In many eastern states, the MCD category complicates labeling as MCDs can, but not always be coextensive with municipalities.
Our first consideration was the type of feature representation to use for label placement. Despite secondary coordinates that are contained in many GNIS records, we decided that area label placement was best suited for a GIS mapping environment of dynamic labeling that responds to zooming and panning. Hence, we acquired and matched polygon features (Census TIGER/Line Shapefiles) in all cases where a GNIS point record had a corresponding polygon. This choice also helped us to remove duplicate placename records and refine the GNIS populated places point records into an initial coarse ranking of places used for an initial prioritization of labels: (1) incorporated municipalities, (2) CDPs, and (3) populated places which are neither incorporated nor CDPs.
Categories 1 and 2 could be further classified internally with rankings by population from Census Bureau decennial data (during the pilot, Census 2000 was the latest nationally complete dataset). However, residential population may not be the only attribute relevant to perceptions of place.
To create a more holistic representation of place, we turned to the Census Bureau's Economic Census, which is conducted every five years, most recently in 2007. The Economic Census creates a subset of places (historically, only incorporated places) which meet a given population threshold, varying from census to census. These subsets of places are called "Economic Places." The 2007 Economic Census changed the criteria to include both incorporated places and CDPs and the threshold is met either by residential population of 5000 or 5000 employees in a place. Adding economic place to our regime allowed us to add economic attribute data to our categorization.
When evaluating the Economic Census attribute data, we looked first at composite sector data. Sectors are broad measures of industry within the economy such as mining, agriculture, retail, manufacturing, and services. Hence a composite of all sectors characterizes the overall economy of a specific geography. However, the Economic Census does not aggregate data coarser than two-digit sector codes at place-level geography, and efforts to aggregate the multiple sector data manually proved cumbersome. We turned to the Survey of Business Owners (SBO) data which provides aggregate data on sales and receipts, number of firms, and number of employees at the place level. At the time of our pilot tests, the 2007 SBO data had not yet been released at the economic place level, so we reverted back to the 2002 SBO attribute data for number of employees (U.S. Census Bureau 2010a). However, the population thresholds defining an economic place had changed from the 2002 SBO to the 2007 SBO. This did create some problems because we used 2009 place and economic place TIGER/Line Shapefiles, and some places met the new thresholds in 2007 that had not done so in 2002. Hence, we did have cases of 2007 defined economic places with no 2002 SBO data, but as the 2007 definition is more robust, we chose to overlook this mismatch until the 2007 SBO data are released and we can join those attributes to our most current TIGER/Line Shapefiles, which we continued to use (U.S. Census Bureau 2010b).
The economic place polygons are, essentially, a subset of place polygons, based on 2007 economic place definitions, and hence we can land all economic places directly on top of an identically named place. Having already joined the 2002 SBO data to the economic place shapefile, we collapsed all relevant polygons into points and used a Spatial Join (in ArcGIS 9.3.1) to merge the economic attributes into the attribute tables for place polygon layers. The reason for this process, as opposed to a union or a merge, was because many CDPs had geometries that were not coextensive with geometries of other places. Sometimes these CDPs straddled other place boundaries causing certain areas to belong to multiple other places (Figure 2). Additionally the geometries across the three input layers often had mismatches that created polygon slivers when attempting to aggregate the polygons using other methods.
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
At this point, we were able to create separate layers using definition queries and exporting the selections for:
1. Incorporated places that are economic places (Figure 3a).
2. Incorporated places that are not economic places (Figure 3b),
3. CDPs that are also economic places (coextensive with MCDs in PA) (Figure 3c),
4. MCDs that are also economic places (PA only) (Figure 3d),
5. CDPs that are not economic places (coextensive with MCDs in PA) (Figure 3e),
6. MCDs that are not economic places (PA only) (Figure 3t), and
Figure 3 shows how this hierarchy builds the Pittsburgh example data.
In the case of Pennsylvania, there were additional CDPs that did not have coextensive geography with MCDs, thus creating a seventh layer (Figure 3g). Finally, at large scales, the remainder of the GNIS populated place points that were not represented by the Census polygons were added as an eighth layer (U4 and U6 points).
[FIGURE 4 OMITTED]
Incorporated places that are also economic places were then classified using the attribute "Number of Employees" from the SBO. Other economic attributes are available, but employees--the number of people working in a place--provides a logical complement to residential population, another attribute that is easily accessible for place.
The final step was to showcase the place hierarchy with the appropriate use of a labeling hierarchy. Places were labeled with Maplex dynamic labeling, and not converted to annotation, using ArcGIS 9.3.1. Incorporated places with number of workers received the most prominent label while GNIS populated place points received the least prominent label at large scales; as the scale decreases the lower layers were not labeled. For example, Denver, Aurora, and Englewood in Colorado (CO) and Pittsburgh, West Mifflin, and Bethel Park in Pennsylvania (PA) are incorporated places coextensive with economic places ranked according to number of workers. The hierarchy of places is listed in the detailed legends in Figure 4.
[FIGURE 5 OMITTED]
The colors seen on the maps and that run across rows in Figure 4 aid in interpreting these pilot maps, but they are not recommended colors for a final version of national mapping. Likewise, the particular label styles, sizes, and fonts in the pilot are not a final set we recommend for national mapping. They instead are intended to clarify different types of polygons--CDPs in brown fills, incorporated places in purple fills, and bluegreen labels for MCDs. At a practical level, we chose to focus on mapping Pittsbugh and Denver because of the authors' familiarity with these cities that enabled immediate visual evaluations of the quality of draft hierarchies based on local knowledge.
In order to effectively evaluate our results we also mapped other sets of features in order to simulate a feature rich background common in cartographic design. In this iteration, two vector sets, transportation and hydrography; were included. We did not include labels for these features, so there is a minor mismatch in our comparisons because the Google maps and The National Map Viewer maps do include a scatter of other features type names such as road shields and airports that compete for map space with names for populated places.
We evaluated our results in two ways. The first was to ask if the additional hierarchy provided value. This was addressed by examining the design to see if a clear visual hierarchy emerged. The second method was to compare the performance of our visual hierarchy, mapped with ArcMap 9.3.1, with two other online maps: Google Maps and The National Map Viewer (see Figure 5). Since this project focuses on place name hierarchy, we wanted to test at scales where there was competition for label placement; testing at a larger map scale would allow most if not all places to be labeled unless there was competition from other feature types. Google Maps and The National Map Viewer use a common set of scales and the same projection for their cached map tiles (and we could display at any scale and projection) so we tested in Web Mercator at 1:144,448 (listed as 144K in tables and figures); 1:288,895 (289K); and 1:577,791 (578K). At scales larger than 1:144,448 there was minimal label competition and the next smaller map scale beyond 578K was 1:1,155,581, which falls outside the scope of our project (we top out at 1:1,000,000 where the U.S. National Atlas mapping resources take over from The National Map). Google Maps
[FIGURE 6 OMITTED]
We printed out a list of all possible labels that could be displayed on our pilot map using a simple selection drag box over a set extent (approximately 6.75 x 9.25 inches) at each scale. This list was organized to reflect the hierarchy of places the attribute data gave us. We compared this list to realized map clips from Google maps, The National Map Viewer, and maps we produced with ArcMap and Maplex to calculate the percentage of available labels landing on each map clip.
Overall, we were pleased with initial tests of incorporating attributes to further create category and hierarchy in labeling of the populated places in the GNIS database. Figures 6 through 11, which have been reduced by 50 percent from original screen captures for inclusion in this paper, show the results of the labeling scheme through multiple scales. We did not alter the Maplex rules for particular scale intervals except for switching on additional layers. We evaluated the results using the same method across scales.
Figure 6 shows the Denver metropolitan area. There is balance to the overall feel of this scheme at this scale. Centennial seems too prominent, based on Wes' local knowledge, in comparison to Boulder and Aurora. A clear overall label hierarchy is operating. Figure 7 shows the Pittsburgh metropolitan area. The same labeling scheme was used in Pittsburgh as in Denver, though there were additional categories added in Pittsburgh to reflect the presence of meaningful MCDs. Again there is clear hierarchy of place. There is a danger with these labeling styles that the labels dominate the map, but the styles were used explicitly for studying map labeling and are not intended for general reference mapping.
[FIGURE 7 OMITTED]
[FIGURE 8 OMITTED]
[FIGURE 10 OMITTED]
At the larger scales (Figures 8 to 11), the scheme holds up well because there are simply fewer labels competing. At this scale, we brought in the U4 and U6 point features because of the space available. There may be a natural break between 1:288,895 and 1:577,791 at which point some smaller features should be omitted by category rather than compete for space in Maplex processing. Again Pittsburgh has a larger number of labels in part because the U4 dataset for Pittsburgh contains more points. The polygon schema used for Pittsburgh and Denver was different because of the importance of MCDs in Pennsylvania. Where in the scale continuum to bring in point data remains to be decided. At scales of 1:144,448 and larger there is ample space to bring in all of the U4/U6 features. We deemed these valuable primarily at the local level, as they are representations of neighborhoods. There is no easily identifiable attribute for population or for economic activity at these points. Future work may address this problem by, for example, using a surface analysis of census attribute data and associating the results with U4/U6 points.
[FIGURE 9 OMITTED]
[FIGURE 11 OMITTED]
[FIGURE 12 OMITTED]
Figure 12 and Table 1 show how the pilot map performed for all placenames outlined in Figure 4. A key aspect of better performance by our maps was the placement of a large number of incorporated places. The good performance of the pilot map in placing township names (MCDs) is a reflection of our decision to include this category at scales where the other maps actively did not include them, so this is a less comparable aspect of the improved performance.
Beyond the Pilot Study
For our pilot study, the labeling schema has been applied to Colorado and Pennsylvania. We plan to expand our analysis to several other states to determine whether additional schemas are necessary. Differences between urban and rural places, primarily in density but also size, may be addressed as well.
We may also incorporate the 2007 SBO data and generate the labels again to determine where truly obvious omissions are occurring. Consideration will be given to determining scale breaks at which classification rules should change. We also plan to test using both worker and residential populations to ensure comparatively large places are always dynamically labeled on maps.
We have done some work exploring the viability of different attributes already. Figures 13 to 15 show the Pittsburgh example using simply residential population data and keeping the label settings seen in Figures 6 to 11. Though there are differences in the hierarchy of place, using population still resulted in a visible and logical hierarchy We would like to continue with the work to decide whether the extra processing of economic polygons and data provides an important enough change in performance to be worthwhile.
Another follow-on task is exploring how to create a hierarchy among U4/U6 neighborhood points. Currently they have no attribute data that we can manipulate. A possible solution would be to do a surface analysis of census data, either economic or decennial, at a suitably granular level such as block groups. The U4/U6 points that fall within a certain buffer of surface highs could potentially be used to introduce local hierarchy at very large map scales.
[FIGURE 13 OMITTED]
[FIGURE 14 OMITTED]
Place labels may well identify the most recognizable features on a map. Though perceptions of place may be somewhat personal to the map reader, the relative importance of place is often communal. By incorporating richer attributes and using the implicit categories, maps can better represent place to the reader. Furthermore, a methodology which incorporates more meaningful attributes, applied thoughtfully across scales, will enhance overall map usability.
[FIGURE 15 OMITTED]
Thanks to Lou Yost, Jennifer Runyon, and members of the Domestic Names Committee of the Board on Geographic Names at the USGS for assistance and perspective on the current state of geographic naming. We appreciate the encouragement of Lynn Usery, Director of the USGS Center for Excellence for Geospatial Information Science (CEGIS). The research is funded by a grant from CEGIS, announcement 09HQPA1000. The Gould Center and National Mapping Expertise Exchange program at Penn State provided facilities for the research.
Edmonson, S., J. Christensen, J. Marks and S.M. Shieber. 1996, A General Cartographic Labelling Algorithm. Cartographica 33(4): 13-24.
Garcia Adeva, J. 2008. Translating Vernacular Terms into Geographical Locations. Geospatial Services and Applications for the Internet 2008, pp. 135-153.
Huffman, F.T. and R. Cromley. 2002. An Automated Multi-Criteria Cartographic Aid for Point Annotation. The Cartographic Journal 39 (1): 51-64.
Jordan, P. 2009. Some considerations on the function of place names on maps. In: Proceedings of the 24th International Cartographic Conference (ICC2009), Santiago, Chile, November 15-21, 10 pp.
Kern, J.R and C.A. Brewer. 2008, Automation and the Map Label Placement Problem. Cartographic Perspectives 60: 22-45.
Michna, I. 2009. Generalization of geographic names on atlas maps. In: Proceedings of the 24th International Cartographic Conference (ICC2009), Santiago, Chile, November 15-21, 10 pp.
Montello, D., M. Goodchild, J. Gottsegan and P. Fohl. 2003. Where's Downtown?: Behavioral Methods for Determining Referents of Vague Spatial Queries. Spatial Cognition and Computation 3: 185-204.
U.S. Board on Geographic Names. 2003. Principles, Policies, and Procedures: geonames. Online: usgs.gov/ domestic/policies.htm (accessed 3 March 2011).
U.S. Census Bureau. 2010a. 2002 Survey of Business Owners: Economy Wide Estimates of Business Ownership: State of Colorado. Online: factfinder.census.gov (accessed 12 March 2010).
U.S. Census Bureau. 2010b. 2009 TIGER/Line Shapefiles. Online: www.census.gov/geo/www/ tiger/tgrshp2009/tgrshp2009.html (accessed 15 March 2010)
U.S. Geological Survey. 2011. Feature Class Definitions. Online: www.geographic.org/geographic_names/ Feature%20Class%20Definition.htm (accessed 3 March 2011).
U.S. Geological Survey. 2006. FIPS 55-3 Class Code Definitions. Online: geonames.usgs.gov/domestic/ fips55codedef.html (accessed 3 March 2011).
Stephen J. Butzler, Cynthia A. Brewer, Wesley J. Stroh, Department of Geography, The Pennsylvania State University, University Park, PA 16802, USA. Emails:
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: Establishing Classification and Hierarchy in Populated Place Labeling for Multiscale Mapping for the National Map. Contributors: Butzler, Stephen J. - Author, Brewer, Cynthia A. - Author, Stroh, Wesley J. - Author. Journal title: Cartography and Geographic Information Science. Volume: 38. Issue: 2 Publication date: April 2011. Page number: 100+. © 2008 American Congress on Surveying & Mapping. COPYRIGHT 2011 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.