| Title: | Easy Access for South Korea Census Data and Boundaries |
|---|---|
| Description: | Census and administrative data in South Korea are a basic source of quantitative and mixed-methods research for social and urban scientists. This package provides a 'sf' (Pebesma et al., 2024 <doi:10.32614/CRAN.package.sf>) based standardized workflow based on direct open API access to the major census and administrative data sources and pre-generated files in South Korea. |
| Authors: | Insang Song [aut, cre] (ORCID: <https://orcid.org/0000-0001-8732-3256>), Sohyun Park [aut, ctb] (ORCID: <https://orcid.org/0000-0002-1231-5662>), Hyesop Shin [aut, ctb] (ORCID: <https://orcid.org/0000-0003-2637-7933>) |
| Maintainer: | Insang Song <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.8 |
| Built: | 2026-06-03 09:11:57 UTC |
| Source: | https://github.com/sigmafelix/tidycensuskr |
District level boundary data in South Korea in 2020. adm2_code column
can be used to join with an anycensus() output.
adm2_sf_2020adm2_sf_2020
A sf object with 250 rows and 3 variables:
year Year of the census data, e.g., 2010, 2015, or 2020
adm2_code Code of the district/municipal-level (Sigungu) administrative unit
geometry Geometry list-column
Statistical Geographic Information Service (SGIS)
The function queries a long format census data frame
(censuskor) for specific administrative codes (if provided)
anycensus( year = 2020, codes = NULL, type = c("population", "housing", "tax", "mortality", "economy", "medicine", "migration", "environment", "welfare", "social security", "landuse"), level = c("adm2", "adm1"), aggregator = sum, geometry = FALSE, ... )anycensus( year = 2020, codes = NULL, type = c("population", "housing", "tax", "mortality", "economy", "medicine", "migration", "environment", "welfare", "social security", "landuse"), level = c("adm2", "adm1"), aggregator = sum, geometry = FALSE, ... )
year |
integer(1). One of 2010, 2015, or 2020. |
codes |
integer vector of admin codes (e.g. |
type |
character(1). "population", "housing", "tax", "economy", "medicine", "migration", "environment", "mortality", "social security", or "landuse". Defaults to "population". |
level |
character(1). "adm1" for province-level or "adm2" for municipal-level. Defaults to "adm2". |
aggregator |
function to aggregate values when |
geometry |
logical(1). If |
... |
additional arguments passed to the |
A data.frame object containing census data for the specified codes and year.
Using characters in codes has a side effect of returning
all rows in the dataset that match year and type.
The 'wide' table is returned with separate columns for each
class1 and class2 and unit (abbreviated whereof) combination.
# Query mortality data for adm2_code 21 (Busan) anycensus(codes = 21, type = "mortality") # Query population data for adm1 "Seoul" or "Daejeon" anycensus(codes = c("Seoul", "Daejeon"), type = "housing", year = 2015) # Aggregate to adm1 level tax (province-level) using sum anycensus( codes = c(11, 23, 31), type = "tax", year = 2020, level = "adm1", aggregator = sum, na.rm = TRUE )# Query mortality data for adm2_code 21 (Busan) anycensus(codes = 21, type = "mortality") # Query population data for adm1 "Seoul" or "Daejeon" anycensus(codes = c("Seoul", "Daejeon"), type = "housing", year = 2015) # Aggregate to adm1 level tax (province-level) using sum anycensus( codes = c(11, 23, 31), type = "tax", year = 2020, level = "adm1", aggregator = sum, na.rm = TRUE )
District level data including tax, population, business entities, housing, economy, medicine and mortality in South Korea in 2010, 2015, and/or 2020. The availble years and variables depend on the type of data.
censuskorcensuskor
A data.frame with 103,626 rows and 10 variables:
year Year of the census data, e.g., 2010, 2015, or 2020
adm1 Name of the province-level (Sido) administrative unit
adm1_code Code of the province-level (Sido) administrative unit
adm2 Name of the district/municipal-level (Sigungu) administrative unit
adm2_code Code of the district/municipal-level (Sigungu) administrative unit
type Type of variable, e.g., "population", "tax", "mortality", "housing", "medicine", "migration", "environment", "welfare", or "economy"
class1 First-level classification of the variable depending on the type
class2 Second-level classification of the variable depending on the type
unit Unit of measurement for the variable
value Value of the variable
NA values in the value field indicate that the data was omitted or suppressed. We kept these NA values as-is to reflect the original data from the source. For temporal comparison, province names in adm1 field are standardized to the common names with no suffix in metropolitan cities and "-do" suffix in provinces. For example, "Seoul" instead of "Seoul Metropolitan City", and "Jeollabuk-do" instead of "Jeonbuk State". "KRW" in the unit field stands for South Korean Won. Values are as-is unless otherwise noted in the unit field (e.g., "per 100k population" or "million KRW").
KOSIS (Korean Statistical Information Service)
adm2_code often refers to the codes of autonomous _Gu_s or non-autonomous _Gu_s. The head table of the data frame may contain either or both of the two types of codes. This function detects the type of the codes in the adm2_code field and returns the exact codes accordingly.
detect_adm2_type(df, year = NULL, mode = "non", adm2_code = "adm2_code")detect_adm2_type(df, year = NULL, mode = "non", adm2_code = "adm2_code")
df |
A head data frame containing the full dataset.
i.e., |
year |
The year for which to filter the data. If not specified, the function will use the data.frame as is. |
mode |
A character vector of "atn" (autonomous) and "non" (non-autonomous). |
adm2_code |
A character vector of adm2_code field Default is "adm2_code". |
filtered data frame with exact codes
# Load 2020 census population pop20 <- anycensus(year = 2020, type = "population") pop20_nonauto <- detect_adm2_type(pop20, mode = "non") pop20_auto <- detect_adm2_type(pop20, mode = "atn") unique(pop20_nonauto$adm2_code) unique(pop20_auto$adm2_code)# Load 2020 census population pop20 <- anycensus(year = 2020, type = "population") pop20_nonauto <- detect_adm2_type(pop20, mode = "non") pop20_auto <- detect_adm2_type(pop20, mode = "atn") unique(pop20_nonauto$adm2_code) unique(pop20_auto$adm2_code)
A geofacet grid for South Korea administrative districts (Si-Gun-Gu) based on the Statistical Geographic Information Service (SGIS) standard in 2020. Non-autonomous districts in cities are retained as separate entities. This grid can be used with the geofacet package to create faceted visualizations based on geographic layout.
kr_grid_adm2_sgis_2020kr_grid_adm2_sgis_2020
A data.frame with 250 rows and 6 variables
name Name of the district/municipal-level (Sigungu) administrative unit
code SGIS code of the district/municipal-level (Sigungu) administrative unit
row Row position in the geofacet grid
col Column position in the geofacet grid
Statistical Geographic Information Service (SGIS)
GitHub username chichead in GitHub geofacet issue page
Load district boundaries for a specific year
load_districts(year = 2020)load_districts(year = 2020)
year |
The year for which to load district boundaries (2010, 2015, or 2020) |
An sf object containing district boundaries for the specified year
This function requires the tidycensuskr.sf package to be installed.
No explicit dependency is defind; but users should install the package following
the instructions at vignette('v01_intro') or more succinctly:
install.packages('tidycensuskr.sf', repos = 'https://sigmafelix.r-universe.dev')
This function reads a KOSIS API key from a specified file and sets it for use in KOSIS API calls.
set_kosis_key(file)set_kosis_key(file)
file |
A character string specifying the path to the file containing the KOSIS API key. |
The file should contain the API key as a single line of text. If the file does not exist, an error will be raised.
No return value. A message will be printed to confirm that the key has been set.