UN Demographic Yearbook, Greenland

a real life example, from 2 days down to 2 minutes

(Published October 2024, last updated 2024-10-30)

click to see/hide used packages

# CRAN
library(tidyverse)
library(janitor)
library(XLConnect)
library(httr)

# As showcase some tables are defined using statgl 
# - a package Statistics Greenland has developed, to fit our needs
# Can be used on all Pxweb datacollections 
# https://www.scb.se/en/services/statistical-programs-for-px-files/px-web/
#
# GitHub - https://statisticsgreenland.github.io/
# devtools::install_github("StatisticsGreenland/statgl")

library(statgl)

Reporting tables for the UN Demographic Yearbook used to take two full days, just 10 years ago.

Filling in questionnaires using tables organized as sheets in a macro-supported Excel spreadsheet was time-consuming and prone to many errors.

When ‘saved queries’ were introduced in Pxweb, the time required was reduced by two-thirds, but it still took more than half a day to complete just two questionnaires (Population Estimates & Vital Statistics).

For many years, Statistics Greenland has been begging DYBquest for the use of SDMX or a similar to allow us to fill in the questionnaires programmatically. However, DYBquest has not been able to modify the process, and Statistics Greenland has been unable to prioritize this task for the past five years.

But now, Statistics Greenland has found a solution. As demonstrated by this script, the questionnaires can be answered using the R package XLConnect.

In fact, DYBquest could run this script themselves to collect the data !

ISSUE: When saving a protected xls/xlsx file all sheets are write protected, with the original code. I am still searching to find a workaround for this. Luckily the original protection code still works! Solutions welcomed.

Questionnaire on Population Estimates Reporting Year: 2024

Definitions

1. Definitions of population:

The following definitions are extracted from the United Nations’ Principles and Recommendations for Population and Housing Censuses, Revision 3.

Please indicate the definition that your office employs for population estimates/counts (e.g. present/de facto, usual resident or other) and the type of population estimates (e.g. intercensal or postcensal) in the notes box.

Present population (de facto): In a questionnaire-based census where no reference is made to usual residence, people are enumerated at the place where they are found, usually the dwelling where they spend census night.
Foreign residents who are in the country at the time of the census will be included but usual residents of the country who are absent at that time will be excluded (para. 4.30).
Usual resident population: A usual resident population count is a count of all usual residents of a country at the time of the census. Usual residents may or may not have citizenship of the country and they may also include undocumented persons, applicants for asylum or refugees.
Usual residents may include foreigners who reside or intend to reside in the country continuously for either most of the last 12 months or for 12 months or more, depending on the definition of place of usual residence that is adopted by the country.

general_def1 <- "Population Estimates are compiled from administrative registers 
reflecting 'de jura' population. The registers have 99.9 pct accuracy, 
when compiled 1 month after the reference date. Midyear refers to July 1.st"

2. Definitions of demographic variables: Please review the instructions provided below each table for variable definitions.

The following general definitions are extracted from the United Nations’ Principles and Recommendations for Population and Housing Censuses, Revision 3.

Age: the interval of time between the date of birth and the date of the census, expressed in completed solar years (para. 4.151).
Marital status: the personal status of each individual in relation to the marriage laws or customs of the country (para. 4.164). The categories of marital status to be identified should at least include the following:
(a) Single (never married)
(b) Married
(c) Married but separated: this category should comprise both the legally and the de facto separated.
(d) In consensual union: In some countries, this status includes registered partnerships and consensual unions, which are legal and binding under law.
(e) Widowed and not remarried
(f) Divorced and not remarried
Treatment of same-sex marriage:
In countries with legal provision for registered or legal partnership for same-sex couples, or where same-sex couples can legally marry, same-sex marriage/partnership may be included in the category “Married” or “In consensual union”.
Please indicate this in the notes box.

general_def2 <- "Same sex marriages and legal partnerships are calculated as 
married etc due to privacy concerns"

3. Urban and rural definitions:

Given the national differences in the characteristics that distinguish urban from rural areas, the distinction between the urban and the rural population is not yet amenable to a single definition that would be applicable to all countries (para. 4.92).
Thus, please provide detailed information about the national definition of urban and rural areas in the notes box.
The present questionnaire requests data on city proper and urban agglomeration as defined below.
City proper: A locality with legally fixed boundaries and an administratively recognized urban status, usually characterized by some form of local government.
Urban agglomeration: Urban agglomeration is defined as comprising the city proper and its suburban fringe or densely settled territory lying outside of, but adjacent to, the city boundaries.

general_def3 <- "Urban/Rural are based on a Greenlandic classification 
from 2020 categorizing localities by size. 
The settlements are categorized by size into the following groups:
a) Capital
b) Main settlements
c) Larger settlements: 700 - 3,000 inhabitants
d) Settlements: 200 - 699 inhabitants
e) Smaller settlements: 50 - 199 inhabitants
f) Smallest settlements: fewer than 50 inhabitants
g) Other localities: Outside municipal divisions, such as Pituffik, Sirius, etc.

Urban is now A + B + C
Rural is now D + E + F + G
"

4. Reference materials Principles and Recommendations for Population and Housing Censuses: Revision 3, United Nations, 2017

Tables

Data from StatBank Greenland into Excel-template

Template: Greenland_PE_2024_E.xls

This is the template we receive from UNstat::DYBquest annually on Population Estimates. As mid year data is collected, new data is ready first week in August, every year.

In operation, if all the same; change Parameter - time - and run

Parameters

time <- 2024

wb <- XLConnect::loadWorkbook(file.path(getwd(),time,
                    paste0("Greenland_PE_",time,"_E"),
                    paste0("Greenland_PE_",time,"_E.xls")))

PE_tmpfile <- file.path(getwd(),time,
                    paste0("Greenland_PE_",time,"_E"),
                    paste0("Greenland_PE_",time,"_E_out.xls"))

Table 0

Surface area (in square kilometers)

click to see/hide code

time_value <- time

writeWorksheet(wb, data = "2166086 km2", sheet = "Table 0", startRow = 18, startCol = 3, header = FALSE)

writeWorksheet(wb, data = "410449 km2", sheet = "Table 0", startRow = 18, startCol = 4, header = FALSE)

writeWorksheet(wb, data = "Population Statistics are compiled, based on administrative registers, published on https://bank.stat.gl/BEESTM1 August 8th, 2024", sheet = "Table 0", startRow = 44, startCol = 2, header = FALSE)

saveWorkbook(wb, PE_tmpfile)

Table 1

Estimated population by sex and urban/rural residence as of 1 July

click to see/hide code

link <- "BEESTM1"
footnote <- paste0(general_def1,"\n\n",general_def3,"\n\n","published on https://bank.stat.gl/",link)

time_value <- time
midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

tab177v11 <- statgl_fetch(statgl_url("BEXSTA"), gender=px_all(),
                     "residence type"=px_all(),
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  filter(residence_type!="T" & time<= 2010) %>% 
  mutate(residence_type = ifelse(residence_type %in% c("A", "B", "C"), "Urban", "Rural")) %>% 
  group_by(time, residence_type, gender) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = c(residence_type, gender), values_from = value) %>%
  mutate(
    Total_T = Rural_T + Urban_T,
    Total_M = Rural_M + Urban_M,
    Total_K = Rural_K + Urban_K
  ) %>%
  select(
    time, 
    Total_T, Total_M, Total_K,
    Urban_T, Urban_M, Urban_K,
    Rural_T, Rural_M, Rural_K
  ) %>%
  ungroup() %>%
  select(-time)



tab1 <- statgl_fetch(statgl_url("BEXSTM1"), gender=px_all(),
                     "residence type"=px_all(),
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  filter(residence_type!="T" & time<=time_value) %>% 
  mutate(residence_type = ifelse(residence_type %in% c("A", "B"), "Urban", "Rural")) %>% 
  group_by(time, residence_type, gender) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = c(residence_type, gender), values_from = value) %>%
  mutate(
    Total_T = Rural_T + Urban_T,
    Total_M = Rural_M + Urban_M,
    Total_K = Rural_K + Urban_K
  ) %>%
  select(
    time, 
    Total_T, Total_M, Total_K,
    Urban_T, Urban_M, Urban_K,
    Rural_T, Rural_M, Rural_K
  ) %>%
  ungroup() %>%
  select(-time)


writeWorksheet(wb, data = tab177v11, sheet = "Table 1", startRow = 41, startCol = 3, header = FALSE)

writeWorksheet(wb, data = tab1, sheet = "Table 1", startRow = 75, startCol = 3, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 1", startRow = 93, startCol = 2, header = FALSE)

saveWorkbook(wb, PE_tmpfile)

Table 2a

Estimated population by five year age groups, sex and urban/rural residence as of midyear

click to see/hide code

link <- "BEESTM1"
footnote <- paste0(general_def1,"\n\n",general_def3,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time

# 

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

url <- "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE01/BE0140/BEXSTM1.PX"

# Create the JSON structure with the dynamic time value
data <- sprintf('
{
  "query": [
    {
      "code": "age",
      "selection": {
        "filter": "agg:0,1-4,5-year.agg",
        "values": [
          "0",
          "1-4",
          "5-9",
          "10-14",
          "15-19",
          "20-24",
          "25-29",
          "30-34",
          "35-39",
          "40-44",
          "45-49",
          "50-54",
          "55-59",
          "60-64",
          "65-69",
          "70-74",
          "75-79",
          "80-84",
          "85-89",
          "90-94",
          "95+"
        ]
      }
    },
    {
      "code": "residence type",
      "selection": {
        "filter": "item",
        "values": [
          "A",
          "B",
          "C",
          "D",
          "E",
          "F",
          "G"
        ]
      }
    },
    {
      "code": "gender",
      "selection": {
        "filter": "item",
        "values": [
          "T",
          "M",
          "K"
        ]
      }
    },
    {
      "code": "time",
      "selection": {
        "filter": "item",
        "values": [
          "%s"
        ]
      }
    }
  ],
  "response": {
    "format": "csv3"
  }
}
', time_value)


response <- POST(
  url, 
  body = data,
  encode = "raw", 
  content_type("text/csv")
)

# Specify the desired age group order
age_order <- c("0", "1-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", 
               "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", 
               "70-74", "75-79", "80-84", "85-89", "90-94", "95+")

# Apply the transformations and set the age order
tab2a_raw <- content(response, as = "parsed", show_col_types = FALSE) %>%
  as.data.frame() %>%
  clean_names() %>%
  rename_with(~ "value", .cols = last_col()) 

tab2a <- tab2a_raw %>% 
  mutate(
    residence_type = ifelse(residence_type %in% c("A", "B"), "Urban", "Rural"),
    age = factor(age, levels = age_order, ordered = TRUE)  # Set age as an ordered factor
  ) %>%
  group_by(time, residence_type, gender, age) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = c(residence_type, gender), values_from = value) %>%
  mutate(
    Total_T = Rural_T + Urban_T,
    Total_M = Rural_M + Urban_M,
    Total_K = Rural_K + Urban_K
  ) %>%
  select(
    time, age,
    Total_T, Total_M, Total_K,
    Urban_T, Urban_M, Urban_K,
    Rural_T, Rural_M, Rural_K
  ) %>%
  ungroup()

# Convert the time column to character in tab2a_raw
tab2a_raw <- tab2a_raw %>%
  mutate(time = as.character(time))

# Calculate the total row
total_row <- tab2a %>%
  summarise(
    time = "Total",        # Label for the total row
    age = NA,              # No specific age for the total row
    Total_T = sum(Total_T, na.rm = TRUE),
    Total_M = sum(Total_M, na.rm = TRUE),
    Total_K = sum(Total_K, na.rm = TRUE),
    Urban_T = sum(Urban_T, na.rm = TRUE),
    Urban_M = sum(Urban_M, na.rm = TRUE),
    Urban_K = sum(Urban_K, na.rm = TRUE),
    Rural_T = sum(Rural_T, na.rm = TRUE),
    Rural_M = sum(Rural_M, na.rm = TRUE),
    Rural_K = sum(Rural_K, na.rm = TRUE)
  )

first_rows <- tab2a %>%
  filter(age %in% c("0", "1-4"))

xtra_row <- tab2a %>%
  filter(age %in% c("0", "1-4")) %>% 
  mutate(age = "0-4") %>%
  group_by(age) %>% 
  summarise(across(everything(), ~sum(.)))  # Replace `sum` with appropriate summary function for each column

last_rows <- tab2a %>%
  filter(!(age %in% c("0", "1-4")))

total_row_o <- total_row %>% select(-time,-age)
first_rows_o <- first_rows %>% select(-time,-age)
xtra_row_o <- xtra_row %>% select(-time,-age)
last_rows_o <- last_rows %>% select(-time,-age)

tab2a <- bind_rows(total_row_o, first_rows_o, xtra_row_o, last_rows_o)

year <- tab2a_raw %>% select(time) %>% filter(time!="Total") %>% distinct()  %>% 
  pull() %>% as.character()

midyear <- format(as.Date(paste0(as.character(year),"-07-01")), "%d-%b-%Y")

writeWorksheet(wb, data = midyear, sheet = "Table 2a", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = tab2a, sheet = "Table 2a", startRow = 15, startCol = 3, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 2a", startRow = 44, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 2a.1

Estimated population by five year age groups, sex and urban/rural residence as of midyear - Second latest available date

click to see/hide code

link <- "BEESTM1"
footnote <- paste0(general_def1,"\n\n",general_def3,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

url <- "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE01/BE0140/BEXSTM1.PX"

# Create the JSON structure with the dynamic time_value
data <- sprintf('
{
  "query": [
    {
      "code": "age",
      "selection": {
        "filter": "agg:0,1-4,5-year.agg",
        "values": [
          "0",
          "1-4",
          "5-9",
          "10-14",
          "15-19",
          "20-24",
          "25-29",
          "30-34",
          "35-39",
          "40-44",
          "45-49",
          "50-54",
          "55-59",
          "60-64",
          "65-69",
          "70-74",
          "75-79",
          "80-84",
          "85-89",
          "90-94",
          "95+"
        ]
      }
    },
    {
      "code": "residence type",
      "selection": {
        "filter": "item",
        "values": [
          "A",
          "B",
          "C",
          "D",
          "E",
          "F",
          "G"
        ]
      }
    },
    {
      "code": "gender",
      "selection": {
        "filter": "item",
        "values": [
          "T",
          "M",
          "K"
        ]
      }
    },
    {
      "code": "time",
      "selection": {
        "filter": "item",
        "values": [
          "%s"
        ]
      }
    }
  ],
  "response": {
    "format": "csv3"
  }
}
', time_value)


response <- POST(
  url, 
  body = data,
  encode = "raw", 
  content_type("text/csv")
)

# Specify the desired age group order
age_order <- c("0", "1-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", 
               "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", 
               "70-74", "75-79", "80-84", "85-89", "90-94", "95+")

# Apply the transformations and set the age order
tab2a.1_raw <- content(response, as = "parsed", show_col_types = FALSE) %>%
  as.data.frame() %>%
  clean_names() %>%
  rename_with(~ "value", .cols = last_col()) 

tab2a.1 <- tab2a.1_raw %>% 
  mutate(
    residence_type = ifelse(residence_type %in% c("A", "B", "C"), "Urban", "Rural"),
    age = factor(age, levels = age_order, ordered = TRUE)  # Set age as an ordered factor
  ) %>%
  group_by(time, residence_type, gender, age) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = c(residence_type, gender), values_from = value) %>%
  mutate(
    Total_T = Rural_T + Urban_T,
    Total_M = Rural_M + Urban_M,
    Total_K = Rural_K + Urban_K
  ) %>%
  select(
    time, age,
    Total_T, Total_M, Total_K,
    Urban_T, Urban_M, Urban_K,
    Rural_T, Rural_M, Rural_K
  ) %>%
  ungroup()

# Convert the time column to character in tab2a_raw
tab2a.1_raw <- tab2a.1_raw %>%
  mutate(time = as.character(time))

# Calculate the total row
total_row <- tab2a.1 %>%
  summarise(
    time = "Total",        # Label for the total row
    age = NA,              # No specific age for the total row
    Total_T = sum(Total_T, na.rm = TRUE),
    Total_M = sum(Total_M, na.rm = TRUE),
    Total_K = sum(Total_K, na.rm = TRUE),
    Urban_T = sum(Urban_T, na.rm = TRUE),
    Urban_M = sum(Urban_M, na.rm = TRUE),
    Urban_K = sum(Urban_K, na.rm = TRUE),
    Rural_T = sum(Rural_T, na.rm = TRUE),
    Rural_M = sum(Rural_M, na.rm = TRUE),
    Rural_K = sum(Rural_K, na.rm = TRUE)
  )

first_rows <- tab2a.1 %>%
  filter(age %in% c("0", "1-4"))
  
xtra_row <- tab2a.1 %>%
  filter(age %in% c("0", "1-4")) %>% 
  mutate(age = "0-4") %>%
  group_by(age) %>% 
  summarise(across(everything(), ~sum(.)))  # Replace `sum` with appropriate summary function for each column

last_rows <- tab2a.1 %>%
  filter(!(age %in% c("0", "1-4")))

total_row_o <- total_row %>% select(-time,-age)
first_rows_o <- first_rows %>% select(-time,-age)
xtra_row_o <- xtra_row %>% select(-time,-age)
last_rows_o <- last_rows %>% select(-time,-age)

tab2a.1 <- bind_rows(total_row_o, first_rows_o, xtra_row_o, last_rows_o)

writeWorksheet(wb, data = midyear, sheet = "Table 2a.1", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = tab2a.1, sheet = "Table 2a.1", startRow = 15, startCol = 3, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 2a.1", startRow = 44, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 2b

Estimated population by single years of age and sex as of midyear

click to see/hide code

link <- "BEESTM1"
footnote <- paste0(general_def1,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time

midyear <- format(as.Date(paste0(time_value,"-07-01")), "%d-%b-%Y")

url <- "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE01/BE0140/BEXSTM1.PX"


data <- sprintf('
{
  "query": [
    {
      "code": "age",
      "selection": {
        "filter": "all",
        "values": ["*"]
      }
    },
    {
      "code": "gender",
      "selection": {
        "filter": "item",
        "values": [
          "T",
          "M",
          "K"
        ]
      }
    },
    {
      "code": "time",
      "selection": {
        "filter": "item",
        "values": [
          "%s"
        ]
      }
    }
  ],
  "response": {
    "format": "csv3"
  }
}
', time_value)



response <- POST(
  url, 
  body = data,
  encode = "raw", 
  content_type("text/csv")
)

tab2b_raw <- content(response, as = "parsed",show_col_types = FALSE) %>%
  as.data.frame() %>%
  clean_names() %>% 
  rename_with(~ "value", .cols = last_col()) %>% 
  mutate(value=strtoi(value)) %>% 
  arrange(age)


tab2b <- tab2b_raw %>%
  pivot_wider(names_from = c(gender),values_from = value) %>% 
  select(age,T,K,M) %>% 
  rename(total=T, female=K, male=M) %>% 
  select(-age)

writeWorksheet(wb, data = midyear, sheet = "Table 2b", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = tab2b, sheet = "Table 2b", startRow = 15, startCol = 3, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 2b", startRow = 119, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 2b.1

Estimated population by single years of age and sex as of midyear - Second latest available date

click to see/hide code

link <- "BEESTM1"
footnote <- paste0(general_def1,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time - 1 

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

url <- "https://bank.stat.gl:443/api/v1/da/Greenland/BE/BE01/BE0140/BEXSTM1.PX"

data <- sprintf('
{
  "query": [
    {
      "code": "age",
      "selection": {
        "filter": "all",
        "values": ["*"]
      }
    },
    {
      "code": "gender",
      "selection": {
        "filter": "all",
        "values": ["*"]
      }
    },
    {
      "code": "time",
      "selection": {
        "filter": "item",
        "values": [
          "%s"
        ]
      }
    }
  ],
  "response": {
    "format": "csv3"
  }
}
', time_value)


response <- POST(
  url, 
  body = data,
  encode = "raw", 
  content_type("text/csv")
)

tab2b.1_raw <- content(response, as = "parsed",show_col_types = FALSE) %>%
  as.data.frame() %>%
  clean_names() %>% 
  rename_with(~ "value", .cols = last_col()) %>% 
  mutate(value=strtoi(value)) %>% 
  arrange(age)

year <- tab2b.1_raw %>% select(time) %>% distinct() %>%  pull() %>% as.character()

tab2b.1 <- tab2b.1_raw %>%
  pivot_wider(names_from = c(gender),values_from = value) %>% 
  select(age,T,K,M) %>% 
  rename(total=T, female=K, male=M) %>% 
  select(-age)


writeWorksheet(wb, data = midyear, sheet = "Table 2b.1", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = tab2b.1, sheet = "Table 2b.1", startRow = 15, startCol = 3, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 2b.1", startRow = 119, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 3

Estimated population of the capital city and cities of 100 000 or more inhabitants as of midyear

click to see/hide code

link <- "BEESTMNUK"
footnote <- paste0(general_def1,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

tab3 <- statgl_fetch(statgl_url("BEXSTMNUK"), citydistrict=px_all(),
                     time=px_top(), gender=px_all(),
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  filter(citydistrict!="N") %>% 
  mutate(citydistrict = ifelse(citydistrict %in% c("1"), "Urban", "Rural")) %>% 
  group_by(time, citydistrict, gender) %>%
  summarise(value = sum(value, na.rm = TRUE)) %>%
  pivot_wider(names_from = c(citydistrict, gender), values_from = value) %>%
  mutate(
    R_area = "",
    U_area = ""
  ) %>%
  select(
    time, 
    Urban_T, Urban_M, Urban_K, U_area,
    Rural_T, Rural_M, Rural_K, R_area
  ) %>%
  ungroup() %>%
  select(-time)

writeWorksheet(wb, data = midyear, sheet = "Table 3", startRow = 13, startCol = 4, header = FALSE)

writeWorksheet(wb, data = tab3, sheet = "Table 3", startRow = 16, startCol = 5, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 3", startRow = 26, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 3.1

Additional data: Estimated population of the capital city and cities of 100 000 or more inhabitants as of midyear

click to see/hide code

# source("Tab3.1.R")

Table 4a

Estimated population by marital status, age and sex as of midyear: Male

click to see/hide code

link <- "BEESTM5"
footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

tab4a_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                     time=px_top(), gender="M",
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


tab4a <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                          time=px_top(), gender=px_all(), age=px_all(),
                          .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  mutate(age=strtoi(age),
         age_grp = cut(age,
                       breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf),
                       labels = c("0-14", "15-19", "20-24", "25-29", "30-34", 
                                  "35-39", "40-44", "45-49", "50-54", "55-59", 
                                  "60-64", "65-69", "70-74", "75-79", "80-84", 
                                  "85-89", "90-94", "95+"),
                       right = FALSE)) %>% 
  group_by(marital_status,age_grp) %>% 
  summarise(value=sum(value)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


total_row_o <- tab4a_tot %>% select(-time,-gender)
rows_o <- tab4a %>% select(-age_grp)

tab4a <- bind_rows(total_row_o, rows_o)

col_part1 <- tab4a %>% select(T,U,G)
col_part2 <- tab4a %>% select(E,F)

writeWorksheet(wb, data = midyear, sheet = "Table 4a", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = col_part1, sheet = "Table 4a", startRow = 16, startCol = 3, header = FALSE)
writeWorksheet(wb, data = col_part2, sheet = "Table 4a", startRow = 16, startCol = 7, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 4a", startRow = 39, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 4b

Estimated population by marital status, age and sex as of midyear: Female

click to see/hide code

link <- "BEESTM5"
footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

tab4b_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                     time=px_top(), gender="K",
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


tab4b <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                          time=px_top(), gender="K", age=px_all(),
                          .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  mutate(age=strtoi(age),
         age_grp = cut(age,
                       breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf),
                       labels = c("0-14", "15-19", "20-24", "25-29", "30-34", 
                                  "35-39", "40-44", "45-49", "50-54", "55-59", 
                                  "60-64", "65-69", "70-74", "75-79", "80-84", 
                                  "85-89", "90-94", "95+"),
                       right = FALSE)) %>% 
  group_by(marital_status,age_grp) %>% 
  summarise(value=sum(value)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


total_row_o <- tab4b_tot %>% select(-time,-gender)
rows_o <- tab4b %>% select(-age_grp)

tab4b <- bind_rows(total_row_o, rows_o)

col_part1 <- tab4b %>% select(T,U,G)
col_part2 <- tab4b %>% select(E,F)


writeWorksheet(wb, data = midyear, sheet = "Table 4b", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = col_part1, sheet = "Table 4b", startRow = 16, startCol = 3, header = FALSE)
writeWorksheet(wb, data = col_part2, sheet = "Table 4b", startRow = 16, startCol = 7, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 4b", startRow = 39, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 4a.1

Estimated population by marital status, age and sex as of midyear: Male - Second latest available date

click to see/hide code

link <- "BEESTM5"
footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time - 1

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

time_value_c <- as.character(time_value)

tab4a.1_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                     time=time_value_c, gender="M",
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


tab4a.1 <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                          time=time_value_c, gender=px_all(), age=px_all(),
                          .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  mutate(age=strtoi(age),
         age_grp = cut(age,
                       breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf),
                       labels = c("0-14", "15-19", "20-24", "25-29", "30-34", 
                                  "35-39", "40-44", "45-49", "50-54", "55-59", 
                                  "60-64", "65-69", "70-74", "75-79", "80-84", 
                                  "85-89", "90-94", "95+"),
                       right = FALSE)) %>% 
  group_by(marital_status,age_grp) %>% 
  summarise(value=sum(value)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


total_row_o <- tab4a.1_tot %>% select(-time,-gender)
rows_o <- tab4a.1 %>% select(-age_grp)

tab4a.1 <- bind_rows(total_row_o, rows_o)

col_part1 <- tab4a.1 %>% select(T,U,G)
col_part2 <- tab4a.1 %>% select(E,F)


writeWorksheet(wb, data = midyear, sheet = "Table 4a.1", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = col_part1, sheet = "Table 4a.1", startRow = 16, startCol = 3, header = FALSE)
writeWorksheet(wb, data = col_part2, sheet = "Table 4a.1", startRow = 16, startCol = 7, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 4a.1", startRow = 39, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

Table 4b.1

Estimated population by marital status, age and sex as of midyear: Female - Second latest available date

click to see/hide code

link <- "BEESTM5"
footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link)
time_value <- time - 1

midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y")

time_value_c <- as.character(time_value)

tab4b.1_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                     time=time_value_c, gender="K",
                     .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  pivot_wider(names_from = marital_status,values_from = value)


tab4b.1 <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(),
                          time=time_value_c, gender="K", age=px_all(),
                          .val_code = T, .col_code = T) %>%
  clean_names() %>% 
  mutate(age=strtoi(age),
         age_grp = cut(age,
                       breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf),
                       labels = c("0-14", "15-19", "20-24", "25-29", "30-34", 
                                  "35-39", "40-44", "45-49", "50-54", "55-59", 
                                  "60-64", "65-69", "70-74", "75-79", "80-84", 
                                  "85-89", "90-94", "95+"),
                       right = FALSE)) %>% 
  group_by(marital_status,age_grp) %>% 
  summarise(value=sum(value)) %>% 
  ungroup() %>% 
  pivot_wider(names_from = marital_status,values_from = value)

total_row_o <- tab4b.1_tot %>% select(-time,-gender)
rows_o <- tab4b.1 %>% select(-age_grp)

tab4b.1 <- bind_rows(total_row_o, rows_o)

col_part1 <- tab4b.1 %>% select(T,U,G)
col_part2 <- tab4b.1 %>% select(E,F)

writeWorksheet(wb, data = midyear, sheet = "Table 4b.1", startRow = 12, startCol = 4, header = FALSE)

writeWorksheet(wb, data = col_part1, sheet = "Table 4b.1", startRow = 16, startCol = 3, header = FALSE)
writeWorksheet(wb, data = col_part2, sheet = "Table 4b.1", startRow = 16, startCol = 7, header = FALSE)

writeWorksheet(wb, data = footnote, sheet = "Table 4b.1", startRow = 39, startCol = 2, header = FALSE)

# Save the workbook
saveWorkbook(wb, PE_tmpfile)

--- always_allow_html: true format: html: code-tools: true self-contained: true toc: true toc-location: right toc-title: "**Contents**:" toc_summary: true toc_float: collapsed: true output: statgl::statgl_report: output: "pretty_html" execute: echo: true editor: source --- # **UN Demographic Yearbook, Greenland** ## [a real life example](https://stat.gl/pxmake/Greenland_PE_2024_E.xls), from 2 days down to 2 minutes (Published October 2024, last updated `r format(Sys.time(), '%Y-%m-%d')`) ```{r, setup, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide used packages" # CRAN library(tidyverse) library(janitor) library(XLConnect) library(httr) # As showcase some tables are defined using statgl # - a package Statistics Greenland has developed, to fit our needs # Can be used on all Pxweb datacollections # https://www.scb.se/en/services/statistical-programs-for-px-files/px-web/ # # GitHub - https://statisticsgreenland.github.io/ # devtools::install_github("StatisticsGreenland/statgl") library(statgl) ``` ### ------------------------------------------------------------------------ **Reporting tables for the UN Demographic Yearbook used to take two full days, just 10 years ago.** Filling in questionnaires using tables organized as sheets in a macro-supported Excel spreadsheet was time-consuming and prone to many errors. When 'saved queries' were introduced in Pxweb, the time required was reduced by two-thirds, but it still took more than half a day to complete just two questionnaires (Population Estimates & Vital Statistics). For many years, Statistics Greenland has been begging DYBquest for the use of SDMX or a similar to allow us to fill in the questionnaires programmatically. However, DYBquest has not been able to modify the process, and Statistics Greenland has been unable to prioritize this task for the past five years. But now, Statistics Greenland has found a solution. As demonstrated by this script, the questionnaires can be answered using the R package XLConnect. **In fact, DYBquest could run this script themselves to collect the data !** ***ISSUE:** When saving a protected xls/xlsx file all sheets are write protected, with the original code. I am still searching to find a workaround for this. Luckily the original protection code still works! Solutions welcomed.* ------------------------------------------------------------------------ **Questionnaire on Population Estimates** **Reporting Year: 2024** ------------------------------------------------------------------------ ### Definitions **1. Definitions of population:** The following definitions are extracted from the United Nations’ Principles and Recommendations for Population and Housing Censuses, Revision 3.\ Please indicate the definition that your office employs for population estimates/counts (e.g. present/de facto, usual resident or other) and the type of population estimates (e.g. intercensal or postcensal) in the notes box.\ \ **Present population (de facto):** In a questionnaire-based census where no reference is made to usual residence, people are enumerated at the place where they are found, usually the dwelling where they spend census night.\ Foreign residents who are in the country at the time of the census will be included but usual residents of the country who are absent at that time will be excluded (para. 4.30).\ **Usual resident population:** A usual resident population count is a count of all usual residents of a country at the time of the census. Usual residents may or may not have citizenship of the country and they may also include undocumented persons, applicants for asylum or refugees.\ Usual residents may include foreigners who reside or intend to reside in the country continuously for either most of the last 12 months or for 12 months or more, depending on the definition of place of usual residence that is adopted by the country.\ ```{r, Def1, message=FALSE,warning=FALSE} general_def1 <- "Population Estimates are compiled from administrative registers reflecting 'de jura' population. The registers have 99.9 pct accuracy, when compiled 1 month after the reference date. Midyear refers to July 1.st" ``` **2. Definitions of demographic variables:** Please review the instructions provided below each table for variable definitions.\ The following general definitions are extracted from the United Nations’ Principles and Recommendations for Population and Housing Censuses, Revision 3.\ **Age:** the interval of time between the date of birth and the date of the census, expressed in completed solar years (para. 4.151).\ **Marital status:** the personal status of each individual in relation to the marriage laws or customs of the country (para. 4.164). The categories of marital status to be identified should at least include the following:\ (a) Single (never married)\ (b) Married\ (c) Married but separated: this category should comprise both the legally and the de facto separated.\ (d) In consensual union: In some countries, this status includes registered partnerships and consensual unions, which are legal and binding under law.\ (e) Widowed and not remarried\ (f) Divorced and not remarried\ Treatment of same-sex marriage:\ In countries with legal provision for registered or legal partnership for same-sex couples, or where same-sex couples can legally marry, same-sex marriage/partnership may be included in the category “Married” or “In consensual union”.\ **Please indicate this in the notes box.**\ ```{r, Def2, message=FALSE,warning=FALSE} general_def2 <- "Same sex marriages and legal partnerships are calculated as married etc due to privacy concerns" ``` **3. Urban and rural definitions:** Given the national differences in the characteristics that distinguish urban from rural areas, the distinction between the urban and the rural population is not yet amenable to a single definition that would be applicable to all countries (para. 4.92).\ Thus, please provide detailed information about the national definition of urban and rural areas in the notes box.\ The present questionnaire requests data on city proper and urban agglomeration as defined below.\ **City proper:** A locality with legally fixed boundaries and an administratively recognized urban status, usually characterized by some form of local government.\ **Urban agglomeration:** Urban agglomeration is defined as comprising the city proper and its suburban fringe or densely settled territory lying outside of, but adjacent to, the city boundaries.\ ```{r, Def3, message=FALSE,warning=FALSE} general_def3 <- "Urban/Rural are based on a Greenlandic classification from 2020 categorizing localities by size. The settlements are categorized by size into the following groups: a) Capital b) Main settlements c) Larger settlements: 700 - 3,000 inhabitants d) Settlements: 200 - 699 inhabitants e) Smaller settlements: 50 - 199 inhabitants f) Smallest settlements: fewer than 50 inhabitants g) Other localities: Outside municipal divisions, such as Pituffik, Sirius, etc. Urban is now A + B + C Rural is now D + E + F + G " ``` **4. Reference materials** Principles and Recommendations for Population and Housing Censuses: Revision 3, United Nations, 2017\ ------------------------------------------------------------------------ ### Tables **Data from StatBank Greenland into Excel-template**\ **Template: Greenland_PE_2024_E.xls** This is the template we receive from UNstat::DYBquest annually on Population Estimates. As mid year data is collected, new data is ready first week in August, every year. In operation, if all the same; change Parameter - time - and run ------------------------------------------------------------------------ ## Parameters ```{r, parameters, message=FALSE,warning=FALSE} time <- 2024 wb <- XLConnect::loadWorkbook(file.path(getwd(),time, paste0("Greenland_PE_",time,"_E"), paste0("Greenland_PE_",time,"_E.xls"))) PE_tmpfile <- file.path(getwd(),time, paste0("Greenland_PE_",time,"_E"), paste0("Greenland_PE_",time,"_E_out.xls")) ``` ## Table 0 Surface area (in square kilometers)\ ```{r, tab0, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" time_value <- time writeWorksheet(wb, data = "2166086 km2", sheet = "Table 0", startRow = 18, startCol = 3, header = FALSE) writeWorksheet(wb, data = "410449 km2", sheet = "Table 0", startRow = 18, startCol = 4, header = FALSE) writeWorksheet(wb, data = "Population Statistics are compiled, based on administrative registers, published on https://bank.stat.gl/BEESTM1 August 8th, 2024", sheet = "Table 0", startRow = 44, startCol = 2, header = FALSE) saveWorkbook(wb, PE_tmpfile) ``` ## Table 1 Estimated population by sex and urban/rural residence as of 1 July\ ```{r, tab1, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM1" footnote <- paste0(general_def1,"\n\n",general_def3,"\n\n","published on https://bank.stat.gl/",link) time_value <- time midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") tab177v11 <- statgl_fetch(statgl_url("BEXSTA"), gender=px_all(), "residence type"=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% filter(residence_type!="T" & time<= 2010) %>% mutate(residence_type = ifelse(residence_type %in% c("A", "B", "C"), "Urban", "Rural")) %>% group_by(time, residence_type, gender) %>% summarise(value = sum(value, na.rm = TRUE)) %>% pivot_wider(names_from = c(residence_type, gender), values_from = value) %>% mutate( Total_T = Rural_T + Urban_T, Total_M = Rural_M + Urban_M, Total_K = Rural_K + Urban_K ) %>% select( time, Total_T, Total_M, Total_K, Urban_T, Urban_M, Urban_K, Rural_T, Rural_M, Rural_K ) %>% ungroup() %>% select(-time) tab1 <- statgl_fetch(statgl_url("BEXSTM1"), gender=px_all(), "residence type"=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% filter(residence_type!="T" & time<=time_value) %>% mutate(residence_type = ifelse(residence_type %in% c("A", "B"), "Urban", "Rural")) %>% group_by(time, residence_type, gender) %>% summarise(value = sum(value, na.rm = TRUE)) %>% pivot_wider(names_from = c(residence_type, gender), values_from = value) %>% mutate( Total_T = Rural_T + Urban_T, Total_M = Rural_M + Urban_M, Total_K = Rural_K + Urban_K ) %>% select( time, Total_T, Total_M, Total_K, Urban_T, Urban_M, Urban_K, Rural_T, Rural_M, Rural_K ) %>% ungroup() %>% select(-time) writeWorksheet(wb, data = tab177v11, sheet = "Table 1", startRow = 41, startCol = 3, header = FALSE) writeWorksheet(wb, data = tab1, sheet = "Table 1", startRow = 75, startCol = 3, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 1", startRow = 93, startCol = 2, header = FALSE) saveWorkbook(wb, PE_tmpfile) ``` ## Table 2a Estimated population by five year age groups, sex and urban/rural residence as of midyear\ ```{r, tab2a, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM1" footnote <- paste0(general_def1,"\n\n",general_def3,"\n\n","published on https://bank.stat.gl/",link) time_value <- time # midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") url <- "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE01/BE0140/BEXSTM1.PX" # Create the JSON structure with the dynamic time value data <- sprintf(' { "query": [ { "code": "age", "selection": { "filter": "agg:0,1-4,5-year.agg", "values": [ "0", "1-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+" ] } }, { "code": "residence type", "selection": { "filter": "item", "values": [ "A", "B", "C", "D", "E", "F", "G" ] } }, { "code": "gender", "selection": { "filter": "item", "values": [ "T", "M", "K" ] } }, { "code": "time", "selection": { "filter": "item", "values": [ "%s" ] } } ], "response": { "format": "csv3" } } ', time_value) response <- POST( url, body = data, encode = "raw", content_type("text/csv") ) # Specify the desired age group order age_order <- c("0", "1-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+") # Apply the transformations and set the age order tab2a_raw <- content(response, as = "parsed", show_col_types = FALSE) %>% as.data.frame() %>% clean_names() %>% rename_with(~ "value", .cols = last_col()) tab2a <- tab2a_raw %>% mutate( residence_type = ifelse(residence_type %in% c("A", "B"), "Urban", "Rural"), age = factor(age, levels = age_order, ordered = TRUE) # Set age as an ordered factor ) %>% group_by(time, residence_type, gender, age) %>% summarise(value = sum(value, na.rm = TRUE)) %>% pivot_wider(names_from = c(residence_type, gender), values_from = value) %>% mutate( Total_T = Rural_T + Urban_T, Total_M = Rural_M + Urban_M, Total_K = Rural_K + Urban_K ) %>% select( time, age, Total_T, Total_M, Total_K, Urban_T, Urban_M, Urban_K, Rural_T, Rural_M, Rural_K ) %>% ungroup() # Convert the time column to character in tab2a_raw tab2a_raw <- tab2a_raw %>% mutate(time = as.character(time)) # Calculate the total row total_row <- tab2a %>% summarise( time = "Total", # Label for the total row age = NA, # No specific age for the total row Total_T = sum(Total_T, na.rm = TRUE), Total_M = sum(Total_M, na.rm = TRUE), Total_K = sum(Total_K, na.rm = TRUE), Urban_T = sum(Urban_T, na.rm = TRUE), Urban_M = sum(Urban_M, na.rm = TRUE), Urban_K = sum(Urban_K, na.rm = TRUE), Rural_T = sum(Rural_T, na.rm = TRUE), Rural_M = sum(Rural_M, na.rm = TRUE), Rural_K = sum(Rural_K, na.rm = TRUE) ) first_rows <- tab2a %>% filter(age %in% c("0", "1-4")) xtra_row <- tab2a %>% filter(age %in% c("0", "1-4")) %>% mutate(age = "0-4") %>% group_by(age) %>% summarise(across(everything(), ~sum(.))) # Replace `sum` with appropriate summary function for each column last_rows <- tab2a %>% filter(!(age %in% c("0", "1-4"))) total_row_o <- total_row %>% select(-time,-age) first_rows_o <- first_rows %>% select(-time,-age) xtra_row_o <- xtra_row %>% select(-time,-age) last_rows_o <- last_rows %>% select(-time,-age) tab2a <- bind_rows(total_row_o, first_rows_o, xtra_row_o, last_rows_o) year <- tab2a_raw %>% select(time) %>% filter(time!="Total") %>% distinct() %>% pull() %>% as.character() midyear <- format(as.Date(paste0(as.character(year),"-07-01")), "%d-%b-%Y") writeWorksheet(wb, data = midyear, sheet = "Table 2a", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = tab2a, sheet = "Table 2a", startRow = 15, startCol = 3, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 2a", startRow = 44, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 2a.1 Estimated population by five year age groups, sex and urban/rural residence as of midyear - Second latest available date\ ```{r, tab2a1, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM1" footnote <- paste0(general_def1,"\n\n",general_def3,"\n\n","published on https://bank.stat.gl/",link) time_value <- time midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") url <- "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE01/BE0140/BEXSTM1.PX" # Create the JSON structure with the dynamic time_value data <- sprintf(' { "query": [ { "code": "age", "selection": { "filter": "agg:0,1-4,5-year.agg", "values": [ "0", "1-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+" ] } }, { "code": "residence type", "selection": { "filter": "item", "values": [ "A", "B", "C", "D", "E", "F", "G" ] } }, { "code": "gender", "selection": { "filter": "item", "values": [ "T", "M", "K" ] } }, { "code": "time", "selection": { "filter": "item", "values": [ "%s" ] } } ], "response": { "format": "csv3" } } ', time_value) response <- POST( url, body = data, encode = "raw", content_type("text/csv") ) # Specify the desired age group order age_order <- c("0", "1-4", "5-9", "10-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+") # Apply the transformations and set the age order tab2a.1_raw <- content(response, as = "parsed", show_col_types = FALSE) %>% as.data.frame() %>% clean_names() %>% rename_with(~ "value", .cols = last_col()) tab2a.1 <- tab2a.1_raw %>% mutate( residence_type = ifelse(residence_type %in% c("A", "B", "C"), "Urban", "Rural"), age = factor(age, levels = age_order, ordered = TRUE) # Set age as an ordered factor ) %>% group_by(time, residence_type, gender, age) %>% summarise(value = sum(value, na.rm = TRUE)) %>% pivot_wider(names_from = c(residence_type, gender), values_from = value) %>% mutate( Total_T = Rural_T + Urban_T, Total_M = Rural_M + Urban_M, Total_K = Rural_K + Urban_K ) %>% select( time, age, Total_T, Total_M, Total_K, Urban_T, Urban_M, Urban_K, Rural_T, Rural_M, Rural_K ) %>% ungroup() # Convert the time column to character in tab2a_raw tab2a.1_raw <- tab2a.1_raw %>% mutate(time = as.character(time)) # Calculate the total row total_row <- tab2a.1 %>% summarise( time = "Total", # Label for the total row age = NA, # No specific age for the total row Total_T = sum(Total_T, na.rm = TRUE), Total_M = sum(Total_M, na.rm = TRUE), Total_K = sum(Total_K, na.rm = TRUE), Urban_T = sum(Urban_T, na.rm = TRUE), Urban_M = sum(Urban_M, na.rm = TRUE), Urban_K = sum(Urban_K, na.rm = TRUE), Rural_T = sum(Rural_T, na.rm = TRUE), Rural_M = sum(Rural_M, na.rm = TRUE), Rural_K = sum(Rural_K, na.rm = TRUE) ) first_rows <- tab2a.1 %>% filter(age %in% c("0", "1-4")) xtra_row <- tab2a.1 %>% filter(age %in% c("0", "1-4")) %>% mutate(age = "0-4") %>% group_by(age) %>% summarise(across(everything(), ~sum(.))) # Replace `sum` with appropriate summary function for each column last_rows <- tab2a.1 %>% filter(!(age %in% c("0", "1-4"))) total_row_o <- total_row %>% select(-time,-age) first_rows_o <- first_rows %>% select(-time,-age) xtra_row_o <- xtra_row %>% select(-time,-age) last_rows_o <- last_rows %>% select(-time,-age) tab2a.1 <- bind_rows(total_row_o, first_rows_o, xtra_row_o, last_rows_o) writeWorksheet(wb, data = midyear, sheet = "Table 2a.1", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = tab2a.1, sheet = "Table 2a.1", startRow = 15, startCol = 3, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 2a.1", startRow = 44, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 2b Estimated population by single years of age and sex as of midyear\ ```{r, tab2b, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM1" footnote <- paste0(general_def1,"\n\n","published on https://bank.stat.gl/",link) time_value <- time midyear <- format(as.Date(paste0(time_value,"-07-01")), "%d-%b-%Y") url <- "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE01/BE0140/BEXSTM1.PX" data <- sprintf(' { "query": [ { "code": "age", "selection": { "filter": "all", "values": ["*"] } }, { "code": "gender", "selection": { "filter": "item", "values": [ "T", "M", "K" ] } }, { "code": "time", "selection": { "filter": "item", "values": [ "%s" ] } } ], "response": { "format": "csv3" } } ', time_value) response <- POST( url, body = data, encode = "raw", content_type("text/csv") ) tab2b_raw <- content(response, as = "parsed",show_col_types = FALSE) %>% as.data.frame() %>% clean_names() %>% rename_with(~ "value", .cols = last_col()) %>% mutate(value=strtoi(value)) %>% arrange(age) tab2b <- tab2b_raw %>% pivot_wider(names_from = c(gender),values_from = value) %>% select(age,T,K,M) %>% rename(total=T, female=K, male=M) %>% select(-age) writeWorksheet(wb, data = midyear, sheet = "Table 2b", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = tab2b, sheet = "Table 2b", startRow = 15, startCol = 3, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 2b", startRow = 119, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 2b.1 Estimated population by single years of age and sex as of midyear - Second latest available date\ ```{r, tab2b1, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM1" footnote <- paste0(general_def1,"\n\n","published on https://bank.stat.gl/",link) time_value <- time - 1 midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") url <- "https://bank.stat.gl:443/api/v1/da/Greenland/BE/BE01/BE0140/BEXSTM1.PX" data <- sprintf(' { "query": [ { "code": "age", "selection": { "filter": "all", "values": ["*"] } }, { "code": "gender", "selection": { "filter": "all", "values": ["*"] } }, { "code": "time", "selection": { "filter": "item", "values": [ "%s" ] } } ], "response": { "format": "csv3" } } ', time_value) response <- POST( url, body = data, encode = "raw", content_type("text/csv") ) tab2b.1_raw <- content(response, as = "parsed",show_col_types = FALSE) %>% as.data.frame() %>% clean_names() %>% rename_with(~ "value", .cols = last_col()) %>% mutate(value=strtoi(value)) %>% arrange(age) year <- tab2b.1_raw %>% select(time) %>% distinct() %>% pull() %>% as.character() tab2b.1 <- tab2b.1_raw %>% pivot_wider(names_from = c(gender),values_from = value) %>% select(age,T,K,M) %>% rename(total=T, female=K, male=M) %>% select(-age) writeWorksheet(wb, data = midyear, sheet = "Table 2b.1", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = tab2b.1, sheet = "Table 2b.1", startRow = 15, startCol = 3, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 2b.1", startRow = 119, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 3 Estimated population of the capital city and cities of 100 000 or more inhabitants as of midyear\ ```{r, tab3, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTMNUK" footnote <- paste0(general_def1,"\n\n","published on https://bank.stat.gl/",link) time_value <- time midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") tab3 <- statgl_fetch(statgl_url("BEXSTMNUK"), citydistrict=px_all(), time=px_top(), gender=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% filter(citydistrict!="N") %>% mutate(citydistrict = ifelse(citydistrict %in% c("1"), "Urban", "Rural")) %>% group_by(time, citydistrict, gender) %>% summarise(value = sum(value, na.rm = TRUE)) %>% pivot_wider(names_from = c(citydistrict, gender), values_from = value) %>% mutate( R_area = "", U_area = "" ) %>% select( time, Urban_T, Urban_M, Urban_K, U_area, Rural_T, Rural_M, Rural_K, R_area ) %>% ungroup() %>% select(-time) writeWorksheet(wb, data = midyear, sheet = "Table 3", startRow = 13, startCol = 4, header = FALSE) writeWorksheet(wb, data = tab3, sheet = "Table 3", startRow = 16, startCol = 5, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 3", startRow = 26, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 3.1 Additional data: Estimated population of the capital city and cities of 100 000 or more inhabitants as of midyear\ ```{r, tab31, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" # source("Tab3.1.R") ``` ## Table 4a Estimated population by marital status, age and sex as of midyear: Male\ ```{r, tab4a, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM5" footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link) time_value <- time midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") tab4a_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=px_top(), gender="M", .val_code = T, .col_code = T) %>% clean_names() %>% pivot_wider(names_from = marital_status,values_from = value) tab4a <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=px_top(), gender=px_all(), age=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% mutate(age=strtoi(age), age_grp = cut(age, breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf), labels = c("0-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+"), right = FALSE)) %>% group_by(marital_status,age_grp) %>% summarise(value=sum(value)) %>% ungroup() %>% pivot_wider(names_from = marital_status,values_from = value) total_row_o <- tab4a_tot %>% select(-time,-gender) rows_o <- tab4a %>% select(-age_grp) tab4a <- bind_rows(total_row_o, rows_o) col_part1 <- tab4a %>% select(T,U,G) col_part2 <- tab4a %>% select(E,F) writeWorksheet(wb, data = midyear, sheet = "Table 4a", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = col_part1, sheet = "Table 4a", startRow = 16, startCol = 3, header = FALSE) writeWorksheet(wb, data = col_part2, sheet = "Table 4a", startRow = 16, startCol = 7, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 4a", startRow = 39, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 4b Estimated population by marital status, age and sex as of midyear: Female\ ```{r, tab4b, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM5" footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link) time_value <- time midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") tab4b_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=px_top(), gender="K", .val_code = T, .col_code = T) %>% clean_names() %>% pivot_wider(names_from = marital_status,values_from = value) tab4b <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=px_top(), gender="K", age=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% mutate(age=strtoi(age), age_grp = cut(age, breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf), labels = c("0-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+"), right = FALSE)) %>% group_by(marital_status,age_grp) %>% summarise(value=sum(value)) %>% ungroup() %>% pivot_wider(names_from = marital_status,values_from = value) total_row_o <- tab4b_tot %>% select(-time,-gender) rows_o <- tab4b %>% select(-age_grp) tab4b <- bind_rows(total_row_o, rows_o) col_part1 <- tab4b %>% select(T,U,G) col_part2 <- tab4b %>% select(E,F) writeWorksheet(wb, data = midyear, sheet = "Table 4b", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = col_part1, sheet = "Table 4b", startRow = 16, startCol = 3, header = FALSE) writeWorksheet(wb, data = col_part2, sheet = "Table 4b", startRow = 16, startCol = 7, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 4b", startRow = 39, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 4a.1 Estimated population by marital status, age and sex as of midyear: Male - Second latest available date\ ```{r, tab4a1, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM5" footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link) time_value <- time - 1 midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") time_value_c <- as.character(time_value) tab4a.1_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=time_value_c, gender="M", .val_code = T, .col_code = T) %>% clean_names() %>% pivot_wider(names_from = marital_status,values_from = value) tab4a.1 <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=time_value_c, gender=px_all(), age=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% mutate(age=strtoi(age), age_grp = cut(age, breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf), labels = c("0-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+"), right = FALSE)) %>% group_by(marital_status,age_grp) %>% summarise(value=sum(value)) %>% ungroup() %>% pivot_wider(names_from = marital_status,values_from = value) total_row_o <- tab4a.1_tot %>% select(-time,-gender) rows_o <- tab4a.1 %>% select(-age_grp) tab4a.1 <- bind_rows(total_row_o, rows_o) col_part1 <- tab4a.1 %>% select(T,U,G) col_part2 <- tab4a.1 %>% select(E,F) writeWorksheet(wb, data = midyear, sheet = "Table 4a.1", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = col_part1, sheet = "Table 4a.1", startRow = 16, startCol = 3, header = FALSE) writeWorksheet(wb, data = col_part2, sheet = "Table 4a.1", startRow = 16, startCol = 7, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 4a.1", startRow = 39, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ``` ## Table 4b.1 Estimated population by marital status, age and sex as of midyear: Female - Second latest available date\ ```{r, tab4b1, message=FALSE,warning=FALSE} #| code-fold: true #| code-summary: "click to see/hide code" link <- "BEESTM5" footnote <- paste0(general_def1,"\n\n",general_def2,"\n\n","published on https://bank.stat.gl/",link) time_value <- time - 1 midyear <- format(as.Date(paste0(as.character(time_value),"-07-01")), "%d-%b-%Y") time_value_c <- as.character(time_value) tab4b.1_tot <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=time_value_c, gender="K", .val_code = T, .col_code = T) %>% clean_names() %>% pivot_wider(names_from = marital_status,values_from = value) tab4b.1 <- statgl_fetch(statgl_url("BEXSTM5"), "marital status"=px_all(), time=time_value_c, gender="K", age=px_all(), .val_code = T, .col_code = T) %>% clean_names() %>% mutate(age=strtoi(age), age_grp = cut(age, breaks = c(0, 14, 19, 24, 29, 34, 39, 44, 49, 54, 59, 64, 69, 74, 79, 84, 89, 94, Inf), labels = c("0-14", "15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49", "50-54", "55-59", "60-64", "65-69", "70-74", "75-79", "80-84", "85-89", "90-94", "95+"), right = FALSE)) %>% group_by(marital_status,age_grp) %>% summarise(value=sum(value)) %>% ungroup() %>% pivot_wider(names_from = marital_status,values_from = value) total_row_o <- tab4b.1_tot %>% select(-time,-gender) rows_o <- tab4b.1 %>% select(-age_grp) tab4b.1 <- bind_rows(total_row_o, rows_o) col_part1 <- tab4b.1 %>% select(T,U,G) col_part2 <- tab4b.1 %>% select(E,F) writeWorksheet(wb, data = midyear, sheet = "Table 4b.1", startRow = 12, startCol = 4, header = FALSE) writeWorksheet(wb, data = col_part1, sheet = "Table 4b.1", startRow = 16, startCol = 3, header = FALSE) writeWorksheet(wb, data = col_part2, sheet = "Table 4b.1", startRow = 16, startCol = 7, header = FALSE) writeWorksheet(wb, data = footnote, sheet = "Table 4b.1", startRow = 39, startCol = 2, header = FALSE) # Save the workbook saveWorkbook(wb, PE_tmpfile) ```