pxmake version 12.1

(update 2024-11-06.)

This note will be presented at the annual PX-meeting in Skopje, North Macedonia,
November 7th, 2024

Title: pxmake - multilingual px files with R, 2024

PXWeb-logfiles

Unfortunately ‘PXWeb-out-of-the-box’ does not offer usage statistics of any kind. Most likely all PXWeb-based services regularly need to deliver management information as PXWeb is vital to the dissemination policy of many statistical offices . Good news is though, by default PXWeb does store information needed in logfiles.

The tricky part is ironically transfering the information to px-files. But as this example will demonstrate, it is not all that difficult, when using pxmake. To serve as learning material this example use R, free to all, to demonstrate how it can be done.

And do not forget: the easiest way to disseminate information is with PXWeb/PWin.

ALL code below, runs in a project called: ‘log’

click to see/hide list of packages used

# Standard R-packages
if(!require("tidyverse")) {install.packages("tidyverse")}
library(tidyverse)
if(!require("lubridate")) {install.packages("lubridate")}
library(lubridate)
if(!require("janitor")) {install.packages("janitor")}
library(janitor)
if(!require("dplyr")) {install.packages("dplyr")}
library(dplyr)
if(!require("RCurl")) {install.packages("RCurl")}
library(RCurl)
if(!require("arrow")) {install.packages("arrow")}
library(arrow)
if(!require("xml2")) {install.packages("xml2")}
library(xml2)

# 2 get subjects used in Statbank Greenland
if(!require("httr")) {install.packages("httr")}
library(httr)
if(!require("jsonlite")) {install.packages("jsonlite")}
library(jsonlite)

# Commands used to setup pxmake ------------------------------------------------------------

# install.packages('devtools')
# devtools::install_github('StatisticsGreenland/pxmake')

library(pxmake)
library(statgl)

Log Types

PXWeb has 3 different log-files apart from the normal web-server logs. They are stored on the server in:

~/PxWeb/logs, as:

api.log*
pxweb.log*
stats.log*

Data Preparation

If you do not have R installed on the web-server, different strategies can be followed.

Set permissions on server to allow the log files to be read
Copy all log-files, you want to analyse, to a file-server
Concatinate all same type log-files to one as .txt before copying

@3. On a windows server this can be done with the DOS copy command

Log on the server, run the cmd prompt and enter the dos copy command:
  
cd c:/inetpub/wwwroot/PxWeb/logs/ copy /y stats*.* stats.txt  
cd c:/inetpub/wwwroot/PxWeb/logs/ copy /y api*.* api.txt  
cd c:/inetpub/wwwroot/PxWeb/logs/ copy /y pxweb*.* pxweb.txt  



:: Concatenate all logs matching stats*.* into statsall.txt
copy /y "C:\inetpub\wwwroot\PxWeb\logs\stats*.*" "C:\inetpub\wwwroot\home\pxmake\statsall.txt"

:: Define source and destination paths
set "source=C:\inetpub\wwwroot\home\pxmake\statsall.txt"
set "destination=C:\inetpub\wwwroot\home\pxmake\statsall.zip"

:: Compress the statsall.txt into statsall.zip using PowerShell
powershell -command "Compress-Archive -Path '%source%' -DestinationPath '%destination%' -Force"
echo Zip file created at %destination%\

On https://bank.stat.gl a Task was setup to run every night, concatinating, and zipping the concatinated file as statsall.zip in the pxmake subfolder:

stats.log

First let’s look at stats.log, saved to one file pr day. One row is added to the log-file for each table being extracted or manipulated by Pxweb.

Content example:

from http://bank.stat.gl:
  
a)  2024-07-19 06:43:01,028 \[20\] INFO PCAxis.Web.Controls.PxDefaultLogger -
b)  Context=Selection,
c)  UserId=userid,
d)  Language=da,
e)  Database=Greenland,
f)  ActionType=Presentation,
g)  ActionName=tableViewLayout1,
h)  TableId=C:\inetpub\wwwroot\PxWeb\Resources\PX\Databases\Greenland\...UDXISCPROH.px,
i)  NumberOfCells=3,
j)  NumberOfContents=1

In the Greenlandic StatBank sample-files we find: 

Context:    Selection, Presentation
UserId:     is not set
Language:   en, da, kl
Database:   Greenland, GSmicro
ActionType: Presentation, SaveAs, Operation
ActionName: tableViewSorted, tableViewLayout2, tableViewLayout1, sum, subtract, 
            pivotManual, pivotCW, pivotCCW, perPart, informationView, 
            footnoteView, FileTypeRelational, FileTypePX, FileTypeJsonStat, 
            FileTypeJson, FileTypeHtml5Table, FileTypeHtml, 
            FileTypeExcelXDoubleColumn, FileTypeExcelX, 
            FileTypeExcelDoubleColumn, FileTypeExcel, 
            FileTypeCsvWithoutHeadingAndTabulator,
            FileTypeCsvWithHeadingAndTabulator, FileTypeCsvWithHeadingAndSpace,
            FileTypeCsvWithHeadingAndSemiColon, FileTypeCsvWithHeadingAndComma, 
            FileTypeChartPng, FileTypeChartJpeg, divide, deleteVariable, 
            deleteValue, chartViewRadar, chartViewPopulationPyramid, 
            chartViewPoint, chartViewPie, chartViewLine, 
            chartViewColumnStacked100, chartViewColumnStacked, 
            chartViewColumnLine, chartViewColumn, chartViewBarStacked100, 
            chartViewBarStacked, chartViewBar, chartViewAreaStacked100, 
            chartViewAreaStacked, chartViewArea, changeValueOrder, 
            changeTextCodePresentation   
TableId:    UDXISCPROH.px  
NumberOfCells  
NumberOfContents

Read and extract

Downloading and reading the log file can be time consuming, so only run next chunck, when needed. If a zipped version exist it might not be a problem.

The ‘stat_log_data’ dataframe is saved to disk in the parquet format, for minimal footprint. In this example both Rds (Rdataset) and parquet is written to the filesystem. The parquet is 5.4 Mb down from 69 Mb in Rds.

As pxmake uses a variable called language, the language variable in the statlog file is renamed as ‘Lang’ in this step

This is too lengtly, so the chunck option eval is set to FALSE
But this is the code and the result is saved to a Parquet file for later use

# URL and destination file path

url <- "https://stat.gl/pxmake/statsall.zip"
destfile <- "statsall.zip"

# Download file
download.file(url, destfile, mode = "wb")

unzip("statsall.zip", exdir = ".")

log_data <- readLines("statsall.txt")

# Convert log_data into a data frame
log_df <- data.frame(log_entry = log_data, stringsAsFactors = FALSE)

# Extract relevant components using regular expressions
stat_log_data <- log_df %>%
  mutate(
    Timestamp = str_extract(log_entry, "^[^\\[]+"), 
    LogLevel = str_extract(log_entry, "(?<=\\[)[^\\]]+"), 
    Context = str_extract(log_entry, "(?<=Context=)[^,]+"), 
    UserId = str_extract(log_entry, "(?<=UserId=)[^,]+"), 
    Lang = str_extract(log_entry, "(?<=Language=)[^,]+"),   # Language is used by pxmake
    Database = str_extract(log_entry, "(?<=Database=)[^,]+"), 
    ActionType = str_extract(log_entry, "(?<=ActionType=)[^,]+"), 
    ActionName = str_extract(log_entry, "(?<=ActionName=)[^,]+"), 
    TableId = sub("\\.[pP][xX]$", "", basename(str_extract(log_entry, "(?<=TableId=)[^,]+"))),
    NumberOfCells = as.integer(str_extract(log_entry, "(?<=NumberOfCells=)[0-9]+")), 
    NumberOfContents = as.integer(str_extract(log_entry, "(?<=NumberOfContents=)[0-9]+"))
   ) %>%
  select(-log_entry) %>% 
  clean_names()

write_rds(stat_log_data,"statlog.Rds")

write_parquet(stat_log_data,"statlog.parquet")

api log

Coding with chatGPT:

i want load data from a log file into a dataframe in r. some lines have errors:

added a number of lines from log files

Also I used ip to country lookup from https://lite.ip2location.com/ and saved dataframe toapilog.parquet

# code-fold: true
# code-summary: "code"

# Step 1: Read in the log file
log_lines <- readLines("api.txt")

# Step 2: Filter out lines that contain structured log information
log_data <- log_lines %>%
  # Filter lines with date pattern; modify if needed based on your log format
  grep("^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", ., value = TRUE) %>%
  # Create a data frame
  data.frame(line = ., stringsAsFactors = FALSE) %>%
  # Extract structured components
  mutate(
    datetime = str_extract(line, "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"),
    thread_id = str_extract(line, "\\[\\d+\\]") %>% str_remove_all("\\[|\\]"),
    level = str_extract(line, "\\b(INFO|ERROR|WARN|DEBUG)\\b"),
    url = str_extract(line, "url=[^,]+") %>% str_remove("url="),
    type = str_extract(line, "type=[^,]+") %>% str_remove("type="),
    caller = str_extract(line, "caller=[^,]+") %>% str_remove("caller="),
    cached = str_extract(line, "cached=[^,]+") %>% str_remove("cached="),
    message = str_extract(line, "message=.*$") %>% str_remove("message=")
  ) %>%
  select(-line)  # Remove the original line column

# KanJegMonBrugeDenne24?

# https://lite.ip2location.com/

library(ip2location)

# Initialize the IP2Location database
ip_db <- "IP2LOCATION/IP2LOCATION-LITE-DB1.BIN"

ip_lookup <- ip2location::open(ip_db)

# Ensure your function works with a single IP address at a time
get_country_code <- function(ip) {
  result <- ip2location::get_all(ip) %>% as.data.frame()
  list(country_short = result$country_short, country_long = result$country_long)
}

# Use rowwise() to apply get_country_code to each row individually
l_data <- log_data %>%
  rowwise() %>%
  mutate(
    country_info = list(get_country_code(caller)),
    countrycode = country_info$country_short,
    countrytxt = country_info$country_long
  ) %>%
  ungroup() %>%
  select(-country_info) # Remove the intermediate list column if not needed

write_parquet(l_data,"apilog.parquet")

pxweb log

# code-fold: true
# code-summary: "code"

# Define the path to the log file
log_file <- "pxweb.txt"

# Define a function to parse log entries
parse_log_entry <- function(line) {
  # Regular expression to match the structured log format
  pattern <- "^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}) \\[(\\d+)\\] (\\w+)\\s+([\\w\\-]+) - (.+)$"
  
  # Attempt to match the pattern
  match <- str_match(line, pattern)
  
  # If a match is found, return as a data frame row, else return NULL
  if (!is.na(match[1, 1])) {
    data.frame(
      timestamp = match[1, 2],
      thread_id = as.integer(match[1, 3]),
      log_level = match[1, 4],
      category = match[1, 5],
      details = match[1, 6],
      stringsAsFactors = FALSE
    )
  } else {
    NULL
  }
}

# Read and parse the log file
pxweb_log_data <- readLines(log_file) %>%
  # Apply the parsing function to each line and discard NULLs
  lapply(parse_log_entry) %>%
  # Remove NULLs from the list
  compact() %>%
  # Bind all entries into a data frame
  bind_rows()

write_parquet(pxweb_log_data,"pxweblog.parquet")

Let the fun begin

This section converts a dataframe to a px-object with minimal metadata.

For maximum satisfaction make sure Pxwin is installed on your pc (Windows only)

A statlog.txt file has been uploaded to www.stat.gl/pxmake in parquet format, despite the txt extension, to allow free play.

# read local log-parquet
stat_log_data <- read_parquet("statlog.parquet")

SBXSTAT <- stat_log_data %>% 
  mutate(time = format(as.Date(timestamp), "%YM%m")) %>% 
  filter(database == "Greenland") %>% 
  mutate(subjectcode=substr(table_id,1,2)) %>% 
  select(time,subjectcode,lang,action_type,action_name) %>% 
  count(across(everything()))

px(SBXSTAT) %>% 
  px_add_totals(c("subjectcode","lang","action_name", "action_type")) %>% 
    px_save("SBXSTAT.px") 

px(SBXSTAT) %>%
  px_elimination(tribble(~`variable-code`, ~elimination,
                    'subjectcode', 'YES',
                    'lang',    'Total',
                    "time","YES")) %>% 
  px_add_totals(c("lang","action_name", "action_type")) %>% 
    px_save("SBXSTAT.px") 

# Read the first 12 lines
first_12_lines <- readLines("SBXSTAT.px", n = 12)

# View the lines
first_12_lines

 [1] "CHARSET=\"ANSI\";"                                                                                                                                                                                                                                                                                                            
 [2] "DECIMALS=0;"                                                                                                                                                                                                                                                                                                                  
 [3] "MATRIX=\"matrix\";"                                                                                                                                                                                                                                                                                                           
 [4] "SUBJECT-CODE=\"subject-code\";"                                                                                                                                                                                                                                                                                               
 [5] "SUBJECT-AREA=\"subject-area\";"                                                                                                                                                                                                                                                                                               
 [6] "TITLE=\"\";"                                                                                                                                                                                                                                                                                                                  
 [7] "CONTENTS=\"contents\";"                                                                                                                                                                                                                                                                                                       
 [8] "STUB=\"time\",\"subjectcode\",\"lang\",\"action_type\";"                                                                                                                                                                                                                                                                      
 [9] "HEADING=\"action_name\";"                                                                                                                                                                                                                                                                                                     
[10] "VALUES(\"time\")=\"2020M03\",\"2020M04\",\"2020M05\",\"2020M06\",\"2020M07\",\"2020M08\",\"2020M10\",\"2020M11\",\"2020M12\",\"2021M01\",\"2021M02\",\"2021M03\",\"2021M04\",\"2021M05\",\"2021M06\",\"2021M07\",\"2021M08\",\"2021M09\",\"2021M10\",\"2021M11\",\"2021M12\",\"2022M01\",\"2022M02\",\"2022M03\",\"2022M04\","
[11] "\"2022M05\",\"2022M06\",\"2022M07\",\"2022M08\",\"2022M09\",\"2022M10\",\"2022M11\",\"2022M12\",\"2023M01\",\"2023M02\",\"2023M03\",\"2023M04\",\"2023M05\",\"2023M06\",\"2023M07\",\"2023M08\",\"2023M09\",\"2023M10\",\"2023M11\",\"2023M12\",\"2024M01\",\"2024M02\",\"2024M03\",\"2024M04\",\"2024M05\","                 
[12] "\"2024M06\",\"2024M07\",\"2024M08\",\"2024M09\",\"2024M10\";"

# code-fold: true
# code-summary: "Kode"

api_log_data <- read_parquet("apilog.parquet") %>% 
  mutate(countrycode=ifelse(countrycode=="-","GS",countrycode),
         countrytxt=ifelse(countrytxt=="-","Statistics Greenland",countrytxt))

# Step 3: Handle missing values
api_data <- api_log_data %>%
  replace_na(list(url = "", type = "", caller = "", cached = "", message = "")) %>% 
    mutate(time = format(as.Date(datetime), "%YM%m"),
           px=basename(url),
           px=sub("\\.px.*|\\.PX.*|\\.csv.*|\\.json.*|\\?.*", "", px)) %>% 
  select(time,type,px,countrytxt,caller) %>% 
  mutate(countrytxt=ifelse(countrytxt=="-",caller,countrytxt)) %>% 
  select(time,type,px,countrytxt)

api <- api_data %>% 
  filter(str_sub(px,3,3)=="X" & type %in% c("config","data","metadata","error") & px!="" & px!="variables" & px!="Variables") %>% 
  count(across(everything()))

px(api) %>% 
  px_elimination(tribble(~`variable-code`, ~elimination,
                    'type',       'YES',
                    'px',         'YES',
                    'countrytxt', 'YES',
                    'time',       'YES',)) %>% 
  px_save("API.px")

# code-fold: true
# code-summary: "code"


pxweb_log_data <- read_parquet("pxweblog.parquet")

# Step 3: Handle missing values
pxweb_data <- pxweb_log_data %>%
  replace_na(list(url = "", type = "", caller = "", cached = "", message = "")) %>% 
    mutate(time = format(as.Date(timestamp), "%YM%m")) %>% 
  select(time,log_level,category,details) 


pxweb <- pxweb_data %>% 
  select(time,log_level,category) %>%  
  count(across(everything()))

px(pxweb) %>% 
px_elimination(tribble(~`variable-code`, ~elimination,
                    'log_level', 'YES',
                    'category',    'YES',
                    "time","YES")) %>% 
px_save("PXWEBLOG.px")

Metadata

A PX-file will need a lot more metadata, to make sense to users and in multiple languages.

Texts instead of codes

StatBank Greenland names px-files with a 2 character prefix for subjects:

AR - ARbejdsmarked (labourmarket) BE - BEfolkning (population) etc…

All subjects/subjectcodes can be obtained via the api

# Function to fetch data from API and add language column
fetch_data <- function(url, language) {
  response <- GET(url)
  data <- content(response, as = "text")
  json_data <- fromJSON(data, flatten = TRUE) %>% as_tibble()
  json_data <- json_data %>% mutate(language = language)
  return(json_data)
}

# URLs for the APIs
url_en <- "https://bank.stat.gl/api/v1/en/Greenland"
url_da <- "https://bank.stat.gl/api/v1/da/Greenland"
url_kl <- "https://bank.stat.gl/api/v1/kl/Greenland"

# Fetch data from each API
df_en <- fetch_data(url_en, 'en')
df_da <- fetch_data(url_da, 'da')
df_kl <- fetch_data(url_kl, 'kl')

# Combine the dataframes
subjectcode_text <- bind_rows(df_en, df_da, df_kl) %>% 
  select(-type)

subjectcode_text %>% sample_n(5) %>% statgl_table()

id	text	language
SA	Ukiumoortumik paasissutissat	kl
UD	Education	en
SO	Socialområdet	da
NR	Nuna tamakkerlugu naatsorsuutit	kl
KO	Konjunktur	da

adding in code

description <-
  tribble(~language, ~value,
          "en", str_glue("Usage statistics 1999-{year(Sys.Date())} <em>[SBESTAT01]</em>"), 
          "da", str_glue("Antal udtræk per emne 2020-{year(Sys.Date())} <em>[SBDSTAT01]</em>"), 
          "kl", str_glue("Tabelimit ataatsimit emne 2020-{year(Sys.Date())} <em>[SBNSTAT01]</em>"), 
  )

content_txt <-
  tribble(~language, ~value,
          "en", str_glue("Usage statistics"), 
          "da", str_glue("Antal udtræk per emne"), 
          "kl", str_glue("Tabelimit ataatsimit emne"), 
  )

subjects <- subjectcode_text %>% 
  mutate(`variable-code`="subjectcode",
         code=id,
         value=text) %>% 
  select(`variable-code`, code, language, value)

prefix_values <- tibble::tribble(
  ~`variable-code`, ~code,  ~language, ~value, ~order,
  "lang",   "en",   "en",    "English", 1,
  "lang",   "da",   "en",    "Danish", 2,
  "lang",   "kl",   "en",    "Greenlandic", 3,
  "lang",   "en",   "da",    "Engelsk", NA,
  "lang",   "da",   "da",    "Dansk", NA,
  "lang",   "kl",   "da",    "Grønlandsk", NA,
  "lang",   "en",   "kl",    "Tuluttut ", NA,
  "lang",   "da",   "kl",    "Qallunaatut", NA,
  "lang",   "kl",   "kl",    "Kalaallisut", NA,
) %>% 
  bind_rows(subjects)

variable_label <- tribble(~`variable-code`, ~language, ~`variable-label`,
                            'subjectcode',         'en',      'Subject',
                            'subjectcode',         'da',      'Emne',
                            'subjectcode',         'kl',      'Qineq',
                            'lang',         'en',      'Language',
                            'lang',         'da',      'Sprog',
                            'lang',         'kl',      'Killiffigassaq',
                            )

Data for a Table

SBXSTAT %>% sample_n(6) %>% statgl_table()

time	subjectcode	lang	action_type	action_name	n
2022M01	GD	da	Operation	sum	1
2022M12	UD	da	Presentation	chartViewLine	47
2021M04	NR	da	Presentation	chartViewLine	12
2022M05	EN	en	SaveAs	FileTypeExcelX	6
2020M11	KO	da	SaveAs	FileTypePX	1
2023M04	KR	en	Presentation	chartViewColumn	2

Add to a px-object

Here metadata is added either by keywords or previously defined.

creation_date <- "20240301 09:00"
last_updated <- "20240711 09:00"
next_update   <- "20250301 09:00"


x <- SBXSTAT %>% 
  pxmake::px() %>%
  px_codepage("utf-8") %>%
  px_language("en") %>%
  px_languages(c("en", "da", "kl")) %>%
  px_decimals("0") %>%
  px_showdecimals("0") %>%
  px_matrix("SBXSTAT01") %>%
  px_subject_code("SB") %>%
  px_subject_area(tribble(~language, ~value,
          "en", "Statbank Greenland", 
          "da", "Statistikbank", 
          "kl", "Kisitsisaataasivik")) %>%
  px_contents(content_txt) %>%
  px_stub(c("subjectcode")) %>%
  px_heading(c("time", "lang")) %>%
  px_values(prefix_values %>% select(-order)) %>% 
  px_variable_label(variable_label) %>% 
  px_timeval("time") %>%
  px_description(description) %>%
  px_title(content_txt) %>%
  px_units(tribble(~language, ~value,
                   "en", "number of table extracts", 
                   "da", "antal tabeludtræk", 
                   "kl", "tabeluttræk")) %>%
  px_update_frequency("When needed") %>%
  px_creation_date(creation_date) %>%
  px_last_updated(last_updated) %>%
  px_next_update(next_update) %>%
  px_contact("Lars Pedersen, LARP at STAT.gl") %>%
  px_source(tribble(~language, ~value,
            "en",  "Statistics Greenland",
            "da",  "Grønlands Statistik",
            "kl",  "Kalaallit Nunaanni Naatsorsueqqissaartarfik")) %>% 
  px_add_totals(c("lang","subjectcode")) %>% 
  px_elimination(tribble(~`variable-code`, ~elimination,
                 'subjectcode',    'Total',
                 'lang',    'Total',
                 ))

Read and Write a px-file

px_save() the px-object as .px …. or .xlsx, yes the object can also be saved to Excel, then edited, and converted back to the px-object and finally saved to px with the changes done in Excel

pxmake can read a .px or a .xlsx to a px-object using px()

If the dataframe has more than 1 million rows (Excel limit), data can be stored as either parquet og Rds files.

px_save(x, "stat_test.px")

px_save(x, "stat_test.xlsx")

# 
# y <- px("stat_test.xlsx")
# 
# z <- px("stat_test.px")

To learn more about pxmake read articles on:

https://statisticsgreenland.github.io/pxmake/

For Researchers

… or just for further analysis, the dataframe ‘statlog’ can be transformed to a px-file for each selected variable. Footnotes in the px-files can be used to present additional information on the data. The px-files will be stored in a folder, along with needed Alias-text files for presentation in Pxweb.

In statlog, the variables log_level and user_id does not hold useable information, so they are excluded.

Timestamp is too detailed, so metadata is aggregated to monthly data, before presentation, by: time = format(as.Date(timestamp), “%YM%m”) or
biannual as:
time = ifelse(month(as.Date(timestamp)) <= 6, paste0(year(as.Date(timestamp)), “H1”), paste0(year(as.Date(timestamp)), “H2”))

This presentation is to allow researcher to learn, what to expect, the variables distribution is like. They can have access at the most detailed level.

The px-files holds a frequency count on each variables unique codeset and the knowledge needed to work with data.

# # Read the log file

# Biannual
statlogh <- stat_log_data %>%
  mutate(time = ifelse(month(as.Date(timestamp)) <= 6, paste0(year(as.Date(timestamp)), "H1"),
                       paste0(year(as.Date(timestamp)), "H2"))) %>% 
  select(time, context,lang,table_id,action_type,action_name)

# Monthly
statlogm <- stat_log_data %>%  
  mutate(time = format(as.Date(timestamp), "%YM%m")) %>% 
    select(time, context,lang,table_id,action_type,action_name)

y <- px(statlogh)

px_heading(y)

[1] "action_type"

y <- px(statlogh) %>% 
  px_stub("action_type") %>% 
  px_heading("time")


individual_keywords_ml <- 
  tibble::tribble(
       ~variable, ~language,     ~px_description, ~px_matrix,
        "context",      "en",   "context",         "context",
        "context",      "da",   "context",                NA,
        "context",      "kl",   "context",                NA,
        "lang",         "en",   "lang",               "lang",
        "lang",         "da",   "lang",                   NA,
        "lamg",         "kl",   "lang",                   NA,
        "table_id",     "en",   "table_id",       "table_id",
        "table_id",     "da",   "table_id",               NA,
        "table_id",     "kl",   "table_id",               NA,
        "action_type",  "en",   "Action type",     "actiont",
        "action_type",  "da",   "Action type",            NA,
        "action_type",  "kl",   "Action type",            NA,
        "action_name",  "en",   "Action name",     "actionn",
        "action_name",  "da",   "Action name",            NA,
        "action_name",  "kl",   "Action name",            NA,
  )

z <- y %>%
  px_codepage("utf-8") %>%
  px_language("en") %>%
  px_languages(c("en", "da", "kl")) %>%
  px_decimals("0") %>%
  px_showdecimals("0") %>%
  px_subject_code("SB") %>%
  px_subject_area(tribble(~language, ~value,
          "en", "Statbank Greenland", 
          "da", "Statistikbank", 
          "kl", "Kisitsisaataasivik")) %>%
  px_contents(content_txt) %>%
  px_timeval("time") %>%
  px_description(description) %>%
  px_title(content_txt) %>%
  px_units(tribble(~language, ~value,
                   "en", "number of table extracts", 
                   "da", "antal tabeludtræk", 
                   "kl", "tabeluttræk")) %>%
  px_update_frequency("When needed") %>%
  px_creation_date(creation_date) %>%
  px_last_updated(last_updated) %>%
  px_next_update(next_update) %>%
  px_contact("Lars Pedersen, LARP at STAT.gl") %>% 
  px_autopen('YES')



unlink("SB", recursive = TRUE)
dir.create("SB")
px_micro(z, out_dir = "SB", keyword_values = individual_keywords_ml )

--- always_allow_html: true format: html: code-tools: true self-contained: true toc: true toc-location: left toc-title: "Usecase:" toc_summary: true toc-depth: 4 toc_collapsed: true toc_float: collapsed: true execute: echo: false output: statgl::statgl_report: output: "pretty_html" editor: visual --- # pxmake version 12.1 (update `r format(Sys.time(), '%Y-%m-%d.')`) This note will be presented at the annual PX-meeting in Skopje, North Macedonia, November 7th, 2024 Title: pxmake - multilingual px files with R, 2024 # **PXWeb-logfiles** Unfortunately 'PXWeb-out-of-the-box' does not offer usage statistics of any kind. Most likely all PXWeb-based services regularly need to deliver management information as PXWeb is vital to the dissemination policy of many statistical offices . Good news is though, by default PXWeb does store information needed in logfiles. The tricky part is ironically transfering the information to px-files. But as this example will demonstrate, it is not all that difficult, when using pxmake. To serve as learning material this example use R, free to all, to demonstrate how it can be done. And do not forget: the easiest way to disseminate information is with PXWeb/PWin. **ALL code below, runs in a project called: 'log'** ```{r packages_in_use, warning=FALSE, message=FALSE,echo=TRUE} #| code-fold: true #| code-summary: "click to see/hide list of packages used" # Standard R-packages if(!require("tidyverse")) {install.packages("tidyverse")} library(tidyverse) if(!require("lubridate")) {install.packages("lubridate")} library(lubridate) if(!require("janitor")) {install.packages("janitor")} library(janitor) if(!require("dplyr")) {install.packages("dplyr")} library(dplyr) if(!require("RCurl")) {install.packages("RCurl")} library(RCurl) if(!require("arrow")) {install.packages("arrow")} library(arrow) if(!require("xml2")) {install.packages("xml2")} library(xml2) # 2 get subjects used in Statbank Greenland if(!require("httr")) {install.packages("httr")} library(httr) if(!require("jsonlite")) {install.packages("jsonlite")} library(jsonlite) # Commands used to setup pxmake ------------------------------------------------------------ # install.packages('devtools') # devtools::install_github('StatisticsGreenland/pxmake') library(pxmake) library(statgl) ``` ## Log Types PXWeb has 3 different log-files apart from the normal web-server logs. They are stored on the server in: \~/PxWeb/logs, as: 1) api.log\*\ 2) pxweb.log\*\ 3) stats.log\* ### Data Preparation If you do not have R installed on the web-server, different strategies can be followed. 1) Set permissions on server to allow the log files to be read 2) Copy all log-files, you want to analyse, to a file-server 3) Concatinate all same type log-files to one as .txt before copying \@3. On a windows server this can be done with the DOS copy command ```{r, concatinate_on_webserver, echo=TRUE, eval=FALSE} Log on the server, run the cmd prompt and enter the dos copy command: cd c:/inetpub/wwwroot/PxWeb/logs/ copy /y stats*.* stats.txt cd c:/inetpub/wwwroot/PxWeb/logs/ copy /y api*.* api.txt cd c:/inetpub/wwwroot/PxWeb/logs/ copy /y pxweb*.* pxweb.txt :: Concatenate all logs matching stats*.* into statsall.txt copy /y "C:\inetpub\wwwroot\PxWeb\logs\stats*.*" "C:\inetpub\wwwroot\home\pxmake\statsall.txt" :: Define source and destination paths set "source=C:\inetpub\wwwroot\home\pxmake\statsall.txt" set "destination=C:\inetpub\wwwroot\home\pxmake\statsall.zip" :: Compress the statsall.txt into statsall.zip using PowerShell powershell -command "Compress-Archive -Path '%source%' -DestinationPath '%destination%' -Force" echo Zip file created at %destination%\ ``` On https://bank.stat.gl a Task was setup to run every night, concatinating, and zipping the concatinated file as statsall.zip in the pxmake subfolder: ## stats.log First let's look at **stats.log**, saved to one file pr day. One row is added to the log-file for each table being extracted or manipulated by Pxweb. Content example: ```{r, stats_content, echo=TRUE, eval=FALSE} from http://bank.stat.gl: a) 2024-07-19 06:43:01,028 \[20\] INFO PCAxis.Web.Controls.PxDefaultLogger - b) Context=Selection, c) UserId=userid, d) Language=da, e) Database=Greenland, f) ActionType=Presentation, g) ActionName=tableViewLayout1, h) TableId=C:\inetpub\wwwroot\PxWeb\Resources\PX\Databases\Greenland\...UDXISCPROH.px, i) NumberOfCells=3, j) NumberOfContents=1 In the Greenlandic StatBank sample-files we find: Context: Selection, Presentation UserId: is not set Language: en, da, kl Database: Greenland, GSmicro ActionType: Presentation, SaveAs, Operation ActionName: tableViewSorted, tableViewLayout2, tableViewLayout1, sum, subtract, pivotManual, pivotCW, pivotCCW, perPart, informationView, footnoteView, FileTypeRelational, FileTypePX, FileTypeJsonStat, FileTypeJson, FileTypeHtml5Table, FileTypeHtml, FileTypeExcelXDoubleColumn, FileTypeExcelX, FileTypeExcelDoubleColumn, FileTypeExcel, FileTypeCsvWithoutHeadingAndTabulator, FileTypeCsvWithHeadingAndTabulator, FileTypeCsvWithHeadingAndSpace, FileTypeCsvWithHeadingAndSemiColon, FileTypeCsvWithHeadingAndComma, FileTypeChartPng, FileTypeChartJpeg, divide, deleteVariable, deleteValue, chartViewRadar, chartViewPopulationPyramid, chartViewPoint, chartViewPie, chartViewLine, chartViewColumnStacked100, chartViewColumnStacked, chartViewColumnLine, chartViewColumn, chartViewBarStacked100, chartViewBarStacked, chartViewBar, chartViewAreaStacked100, chartViewAreaStacked, chartViewArea, changeValueOrder, changeTextCodePresentation TableId: UDXISCPROH.px NumberOfCells NumberOfContents ``` ### Read and extract Downloading and reading the log file can be time consuming, so only run next chunck, when needed. If a zipped version exist it might not be a problem. The 'stat_log_data' dataframe is saved to disk in the parquet format, for minimal footprint. In this example both Rds (Rdataset) and parquet is written to the filesystem. The parquet is 5.4 Mb down from 69 Mb in Rds. As pxmake uses a variable called language, the language variable in the statlog file is renamed as 'Lang' in this step This is too lengtly, so the chunck option eval is set to FALSE\ But this is the code and the result is saved to a Parquet file for later use \ \ ```{r read_stattxt, message=FALSE, warning=FALSE, echo=TRUE, eval=FALSE} # URL and destination file path url <- "https://stat.gl/pxmake/statsall.zip" destfile <- "statsall.zip" # Download file download.file(url, destfile, mode = "wb") unzip("statsall.zip", exdir = ".") log_data <- readLines("statsall.txt") # Convert log_data into a data frame log_df <- data.frame(log_entry = log_data, stringsAsFactors = FALSE) # Extract relevant components using regular expressions stat_log_data <- log_df %>% mutate( Timestamp = str_extract(log_entry, "^[^\\[]+"), LogLevel = str_extract(log_entry, "(?<=\\[)[^\\]]+"), Context = str_extract(log_entry, "(?<=Context=)[^,]+"), UserId = str_extract(log_entry, "(?<=UserId=)[^,]+"), Lang = str_extract(log_entry, "(?<=Language=)[^,]+"), # Language is used by pxmake Database = str_extract(log_entry, "(?<=Database=)[^,]+"), ActionType = str_extract(log_entry, "(?<=ActionType=)[^,]+"), ActionName = str_extract(log_entry, "(?<=ActionName=)[^,]+"), TableId = sub("\\.[pP][xX]$", "", basename(str_extract(log_entry, "(?<=TableId=)[^,]+"))), NumberOfCells = as.integer(str_extract(log_entry, "(?<=NumberOfCells=)[0-9]+")), NumberOfContents = as.integer(str_extract(log_entry, "(?<=NumberOfContents=)[0-9]+")) ) %>% select(-log_entry) %>% clean_names() write_rds(stat_log_data,"statlog.Rds") write_parquet(stat_log_data,"statlog.parquet") ``` ## api log Coding with chatGPT: i want load data from a log file into a dataframe in r. some lines have errors: added a number of lines from log files Also I used ip to country lookup from https://lite.ip2location.com/ and saved dataframe toapilog.parquet ```{r wrangle_apidata, message=FALSE, warning=FALSE, echo=TRUE, eval=FALSE} # code-fold: true # code-summary: "code" # Step 1: Read in the log file log_lines <- readLines("api.txt") # Step 2: Filter out lines that contain structured log information log_data <- log_lines %>% # Filter lines with date pattern; modify if needed based on your log format grep("^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}", ., value = TRUE) %>% # Create a data frame data.frame(line = ., stringsAsFactors = FALSE) %>% # Extract structured components mutate( datetime = str_extract(line, "^\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"), thread_id = str_extract(line, "\\[\\d+\\]") %>% str_remove_all("\\[|\\]"), level = str_extract(line, "\\b(INFO|ERROR|WARN|DEBUG)\\b"), url = str_extract(line, "url=[^,]+") %>% str_remove("url="), type = str_extract(line, "type=[^,]+") %>% str_remove("type="), caller = str_extract(line, "caller=[^,]+") %>% str_remove("caller="), cached = str_extract(line, "cached=[^,]+") %>% str_remove("cached="), message = str_extract(line, "message=.*$") %>% str_remove("message=") ) %>% select(-line) # Remove the original line column # KanJegMonBrugeDenne24? # https://lite.ip2location.com/ library(ip2location) # Initialize the IP2Location database ip_db <- "IP2LOCATION/IP2LOCATION-LITE-DB1.BIN" ip_lookup <- ip2location::open(ip_db) # Ensure your function works with a single IP address at a time get_country_code <- function(ip) { result <- ip2location::get_all(ip) %>% as.data.frame() list(country_short = result$country_short, country_long = result$country_long) } # Use rowwise() to apply get_country_code to each row individually l_data <- log_data %>% rowwise() %>% mutate( country_info = list(get_country_code(caller)), countrycode = country_info$country_short, countrytxt = country_info$country_long ) %>% ungroup() %>% select(-country_info) # Remove the intermediate list column if not needed write_parquet(l_data,"apilog.parquet") ``` ## pxweb log ```{r wrangle_pxwebdata, message=FALSE, warning=FALSE, echo=TRUE, eval=FALSE} # code-fold: true # code-summary: "code" # Define the path to the log file log_file <- "pxweb.txt" # Define a function to parse log entries parse_log_entry <- function(line) { # Regular expression to match the structured log format pattern <- "^(\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d{3}) \\[(\\d+)\\] (\\w+)\\s+([\\w\\-]+) - (.+)$" # Attempt to match the pattern match <- str_match(line, pattern) # If a match is found, return as a data frame row, else return NULL if (!is.na(match[1, 1])) { data.frame( timestamp = match[1, 2], thread_id = as.integer(match[1, 3]), log_level = match[1, 4], category = match[1, 5], details = match[1, 6], stringsAsFactors = FALSE ) } else { NULL } } # Read and parse the log file pxweb_log_data <- readLines(log_file) %>% # Apply the parsing function to each line and discard NULLs lapply(parse_log_entry) %>% # Remove NULLs from the list compact() %>% # Bind all entries into a data frame bind_rows() write_parquet(pxweb_log_data,"pxweblog.parquet") ``` # Let the fun begin This section converts a dataframe to a px-object with minimal metadata. For maximum satisfaction make sure Pxwin is installed on your pc (Windows only) A statlog.txt file has been uploaded to www.stat.gl/pxmake in parquet format, despite the txt extension, to allow free play. ```{r stats_px, warning=FALSE, message=FALSE, echo=TRUE} # read local log-parquet stat_log_data <- read_parquet("statlog.parquet") SBXSTAT <- stat_log_data %>% mutate(time = format(as.Date(timestamp), "%YM%m")) %>% filter(database == "Greenland") %>% mutate(subjectcode=substr(table_id,1,2)) %>% select(time,subjectcode,lang,action_type,action_name) %>% count(across(everything())) px(SBXSTAT) %>% px_add_totals(c("subjectcode","lang","action_name", "action_type")) %>% px_save("SBXSTAT.px") px(SBXSTAT) %>% px_elimination(tribble(~`variable-code`, ~elimination, 'subjectcode', 'YES', 'lang', 'Total', "time","YES")) %>% px_add_totals(c("lang","action_name", "action_type")) %>% px_save("SBXSTAT.px") # Read the first 12 lines first_12_lines <- readLines("SBXSTAT.px", n = 12) # View the lines first_12_lines ``` ```{r api_px, warning=FALSE,message=FALSE, echo=TRUE, eval=FALSE} # code-fold: true # code-summary: "Kode" api_log_data <- read_parquet("apilog.parquet") %>% mutate(countrycode=ifelse(countrycode=="-","GS",countrycode), countrytxt=ifelse(countrytxt=="-","Statistics Greenland",countrytxt)) # Step 3: Handle missing values api_data <- api_log_data %>% replace_na(list(url = "", type = "", caller = "", cached = "", message = "")) %>% mutate(time = format(as.Date(datetime), "%YM%m"), px=basename(url), px=sub("\\.px.*|\\.PX.*|\\.csv.*|\\.json.*|\\?.*", "", px)) %>% select(time,type,px,countrytxt,caller) %>% mutate(countrytxt=ifelse(countrytxt=="-",caller,countrytxt)) %>% select(time,type,px,countrytxt) api <- api_data %>% filter(str_sub(px,3,3)=="X" & type %in% c("config","data","metadata","error") & px!="" & px!="variables" & px!="Variables") %>% count(across(everything())) px(api) %>% px_elimination(tribble(~`variable-code`, ~elimination, 'type', 'YES', 'px', 'YES', 'countrytxt', 'YES', 'time', 'YES',)) %>% px_save("API.px") ``` ```{r pxweb_px, warning=FALSE,message=FALSE, echo=TRUE, eval=FALSE} # code-fold: true # code-summary: "code" pxweb_log_data <- read_parquet("pxweblog.parquet") # Step 3: Handle missing values pxweb_data <- pxweb_log_data %>% replace_na(list(url = "", type = "", caller = "", cached = "", message = "")) %>% mutate(time = format(as.Date(timestamp), "%YM%m")) %>% select(time,log_level,category,details) pxweb <- pxweb_data %>% select(time,log_level,category) %>% count(across(everything())) px(pxweb) %>% px_elimination(tribble(~`variable-code`, ~elimination, 'log_level', 'YES', 'category', 'YES', "time","YES")) %>% px_save("PXWEBLOG.px") ``` ## Metadata A PX-file will need a lot more metadata, to make sense to users and in multiple languages. **Texts instead of codes** StatBank Greenland names px-files with a 2 character prefix for subjects: AR - ARbejdsmarked (labourmarket) BE - BEfolkning (population) etc... All subjects/subjectcodes can be obtained via the api ```{r subjects, message=FALSE, warning=FALSE, echo=TRUE} # Function to fetch data from API and add language column fetch_data <- function(url, language) { response <- GET(url) data <- content(response, as = "text") json_data <- fromJSON(data, flatten = TRUE) %>% as_tibble() json_data <- json_data %>% mutate(language = language) return(json_data) } # URLs for the APIs url_en <- "https://bank.stat.gl/api/v1/en/Greenland" url_da <- "https://bank.stat.gl/api/v1/da/Greenland" url_kl <- "https://bank.stat.gl/api/v1/kl/Greenland" # Fetch data from each API df_en <- fetch_data(url_en, 'en') df_da <- fetch_data(url_da, 'da') df_kl <- fetch_data(url_kl, 'kl') # Combine the dataframes subjectcode_text <- bind_rows(df_en, df_da, df_kl) %>% select(-type) subjectcode_text %>% sample_n(5) %>% statgl_table() ``` **adding in code** ```{r metadata, warning=FALSE, message=FALSE, echo=TRUE} description <- tribble(~language, ~value, "en", str_glue("Usage statistics 1999-{year(Sys.Date())} [SBESTAT01]"), "da", str_glue("Antal udtræk per emne 2020-{year(Sys.Date())} [SBDSTAT01]"), "kl", str_glue("Tabelimit ataatsimit emne 2020-{year(Sys.Date())} [SBNSTAT01]"), ) content_txt <- tribble(~language, ~value, "en", str_glue("Usage statistics"), "da", str_glue("Antal udtræk per emne"), "kl", str_glue("Tabelimit ataatsimit emne"), ) subjects <- subjectcode_text %>% mutate(`variable-code`="subjectcode", code=id, value=text) %>% select(`variable-code`, code, language, value) prefix_values <- tibble::tribble( ~`variable-code`, ~code, ~language, ~value, ~order, "lang", "en", "en", "English", 1, "lang", "da", "en", "Danish", 2, "lang", "kl", "en", "Greenlandic", 3, "lang", "en", "da", "Engelsk", NA, "lang", "da", "da", "Dansk", NA, "lang", "kl", "da", "Grønlandsk", NA, "lang", "en", "kl", "Tuluttut ", NA, "lang", "da", "kl", "Qallunaatut", NA, "lang", "kl", "kl", "Kalaallisut", NA, ) %>% bind_rows(subjects) variable_label <- tribble(~`variable-code`, ~language, ~`variable-label`, 'subjectcode', 'en', 'Subject', 'subjectcode', 'da', 'Emne', 'subjectcode', 'kl', 'Qineq', 'lang', 'en', 'Language', 'lang', 'da', 'Sprog', 'lang', 'kl', 'Killiffigassaq', ) ``` ## Data for a Table ```{r wrangle_sample, message=FALSE, warning=FALSE, echo=TRUE} SBXSTAT %>% sample_n(6) %>% statgl_table() ``` ## Add to a px-object Here metadata is added either by keywords or previously defined. ```{r pxmake, warning=FALSE, message=FALSE,echo=TRUE} creation_date <- "20240301 09:00" last_updated <- "20240711 09:00" next_update <- "20250301 09:00" x <- SBXSTAT %>% pxmake::px() %>% px_codepage("utf-8") %>% px_language("en") %>% px_languages(c("en", "da", "kl")) %>% px_decimals("0") %>% px_showdecimals("0") %>% px_matrix("SBXSTAT01") %>% px_subject_code("SB") %>% px_subject_area(tribble(~language, ~value, "en", "Statbank Greenland", "da", "Statistikbank", "kl", "Kisitsisaataasivik")) %>% px_contents(content_txt) %>% px_stub(c("subjectcode")) %>% px_heading(c("time", "lang")) %>% px_values(prefix_values %>% select(-order)) %>% px_variable_label(variable_label) %>% px_timeval("time") %>% px_description(description) %>% px_title(content_txt) %>% px_units(tribble(~language, ~value, "en", "number of table extracts", "da", "antal tabeludtræk", "kl", "tabeluttræk")) %>% px_update_frequency("When needed") %>% px_creation_date(creation_date) %>% px_last_updated(last_updated) %>% px_next_update(next_update) %>% px_contact("Lars Pedersen, LARP at STAT.gl") %>% px_source(tribble(~language, ~value, "en", "Statistics Greenland", "da", "Grønlands Statistik", "kl", "Kalaallit Nunaanni Naatsorsueqqissaartarfik")) %>% px_add_totals(c("lang","subjectcode")) %>% px_elimination(tribble(~`variable-code`, ~elimination, 'subjectcode', 'Total', 'lang', 'Total', )) ``` ## Read and Write a px-file px_save() the px-object as .px .... or .xlsx, yes the object can also be saved to Excel, then edited, and converted back to the px-object and finally saved to px with the changes done in Excel pxmake can read a .px or a .xlsx to a px-object using px() If the dataframe has more than 1 million rows (Excel limit), data can be stored as either parquet og Rds files. ```{r savenread, warning=FALSE,message=FALSE, echo=TRUE} px_save(x, "stat_test.px") px_save(x, "stat_test.xlsx") # # y <- px("stat_test.xlsx") # # z <- px("stat_test.px") ``` To learn more about pxmake read articles on: <https://statisticsgreenland.github.io/pxmake/> # For Researchers ... or just for further analysis, the dataframe 'statlog' can be transformed to a px-file for each selected variable. Footnotes in the px-files can be used to present additional information on the data. The px-files will be stored in a folder, along with needed Alias-text files for presentation in Pxweb. In statlog, the variables log_level and user_id does not hold useable information, so they are excluded. Timestamp is too detailed, so metadata is aggregated to monthly data, before presentation, by: time = format(as.Date(timestamp), "%YM%m") or\ biannual as:\ time = ifelse(month(as.Date(timestamp)) \<= 6, paste0(year(as.Date(timestamp)), "H1"), paste0(year(as.Date(timestamp)), "H2")) This presentation is to allow researcher to learn, what to expect, the variables distribution is like. They can have access at the most detailed level. The px-files holds a frequency count on each variables unique codeset and the knowledge needed to work with data. ```{r researchers, warning=FALSE,message=FALSE, echo=TRUE} # # Read the log file # Biannual statlogh <- stat_log_data %>% mutate(time = ifelse(month(as.Date(timestamp)) <= 6, paste0(year(as.Date(timestamp)), "H1"), paste0(year(as.Date(timestamp)), "H2"))) %>% select(time, context,lang,table_id,action_type,action_name) # Monthly statlogm <- stat_log_data %>% mutate(time = format(as.Date(timestamp), "%YM%m")) %>% select(time, context,lang,table_id,action_type,action_name) y <- px(statlogh) px_heading(y) y <- px(statlogh) %>% px_stub("action_type") %>% px_heading("time") individual_keywords_ml <- tibble::tribble( ~variable, ~language, ~px_description, ~px_matrix, "context", "en", "context", "context", "context", "da", "context", NA, "context", "kl", "context", NA, "lang", "en", "lang", "lang", "lang", "da", "lang", NA, "lamg", "kl", "lang", NA, "table_id", "en", "table_id", "table_id", "table_id", "da", "table_id", NA, "table_id", "kl", "table_id", NA, "action_type", "en", "Action type", "actiont", "action_type", "da", "Action type", NA, "action_type", "kl", "Action type", NA, "action_name", "en", "Action name", "actionn", "action_name", "da", "Action name", NA, "action_name", "kl", "Action name", NA, ) z <- y %>% px_codepage("utf-8") %>% px_language("en") %>% px_languages(c("en", "da", "kl")) %>% px_decimals("0") %>% px_showdecimals("0") %>% px_subject_code("SB") %>% px_subject_area(tribble(~language, ~value, "en", "Statbank Greenland", "da", "Statistikbank", "kl", "Kisitsisaataasivik")) %>% px_contents(content_txt) %>% px_timeval("time") %>% px_description(description) %>% px_title(content_txt) %>% px_units(tribble(~language, ~value, "en", "number of table extracts", "da", "antal tabeludtræk", "kl", "tabeluttræk")) %>% px_update_frequency("When needed") %>% px_creation_date(creation_date) %>% px_last_updated(last_updated) %>% px_next_update(next_update) %>% px_contact("Lars Pedersen, LARP at STAT.gl") %>% px_autopen('YES') unlink("SB", recursive = TRUE) dir.create("SB") px_micro(z, out_dir = "SB", keyword_values = individual_keywords_ml ) ```