Our first introduction to pointblank
pointblank
provides data quality assessment and metadata reporting for data frames and database tables. https://github.com/rstudio/pointblank
🧰 The pointblank::scan_data()
function provides a HTML report of the input data to help you understand your data.
Activity
👉 Open the file materials/activities/activity-01_raw_data_exploration.qmd
Activity objective: explore our Chicago Food Inspections data to get familiar with our data
📣 Production data belongs in a database.
DBI
to make a connection with DBI::dbConnect()
DBI
is your DBI-compliant R package
odbc::odbc()
+ an ODBC driver installed on your systemRPostgres
, RMariaDB
, RSQLite
, bigrquery
, etc.) instead of odbc
+ a driver. In many cases, they are more performant (especially in writing data) and may have more translations available for query types.
What tables are in a database? DBI::dbListTables(con)
Use dplyr
to interact with the database table in the same manner you would a local data frame
📣 Do as much work as possible in the database to save time and resources before bringing the table into local memory.
Use dplyr::collect()
to bring the table into memory. Try to use collect
as late as possible in your queries / transformations
Best practices in working with databases
🧰 Deploy and schedule your ETL and reports on Posit Connect
Activity
👉 Open the file materials/activities/activity-02_publish_and_schedule_data_pull.Rmd
Activity objective: Write production data to database, then deploy and schedule this work on Posit Connect so it runs automatically.