Data Science Workflows
with Posit Tools

R Focus

Instructors:
Ryan Johnson
Katie Masiello

TAs:
Sam Edwardes
Rich Iannone

Introduction

Logistics

🛜 WiFi credentials:

  • Network: Posit Conf 2023

  • Password: conf2023

  • Important locations:

    • Bathrooms: There are gender-neutral bathrooms located are among the Grand Suite Bathrooms.
    • Meditation/prayer room: Grand Suite 2A and Grand Suite 2B. Open Sunday - Tuesday 7:30 a.m. - 7:00 p.m., Wednesday 8:00 a.m. - 6:00 p.m.
    • Lactation room: located in Grand Suite 1. Open Sunday - Tuesday 7:30 a.m. - 7:00 p.m., Wednesday 8:00 a.m. - 6:00 p.m.

Logistics

  • Participants who do not wish to be photographed have red lanyards; please note everyone’s lanyard colors before taking a photo and respect their choices.
  • The Code of Conduct and COVID policies can be found at https://posit.co/code-of-conduct/. Please review them carefully. You can report Code of Conduct violations in person, by email, or by phone. Please see the policy linked above for contact information.

Code of Conduct

  • Everyone who comes to learn and enjoy the experience should feel welcome at posit::conf. Posit is committed to providing a professional, friendly and safe environment for all participants at its events, regardless of gender, sexual orientation, disability, race, ethnicity, religion, national origin or other protected class.

  • This code of conduct outlines the expectations for all participants, including attendees, sponsors, speakers, vendors, media, exhibitors, and volunteers. Posit will actively enforce this code of conduct throughout posit::conf.

https://posit.co/code-of-conduct/

Meet the Team!

Ryan Johnson

Data Science Advisor @ Posit

Katie Masiello

Solutions Engineer @ Posit

Sam Edwardes

Solutions Engineer @ Posit

Rich Iannone

Software Engineer @ Posit

Meet your Neighbor!

Agenda

Time Activity
~9:00 - 10:30 Workshop Introduction
Reading, Cleaning, Writing and Validating Data
10:30 - 11:00 Coffee break
~11:00 - 12:30 Creating, Delivering, and Monitoring a model using Vetiver
12:30 - 1:30 Lunch break 🥪
~1:30 - 3:00 Delivery
3:00 - 3:30 Coffee break
~3:30 - 5:00 Advancing your Workflow

The Sticky Situation

“I’m lost / need help”

“I’m done and ready to move along”



👨‍💻Put your sticky note on the back of your laptop screen 👩‍💻

Workshop approach

We will use an end-to-end real-world project to demonstrate workflows and best practices using open source packages and Posit professional tools.


Conventions

🧰 Add this to your toolbox.
📣 I will stand on my soapbox and profess this until I am blue in the face.

Detour warning. We could get really into this, but there’s not time today.

Asking Questions

Discord - #data-science-workflows-with-posit-tools-r-focus




👉 Submit questions and respond to polls here


You are always welcome to raise your hand! 🙋

Go to Discord now and tell us what you’re excited about!

Getting help (R Functions)

Functions are the 🍞 and 🧈 of R programming!


If you want to access any function’s help page:

# Method 1
help(function_name_here)

# Method 2
?function_name_here

# Method 3
# Highlight the function and press F1 🤯

🌮 Hungry to get started? 🌭

Chicago Food Inspections Project

What is this project?

  • Over 15,000 food establishments across the City of Chicago
  • Only a handful of food inspectors currently employed

The Question

Can we help inspectors identify which establishments are at highest risk for failing an inspection?

The Answer

Use the historical (validated) inspections data to create a model that will predict fail likelihood!

Project Data

This workshop will use data from the City of Chicago Open Data Portal: https://data.cityofchicago.org.

  1. 🍕 Food inspections: https://data.cityofchicago.org/Health-Human-Services/Food-Inspections/4ijn-s7e5
dba_name license_number facility_type risk address results violations
MID AMERICA CARE CENTER 2206864 LONG TERM CARE RISK 1 (HIGH) 4920 N KENMORE AVE PASS 32. FOOD AND NON-FOOD CONTA...
LA CEBOLLITA RESTAURANT 2689434 RESTAURANT RISK 1 (HIGH) 4343 W 47TH ST PASS 55. PHYSICAL FACILITIES INS...
C FISH HOUSE 2364469 RESTAURANT RISK 1 (HIGH) 20 W KINZIE ST PASS NA
THE CHALKBOARD INC 2215547 DAYCARE (2 - 6 YEARS) RISK 3 (LOW) 450 W MENOMONEE ST PASS 35. WALLS, CEILINGS, ATTACH...
NEW CHINA BUFFET ENTERPRISES 2032302 RESTAURANT RISK 1 (HIGH) 7310 W FOSTER AVE FAIL 24. DISH WASHING FACILITIES...
  1. 📒 Business licenses: https://data.cityofchicago.org/Community-Economic-Development/Business-Licenses/r5kz-chrr
license_number doing_business_as_name address license_description license_term_expiration_date
2368796 BURGER KING #7268 13770 S AVENUE O RETAIL FOOD ESTABLISHMENT 2019-03-15
5057 HAROLD'S CHICKEN SHACK 7310 S HALSTED ST 1ST RETAIL FOOD ESTABLISHMENT 2004-11-15
2172 THE LODGE TAVERN 21 W DIVISION ST 1 RETAIL FOOD ESTABLISHMENT 2006-02-15
15542 SAM'S CUT RATE FOOD & LIQ INC 500 E 75TH ST 1ST RETAIL FOOD ESTABLISHMENT 2006-11-15
1954015 WALGREENS #12426 315 W CHICAGO AVE 1 RETAIL FOOD ESTABLISHMENT 2012-05-15

🕐 Take 5 minutes and explore these two datasets

Project Objective

  • Provide users food inspectors with a self-service tool that predicts the likelihood of a food establishment failing their next inspection.


Project Requirements

  • 🤖 Automate the pipeline
  • ⚠️ Receive alerts if there are issues in the pipeline
  • 🔄 Project is easy to maintain and iterate upon
  • Work is reusable by other teams, even if they don’t use R (Lookin’ at you )

Project Overview

Get Your Environment Set Up

Your Tools

Access Your Tools

Visit the workshop tools landing page https://conf23workflows.training.posit.co to access:

Connect Setup // Step 1

Visit: https://connect.conf23workflows.training.posit.co

  • Use any email you have access to (personal is preferred to avoid corporate spam filters)
  • Set your username as firstname.lastname
  • Choose any password

Connect Setup // Step 2

  • Check your email for a confirmation email.
  • It will likely be in your spam folder.
  • The email will be from conf23workflows@training.rstudio.com

Workbench Setup // Step 1

Visit: https://workbench.conf23workflows.training.posit.co

  • Your account has already been created!
  • Log in with username as firstname.lastname.

    For example:
    • Name: Ryan Johnson
    • Username: ryan.johnson
  • Password = password

Workbench Setup // Step 1

Your account has already been created!

aaron.amos

david.sluder

jenny.vo

katie.masiello

madison.gipson

nicholas.skaff

rich

sean.hackett

thiago.moreira

yuting.chen

aaron.novotny

emily.prezzato

joanne.ang

kelly.carscadden

matthew.defrank

nick.einterz

richard.carder

shannon.coulter

thiyanga.talagala

yuzhi.lin

amrita.sridhar

ephraim.infante

jordan.creed

kwangshi.shu

mccrea.cobb

nicklas.smith

robin.donatello

stephen.howe

thuy.scanlon

angie.reed

eric.prager

joseph.orton

lia.bozzone

miriam.skorupa

patrick.vandenberg

ryan

steve.hummel

ubuntu

ashley.naimi

erika.manning

julia.silge

lu.mao

naomi.smorenburg

payam.khodabakhshi

sam.edwardes

test_user

user1

carter.coughlin

gagan

justin.lee

luis.dominguez

nathaniel.hawkins

rahul.sangole

sarah.gibson

test_user1

user2

dale.hess

george.tenney

justin.mary

luke.slipski

nergis.zaim

rebecca.butler

scott.tucholka

test_user2

user3

Workbench Setup // Step 2

  1. Click New Session
  2. Start a RStudio Pro session
  3. Navigate to the folder ds-workflows-r
  4. Open the ds-workflows-r.Rproj project

Project Navigation

📁 materials : Everything you need for this workshop is in here!


  • 📁 materials/activities : Contains interactive exercises that we/you will complete during the workshop.

  • 📁 materials/slides : Quarto slides for the workshop, broken up by section.

  • 📁 materials/project : Source material for the Food Inspections Project.

Saving your work 💾

  • All source material can be found on the GitHub page: https://github.com/posit-conf-2023/ds-workflows-r
  • The environment we’re working on will stay on for a few days after conf…but that’s it!
  • If you would like to save your work, we recommend:
    • Exporting any source code to your local machine.
    • Linking the project to a personal GitHub Repo. Information for how to do this can be found here.