LICENSE: CC BY-NC-SA 4.0 - Emily Riederer

Page not found!

Looking for content from a post or talk? Try searching the table below or returning home

Otherwise, comment below the table with the link you were trying to reach, and I’ll point you in the right direction.

Order By Default Date - Oldest Date - Newest Title
Date Title Description
Nov 10, 2024 Role-Based Access Control for Quarto sites with Netlify Identity A quick tech note on Netlify’s managed authentication solution
Aug 15, 2024 Python Rgonomics A survey of modern python tooling that “feels good” to R users
Jul 18, 2024 Crosspost: Data discovery doesn’t belong in ad hoc queries Data teams may struggle to quantify the benefits of good data documentation. But running countless ad hoc validation queries can incur both computational and cognitive cost.
Jan 20, 2024 Base Python Rgonomic Patterns Getting comfortable in a new language is more than the packages you use. Syntactic sugar in base python increases the efficiency, and aesthetics of python code in ways that R users may enjoy in packages like glue and purrr. This post collects a miscellaneous grab bag of tools for wrangling, formatting (f-strings), repeating (list comprehensions), faking data, and saving objects (pickle)
Jan 15, 2024 Crosspost: Why You Need Data Documentation in 2024 Data documentation isn’t a box to check; it’s an active member of your team with many jobs-to-be-done. In this cross-post with Select Star, I write about how effective documentation can be your data products’ developer advocate for users, project manager for developers, and chief of staff for data leaders
Jan 13, 2024 polars’ Rgonomic Patterns In this follow-up post to Python Rgonomics, we deep dive into some of the advanced data wrangling functionality in python’s polars package to see how it’s powertools like column selectors and nested data structures mirror the best of dplyr and tidyr’s expressive and concise syntax
Jan 5, 2024 Crosspost: Why you’re closer to data documentation than you think Writing is thinking; documenting is planning and executing. In this cross-post with Select Star, I write about how teams can produce high-quality and maintainble documentation by smartly structuring planning and development documentation and effeciently recycling them into long-term, user-friendly docs
Dec 30, 2023 Python Rgonomics Switching languages is about switching mindsets - not just syntax. New developments in python data science toolings, like polars and seaborn’s object interface, can capture the ‘feel’ that converts from R/tidyverse love while opening the door to truly pythonic workflows
Nov 18, 2023 Big ideas from the 2023 Causal Data Science Meeting Five highlights and links to select talks
Oct 23, 2023 Data Downtime Horror Stories Panel Panel discussion with Chad Sanderson and Joe Reis, hosted by Monte Carlo Data, on our thorniest brushes with data downtime, leading data teams to tackle data quality at scale with testing, contracts, observability and monitoring, and more.
Sep 21, 2023 Operationalizing Column-Name Contracts with dbtplyr An exploration of how data producers and consumers can use column names as interfaces, configuations, and code to improve data quality and discoverability. The second half of the talk demonstrates how to implement these ideas with my dbtplyr dbt package.
Jun 21, 2023 Scaling Personalized Volunteer Emails An overview of the data stack used to automate over 50,000 personalized emails to voter turnout volunteers using BigQuery, dbt, Census, and MailChimp
Jun 7, 2023 Causal Design Patterns An overview of basic research design patterns in causal inference, modern extensions, and data management strategies to set up a causal inference initiative for success
May 30, 2023 Industry information management for causal inference Proactive collection of data to comply or confront assumptions
May 12, 2023 DataFold Data Quality Meet Up Joined a panel of speakers to discuss tips and tricks for running dbt at scale
May 3, 2023 Crosspost: The Art of Abstraction in ETL Rounding out my three-part ETL series form Airbyte’s developer blog
Apr 13, 2023 Posit Data Science Hangout Each week, host Rachael Dempsey invites an accomplished data science leader to talk about their experience and answer questions from the audience. The discussion focuses mainly on the human elements of data science leadership. There’s no sales or marketing fluff, just great insights from inspiring professionals.
Mar 22, 2023 The Art of Abstraction in ETL: Dodging Data Extraction Errors Cross-post from guest post on Airbyte’s developer blog
Mar 22, 2023 Evaluation without Experimentation An introduction to inverse propensity of treatment weighting for program evaluation with applications to Two Million Texans’ relational organizing campaign during the 2022 midterms
Mar 15, 2023 Taking Flight with Shiny: a Modules-First Approach An argument for the individual and organization-wide benefits of teaching new developers Shiny with a modules-first paradigm.
Jan 17, 2023 Crosspost: Power up your data quality with grouped checks After a prior post on the merits of grouped data quality checks, I demo my newly merged implementation for dbt
Nov 12, 2022 The Data (error) Generating Process Interrogating the data generating process to devise better data quality tests.
Sep 25, 2022 Goin’ to Carolina in my mind (or on my hard drive) Out-of-memory processing of North Carolina’s voter file with DuckDB and Apache Arrow
Sep 5, 2022 Oh, I’m sure it’s probably nothing How we do (or don’t) think about null values and why the polyglot push makes it all the more important
Aug 26, 2022 Update: grouped data quality check PR merged to dbt-utils After a prior post on the merits of grouped data quality checks, I demo my newly merged implementation for dbt
Jan 12, 2022 The Data Engineering Podcast: Column Names as Contracts Discussing how column names can serve as a light-weight alternative to data catalogs and contracts and how to implement this approach with dbtplyr
Jan 2, 2022 Using databases with Shiny Key issues when adding persistent storage to a Shiny application, featuring {golem} app development and Digital Ocean serving
Dec 11, 2021 How to Make R Markdown Snow Much like ice sculpting, applying powertools to absolutely frivolous pursuits
Nov 27, 2021 Make grouping a first-class citizen in data quality checks Which of these numbers doesn’t belong? -1, 0, 1, NA. You can’t judge data quality without data context, so our tools should enable as much context as possible.
Nov 17, 2021 UIUC STAT447 (Data Science Programming) Guest Lecture Discussing how to move from scripting to tool development, designing tools in enterprise, and navigating diverse data career paths
Nov 10, 2021 Why machine learning hates vegetables A personal encounter with ‘intelligent’ data products gone wrong
Sep 21, 2021 Update: column-name contracts with dbtplyr Following up on ‘Embedding Column-Name Contracts… with dbt’ to demo my new dbtplyr package to further streamline the process
Aug 26, 2021 A lightweight data validation ecosystem with R, GitHub, and Slack A right-sized solution to automated data monitoring, alerting, and reporting using R (pointblank, projmgr), GitHub (Actions, Pages, issues), and Slack
Jul 14, 2021 Workflows for querying databases via R Tricks for modularizing and refactoring your projects SQL/R interface. (Image source techdaily.ca)
May 27, 2021 Understanding the data (error) generating processes for data validation A data consumer’s guide to validating data based on the failure modes data producer’s try to avoid
May 8, 2021 A Tale of Six States: Flexible data extraction with scraping and browser automation Exploring how Playwright‘s headless browser automation (and its friends) can help unite the states’ data
Feb 26, 2021 Column Names as Contracts Exploring the benefits of using controlled vocabularies to encode metadata in column names, and demonstrations of implementing this approach with the convo R package or dbt extensions of SQL.
Feb 6, 2021 Embedding column-name contracts in data pipelines with dbt dbt supercharges SQL with Jinja templating, macros, and testing – all of which can be customized to enforce controlled vocabularies and their implied contracts on a data model
Jan 30, 2021 Causal design patterns for data analysts An informal primer to causal analysis designs and data structures
Jan 30, 2021 Resource Round-Up: Causal Inference Free books, lectures, blogs, papers, and more for a causal inference crash course
Jan 21, 2021 Building a team of internal R packages On the jobs-to-be-done and design principles for internal tools
Jan 21, 2021 oRganization: Design patterns for internal packages An overview of the unique design challenges and opportunities when building R packages for use inside of a single organization versus open-source. By using the jobs-to-be-done framework, this talk explores how internal packages can be better teammates by following specific design patterns for API design, testing, documentaiton, and more.
Jan 16, 2021 Generating SQL with {dbplyr} and sqlfluff Using the tidyverse’s expressive data wrangling vocabulary as a preprocessor for elegant SQL scripts. (Image source techdaily.ca)
Dec 30, 2020 Introducing the {convo} package An R package for maintaining controlled vocabularies to encode contracts between data producers and consumers
Sep 20, 2020 Sticker-driven maintenance Marketing maintenance work with irrational exuberance
Sep 12, 2020 crosstalk: Dynamic filtering for R Markdown An introduction to browser-based interactivity of htmlwidgets – no Shiny server required!
Sep 6, 2020 Column Names as Contracts Using controlled dictionaries for low-touch documentation, validation, and usability of tabular data
Jul 26, 2020 A beginner’s guide to Shiny modules Don’t believe the documentation! Shiny modules aren’t just for advanced users; they might just be a great entry point for development
Jul 6, 2020 projmgr: Managing the human dependencies of your project A lightning talk on key features of the projmgr package which brings enables code-based planning and reporting workflows grounded in GitHub issues and milestones
Jul 3, 2020 Resource Round-Up: Latent and Lasting Documentation Readings and assorted ideas about creating and maintaining low-overhead documentation
Jun 30, 2020 RMarkdown CSS Selector Tips A few tips and tools for finding the right selectors to style in RMarkdown
May 14, 2020 projmgr: Managing the human dependencies of your projects A walkthrough of using the projmgr package for GitHub-based project management via R
Feb 1, 2020 RMarkdown Driven Development: the Technical Appendix A recommended tech stack for implementing RMarkdown Driven Development
Jan 30, 2020 RMarkdown Driven Development How and why to refactor one time analyses in RMarkdown into sustainable data products
Aug 30, 2019 Resource Round-Up: R in Industry Edition Case studies of the impact of R use on organizational culture and collaboration
Aug 30, 2019 Resource Round-Up: Reproducible Research Edition An annotated bibliography of advice for getting started with reproducible research
May 25, 2019 Rtistic: A package-by-numbers repo A walkthrough of a GitHub template for making your own RMarkdown and ggplot2 theme package
May 7, 2019 Notes on supporting conference speakers Conference planning tips to design a good speakers experience
May 4, 2019 RMarkdown Driven Development (RmdDD) A workflow for refactoring one-time analyses to sustainable data products
Apr 20, 2019 Notes on preparing a tech talk A proposed workflow for methodically developing a good presentations
Nov 1, 2017 Assorted talks on designing analytical tools and communities for enterprise A variety of related talks to creating innersource culture with R packages and related tools
Nov 1, 2017 tidycf: Turning analysis on its head by turning cashflows on their side A case study on building an internal R package for customer lifetime value modeling at Capital One and leading broader analyst adoption of open-source tooling and reproducible workflows through a community of practice.
No matching items

Từ khóa » Cc By-nc 4.0 Github