Hey guys, welcome back to my R-tips newsletter. Getting quick insights into your data is absolutely critical to data understanding, predictive modeling, and production. But it can be challenging if you’re just getting started. Today, I’m going to show you how to analyze your data faster using the summarytools
package in R. Let’s go!
Here’s what you’re learning today:
summarytools
to Summarize Your Data
dfSummary()
descr()
freq()
Get the Code (In the R-Tip 084 Folder)
Inside the workshop I’ll share how I built a Machine Learning Powered Production Shiny App with ChatGPT
(extends this data analysis to an insane production app):
What: ChatGPT for Data Scientists
When: Wednesday September 25th, 2pm EST
How It Will Help You: Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free chatgpt for data scientists workshop.
Price: Does Free sound good?
How To Join: Register Here
This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Pretty cool, right?
Here are the links to get set up.
I have a 6-minute video that walks you through setting up summarytools
in R and running your first exploratory data analysis with it.
summarytools
In the fast-paced world of data science, getting quick insights into your data is crucial. It allows you to understand your data better, make informed decisions, and expedite the modeling process. However, performing exploratory data analysis (EDA) can be time-consuming if you’re not using the right tools.
summarytools
The summarytools
package in R simplifies the process of data exploration by providing functions that generate comprehensive summaries of your data with minimal code.
Let’s dive into how you can use summarytools
to speed up your data analysis.
summarytools
I’ll show off some of the most important functionality in summarytools
. I’ll use a customer churn dataset. You can get all of the data and code here (it’s in the R-Tip 084 Folder).
First, make sure you have the summarytools
and tidyverse
packages installed. Then load the libraries and data needed to complete this tutorial.
Get the Data and Code (In the R-Tip 084 Folder)
dfSummary()
The dfSummary()
function provides a detailed summary of your data frame, including:
This code will open an interactive HTML report that summarizes your entire data frame, making it easy to spot anomalies or areas that need attention. Run this code:
Get the Code (In the R-Tip 084 Folder)
descr()
To get descriptive statistics for your numeric variables, use the descr()
function. This function provides detailed statistics such as:
Run this code:
Get the Code (In the R-Tip 084 Folder)
freq()
For categorical variables, the freq()
function generates frequency tables that show the distribution of categories. This helps you understand the distribution and prevalence of each category within your data.
Run this code:
Get the Code (In the R-Tip 084 Folder)
By leveraging the summarytools
package, you can perform a comprehensive exploratory data analysis with just a few lines of code. This not only saves you time but also enhances your understanding of the data, allowing you to make better-informed decisions. This leads to better predictive modeling, exploratory data analysis, and production deployment.
But there’s more to becoming a data scientist.
If you would like to grow your Business Data Science skills with R, then please read on…
I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.
I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.
And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):
Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…
Join My 5-Course R-Track Program Now!
(And Become The Data Scientist You Were Meant To Be…)
P.S. – Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.