IT博客汇
  • 首页
  • 精华
  • 技术
  • 设计
  • 资讯
  • 扯淡
  • 权利声明
  • 登录 注册

    Top 10 R Packages for Exploratory Data Analysis (EDA) (Bookmark this!)

    Business Science发表于 2024-10-03 12:33:00
    love 0
    [This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Hey guys, welcome back to my R-tips newsletter. Today, I’m excited to share with you the Top 10 R Packages for Exploratory Data Analysis (EDA). These packages will help you streamline your data analysis workflow and gain deeper insights into your datasets. Let’s dive in!

    Table of Contents

    Here’s what you’re learning today:

    • Importance of Exploratory Data Analysis

    • Top 10 R Packages for EDA:
      • skimr
      • psych
      • corrplot
      • PerformanceAnalytics
      • GGally
      • DataExplorer
      • summarytools
      • SmartEDA
      • janitor
      • inspectdf
    • BONUS: 5 More Underrated EDA Libraries in R

    • Get the Code: Join the R-Tips Newsletter to get the code and stay updated.

    Analyze Your Data Faster with gt_summarytools()

    Get the Code (In the R-Tip 086 Folder)


    SPECIAL ANNOUNCEMENT: ChatGPT for Data Scientists Workshop on October 23rd

    Inside the workshop I’ll share how I built a Machine Learning Powered Production Shiny App with ChatGPT (extends this data analysis to an insane production app):

    ChatGPT for Data Scientists

    What: ChatGPT for Data Scientists

    When: Wednesday October 23rd, 2pm EST

    How It Will Help You: Whether you are new to data science or are an expert, ChatGPT is changing the game. There’s a ton of hype. But how can ChatGPT actually help you become a better data scientist and help you stand out in your career? I’ll show you inside my free chatgpt for data scientists workshop.

    Price: Does Free sound good?

    How To Join: 👉 Register Here


    R-Tips Weekly

    This article is part of R-Tips Weekly, a weekly video tutorial that shows you step-by-step how to do common R coding tasks. Pretty cool, right?

    Here are the links to get set up. 👇

    • Sign up for our R-Tips Newsletter and get the code.
    • YouTube Tutorial

    This Tutorial is Available in Video (12-minutes)

    I have a 12-minute video that walks you through these top 10 R packages for EDA and how to use them in R. (These are the ones I use most commonly) 👇

    Importance of Exploratory Data Analysis

    Exploratory Data Analysis (EDA) is a crucial step in any data science project. It helps you understand the underlying structure of your data, identify patterns, detect anomalies, and test hypotheses. EDA enables you to make informed decisions about data cleaning, feature selection, and model selection.

    Top 10 R Packages for EDA

    To make your EDA process more efficient and insightful, here are the top 10 R packages you should know. Get the R code and dataset so you can follow along here.

    Setup the EDA Packages and Dataset in R:

    First, make sure you install all of the R packages I’ll be demo-ing today. Then load the data set I’ll be using so you can reproduce the results. Run this code:

    Libraries and Data

    Get the Code (In the R-Tip 086 Folder)

    1. skimr: Summary of the Dataset

    skimr provides a convenient and elegant summary of your data. Run this code:

    • I made a deeper writeup on skimr: Get the deep-dive here.

    skimr summary of dataset

    Get the Code (In the R-Tip 086 Folder)

    2. psych: Descriptive Statistics

    The psych package offers functions for psychological, psychometric, and personality research, including descriptive statistics. Run this code:

    • We’ll use the describe() function.
    • I personally like to output tables, so optionally you can use gt::gt() to convert to a GT HTML table. (I made a deep dive on the GT R package here.)

    Get Descriptive Statistics with Psych

    Get the Code (In the R-Tip 086 Folder)

    3. corrplot: Correlation Matrix Visualization

    corrplot visualizes correlation matrices using various correlation methods. There’s a ton of customizations you can do. Run this code:

    Correlation Matrix Visualization with Corrplot

    Get the Code (In the R-Tip 086 Folder)

    4. PerformanceAnalytics: Correlation Matrix with Scatterplots and Histograms

    PerformanceAnalytics provides advanced charts and statistical functions for financial analysis (I actually use PerformanceAnalytics inside my tidyquant package for easier financial analysis). But, most people have no idea it has an amazing chart.Correlation() function that is fast and awesome. Run this code:

    PeformanceAnalytics Chart Correlation

    Get the Code (In the R-Tip 086 Folder)

    5. GGally: Scatterplot Matrix with Pairwise Relationships

    GGally extends ggplot2 by adding several functions to reduce the complexity of combining geometric objects. The ggpairs() function is one of my favorite functions for assessing Pairwise Relationships. So powerful. Run this code:

    GGally Pairwise Relationships

    Get the Code (In the R-Tip 086 Folder)

    6. DataExplorer: Generate a Full EDA Report

    DataExplorer automates the EDA process and generates comprehensive reports. Run this code:

    • I did a Deeper Dive on Data Explorer (Get my deep-dive here.)

    DataExplorer

    Get the Code (In the R-Tip 086 Folder)

    7. summarytools: Summary Table for the Dataset

    summarytools provides tools to neatly and quickly summarize data. Run this code:

    • I did a deep dive on summarytools (Get the deep dive here.)
    • I’m a big fan of gt tables, so I converted summarytools to gt (get that article here.)

    Summarytools

    Get the Code (In the R-Tip 086 Folder)

    8. SmartEDA: Generate a Detailed EDA Report in HTML

    SmartEDA creates automated EDA reports with detailed analyses. This is a newer package, but already I love it. Run this code:

    SmartEDA

    Get the Code (In the R-Tip 086 Folder)

    9. janitor: Frequency Table for a Categorical Variable

    janitor helps with data cleaning tasks, including frequency tables. We’ll use tabyl() to create a frequency table and the adorn_* functions to modify the table. Run this code:

    Janitor Tabyl

    Get the Code (In the R-Tip 086 Folder)

    10. inspectdf: Visualize Missing Values in the Dataset

    inspectdf provides tools to visualize data frames, including missing values and correlations. Run this code:

    InspectDF

    Get the Code (In the R-Tip 086 Folder)

    Bonus: Five (5) Underrated EDA Libraries in R:

    I had to call it quits at 10. But here are 4 more up and coming EDA libraries that are underrated:

    1. Radiant: A shiny app for creating reproducible business and data analytics reports. Get my radiant deep dive here.

    2. Correlationfunnel: I use this R package all the time for quick correlation anlaysis and detecting critical relationships. Full Disclosure: I authored this R package. (Get the introduction here.)

    3. GWalkr: Like Tableau in R for $0. Get my GWalkR deep-dive here.

    4. Esquisse: Also like Tableau in R for $0. Get my Esquisse deep-dive here.

    5. Explore: A simple shiny app for quickly exploring data. Get my explore deep-dive here.

    Want the Full R Code?

    To get access to the full source code for this tutorial, subscribe to the R-Tips Newsletter. This code is available exclusively to subscribers!

    Get the Code (In the R-Tip 086 Folder)

    Conclusion: Enhance Your Data Analysis Workflow

    By using these top 10 R packages for EDA, you can significantly enhance your exploratory data analysis workflow, gain deeper insights, and make data-driven decisions more effectively.

    But there’s more to becoming a data scientist.

    If you would like to grow your Business Data Science skills with R, then please read on…

    Need to advance your business data science skills?

    I’ve helped 6,107+ students learn data science for business from an elite business consultant’s perspective.

    I’ve worked with Fortune 500 companies like S&P Global, Apple, MRM McCann, and more.

    And I built a training program that gets my students life-changing data science careers (don’t believe me? see my testimonials here):

    6-Figure Data Science Job at CVS Health ($125K)
    Senior VP Of Analytics At JP Morgan ($200K)
    50%+ Raises & Promotions ($150K)
    Lead Data Scientist at Northwestern Mutual ($175K)
    2X-ed Salary (From $60K to $120K)
    2 Competing ML Job Offers ($150K)
    Promotion to Lead Data Scientist ($175K)
    Data Scientist Job at Verizon ($125K+)
    Data Scientist Job at CitiBank ($100K + Bonus)

    Whenever you are ready, here’s the system they are taking:

    Here’s the system that has gotten aspiring data scientists, career transitioners, and life long learners data science jobs and promotions…

    What They're Doing - 5 Course R-Track

    Join My 5-Course R-Track Program Now!
    (And Become The Data Scientist You Were Meant To Be…)

    P.S. – Samantha landed her NEW Data Science R Developer job at CVS Health (Fortune 500). This could be you.

    Success Samantha Got The Job

    To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
    Continue reading: Top 10 R Packages for Exploratory Data Analysis (EDA) (Bookmark this!)


沪ICP备19023445号-2号
友情链接