Introduction
Welcome back, fellow data enthusiasts! Today, we embark on an exciting journey into the world of statistical distributions with a special focus on the latest addition to the TidyDensity package – the triangular distribution. Tightly packed and versatile, this distribution brings a unique flavor to your data simulations and analyses. In this blog post, we’ll delve into the functions provided, understand their arguments, and explore the wonders of the triangular distribution.
What’s So Special About Triangular Distributions?
- Flexibility in uncertainty: They model situations where you have a minimum, maximum, and most likely value, but the exact distribution between those points is unknown.
- Common in real-world scenarios: Project cost estimates, task completion times, expert opinions, and even natural phenomena often exhibit triangular patterns.
- Simple to understand and visualize: Their straightforward shape makes them accessible for interpretation and communication.
The triangular distribution is a continuous probability distribution with lower limit a, upper limit b, and mode c, where a < b and a ≤ c ≤ b. The distribution resembles a tent shape.
The probability density function of the triangular distribution is:
f(x) =
(2(x - a)) / ((b - a)(c - a)) for a ≤ x ≤ c
(2(b - x)) / ((b - a)(b - c)) for c ≤ x ≤ b
The key parameters of the triangular distribution are:
a
– the minimum value
b
– the maximum value
c
– the mode (most frequent value)
The triangular distribution is often used as a subjective description of a population for which there is only limited sample data. It is useful when a process has a natural minimum and maximum.
Triangular Functions
TidyDensity’s Triangular Distribution Functions: Let’s start by introducing the main functions for the triangular distribution:
tidy_triangular()
: This function generates a triangular distribution with a specified number of simulations, minimum, maximum, and mode values.
- .n: Specifies the number of x values for each simulation.
- .min: Sets the minimum value of the triangular distribution.
- .max: Determines the maximum value of the triangular distribution.
- .mode: Specifies the mode (peak) value of the triangular distribution.
- .num_sims: Controls the number of simulations to perform.
- .return_tibble: A logical value indicating whether to return the result as a tibble.
util_triangular_param_estimate()
: This function estimates the parameters of a triangular distribution from a tidy data frame.
- .x: Requires a numeric vector, with all values satisfying 0 <= x <= 1.
- .auto_gen_empirical: A boolean value (TRUE/FALSE) with a default set to TRUE. It automatically generates tidy_empirical() output for the .x parameter and utilizes tidy_combine_distributions().
util_triangular_stats_tbl()
: This function creates a tidy data frame with statistics for a triangular distribution.
- .data: The data being passed from a tidy_ distribution function.
triangle_plot()
: This function creates a ggplot2 object for a triangular distribution.
- .data: Tidy data from the tidy_triangular function.
- .interactive: A logical value indicating whether to return an interactive plot using plotly. Default is FALSE.
Using tidy_triangular for Simulations
Suppose you want to simulate a triangular distribution with 100 x values, a minimum of 0, a maximum of 1, and a mode at 0.5. You’d use the following code:
library(TidyDensity)
triangular_data <- tidy_triangular(
.n = 100,
.min = 0,
.max = 1,
.mode = 0.5,
.num_sims = 1,
.return_tibble = TRUE
)
triangular_data
# A tibble: 100 × 7
sim_number x y dx dy p q
<fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 0.853 -0.140 0.00158 0.957 0.853
2 1 2 0.697 -0.128 0.00282 0.816 0.697
3 1 3 0.656 -0.116 0.00484 0.764 0.656
4 1 4 0.518 -0.103 0.00805 0.536 0.518
5 1 5 0.635 -0.0909 0.0130 0.733 0.635
6 1 6 0.838 -0.0786 0.0202 0.948 0.838
7 1 7 0.645 -0.0662 0.0304 0.748 0.645
8 1 8 0.482 -0.0539 0.0444 0.464 0.482
9 1 9 0.467 -0.0416 0.0627 0.437 0.467
10 1 10 0.599 -0.0293 0.0859 0.678 0.599
# ℹ 90 more rows
This generates a tidy tibble with simulated data, ready for your analysis.
Estimating Parameters and Creating Stats Tables
Utilize the util_triangular_param_estimate
function to estimate parameters and create tidy empirical data:
param_estimate <- util_triangular_param_estimate(.x = triangular_data$y)
t(param_estimate$parameter_tbl)
[,1]
dist_type "Triangular"
samp_size "100"
min "0.0572515"
max "0.8822025"
mode "0.8822025"
method "Basic"
For statistics table creation:
stats_table <- util_triangular_stats_tbl(.data = triangular_data)
t(stats_table)
[,1]
tidy_function "tidy_triangular"
function_call "Triangular c(0, 1, 0.5)"
distribution "Triangular"
distribution_type "continuous"
points "100"
simulations "1"
mean "0.5"
median "0.3535534"
mode "1"
range_low "0.0572515"
range_high "0.8822025"
variance "0.04166667"
skewness "0"
kurtosis "-0.6"
entropy "-0.6931472"
computed_std_skew "-0.1870017"
computed_std_kurt "2.778385"
ci_lo "0.08311609"
ci_hi "0.8476985"
Visualizing the Triangular Distribution: Now, let’s visualize the triangular distribution using the triangle_plot
function:
triangle_plot(.data = triangular_data, .interactive = TRUE)
triangle_plot(.data = triangular_data, .interactive = FALSE)
This will generate an informative plot, and if you set .interactive
to TRUE, you can explore the distribution interactively using plotly.
Conclusion
In this blog post, we’ve explored the powerful functionalities of the triangular distribution in TidyDensity. Whether you’re simulating data, estimating parameters, or creating insightful visualizations, these functions provide a robust toolkit for your statistical endeavors. Happy coding, and may your distributions always be tidy!
Continue reading:
Exploring the Peaks: A Dive into the Triangular Distribution in TidyDensity