This is a bare-bones introduction to ggplot2, a visualization package in R. It assumes no knowledge of R.
For a better-looking version of this post, see this Github repository, which also contains some of the example datasets I use and a literate programming version of this tutorial.
Preview
Let’s start with a preview of what ggplot2 can do.
Given Fisher’s iris data set and one simple command…
1
|
|
…we can produce this plot of sepal length vs. petal length, colored by species.
Installation
You can download R here. After installation, you can launch R in interactive mode by either typing R
on the command line or opening the standard GUI (which should have been included in the download).
R Basics
Vectors
Vectors are a core data structure in R, and are created with c()
. Elements in a vector must be of the same type.
1 2 |
|
Elements are indexed starting at 1, and are accessed with []
notation.
1 2 |
|
Data frames
Data frames are like matrices, but with named columns of different types (similar to database tables).
1 2 3 4 5 |
|
You can access columns of a data frame with $
.
1 2 |
|
You can also create new columns with $
.
1 2 3 4 |
|
read.table
Suppose you want to import a TSV file into R as a data frame.
tsv file without header
For example, consider the data/students.tsv
file (with columns describing each student’s age, test score, and name).
1 2 3 |
|
We can import this file into R using read.table()
.
1 2 3 4 5 |
|
We can now access the different columns in the data frame with students$age
, students$score
, and students$name
.
csv file with header
For an example of a file in a different format, look at the data/studentsWithHeader.tsv
file.
1 2 3 4 |
|
Here we have the same data, but now the file is comma-delimited and contains a header. We can import this file with
1 2 3 4 |
|
(Note: there is also a read.csv
function that uses sep = ","
by default.)
help
There are many more options that read.table
can take. For a list of these, just type help(read.table)
(or ?read.table
) at the prompt to access documentation.
1 2 3 |
|
ggplot2
With these R basics in place, let’s dive into the ggplot2 package.
Installation
One of R’s greatest strengths is its excellent set of packages. To install a package, you can use the install.packages()
function.
1
|
|
To load a package into your current R session, use library()
.
1
|
|
Scatterplots with qplot()
Let’s look at how to create a scatterplot in ggplot2. We’ll use the iris
data frame that’s automatically loaded into R.
What does the data frame contain? We can use the head
function to look at the first few rows.
1 2 3 4 5 6 7 8 9 10 |
|
(The data frame actually contains three types of species: setosa, versicolor, and virginica.)
Let’s plot Sepal.Length
against Petal.Length
using ggplot2’s qplot()
function.
1 2 3 4 5 |
|
To see where each species is located in this graph, we can color each point by adding a color = Species
argument.
1
|
|
Similarly, we can let the size of each point denote sepal width, by adding a size = Sepal.Width
argument.
1 2 |
|
1 2 |
|
Finally, let’s fix the axis labels and add a title to the plot.
1 2 3 |
|
Other common geoms
In the scatterplot examples above, we implicitly used a point geom, the default when you supply two arguments to qplot()
.
1 2 3 |
|
But we can also easily use other types of geoms to create more kinds of plots.
Barcharts: geom = “bar”
1 2 3 4 5 6 7 8 9 |
|
1 2 3 |
|
Line charts: geom = “line”
1 2 |
|
1 2 3 4 |
|
1 2 |
|
And that’s it with what I’ll cover.
Next Steps
I skipped over a lot of aspects of R and ggplot2 in this intro.
For example,
- There are many geoms (and other functionalities) in ggplot2 that I didn’t cover, e.g., boxplots and histograms.
- I didn’t talk about ggplot2’s layering system, or the grammar of graphics it’s based on.
So I’ll end with some additional resources on R and ggplot2.
- I don’t use it myself, but RStudio is a popular IDE for R.
- The official ggplot2 documentation is great and has lots of examples. There’s also an excellent book.
- plyr is another fantastic R package that’s also by Hadley Wickham (the author of ggplot2).
- The official R introduction is okay, but definitely not great. I haven’t found any R tutorials I really like, but I’ve heard good things about The Art of R Programming.