Count Continuous Variable in R Ggplot X Axis Show Actual Values
Customizing Graphs
Graph defaults are fine for quick data exploration, but when you want to publish your results to a blog, paper, article or poster, you'll probably want to customize the results. Customization can improve the clarity and attractiveness of a graph.
This chapter describes how to customize a graph's axes, gridlines, colors, fonts, labels, and legend. It also describes how to add annotations (text and lines).
Axes
The x-axis and y-axis represent numeric, categorical, or date values. You can modify the default scales and labels with the functions below.
Quantitative axes
A quantitative axis is modified using the scale_x_continuous
or scale_y_continuous
function.
Options include
-
breaks
- a numeric vector of positions
-
limits
- a numeric vector with the min and max for the scale
# customize numerical x and y axes library(ggplot2) ggplot(mpg, aes(x=displ, y=hwy)) + geom_point() + scale_x_continuous(breaks = seq(1, 7, 1), limits= c(1, 7)) + scale_y_continuous(breaks = seq(10, 45, 5), limits= c(10, 45))
Numeric formats
The scales
package provides a number of functions for formatting numeric labels. Some of the most useful are
-
dollar
-
comma
-
percent
Let's demonstrate these functions with some synthetic data.
# create some data set.seed(1234) df <- data.frame(xaxis = rnorm(50, 100000, 50000), yaxis = runif(50, 0, 1), pointsize = rnorm(50, 1000, 1000)) library(ggplot2) # plot the axes and legend with formats ggplot(df, aes(x = xaxis, y = yaxis, size=pointsize)) + geom_point(color = "cornflowerblue", alpha = .6) + scale_x_continuous(label = scales::comma) + scale_y_continuous(label = scales::percent) + scale_size(range = c(1,10), # point size range label = scales::dollar)
To format currency values as euros, you can use
label = scales::dollar_format(prefix = "", suffix = "\u20ac")
.
Categorical axes
A categorical axis is modified using the scale_x_discrete
or scale_y_discrete
function.
Options include
-
limits
- a character vector (the levels of the quantitative variable in the desired order) -
labels
- a character vector of labels (optional labels for these levels)
library(ggplot2) # customize categorical x axis ggplot(mpg, aes(x = class)) + geom_bar(fill = "steelblue") + scale_x_discrete(limits = c("pickup", "suv", "minivan", "midsize", "compact", "subcompact", "2seater"), labels = c("Pickup \n Truck", "Sport Utility \n Vehicle", "Minivan", "Mid-size", "Compact", "Subcompact", "2-Seater"))
Date axes
A date axis is modified using the scale_x_date
or scale_y_date
function.
Options include
-
date_breaks
- a string giving the distance between breaks like "2 weeks" or "10 years" -
date_labels
- A string giving the formatting specification for the labels
The table below gives the formatting specifications for date values.
Symbol | Meaning | Example |
---|---|---|
%d | day as a number (0-31) | 01-31 |
%a | abbreviated weekday | Mon |
%A | unabbreviated weekday | Monday |
%m | month (00-12) | 00-12 |
%b | abbreviated month | Jan |
%B | unabbreviated month | January |
%y | 2-digit year | 07 |
%Y | 4-digit year | 2007 |
library(ggplot2) # customize date scale on x axis ggplot(economics, aes(x = date, y = unemploy)) + geom_line(color= "darkgreen") + scale_x_date(date_breaks = "5 years", date_labels = "%b-%y")
Here is a help sheet for modifying scales developed from the online help.
Colors
The default colors in ggplot2
graphs are functional, but often not as visually appealing as they can be. Happily this is easy to change.
Specific colors can be
- specified for points, lines, bars, areas, and text, or
- mapped to the levels of a variable in the dataset.
Specifying colors manually
To specify a color for points, lines, or text, use the color = "colorname"
option in the appropriate geom. To specify a color for bars and areas, use the fill = "colorname"
option.
Examples:
-
geom_point(color = "blue")
-
geom_bar(fill = "steelblue")
Colors can be specified by name or hex code.
To assign colors to the levels of a variable, use the scale_color_manual
and scale_fill_manual
functions. The former is used to specify the colors for points and lines, while the later is used for bars and areas.
Here is an example, using the diamonds
dataset that ships with ggplot2
. The dataset contains the prices and attributes of 54,000 round cut diamonds.
# specify fill color manually library(ggplot2) ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar() + scale_fill_manual(values = c("darkred", "steelblue", "darkgreen", "gold", "brown", "purple", "grey", "khaki4"))
If you are aesthetically challenged like me, an alternative is to use a predefined palette.
Color palettes
There are many predefined color palettes available in R.
RColorBrewer
The most popular alternative palettes are probably the ColorBrewer palettes.
You can specify these palettes with the scale_color_brewer
and scale_fill_brewer
functions.
# use an ColorBrewer fill palette ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar() + scale_fill_brewer(palette = "Dark2")
Adding direction = -1
to these functions reverses the order of the colors in a palette.
Viridis
The viridis palette is another popular choice.
For continuous scales use
-
scale_fill_viridis_c
-
scale_color_viridis_c
For discrete (categorical scales) use
-
scale_fill_viridis_d
-
scale_color_viridis_d
# Use a viridis fill palette ggplot(diamonds, aes(x = cut, fill = clarity)) + geom_bar() + scale_fill_viridis_d()
Other palettes
Other palettes to explore include dutchmasters, ggpomological, LaCroixColoR, nord, ochRe, palettetown, pals, rcartocolor, and wesanderson.
If you want to explore all the palette options (or nearly all), take a look at the paletter package.
To learn more about color specifications, see the R Cookpage page on ggplot2 colors. Also see the color choice advice in this book.
Points & Lines
Points
For ggplot2
graphs, the default point is a filled circle. To specify a different shape, use the shape = #
option in the geom_point
function. To map shapes to the levels of a categorical variable use the shape = variablename
option in the aes
function.
Examples:
-
geom_point(shape = 1)
- geom_point(
aes(shape = sex)
)
Availabe shapes are given in the table below.
Shapes 21 through 26 provide for both a fill color and a border color.
Lines
The default line type is a solid line. To change the linetype, use the linetype = #
option in the geom_line
function. To map linetypes to the levels of a categorical variable use the linetype = variablename
option in the aes
function.
Examples:
-
geom_line(linetype = 1)
- geom_line(
aes(linetype = sex)
)
Availabe linetypes are given in the table below.
Fonts
R does not have great support for fonts, but with a bit of work, you can change the fonts that appear in your graphs. First you need to install and set-up the extrafont
package.
# one time install install.packages("extrafont") library(extrafont) font_import() # see what fonts are now available fonts()
Apply the new font(s) using the text
option in the theme
function.
# specify new font library(extrafont) ggplot(mpg, aes(x = displ, y=hwy)) + geom_point() + labs(title = "Diplacement by Highway Mileage", subtitle = "MPG dataset") + theme(text = element_text(size = 16, family = "Comic Sans MS"))
To learn more about customizing fonts, see Working with R, Cairo graphics, custom fonts, and ggplot.
Labels
Labels are a key ingredient in rendering a graph understandable. They're are added with the labs
function. Available options are given below.
option | Use |
---|---|
title | main title |
subtitle | subtitle |
caption | caption (bottom right by default) |
x | horizontal axis |
y | vertical axis |
color | color legend title |
fill | fill legend title |
size | size legend title |
linetype | linetype legend title |
shape | shape legend title |
alpha | transparency legend title |
size | size legend title |
For example
# add plot labels ggplot(mpg, aes(x = displ, y=hwy, color = class, shape = factor(year))) + geom_point(size = 3, alpha = .5) + labs(title = "Mileage by engine displacement", subtitle = "Data from 1999 and 2008", caption = "Source: EPA (http://fueleconomy.gov)", x = "Engine displacement (litres)", y = "Highway miles per gallon", color = "Car Class", shape = "Year") + theme_minimal()
This is not a great graph - it is too busy, making the identification of patterns difficult. It would better to facet the year variable, the class variable or both. Trend lines would also be helpful.
Annotations
Annotations are addition information added to a graph to highlight important points.
Adding text
There are two primary reasons to add text to a graph.
One is to identify the numeric qualities of a geom. For example, we may want to identify points with labels in a scatterplot, or label the heights of bars in a bar chart.
Another reason is to provide additional information. We may want to add notes about the data, point out outliers, etc.
Labeling values
Consider the following scatterplot, based on the car data in the mtcars dataset.
# basic scatterplot data(mtcars) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point()
Let's label each point with the name of the car it represents.
# scatterplot with labels data(mtcars) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_text(label = row.names(mtcars))
The overlapping labels make this chart difficult to read. There is a package called ggrepel
that can help us here.
# scatterplot with non-overlapping labels data(mtcars) library(ggrepel) ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() + geom_text_repel(label = row.names(mtcars), size= 3)
Much better.
Adding labels to bar charts is covered in the aptly named labeling bars section.
Adding additional information
We can place text anywhere on a graph using the annotate
function. The format is
annotate("text", x, y, label = "Some text", color = "colorname", size=textsize)
where x and y are the coordinates on which to place the text. The color
and size
parameters are optional.
By default, the text will be centered. Use hjust
and vjust
to change the alignment.
-
hjust
0 = left justified, 0.5 = centered, and 1 = right centered. -
vjust
0 = above, 0.5 = centered, and 1 = below.
Continuing the previous example.
# scatterplot with explanatory text data(mtcars) library(ggrepel) txt <- paste("The relationship between car weight", "and mileage appears to be roughly linear", sep = " \n ") ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(color = "red") + geom_text_repel(label = row.names(mtcars), size= 3) + ggplot2:: annotate("text", 6, 30, label=txt, color = "red", hjust = 1) + theme_bw()
See this blog post for more details.
Adding lines
Horizontal and vertical lines can be added using:
-
geom_hline(yintercept = a)
-
geom_vline(xintercept = b)
where a is a number on the y-axis and b is a number on the x-axis respectively. Other option include linetype
and color
.
# add annotation line and text label min_cty <- min(mpg$cty) mean_hwy <- mean(mpg$hwy) ggplot(mpg, aes(x = cty, y=hwy, color=drv)) + geom_point(size = 3) + geom_hline(yintercept = mean_hwy, color = "darkred", linetype = "dashed") + ggplot2:: annotate("text", min_cty, mean_hwy + 1, label = "Mean", color = "darkred") + labs(title = "Mileage by drive type", x = "City miles per gallon", y = "Highway miles per gallon", color = "Drive")
We could add a vertical line for the mean city miles per gallon as well. In any case, always label annotation lines in some way. Otherwise the reader will not know what they mean.
Highlighting a single group
Sometimes you want to highlight a single group in your graph. The gghighlight
function in the gghighlight
package is designed for this.
Here is an example with a scatterplot.
# highlight a set of points library(ggplot2) library(gghighlight) ggplot(mpg, aes(x = cty, y = hwy)) + geom_point(color = "red", size= 2) + gghighlight(class == "midsize")
Below is an example with a bar chart.
# highlight a single bar library(gghighlight) ggplot(mpg, aes(x = class)) + geom_bar(fill = "red") + gghighlight(class == "midsize")
There is nothing here that could not be done with base graphics, but it is more convenient.
Themes
ggplot2
themes control the appearance of all non-data related components of a plot. You can change the look and feel of a graph by altering the elements of its theme.
Altering theme elements
The theme
function is used to modify individual components of a theme.
The parameters of the theme
function are described in a cheatsheet developed from the online help.
Consider the following graph. It shows the number of male and female faculty by rank and discipline at a particular university in 2008-2009. The data come from the Salaries for Professors dataset.
# create graph data(Salaries, package = "carData") p <- ggplot(Salaries, aes(x = rank, fill = sex)) + geom_bar() + facet_wrap(~discipline) + labs(title = "Academic Rank by Gender and Discipline", x = "Rank", y = "Frequency", fill = "Gender") p
Let's make some changes to the theme.
- Change label text from black to navy blue
- Change the panel background color from grey to white
- Add solid grey lines for major y-axis grid lines
- Add dashed grey lines for minor y-axis grid lines
- Eliminate x-axis grid lines
- Change the strip background color to white with a grey border
Using the cheat sheet gives us
p + theme(text = element_text(color = "navy"), panel.background = element_rect(fill = "white"), panel.grid.major.y = element_line(color = "grey"), panel.grid.minor.y = element_line(color = "grey", linetype = "dashed"), panel.grid.major.x = element_blank(), panel.grid.minor.x = element_blank(), strip.background = element_rect(fill = "white", color= "grey"))
Wow, this looks pretty awful, but you get the idea.
ggThemeAssist
If you would like to create your own theme using a GUI, take a look at ggThemeAssist
. After you install the package, a new menu item will appear under Addins in RStudio.
Highlight the code that creates your graph, then choose the ggThemeAssist
option from the Addins drop-down menu. You can change many of the features of your theme using point-and-click. When you're done, the theme
code will be appended to your graph code.
Pre-packaged themes
I'm not a very good artist (just look at the last example), so I often look for pre-packaged themes that can be applied to my graphs. There are many available.
Some come with ggplot2
. These include theme_classic, theme_dark, theme_gray, theme_grey, theme_light theme_linedraw, theme_minimal, and theme_void. We've used theme_minimal often in this book. Others are available through add-on packages.
ggthemes
The ggthemes
package come with 19 themes.
Theme | Description |
---|---|
theme_base | Theme Base |
theme_calc | Theme Calc |
theme_economist | ggplot color theme based on the Economist |
theme_economist_white | ggplot color theme based on the Economist |
theme_excel | ggplot color theme based on old Excel plots |
theme_few | Theme based on Few's "Practical Rules for Using Color in Charts" |
theme_fivethirtyeight | Theme inspired by fivethirtyeight.com plots |
theme_foundation | Foundation Theme |
theme_gdocs | Theme with Google Docs Chart defaults |
theme_hc | Highcharts JS theme |
theme_igray | Inverse gray theme |
theme_map | Clean theme for maps |
theme_pander | A ggplot theme originated from the pander package |
theme_par | Theme which takes its values from the current 'base' graphics parameter values in 'par'. |
theme_solarized | ggplot color themes based on the Solarized palette |
theme_solarized_2 | ggplot color themes based on the Solarized palette |
theme_solid | Theme with nothing other than a background color |
theme_stata | Themes based on Stata graph schemes |
theme_tufte | Tufte Maximal Data, Minimal Ink Theme |
theme_wsj | Wall Street Journal theme |
To demonstrate their use, we'll first create and save a graph.
# create basic plot library(ggplot2) p <- ggplot(mpg, aes(x = displ, y=hwy, color = class)) + geom_point(size = 3, alpha = .5) + labs(title = "Mileage by engine displacement", subtitle = "Data from 1999 and 2008", caption = "Source: EPA (http://fueleconomy.gov)", x = "Engine displacement (litres)", y = "Highway miles per gallon", color = "Car Class") # display graph p
Now let's apply some themes.
# add economist theme library(ggthemes) p + theme_economist()
# add fivethirtyeight theme p + theme_fivethirtyeight()
# add wsj theme p + theme_wsj(base_size= 8)
By default, the font size for the wsj theme is usually too large. Changing the base_size
option can help.
Each theme also comes with scales for colors and fills. In the next example, both the few
theme and colors are used.
# add few theme p + theme_few() + scale_color_few()
Try out different themes and scales to find one that you like.
hrbrthemes
The hrbrthemes
package is focused on typography-centric themes. The results are charts that tend to have a clean look.
Continuing the example plot from above
# add few theme library(hrbrthemes) p + theme_ipsum()
See the hrbrthemes homepage for additional examples.
ggthemer
The ggthemer
package offers a wide range of themes (17 as of this printing).
The package is not available on CRAN and must be installed from GitHub.
# one time install install.packages("devtools") devtools:: install_github('cttobin/ggthemr')
The functions work a bit differently. Use the ggthemr("themename")
function to set future graphs to a given theme. Use ggthemr_reset()
to return future graphs to the ggplot2
default theme.
Current themes include flat, flat dark, camoflauge, chalk, copper, dust, earth, fresh, grape, grass, greyscale, light, lilac, pale, sea, sky, and solarized.
# set graphs to the flat dark theme library(ggthemr) ggthemr("flat dark") p
I would not actually use this theme for this particular graph. It is difficult to distinguish colors. Which green represents compact cars and which represents subcompact cars?
Select a theme that best conveys the graph's information to your audience.
Source: https://rkabacoff.github.io/datavis/Customizing.html
0 Response to "Count Continuous Variable in R Ggplot X Axis Show Actual Values"
Post a Comment