Those plots did look promising, didn’t they? Let’s start from the beginning and go through basic plots in R. The following section is based on the wonderful R in Action book (pp 119) by Robert Kabacoff. You have the book in your course references. http://www.statmethods.net/ is the book’s quick reference.
To make it a bit more interesting, we return to the LinkedIn and Facebook view numbers. We would like to investigate their relationship. They should be loaded already. Try linkedin to get the LinkedIn views of your week.
linkedin
## [1] 16 9 13 5 2 17 14
Let’s look at the Facebook views again. Type facebook.
facebook
## [1] 17 7 5 16 8 13 14
With the function plot, we simply plot a vector value at a certain index. Try plot(linkedin).
plot(linkedin)
That’s ok but not very pretty. Let’s produce a line plot by using the argument type in plot. With plot(linkedin, type=‘o’, col=‘blue’) you create a blue line plot. Find out how by researching it online.
plot(linkedin, type='o', col='blue')
Please, now type title(main=‘LinkedIn’, col.main=‘red’, font.main=4) to add a red main title of font size 4. Until we start a new plot with the plot function, we will add to the existing one in R. In this case, we add a title.
plot(linkedin, type='o', col='blue')
title(main='LinkedIn', col.main='red', font.main=4)
Better. Now, we would like to compare LinkedIn and Facebook views and create a graph containing both. Let’s start again with plot(linkedin, type=‘o’, col=‘blue’, xlab = ‘Days’, ylab = ‘Views’, xaxt = ‘n’). Please, note how we change the colour using col. We also defined the names of the x and y axis with xlab and ylab. You have already seen those earlier. Finally, we instructed the plot not to use ticks for the x-axis with xaxt. We want to define these later by ourselves.
plot(linkedin, type='o', col='blue', xlab = 'Days', ylab = 'Views', xaxt = 'n')
Let’s add the facebook graph with lines(facebook, type=‘o’, pch=22, lty=2, col=‘red’). lines is the built-in way to plot line graphs without starting a new plot. We chose red as a colour, but how do we create the dotted line? Go find out from the all-knowing Internet.
plot(linkedin, type='o', col='blue', xlab = 'Days', ylab = 'Views', xaxt = 'n')
lines(facebook, type='o', pch=22, lty=2, col='red')
Add a title with title(main=‘LinkedIn-Facebook-Week’, col.main=‘red’, font.main=4).
plot(linkedin, type='o', col='blue', xlab = 'Days', ylab = 'Views', xaxt = 'n')
lines(facebook, type='o', pch=22, lty=2, col='red')
title(main='LinkedIn-Facebook-Week', col.main='red', font.main=4)
There are many more ways to improve this graph. You can, for instance, add a better x-axis description with axis(1, at=1:7, lab=c(‘Mon’,‘Tue’,‘Wed’,‘Thu’,‘Fri’,‘Sat’, ‘Sun’)). Do you understand this expression?
plot(linkedin, type='o', col='blue', xlab = 'Days', ylab = 'Views', xaxt = 'n')
lines(facebook, type='o', pch=22, lty=2, col='red')
title(main='LinkedIn-Facebook-Week', col.main='red', font.main=4)
axis(1, at=1:7, lab=c('Mon','Tue','Wed','Thu','Fri','Sat', 'Sun'))
Finally, let us add a legend in the bottom right corner. This is a bit more complicated and the Internet is definitely your friend here. The command is legend(‘bottomright’, inset=.05, title=‘’, c(’LinkedIn’,‘Facebook’), fill=c(‘red’,‘blue’), horiz=FALSE).
plot(linkedin, type='o', col='blue', xlab = 'Days', ylab = 'Views', xaxt = 'n')
lines(facebook, type='o', pch=22, lty=2, col='red')
title(main='LinkedIn-Facebook-Week', col.main='red', font.main=4)
axis(1, at=1:7, lab=c('Mon','Tue','Wed','Thu','Fri','Sat', 'Sun'))
legend('bottomright', inset=.05, title='', c('LinkedIn','Facebook'), fill=c('red','blue'), horiz=FALSE)
Much better. You could add this graph already to your presentations. It looks good enough. There are, however, a million ways to improve this even further in R. If you are interested, just search the web for all the fantastic visualisations people have created with R. But we will move on to look at how visualisations can be used with a data frame. Remember, data frames are the workhorses of R, which we use in almost all our data analysis tasks.
First let’s create a simple data frame with views <- data.frame(linkedin, facebook).
views <- data.frame(linkedin, facebook)
Now, let’s create a simple barplot of facebook views with barplot(views$facebook). Do you remember what the $ operator does?
barplot(views$facebook)
And, an advanced version with barplot(views$facebook, main=‘Facebook’, xlab=‘Days’,ylab=‘Total’, names.arg=c(‘Mon’,‘Tue’,‘Wed’,‘Thu’,‘Fri’,‘Sat’,‘Sun’), border=‘blue’, density=c(50,20,10,50,20,35,35)). This expression is quite something. Try and figure out what it does with the help of your new best friend the Internet.
barplot(views$facebook, main='Facebook', xlab='Days',ylab='Total', names.arg=c('Mon','Tue','Wed','Thu','Fri','Sat','Sun'), border='blue', density=c(50,20,10,50,20,35,35))
Finally, a stacked barplot is produced with barplot(t(views), main=‘Views’, ylab=‘Total’, col=heat.colors(2), space=0.1, cex.axis=0.8, las=1, names.arg=c(‘Mon’,‘Tue’,‘Wed’,‘Thu’,‘Fri’,‘Sat’,‘Sun’), cex=0.8). I would recommend taking a piece of paper and research each argument online in order to find out exactly what it does. Talk to your neighbour in class about it, too. This is already advanced R work. You do not need to understand all the internal workings of this expression yet but try your best. I am just trying to give you an impression of the limitless opportunities of plots in R.
barplot(t(views), main='Views', ylab='Total', col=heat.colors(2), space=0.1, cex.axis=0.8, las=1, names.arg=c('Mon','Tue','Wed','Thu','Fri','Sat','Sun'), cex=0.8)
Lastly, add a legend with legend(2,30, names(views), cex=0.8, fill=heat.colors(2)).
barplot(t(views), main='Views', ylab='Total', col=heat.colors(2), space=0.1, cex.axis=0.8, las=1, names.arg=c('Mon','Tue','Wed','Thu','Fri','Sat','Sun'), cex=0.8)
legend(2,30, names(views), cex=0.8, fill=heat.colors(2))
Finally, in order to demonstrate the great power of R visualisations as long as you have the data, let us take a quick look at geographical maps. We will use a set of geo-coordinates expressed as longitude and latitude to map data on a map of the USA.
Pre-loaded is a dataset of catholic dioceses in the USA called dioceses. Check it out with head(dioceses). We downloaded it from yet another repository of R data with lots of social and historical datasets. Take a look at https://ropensci.org/.
head(dioceses)
## X diocese date.erected date.metropolitan rite
## 1 1 Baltimore, Maryland April 6, 1789 April 8, 1808 Latin
## 2 2 New Orleans, Louisiana April 25, 1793 July 19, 1850 Latin
## 3 3 Boston, Massachusetts April 8, 1808 February 12, 1875 Latin
## 4 4 Louisville, Kentucky April 8, 1808 December 10, 1937 Latin
## 5 5 New York, New York April 8, 1808 July 19, 1850 Latin
## 6 6 Philadelphia, Pennsylvania April 8, 1808 February 12, 1875 Latin
## geo.lon geo.lat
## 1 -76.61219 39.29038
## 2 -90.07153 29.95107
## 3 -71.05977 42.35843
## 4 -85.75846 38.25266
## 5 -74.00597 40.71435
## 6 -75.16379 39.95233
Let’s look for the geo-coordinates with str(dioceses). The corresponding columns are called geo.lon and geo.lat. Fairly obvious and easy to find! In other cases, you might have to try harder if the columns are not well described.
str(dioceses)
## 'data.frame': 355 obs. of 7 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ diocese : Factor w/ 341 levels "Acapulco, Guerrero",..: 22 183 31 156 186 211 231 41 50 282 ...
## $ date.erected : Factor w/ 277 levels " April 20, 1977",..: 25 17 27 27 27 27 154 125 155 136 ...
## $ date.metropolitan: Factor w/ 57 levels "","April 30, 1960",..: 3 30 17 10 30 17 1 1 30 31 ...
## $ rite : Factor w/ 6 levels "","Antiochian",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ geo.lon : num -76.6 -90.1 -71.1 -85.8 -74 ...
## $ geo.lat : num 39.3 30 42.4 38.3 40.7 ...
We will produce a simple map, for which we need to load the package maps with library(maps). With the library command you get access to all the package’s collections of R functions and data. The map package was installed when you started the lesson. You can just type library(maps).
library(maps)
Next, we need to load the template of a map of the USA with map(‘state’).
map('state')
Finally, for the magic. We just need to give R the coordinates. With them, R can plot them by itself on the map of the USA. Use points(dioceses\(geo.lon, dioceses\)geo.lat, col = ‘red’, pch = 19) to create red points for each diocese in the USA.
map('state')
points(dioceses$geo.lon, dioceses$geo.lat, col = 'red', pch = 19)
Let’s make this a bit more informative with a title and the names (text) of the dioceses. Type title(‘Catholic Dioceses’).
map('state')
points(dioceses$geo.lon, dioceses$geo.lat, col = 'red', pch = 19)
title('Catholic Dioceses')
Add the names of the dioceses with text(dioceses\(geo.lon, dioceses\)geo.lat+1, labels = dioceses$diocese, cex = 0.7). This is more for demonstration purposes, as there are too many names to plot on the map. If you really intend to add names, it will be a good idea to first filter out a few first.
map('state')
points(dioceses$geo.lon, dioceses$geo.lat, col = 'red', pch = 19)
title('Catholic Dioceses')
text(dioceses$geo.lon, dioceses$geo.lat+1, labels = dioceses$diocese, cex = 0.7)
Let’s check what we have learned next.
Once you have loaded the library maps, what is the order to plot points on a map?
Map then points
Plot the facebook data with type ‘o’ and a ‘blue’ colour. Type in the answer.
plot(facebook, type='o', col='blue')
Plot a barplot of the linkedin views.
barplot(views$linkedin)
Finally, work together in your group through the many possibilities of plotting in R at http://www.statmethods.net/graphs/creating.html.
That’s it for the introduction into basic R. At the same time, this should be only the beginning. We have seen the power of R but this was only a small snapshot. I hope you are sufficiently motivated to follow up on what you have learned so far and look at some of the books and online resources of the course. Also, don’t forget the wonderful online TryR course, datacamp.com or look up further SWIRL resources under https://github.com/swirldev/swirl_courses#swirl-courses.