Introduction

R is an interpreted programming language for statistical computing and graphics supported by the R Foundation. It is widely used among statisticians and data miners for developing statistical software and data analysis. This page presents how to run R in Jupyter Notebook for better documentation, reproducibility and sharing the results. Python, Jupyter Notebook and R should be already installed.

Run R on Jupyter Notebook

The distributions of R packages (Linux, Mac OS X, and Windows) are available at the Comprehensive R Archive Network. Download R for Windows, macOS or Linux and then install it on your machine. image.png

Install R kernel for Jupyter Notebook

Jupyter Notebook has kernels which are processes that run interactive code in a particular programming language and return output to the user. IRkernel is an R kernel for Jupyter Notebook.

IRkernel packages can be installed by running to the following command in an R console:

install.packages('IRkernel')

image.png

Kernel Available on Jupyter Notebook

Then, you will have to make Jupyter see the newly installed R kernel by installing a kernel spec. To install system-wide, set user to False in the installspec command:

IRkernel::installspec(user = FALSE)

image.png

Launch Jupyter Notebook with R

Now, the Jupyter Notebook supports both Python 3 and R programming languages. Type jupyter notebook in the anaconda console and press enter to pop up empty workspace. Now create a new R notebook by clicking on the New button and selecting R.

image.png

Install R Packages and Load in Jupyter Notebook

R packages should be installed in R console. For example, for installing “ggplot2“ type:

install.packages("ggplot2")

Then you can load the package in Jupyter Notebook and plot and embedded image as below: image-2.png

Add Table of Content on Jupyter Notebook

First, install jupyter_nbextensions_configurator by typing this command in the anaconda console and press enter:

pip install jupyter_nbextensions_configurator

Then, type jupyter notebook in the anaconda console

image.png

Finally, click on Table of contents icon to generate table of contents image.png

Table of Contents can be generated on first cell. First click on Contents on top left figure below (Step 1) and then tick Add notebook TOC cell (Step 2) :

image.png

In [ ]:

Orientation to R

In [1]:
print("Hello World", quote=FALSE)
[1] Hello World
In [2]:
a<- 1  # a=1
b<- 2  # b=2
In [3]:
a+b
3
In [4]:
round(3.1415)
3
In [5]:
factorial(3)  # 3!=3*2*1
6
In [6]:
sqrt(14)  # square root
3.74165738677394
In [7]:
factorial (round(2.0015)+1)
6
In [8]:
log10(1000)+log10(100)
5
In [9]:
pi
3.14159265358979

Data Structures

In [10]:
# Load WorldPhones data set
WorldPhones  # Number of phone numbers
A matrix: 7 × 7 of type dbl
N.AmerEuropeAsiaS.AmerOceaniaAfricaMid.Amer
19514593921574287618151646 89 555
195660423299904708256823661411 733
195764721325105230269525261546 773
195868484352186662284526911663 836
195971799375986856300028681769 911
1960760364034182203145305419051008
1961798314317390533338322420051076

Vectors (single dimension)

In [11]:
vec <- c(1,2,3,10,100) # c create vector
vec
  1. 1
  2. 2
  3. 3
  4. 10
  5. 100

Matrix (two dimension)

In [12]:
mat <- matrix (c(1,2,3,4,5,6),nrow=2)
mat
A matrix: 2 × 3 of type dbl
135
246

Math: element-wise Operation

In [13]:
vec+4
  1. 5
  2. 6
  3. 7
  4. 14
  5. 104
In [14]:
vec*4
  1. 4
  2. 8
  3. 12
  4. 40
  5. 400
In [15]:
vec*vec
  1. 1
  2. 4
  3. 9
  4. 100
  5. 10000

Matrix multiplication

In [16]:
vec %*% vec # inner multipication
A matrix: 1 × 1 of type dbl
10114
In [17]:
vec%o% vec  # outer multipication
A matrix: 5 × 5 of type dbl
1 2 3 10 100
2 4 6 20 200
3 6 9 30 300
10 20 30 100 1000
100200300100010000
In [18]:
# Transpose matrix by t
mat
A matrix: 2 × 3 of type dbl
135
246
In [19]:
t(mat)
A matrix: 3 × 2 of type dbl
12
34
56

Array (n dimensional )

In [20]:
array(c(1,2,3,4,5,6), dim=c(2,2,3))
  1. 1
  2. 2
  3. 3
  4. 4
  5. 5
  6. 6
  7. 1
  8. 2
  9. 3
  10. 4
  11. 5
  12. 6

Data Types

In [21]:
# Numeric
class(0.00001) # class function gives type of data
'numeric'
In [22]:
# Character
class('hello')
'character'
In [23]:
nchar('hello')
5
In [24]:
paste ('Hi','There')
'Hi There'
In [25]:
# Logical
class(TRUE)
'logical'
In [26]:
class(T)
'logical'
In [27]:
# factor or categorical data
In [28]:
fac <- factor(c('a','b','c'))
fac
  1. a
  2. b
  3. c
Levels:
  1. 'a'
  2. 'b'
  3. 'c'
In [29]:
class (fac)
'factor'

Lists and Data Frames

Lists

A list is a one dimensional group of R objects. Create lists with list:

In [30]:
lst <- list(1,'R', TRUE)
lst
  1. 1
  2. 'R'
  3. TRUE
In [31]:
class(lst)
'list'
In [32]:
list(c(1,2),TRUE,c('a','b','c'))
    1. 1
    2. 2
  1. TRUE
    1. 'a'
    2. 'b'
    3. 'c'

Data Frame

In [33]:
# Each column can be a different data type
df <- data.frame(c(1,2,3),c('R','S','T'),c(TRUE,FALSE,TRUE))
df
A data.frame: 3 × 3
c.1..2..3.c..R....S....T..c.TRUE..FALSE..TRUE.
<dbl><chr><lgl>
1R TRUE
2SFALSE
3T TRUE
In [34]:
class(df)
'data.frame'
In [35]:
nvec <- c(one=1, two=2, three=3)
nvec
one
1
two
2
three
3
In [36]:
nvec+1
one
2
two
3
three
4
In [37]:
ndf <- data.frame(numbers=c(1,2,3),letters=c('R','S','T'),logic=c(TRUE,FALSE,TRUE))
ndf
A data.frame: 3 × 3
numbersletterslogic
<dbl><chr><lgl>
1R TRUE
2SFALSE
3T TRUE
In [38]:
names(ndf)
  1. 'numbers'
  2. 'letters'
  3. 'logic'
In [39]:
names(nvec) <- c('uno','dos','tres')
nvec
uno
1
dos
2
tres
3

Visualizing Data

Scatterplot

In [40]:
#first install ggplot2
#install.packages("ggplot2")
In [41]:
library('ggplot2')
diamonds[1:5,] # diamonds data set comes with ggplot2 package
A tibble: 5 × 10
caratcutcolorclaritydepthtablepricexyz
<dbl><ord><ord><ord><dbl><dbl><int><dbl><dbl><dbl>
0.23Ideal ESI261.5553263.953.982.43
0.21PremiumESI159.8613263.893.842.31
0.23Good EVS156.9653274.054.072.31
0.29PremiumIVS262.4583344.204.232.63
0.31Good JSI263.3583354.344.352.75
In [42]:
qplot(x,y,data=diamonds,color=price)
Warning message:
"`qplot()` was deprecated in ggplot2 3.4.0."
In [43]:
?mpg
In [44]:
View(mpg)
A tibble: 234 × 11
manufacturermodeldisplyearcyltransdrvctyhwyflclass
<chr><chr><dbl><int><int><chr><chr><int><int><chr><chr>
audi a4 1.819994auto(l5) f1829pcompact
audi a4 1.819994manual(m5)f2129pcompact
audi a4 2.020084manual(m6)f2031pcompact
audi a4 2.020084auto(av) f2130pcompact
audi a4 2.819996auto(l5) f1626pcompact
audi a4 2.819996manual(m5)f1826pcompact
audi a4 3.120086auto(av) f1827pcompact
audi a4 quattro 1.819994manual(m5)41826pcompact
audi a4 quattro 1.819994auto(l5) 41625pcompact
audi a4 quattro 2.020084manual(m6)42028pcompact
audi a4 quattro 2.020084auto(s6) 41927pcompact
audi a4 quattro 2.819996auto(l5) 41525pcompact
audi a4 quattro 2.819996manual(m5)41725pcompact
audi a4 quattro 3.120086auto(s6) 41725pcompact
audi a4 quattro 3.120086manual(m6)41525pcompact
audi a6 quattro 2.819996auto(l5) 41524pmidsize
audi a6 quattro 3.120086auto(s6) 41725pmidsize
audi a6 quattro 4.220088auto(s6) 41623pmidsize
chevroletc1500 suburban 2wd5.320088auto(l4) r1420rsuv
chevroletc1500 suburban 2wd5.320088auto(l4) r1115esuv
chevroletc1500 suburban 2wd5.320088auto(l4) r1420rsuv
chevroletc1500 suburban 2wd5.719998auto(l4) r1317rsuv
chevroletc1500 suburban 2wd6.020088auto(l4) r1217rsuv
chevroletcorvette 5.719998manual(m6)r1626p2seater
chevroletcorvette 5.719998auto(l4) r1523p2seater
chevroletcorvette 6.220088manual(m6)r1626p2seater
chevroletcorvette 6.220088auto(s6) r1525p2seater
chevroletcorvette 7.020088manual(m6)r1524p2seater
chevroletk1500 tahoe 4wd 5.320088auto(l4) 41419rsuv
chevroletk1500 tahoe 4wd 5.320088auto(l4) 41114esuv
⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮⋮
toyota toyota tacoma 4wd3.419996auto(l4) 41519rpickup
toyota toyota tacoma 4wd4.020086manual(m6)41518rpickup
toyota toyota tacoma 4wd4.020086auto(l5) 41620rpickup
volkswagengti 2.019994manual(m5)f2129rcompact
volkswagengti 2.019994auto(l4) f1926rcompact
volkswagengti 2.020084manual(m6)f2129pcompact
volkswagengti 2.020084auto(s6) f2229pcompact
volkswagengti 2.819996manual(m5)f1724rcompact
volkswagenjetta 1.919994manual(m5)f3344dcompact
volkswagenjetta 2.019994manual(m5)f2129rcompact
volkswagenjetta 2.019994auto(l4) f1926rcompact
volkswagenjetta 2.020084auto(s6) f2229pcompact
volkswagenjetta 2.020084manual(m6)f2129pcompact
volkswagenjetta 2.520085auto(s6) f2129rcompact
volkswagenjetta 2.520085manual(m5)f2129rcompact
volkswagenjetta 2.819996auto(l4) f1623rcompact
volkswagenjetta 2.819996manual(m5)f1724rcompact
volkswagennew beetle 1.919994manual(m5)f3544dsubcompact
volkswagennew beetle 1.919994auto(l4) f2941dsubcompact
volkswagennew beetle 2.019994manual(m5)f2129rsubcompact
volkswagennew beetle 2.019994auto(l4) f1926rsubcompact
volkswagennew beetle 2.520085manual(m5)f2028rsubcompact
volkswagennew beetle 2.520085auto(s6) f2029rsubcompact
volkswagenpassat 1.819994manual(m5)f2129pmidsize
volkswagenpassat 1.819994auto(l5) f1829pmidsize
volkswagenpassat 2.020084auto(s6) f1928pmidsize
volkswagenpassat 2.020084manual(m6)f2129pmidsize
volkswagenpassat 2.819996auto(l5) f1626pmidsize
volkswagenpassat 2.819996manual(m5)f1826pmidsize
volkswagenpassat 3.620086auto(s6) f1726pmidsize
In [45]:
qplot(displ,hwy,data=mpg)

Aesthetics

In [46]:
qplot(displ,hwy,data=mpg,color=class)
In [47]:
qplot(displ,hwy,data=mpg,shape=class)
Warning message:
"The shape palette can deal with a maximum of 6 discrete values because
more than 6 becomes difficult to discriminate; you have 7. Consider
specifying shapes manually if you must have them."
Warning message:
"Removed 62 rows containing missing values (`geom_point()`)."

Facetting

Smaller plots that display different subsets of the data. Also useful for exploring conditional relationships. Useful for large data.

In [48]:
qplot(displ,hwy,data=mpg)+facet_grid(. ~ cyl)
In [49]:
qplot(displ,hwy,data=mpg)+facet_grid(drv ~ .)
In [50]:
qplot(displ,hwy,data=mpg)+facet_grid(drv ~ cyl)
In [51]:
qplot(displ,hwy,data=mpg)+facet_grid( ~ class)

Geoms

In [52]:
qplot(displ,hwy,data=mpg,geom=c("point","smooth"))
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
In [53]:
qplot(class,hwy,data=mpg,geom='boxplot')
In [54]:
qplot(reorder(class,hwy),hwy,data=mpg,geom='boxplot') # Reorder is based on mean

Position Adjustments

Histogram

In [55]:
qplot(x,data=diamonds)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Bar Chart

In [56]:
qplot(cut,data=diamonds,geom='bar',fill=cut)
In [57]:
qplot(color,data=diamonds,geom='bar',fill=cut)

Visualizing Distributions

In [58]:
qplot(carat,data=diamonds,binwidth=0.1)

Adding Zoom

In [59]:
zoom <- coord_cartesian(xlim=c(50,70))
In [60]:
qplot(depth,data=diamonds,binwidth=0.2)+zoom
In [61]:
qplot(depth,data=diamonds,binwidth=0.2,fill=cut)+zoom
In [62]:
qplot(depth,data=diamonds,geom="freqpoly", color=cut)+zoom
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
In [63]:
qplot(depth,data=diamonds,geom="density", color=cut)+zoom

Visalizing Big Data

In [64]:
# install.packages('hexbin')
qplot(carat,price,data=diamonds,geom='hex')
Warning message:
"Computation failed in `stat_binhex()`
Caused by error in `compute_group()`:
! The package `hexbin` is required for `stat_binhex()`"
In [65]:
qplot(carat,price,data=diamonds,geom='density2d')
In [66]:
qplot(carat,price,data=diamonds,geom=c('point','density2d'))
In [67]:
qplot(carat,price,data=diamonds,geom='smooth')
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
In [68]:
qplot(carat,price,data=diamonds,geom='smooth',color=cut)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
In [69]:
qplot(carat,price,data=diamonds,geom='smooth',group=cut)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
In [70]:
qplot(carat,price,data=diamonds,geom='smooth',color=cut,se=FALSE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
In [71]:
qplot(carat,price,data=diamonds,geom='smooth',color=cut,method=lm)
`geom_smooth()` using formula = 'y ~ x'
In [72]:
qplot(carat,price,data=diamonds,color='blue')
In [73]:
qplot(carat,price,data=diamonds,color=I('blue'))
In [74]:
qplot(carat,price,data=diamonds,size=I(0.5), alpha=I(0.1))

Saving Graphs

In [75]:
# get working directory
#getwd()
In [76]:
# Specify size in inches
ggsave("my-plot.pdf", width=6, height=6)