Introduction
R is an interpreted programming language for statistical computing and graphics supported by the R Foundation. It is widely used among statisticians and data miners for developing statistical software and data analysis. This page presents how to run R in Jupyter Notebook for better documentation, reproducibility and sharing the results. Python, Jupyter Notebook and R should be already installed.
The distributions of R packages (Linux, Mac OS X, and Windows) are available at the Comprehensive R Archive Network. Download R for Windows, macOS or Linux and then install it on your machine.
Jupyter Notebook has kernels which are processes that run interactive code in a particular programming language and return output to the user. IRkernel is an R kernel for Jupyter Notebook.
IRkernel packages can be installed by running to the following command in an R console:
install.packages('IRkernel')
Then, you will have to make Jupyter see the newly installed R kernel by installing a kernel spec. To install system-wide, set user to False in the installspec command:
IRkernel::installspec(user = FALSE)
Now, the Jupyter Notebook supports both Python 3 and R programming languages. Type jupyter notebook in the anaconda console and press enter to pop up empty workspace. Now create a new R notebook by clicking on the New button and selecting R.
R packages should be installed in R console. For example, for installing “ggplot2“ type:
install.packages("ggplot2")
Then you can load the package in Jupyter Notebook and plot and embedded image as below:
First, install jupyter_nbextensions_configurator
by typing this command in the anaconda console and press enter:
pip install jupyter_nbextensions_configurator
Then, type jupyter notebook in the anaconda console
Finally, click on Table of contents icon to generate table of contents
Table of Contents can be generated on first cell. First click on Contents on top left figure below (Step 1) and then tick Add notebook TOC cell (Step 2) :
print("Hello World", quote=FALSE)
a<- 1 # a=1
b<- 2 # b=2
a+b
round(3.1415)
factorial(3) # 3!=3*2*1
sqrt(14) # square root
factorial (round(2.0015)+1)
log10(1000)+log10(100)
pi
# Load WorldPhones data set
WorldPhones # Number of phone numbers
vec <- c(1,2,3,10,100) # c create vector
vec
mat <- matrix (c(1,2,3,4,5,6),nrow=2)
mat
vec+4
vec*4
vec*vec
vec %*% vec # inner multipication
vec%o% vec # outer multipication
# Transpose matrix by t
mat
t(mat)
array(c(1,2,3,4,5,6), dim=c(2,2,3))
# Numeric
class(0.00001) # class function gives type of data
# Character
class('hello')
nchar('hello')
paste ('Hi','There')
# Logical
class(TRUE)
class(T)
# factor or categorical data
fac <- factor(c('a','b','c'))
fac
class (fac)
A list is a one dimensional group of R objects. Create lists with list:
lst <- list(1,'R', TRUE)
lst
class(lst)
list(c(1,2),TRUE,c('a','b','c'))
# Each column can be a different data type
df <- data.frame(c(1,2,3),c('R','S','T'),c(TRUE,FALSE,TRUE))
df
class(df)
nvec <- c(one=1, two=2, three=3)
nvec
nvec+1
ndf <- data.frame(numbers=c(1,2,3),letters=c('R','S','T'),logic=c(TRUE,FALSE,TRUE))
ndf
names(ndf)
names(nvec) <- c('uno','dos','tres')
nvec
#first install ggplot2
#install.packages("ggplot2")
library('ggplot2')
diamonds[1:5,] # diamonds data set comes with ggplot2 package
qplot(x,y,data=diamonds,color=price)
?mpg
View(mpg)
qplot(displ,hwy,data=mpg)
qplot(displ,hwy,data=mpg,color=class)
qplot(displ,hwy,data=mpg,shape=class)
Smaller plots that display different subsets of the data. Also useful for exploring conditional relationships. Useful for large data.
qplot(displ,hwy,data=mpg)+facet_grid(. ~ cyl)
qplot(displ,hwy,data=mpg)+facet_grid(drv ~ .)
qplot(displ,hwy,data=mpg)+facet_grid(drv ~ cyl)
qplot(displ,hwy,data=mpg)+facet_grid( ~ class)
qplot(displ,hwy,data=mpg,geom=c("point","smooth"))
qplot(class,hwy,data=mpg,geom='boxplot')
qplot(reorder(class,hwy),hwy,data=mpg,geom='boxplot') # Reorder is based on mean
qplot(x,data=diamonds)
qplot(cut,data=diamonds,geom='bar',fill=cut)
qplot(color,data=diamonds,geom='bar',fill=cut)
qplot(carat,data=diamonds,binwidth=0.1)
zoom <- coord_cartesian(xlim=c(50,70))
qplot(depth,data=diamonds,binwidth=0.2)+zoom
qplot(depth,data=diamonds,binwidth=0.2,fill=cut)+zoom
qplot(depth,data=diamonds,geom="freqpoly", color=cut)+zoom
qplot(depth,data=diamonds,geom="density", color=cut)+zoom
# install.packages('hexbin')
qplot(carat,price,data=diamonds,geom='hex')
qplot(carat,price,data=diamonds,geom='density2d')
qplot(carat,price,data=diamonds,geom=c('point','density2d'))
qplot(carat,price,data=diamonds,geom='smooth')
qplot(carat,price,data=diamonds,geom='smooth',color=cut)
qplot(carat,price,data=diamonds,geom='smooth',group=cut)
qplot(carat,price,data=diamonds,geom='smooth',color=cut,se=FALSE)
qplot(carat,price,data=diamonds,geom='smooth',color=cut,method=lm)
qplot(carat,price,data=diamonds,color='blue')
qplot(carat,price,data=diamonds,color=I('blue'))
qplot(carat,price,data=diamonds,size=I(0.5), alpha=I(0.1))
# get working directory
#getwd()
# Specify size in inches
ggsave("my-plot.pdf", width=6, height=6)