Introduction
R is an interpreted programming language for statistical computing and graphics supported by the R Foundation. It is widely used among statisticians and data miners for developing statistical software and data analysis. This page presents how to run R in Jupyter Notebook for better documentation, reproducibility and sharing the results. Python and Jupyter Notebook should be already installed.
Run R on Jupyter Notebook¶
The distributions of R packages (Linux, Mac OS X, and Windows) are available at the Comprehensive R Archive Network. Download R for Windows, macOS or Linux and then install it on your machine.
Install R kernel for Jupyter Notebook¶
Jupyter Notebook has kernels which are processes that run interactive code in a particular programming language and return output to the user. IRkernel is an R kernel for Jupyter Notebook.
IRkernel packages can be installed by running to the following command in an R console:
install.packages('IRkernel')
Kernel Available on Jupyter Notebook¶
Then, you will have to make Jupyter see the newly installed R kernel by installing a kernel spec. To install system-wide, set user to False in the installspec command:
IRkernel::installspec(user = FALSE)
Launch Jupyter Notebook with R¶
Now, the Jupyter Notebook supports both Python 3 and R programming languages. Type jupyter notebook in the anaconda console and press enter to pop up empty workspace. Now create a new R notebook by clicking on the New button and selecting R.
Install R Packages and Load in Jupyter Notebook¶
R packages should be installed in R console. For example, for installing “ggplot2“ type:
install.packages("ggplot2")
Then you can load the package in Jupyter Notebook and plot and embedded image as below:
Add Table of Content on Jupyter Notebook¶
First, install jupyter_nbextensions_configurator
by typing this command in the anaconda console and press enter:
pip install jupyter_nbextensions_configurator
Then, type jupyter notebook in the anaconda console
Finally, click on Table of contents icon to generate table of contents
Table of Contents can be generated on first cell. First click on Contents on top left figure below (Step 1) and then tick Add notebook TOC cell (Step 2) :
Orientation to R¶
print("Hello World", quote=FALSE)
[1] Hello World
a<- 1 # a=1
b<- 2 # b=2
a+b
round(3.1415)
factorial(3) # 3!=3*2*1
sqrt(14) # square root
factorial (round(2.0015)+1)
log10(1000)+log10(100)
pi
Data Structures¶
# Load WorldPhones data set
WorldPhones # Number of phone numbers
N.Amer | Europe | Asia | S.Amer | Oceania | Africa | Mid.Amer | |
---|---|---|---|---|---|---|---|
1951 | 45939 | 21574 | 2876 | 1815 | 1646 | 89 | 555 |
1956 | 60423 | 29990 | 4708 | 2568 | 2366 | 1411 | 733 |
1957 | 64721 | 32510 | 5230 | 2695 | 2526 | 1546 | 773 |
1958 | 68484 | 35218 | 6662 | 2845 | 2691 | 1663 | 836 |
1959 | 71799 | 37598 | 6856 | 3000 | 2868 | 1769 | 911 |
1960 | 76036 | 40341 | 8220 | 3145 | 3054 | 1905 | 1008 |
1961 | 79831 | 43173 | 9053 | 3338 | 3224 | 2005 | 1076 |
Vectors (single dimension)¶
vec <- c(1,2,3,10,100) # c create vector
vec
- 1
- 2
- 3
- 10
- 100
Matrix (two dimension)¶
mat <- matrix (c(1,2,3,4,5,6),nrow=2)
mat
1 | 3 | 5 |
2 | 4 | 6 |
Math: element-wise Operation¶
vec+4
- 5
- 6
- 7
- 14
- 104
vec*4
- 4
- 8
- 12
- 40
- 400
vec*vec
- 1
- 4
- 9
- 100
- 10000
Matrix multiplication¶
vec %*% vec # inner multipication
10114 |
vec%o% vec # outer multipication
1 | 2 | 3 | 10 | 100 |
2 | 4 | 6 | 20 | 200 |
3 | 6 | 9 | 30 | 300 |
10 | 20 | 30 | 100 | 1000 |
100 | 200 | 300 | 1000 | 10000 |
# Transpose matrix by t
mat
1 | 3 | 5 |
2 | 4 | 6 |
t(mat)
1 | 2 |
3 | 4 |
5 | 6 |
Array (n dimensional )¶
array(c(1,2,3,4,5,6), dim=c(2,2,3))
- 1
- 2
- 3
- 4
- 5
- 6
- 1
- 2
- 3
- 4
- 5
- 6
Data Types¶
# Numeric
class(0.00001) # class function gives type of data
# Character
class('hello')
nchar('hello')
paste ('Hi','There')
# Logical
class(TRUE)
class(T)
# factor or categorical data
fac <- factor(c('a','b','c'))
fac
- a
- b
- c
Levels:
- 'a'
- 'b'
- 'c'
class (fac)
Lists and Data Frames¶
Lists¶
A list is a one dimensional group of R objects. Create lists with list:
lst <- list(1,'R', TRUE)
lst
- 1
- 'R'
- TRUE
class(lst)
list(c(1,2),TRUE,c('a','b','c'))
-
- 1
- 2
- TRUE
-
- 'a'
- 'b'
- 'c'
Data Frame¶
# Each column can be a different data type
df <- data.frame(c(1,2,3),c('R','S','T'),c(TRUE,FALSE,TRUE))
df
c.1..2..3. | c..R....S....T.. | c.TRUE..FALSE..TRUE. |
---|---|---|
<dbl> | <chr> | <lgl> |
1 | R | TRUE |
2 | S | FALSE |
3 | T | TRUE |
class(df)
nvec <- c(one=1, two=2, three=3)
nvec
- one
- 1
- two
- 2
- three
- 3
nvec+1
- one
- 2
- two
- 3
- three
- 4
ndf <- data.frame(numbers=c(1,2,3),letters=c('R','S','T'),logic=c(TRUE,FALSE,TRUE))
ndf
numbers | letters | logic |
---|---|---|
<dbl> | <chr> | <lgl> |
1 | R | TRUE |
2 | S | FALSE |
3 | T | TRUE |
names(ndf)
- 'numbers'
- 'letters'
- 'logic'
names(nvec) <- c('uno','dos','tres')
nvec
- uno
- 1
- dos
- 2
- tres
- 3
Visualizing Data¶
Scatterplot¶
#first install ggplot2
#install.packages("ggplot2")
library('ggplot2')
diamonds[1:5,] # diamonds data set comes with ggplot2 package
carat | cut | color | clarity | depth | table | price | x | y | z |
---|---|---|---|---|---|---|---|---|---|
<dbl> | <ord> | <ord> | <ord> | <dbl> | <dbl> | <int> | <dbl> | <dbl> | <dbl> |
0.23 | Ideal | E | SI2 | 61.5 | 55 | 326 | 3.95 | 3.98 | 2.43 |
0.21 | Premium | E | SI1 | 59.8 | 61 | 326 | 3.89 | 3.84 | 2.31 |
0.23 | Good | E | VS1 | 56.9 | 65 | 327 | 4.05 | 4.07 | 2.31 |
0.29 | Premium | I | VS2 | 62.4 | 58 | 334 | 4.20 | 4.23 | 2.63 |
0.31 | Good | J | SI2 | 63.3 | 58 | 335 | 4.34 | 4.35 | 2.75 |
qplot(x,y,data=diamonds,color=price)
Warning message: "`qplot()` was deprecated in ggplot2 3.4.0."
?mpg
View(mpg)
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class |
---|---|---|---|---|---|---|---|---|---|---|
<chr> | <chr> | <dbl> | <int> | <int> | <chr> | <chr> | <int> | <int> | <chr> | <chr> |
audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | compact |
audi | a4 | 3.1 | 2008 | 6 | auto(av) | f | 18 | 27 | p | compact |
audi | a4 quattro | 1.8 | 1999 | 4 | manual(m5) | 4 | 18 | 26 | p | compact |
audi | a4 quattro | 1.8 | 1999 | 4 | auto(l5) | 4 | 16 | 25 | p | compact |
audi | a4 quattro | 2.0 | 2008 | 4 | manual(m6) | 4 | 20 | 28 | p | compact |
audi | a4 quattro | 2.0 | 2008 | 4 | auto(s6) | 4 | 19 | 27 | p | compact |
audi | a4 quattro | 2.8 | 1999 | 6 | auto(l5) | 4 | 15 | 25 | p | compact |
audi | a4 quattro | 2.8 | 1999 | 6 | manual(m5) | 4 | 17 | 25 | p | compact |
audi | a4 quattro | 3.1 | 2008 | 6 | auto(s6) | 4 | 17 | 25 | p | compact |
audi | a4 quattro | 3.1 | 2008 | 6 | manual(m6) | 4 | 15 | 25 | p | compact |
audi | a6 quattro | 2.8 | 1999 | 6 | auto(l5) | 4 | 15 | 24 | p | midsize |
audi | a6 quattro | 3.1 | 2008 | 6 | auto(s6) | 4 | 17 | 25 | p | midsize |
audi | a6 quattro | 4.2 | 2008 | 8 | auto(s6) | 4 | 16 | 23 | p | midsize |
chevrolet | c1500 suburban 2wd | 5.3 | 2008 | 8 | auto(l4) | r | 14 | 20 | r | suv |
chevrolet | c1500 suburban 2wd | 5.3 | 2008 | 8 | auto(l4) | r | 11 | 15 | e | suv |
chevrolet | c1500 suburban 2wd | 5.3 | 2008 | 8 | auto(l4) | r | 14 | 20 | r | suv |
chevrolet | c1500 suburban 2wd | 5.7 | 1999 | 8 | auto(l4) | r | 13 | 17 | r | suv |
chevrolet | c1500 suburban 2wd | 6.0 | 2008 | 8 | auto(l4) | r | 12 | 17 | r | suv |
chevrolet | corvette | 5.7 | 1999 | 8 | manual(m6) | r | 16 | 26 | p | 2seater |
chevrolet | corvette | 5.7 | 1999 | 8 | auto(l4) | r | 15 | 23 | p | 2seater |
chevrolet | corvette | 6.2 | 2008 | 8 | manual(m6) | r | 16 | 26 | p | 2seater |
chevrolet | corvette | 6.2 | 2008 | 8 | auto(s6) | r | 15 | 25 | p | 2seater |
chevrolet | corvette | 7.0 | 2008 | 8 | manual(m6) | r | 15 | 24 | p | 2seater |
chevrolet | k1500 tahoe 4wd | 5.3 | 2008 | 8 | auto(l4) | 4 | 14 | 19 | r | suv |
chevrolet | k1500 tahoe 4wd | 5.3 | 2008 | 8 | auto(l4) | 4 | 11 | 14 | e | suv |
â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® |
toyota | toyota tacoma 4wd | 3.4 | 1999 | 6 | auto(l4) | 4 | 15 | 19 | r | pickup |
toyota | toyota tacoma 4wd | 4.0 | 2008 | 6 | manual(m6) | 4 | 15 | 18 | r | pickup |
toyota | toyota tacoma 4wd | 4.0 | 2008 | 6 | auto(l5) | 4 | 16 | 20 | r | pickup |
volkswagen | gti | 2.0 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | compact |
volkswagen | gti | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | compact |
volkswagen | gti | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | compact |
volkswagen | gti | 2.0 | 2008 | 4 | auto(s6) | f | 22 | 29 | p | compact |
volkswagen | gti | 2.8 | 1999 | 6 | manual(m5) | f | 17 | 24 | r | compact |
volkswagen | jetta | 1.9 | 1999 | 4 | manual(m5) | f | 33 | 44 | d | compact |
volkswagen | jetta | 2.0 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | compact |
volkswagen | jetta | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | compact |
volkswagen | jetta | 2.0 | 2008 | 4 | auto(s6) | f | 22 | 29 | p | compact |
volkswagen | jetta | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | compact |
volkswagen | jetta | 2.5 | 2008 | 5 | auto(s6) | f | 21 | 29 | r | compact |
volkswagen | jetta | 2.5 | 2008 | 5 | manual(m5) | f | 21 | 29 | r | compact |
volkswagen | jetta | 2.8 | 1999 | 6 | auto(l4) | f | 16 | 23 | r | compact |
volkswagen | jetta | 2.8 | 1999 | 6 | manual(m5) | f | 17 | 24 | r | compact |
volkswagen | new beetle | 1.9 | 1999 | 4 | manual(m5) | f | 35 | 44 | d | subcompact |
volkswagen | new beetle | 1.9 | 1999 | 4 | auto(l4) | f | 29 | 41 | d | subcompact |
volkswagen | new beetle | 2.0 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | subcompact |
volkswagen | new beetle | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | subcompact |
volkswagen | new beetle | 2.5 | 2008 | 5 | manual(m5) | f | 20 | 28 | r | subcompact |
volkswagen | new beetle | 2.5 | 2008 | 5 | auto(s6) | f | 20 | 29 | r | subcompact |
volkswagen | passat | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | midsize |
volkswagen | passat | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | midsize |
volkswagen | passat | 2.0 | 2008 | 4 | auto(s6) | f | 19 | 28 | p | midsize |
volkswagen | passat | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | midsize |
volkswagen | passat | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | midsize |
volkswagen | passat | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | midsize |
volkswagen | passat | 3.6 | 2008 | 6 | auto(s6) | f | 17 | 26 | p | midsize |
qplot(displ,hwy,data=mpg)
Aesthetics¶
qplot(displ,hwy,data=mpg,color=class)
qplot(displ,hwy,data=mpg,shape=class)
Warning message: "The shape palette can deal with a maximum of 6 discrete values because more than 6 becomes difficult to discriminate; you have 7. Consider specifying shapes manually if you must have them." Warning message: "Removed 62 rows containing missing values (`geom_point()`)."
Facetting¶
Smaller plots that display different subsets of the data. Also useful for exploring conditional relationships. Useful for large data.
qplot(displ,hwy,data=mpg)+facet_grid(. ~ cyl)
qplot(displ,hwy,data=mpg)+facet_grid(drv ~ .)
qplot(displ,hwy,data=mpg)+facet_grid(drv ~ cyl)
qplot(displ,hwy,data=mpg)+facet_grid( ~ class)
Geoms¶
qplot(displ,hwy,data=mpg,geom=c("point","smooth"))
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
qplot(class,hwy,data=mpg,geom='boxplot')
qplot(reorder(class,hwy),hwy,data=mpg,geom='boxplot') # Reorder is based on mean
Position Adjustments¶
Histogram¶
qplot(x,data=diamonds)
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Bar Chart¶
qplot(cut,data=diamonds,geom='bar',fill=cut)
qplot(color,data=diamonds,geom='bar',fill=cut)
Visualizing Distributions¶
qplot(carat,data=diamonds,binwidth=0.1)
Adding Zoom¶
zoom <- coord_cartesian(xlim=c(50,70))
qplot(depth,data=diamonds,binwidth=0.2)+zoom
qplot(depth,data=diamonds,binwidth=0.2,fill=cut)+zoom
qplot(depth,data=diamonds,geom="freqpoly", color=cut)+zoom
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qplot(depth,data=diamonds,geom="density", color=cut)+zoom
Visalizing Big Data¶
# install.packages('hexbin')
qplot(carat,price,data=diamonds,geom='hex')
Warning message:
"Computation failed in `stat_binhex()`
Caused by error in `compute_group()`:
! The package `hexbin` is required for `stat_binhex()`"
qplot(carat,price,data=diamonds,geom='density2d')
qplot(carat,price,data=diamonds,geom=c('point','density2d'))
qplot(carat,price,data=diamonds,geom='smooth')
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
qplot(carat,price,data=diamonds,geom='smooth',color=cut)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
qplot(carat,price,data=diamonds,geom='smooth',group=cut)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
qplot(carat,price,data=diamonds,geom='smooth',color=cut,se=FALSE)
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
qplot(carat,price,data=diamonds,geom='smooth',color=cut,method=lm)
`geom_smooth()` using formula = 'y ~ x'
qplot(carat,price,data=diamonds,color='blue')
qplot(carat,price,data=diamonds,color=I('blue'))
qplot(carat,price,data=diamonds,size=I(0.5), alpha=I(0.1))
Saving Graphs¶
# get working directory
#getwd()
# Specify size in inches
ggsave("my-plot.pdf", width=6, height=6)
- Home
-
- Prediction of Movie Genre by Fine-tunning GPT
- Fine-tunning BERT for Fake News Detection
- Covid Tweet Classification by Fine-tunning BART
- Semantic Search Using BERT
- Abstractive Semantic Search by OpenAI Embedding
- Fine-tunning GPT for Style Completion
- Extractive Question-Answering by BERT
- Fine-tunning T5 Model for Abstract Title Prediction
- Image Captioning by Fine-tunning ViT
- Build Serverless ChatGPT API
- Statistical Analysis in Python
- Clustering Algorithms
- Customer Segmentation
- Time Series Forecasting
- PySpark Fundamentals for Big Data
- Predict Customer Churn
- Classification with Imbalanced Classes
- Feature Importance
- Feature Selection
- Text Similarity Measurement
- Dimensionality Reduction
- Prediction of Methane Leakage
- Imputation by LU Simulation
- Histogram Uncertainty
- Delustering to Improve Preferential Sampling
- Uncertainty in Spatial Correlation
-
- Machine Learning Overview
- Python and Pandas
- Main Steps of Machine Learning
- Classification
- Model Training
- Support Vector Machines
- Decision Trees
- Ensemble Learning & Random Forests
- Artificial Neural Network
- Deep Neural Network (DNN)
- Unsupervised Learning
- Multicollinearity
- Introduction to Git
- Introduction to R
- SQL Basic to Advanced Level
- Develop Python Package
- Introduction to BERT LLM
- Exploratory Data Analysis
- Object Oriented Programming in Python
- Natural Language Processing
- Convolutional Neural Network
- Publications