Introduction to Git¶
What is Git, what is the difference between GitHub and GitLab?
Imagine working on a same code base or project with some developers. How we can do this without any conflicting. Well, we can do this with Git, GitHub, or GitLab command.
Git is distributed version control system (VCS). It is a system that tracks changes to project files over time and enables to record project changes and go back to a specific version of the tracked files, at any given point in time.
GitHub/GitLab are hosted Git. Where Git is the underlying system that runs on your machine, GitHub/GitLab is hosted on the cloud or via the web. These allows developers from all over the world can work together and collaborate on different codes or projects.
Some benefits of Git/GitHub:
- Track the changes, It states the changes at anytime
- Historical Backup-"Snapshop" it is like
save as
bottom. We can keep a previous version and revert to if we need to, while sill copying and then making changing on top of that previous version
- Team based development
- Flexible. It allows to work locally on a project, or Git/GitHub can be used as your DevOps flows so we can integrate our repository, so kick off automated testing.
- Git is typically interacted with command line on local machine while Git is on the web.
- Trunk-based development. See Figure below. Each developer can branch off of the main branch of code, make some changes and merge the code back into the main branch.
If we had a repository which is a place to store your code and changes to your code. It can be hosted on cloud or web (GitHub). If I want to work on that code on my local machine, I should clone this repo onto my local machine and create my trunk-based development. Then I should create a branch of code called for example Mehdi-Branch
. Next is to create our application, called it working-copy
, commit it and finally push it back to repo. Coworker should go through this process and have pull request to ask for review the code.
So, Git is a tool that manages changes made to the files and directories in a project.
Material saved to Git can never be lost.
When your work has conflicts with someone else's, Git automatically notifies you. Therefore, so it's harder to accidentally overwrite work.
Set up¶
Git can be downloaded from here: https://git-scm.com/downloads. Download Git and follow the instruction to get things set up on your computer.
After installing, type git --version
on your terminal to verify that Git is ready to use:
Setting Name & Email¶
Run the following commands in your terminal, to identify yourself with Git. Notice, this name and email has no application other than reporting:
git config --global user.name "Name"
git config --global user.email "Example@email.com"
Repositories¶
Where information is stored by Git?
Each Git project has two part:
- files and directories that you created and edit directly
- Extra information that Git stores about project's history
The combination of two parts is called repository
A repository can be initialized by git init
that initializes a repository
Extra information will be stored directly in directory called .git
located in root directory of the repository. `.git` should not be edited or deleted.
Two major types of Git repositories are:
- Local repository - you can work on the local version of your project on an isolated repository stored on your own computer.
- Remote repository - stored outside of isolated local system, usually on a remote server, web or cloud such as GitHub, GitLab. This is the place to share project code, see other people's code and integrate it into local version of the project, and also push the changes to the remote repository.
Check the status¶
How to track what have been changed?
There is a staging area to store files with changes you want to save that haven't been saved yet. Files in the staging area is similar to having them in box, while committing means mailing those files: data in box can be changed, added , but after committing (mailing), further changes cannot be done.
git status
shows the files are in staging area, and the files have changes that haven't yet been put there.
since we do not change the repo, there is no commit
Staging files¶
We can use git add
command to add our files to the staging area, which allows them to be tracked. See the example below to create a file diff-demo
, initialize a repo, create a file file.txt
.
mkdir diff-demo
cd diff-demo
git init
echo hello > myfile.txt
git add myfile.txt
We can append new line to the end of the file:
echo “text_2” >> myfile.txt
To add multiple files, we can do this:
git add myfile1.txt myfile2.txt myfile3.txt
This can be tedious, instead of adding the files individually, all the files inside the project folder can be added to the staging area by the command below:
git add .
Making commits¶
A commit is considered as a snapshot of our code at a particular time, which we are saving to the commit history of our repository. git commit
is used to save the changes in the staging area. It always saves everything that is in the staging area as one unit. Git requires you to enter a log message while committing changes. It keeps informed the next person to examine the repository why you made a change. By default, Git launches a text editor for writing this message. You can use -m "some message in quotes" on the command line as:
git commit -m "Add text file"
if a message is accidentally mistyped, it can be changed by --amend
flag.
git commit --amend - m "edited message"
View Commit History¶
The command git log
is used to view the log of the project's history. Log entries are shown most recent first, and look like this:
The commit
line displays a unique ID for the commit called a hash. The other lines tell who made the change, when, and what log message they wrote for the change.
It's often useful to inspect only the changes to particular files or directories since a project's entire log can be overwhelming. It can be done using git log path
, where path
is the path to a specific file or directory.
The following command can be used to go back to a previous state of committed project code:
git checkout <commit-hash>
Replace <commit-hash>
with the actual hash to visit, listed with the git log command.
What is in a diff?¶
A diff
is a formatted display of the differences between two sets of files. git diff
shows all the changes in your repository, while git diff directory
shows the changes to the files in some directory.
The first comparison is already created. Just run git diff
and the results will be:
a and b are placeholders meaning "the first version" and "the second version".
An index line is keys into Git's internal database of changes.
--- a/myfile.txt and +++ b/myfile.txt, wherein lines being removed are prefixed with - and lines being added are prefixed with +.
A line starting with @@ that tells where the changes are being made.
-showing deletions and + showing additions
A file can be added to staging area by git add filename
(git add myfile.txt
)
Repositories¶
How Information are Stored by Git
Git uses a three-level structure to store information by each commit.
- commit: containing metadata including the author, time of commit and the commit message
- tree: Each commit also has a tree. When that commit happens, it tracks the names and locations in the repository. In
- blob: each tree has a blob which contains a compressed snapshot of the contents of the file when the commit happens. blob stands for binary large object, which is a SQL database term for "may contain data of any kind".
What is a hash?
Committing to a repository leads to a unique identifier called a hash generated by random number generator called a hash function. It is a 40-character hexadecimal string like 8ffdfb132458dc9b377bc5698cc500c7052121f7
.
Hashes make Git capable of sharing data between repositories. Hashes guaranteed for the same files should be the same. If two commits contain the same files and have the same ancestors, their hashes will be the same as well.
How to view a specific commit?
The command git show
with the first few characters of the commit's hash can be used to view the details of a specific commit. For example, the command git show 8ffdfb
produces this:
lines that the change removed are prefixed with -, while lines that it added are prefixed with +.
Branches¶
branch could be considered as an individual timeline of project commits.
Git allows to create many of alternative environments (branches) so other versions of our project code can exist and be tracked in parallel.
That allows us to add new features in separate branches, without touching the 'official' or master branch of the project. When a repository is initialized and commits are started, they are automatically saved to the master branch by default.
Creating a new branch¶
The following command can be used to create a new branch:
git branch <new-branch-name>
Changing branches¶
git checkout
command can be used to switch to a different branch.
To go back to master branch, we can use this command:
git checkout master
See all Branches my Repository¶
Every Git repository by default has a branch called master
. You can run the command git branch
to list all of the branches in a repository. The branch you are currently in will be shown with a *
beside its name.
View Difference between Branches¶
Branches and revisions are closely connected, and commands that work on the latter usually work on the former. For example, just as git diff revision-1..revision-2
shows the difference between two versions of a repository, git diff branch-1..branch-2
shows the difference between two branches.
Create a Local Repository and Push it into GitHub¶
- First Create a local project.
- convert working directory into GitHub repository by
git init
(it creates .git folder in the folder confirming git repository).
- Use
git add .
command to add all files to the staging area.
- Add first commit by
git commit -m "First Commit"
.
- Then go to GitHub and create a repository as below:
- Copy the link of GitHub repository.
- Go back to terminal and write this command
git remote add origin
+paste the url of repository.
- Next use
git push -u origin master
to push local library to GitHub
- Finally use
git push --set-upstream origin master
command: local master branch is linked to remote master branch now
- Any change to local repository can be easily pushed to GitHub by:
git add .
: add files to the staging area.git commit -m "add a message"
: make a commitgit push
: push local branch to GitHub
Reference¶
- Home
-
- Prediction of Movie Genre by Fine-tunning GPT
- Fine-tunning BERT for Fake News Detection
- Covid Tweet Classification by Fine-tunning BART
- Semantic Search Using BERT
- Abstractive Semantic Search by OpenAI Embedding
- Fine-tunning GPT for Style Completion
- Extractive Question-Answering by BERT
- Fine-tunning T5 Model for Abstract Title Prediction
- Image Captioning by Fine-tunning ViT
- Build Serverless ChatGPT API
- Statistical Analysis in Python
- Clustering Algorithms
- Customer Segmentation
- Time Series Forecasting
- PySpark Fundamentals for Big Data
- Predict Customer Churn
- Classification with Imbalanced Classes
- Feature Importance
- Feature Selection
- Text Similarity Measurement
- Dimensionality Reduction
- Prediction of Methane Leakage
- Imputation by LU Simulation
- Histogram Uncertainty
- Delustering to Improve Preferential Sampling
- Uncertainty in Spatial Correlation
-
- Machine Learning Overview
- Python and Pandas
- Main Steps of Machine Learning
- Classification
- Model Training
- Support Vector Machines
- Decision Trees
- Ensemble Learning & Random Forests
- Artificial Neural Network
- Deep Neural Network (DNN)
- Unsupervised Learning
- Multicollinearity
- Introduction to Git
- Introduction to R
- SQL Basic to Advanced Level
- Develop Python Package
- Introduction to BERT LLM
- Exploratory Data Analysis
- Object Oriented Programming in Python
- Natural Language Processing
- Convolutional Neural Network
- Publications