Git
Local repositories
Creating and tracking files
To set up a local git repository (also called repo), move to the directory of interest in the terminal/shell and use the following command.
git init
This creates a git repo for the current directory, and creates a subdirectory called .git
which contains all the information git needs to do its job. The initialized git repo also includes the subdirectories and their files as well. As a result, there is no need to initialize a new git repo every time a new subdirectory is created.
The next step is to rename is the default branch to main
.
git checkout -b main
We change the default branch name since the community as a whole is moving on from the previous conventional name of master
.
At any point we can ask the current status of the git repo as well.
git status
Now for tracking changes with git.
Suppose you create a new file called sentence.txt
.
echo "Tracking the first change on git woohoo" >> sentence.txt
Checking the status of the git repo (via git status
) will reveal that we will have what are called Untracked files. To track this file, we need to add it.
git add sentence.txt
Checking the status will now reveal this change under Changed to be committed. The next step would be to commit this file, with a short message which describes the changes made.
git commit -m "added text for first git commit"
The git commit
command permanently adds the changes added by the git
add
command in the .git
directory. Each commit has a hash value,
called its identifier which is practically unique.
We can view the commits made to a repo in reverse chronological order as well, note however if your project has many changes it can be a bit overwhelming, so we can use the -k
to show the last \(k\) commits.
git log -1
Other flags for git log
include --oneline
and --graph
.
Now, suppose we add another line to this file.
echo "Second line to the first file created" >> sentence.txt
Checking the status status= will this time reveal this change under Changes not staged for commit. When making a change to a file it is good practice to first view the exact changes being applied, which can be done via the following command.
git diff
From here on we can use git add
and git commit
to track these changes.
Alternatively, we can also check the differences between the files after they are added to the repo via git add
.
git diff --staged
Other flags for git diff
include --color-words
which show changed words using colours, in case lines are not too fine grained.
The process of tracking changes can be thought of as taking snapshots
of changes of a project as it progresses, where git add
defines what
goes in to the snapshot (putting them in a staging area), and git
commit
actually takes the snapshot and makes a permanent record of
it. If nothing is staged, git will prompt you to use git commit -a
which commits all changes done - this is not good practice and should
be avoided, since you may commit changes that you forgot you made.
To make git ignore certain files/directories, you can type what to ignore in a file called .gitignore
. Below is an example of a .gitignore
file where we do the following:
- Ignore all files with the extension ".xcf".
- Ignore the directory "passwd".
- Ignoring just the directory "img" inside the directory "data".
- Keep a specific .xcf called "results.xcf".
- Ignore all ".dicom" files in "data/scans".
- Ignore all ".pdf" files in different subdirectories regardless of their position in the directory tree.
*.xcf passwd/ data/img/ !results.xcf data/scans/*.dicom **/*.pdf
Exploring History
The most recent commit can also be referred to by the identifier HEAD
. Suppose we make new changes to our file.
echo "Third line to the first file created" >> sentence.txt
We can see the differences between the file now and from \(k\) commits before the current HEAD
by adding HEAD~k
in the git diff
command. Alternatively, we could also use the full/first 7 characters of the identifier hash value instead as well. For example, to see the difference between the file now and from 1 commit ago, we can do the following.
git diff HEAD~1 sentence.txt
To go back to a certain commit, we checkout.
git checkout HEAD sentence.txt
The command above reverts back to the last commit, thus deleting the
third line in sentence.txdt
. Instead of HEAD
, we can go back to any
other previous commit using its shorter identifier, say abcd123
- note
that in this case, the snapshot of the file sentence.txt
from the
abcd123
commit will be in the staging area. Note that instead if you
do not specify the file name when using git checkout
to a previous
commit, you go a detached HEAD state, where you can "reattach" your
head by checking out to the main branch via git checkout main
.
Remote repositories
Often you would want a copy of your git repo somewhere besides your own personal computer, this where the concept of git remotes are relevant. These are repos hosted online, commonly on services like Github, Gitlab, Bitbucket etc. Suppose we are using Github, with the username ghuser1
and the name of the directory of the git repo above is firstgit
. After creating a git repo on Github with the same name, we need to connect our local repo to this remote.
git remote add origin git@github.com:ghuser1/firstgit.git
In the above, origin
is a conventional name used to refer to the remote repo. Note we assume that you have SSH setup with Github.
To push local changes to the remote repo, we can do the following.
git push origin main
To pull remote changes to the local repo, we can do the following.
git pull origin main
Collaborative workflow
Suppose now that you are another person (not the owner of the repo) who has access to it nonetheless. In this case, the first step to work on that project is to clone it to your computer - suppose, to a directory called projects
.
git clone git@github.com:ghuser1/firstgit.git ~/projects
From here, a basic workflow would be to pull, add, commit, then push everytime you make a change.
However, it might be possible that by the time you want to push your changes to the remote repo, someone pushed their own changes to the remote. In this case, we have to pull the remote changes again. It is possible that the changes you just pulled from the remote conflict with the changes were about to push. In this case, git notifies you of a merge conflict, which we have to manually resolve by looking at the files where they occurred.
Merge conflicts cause a bit of friction since you have to manually resolve them. They can be reduced by pulling from the upstream repo more frequently, making smaller atomic commits, breaking files into smaller ones or any other change that makes it unlikely for more than one person working on the same file, defining tasks required to be done for the project and assigning these tasks accordingly, using conventions for code style, and perhaps most important, using different branches.
Using other repos in ongoing research projects
- Create new repository and use the duplicate feature.
- Hide this duplicated repository, which will also have all commit history from original authors.
Create a git submodule with this directory, namely,
git submodule add git@github.com:mnazaal/repo-name.git ./submodule-dir-name
- Now changes made within the submodule will be pushed to your hidden duplicated repo. This is in contrast to having forked that repo in which case your changes go to that forked repo, and forked repos cannot be made private.
- #TODO After the research project is done, find a way to push changes made in your duplicated hidden repo to a forked version of the repo, so that the contribution of the original authors are clear as well.
Extras
- You can shorten some of the commands via git alias.
- For more details on undoing changes, see here.
- Specific collaborative workflows are detailed here.
- Some notes of contributing via Github here.
- If you use Github, use your Github noreply email as your git email (this is asked when you first use git in your system, and can be updated later).
- One alternative for git based on an idea they call patches: Pijul.