Cjku/git

Outline

Things that are not involved in this article:

tag
git reset [--merge | --keep]
git reset --patch: interactive reset.
git replace: http://git-scm.com/blog/2010/03/17/replace.html
git bisect
git stash
git alias: http://goo.gl/TrBUbz
git hook: might skip in the first session
git rerere: might skip in the first session

git directory

Figure 1 describe the tree architecture of git directory after clone from a remote.

 # Clone from a repository which has a README.md file and single commit only
 $ git clone https://github.com/CJKu/SingleCommit.git

Figure 1. git directory

Table 1 roughly describes four different object types. I will explain more detail in the next section. Just keep this table as a reference.

Object types	Description
Blobs	Binary large object. Hold only the content of a file, nothing more.
Trees	Be similar to a file system directory. A tree object holds path names and identifiers(SHA1) of blobs and sub-tree objects.
Commits	A snapshot of content change, which consist of parent commit(if any), the name of author and comitter, commit message, and identifier(SHA1) of new tree node.
Tags	Human readable name of an object.

Table 1. Object types.

In the remaining sections, I am going to explain roles of each object and how they interact with git API.

object relation

In the beginning, let's create a repository, named sample_1, with single commit by following steps.

 ###############################################
 ## Create a simple commit graphs
 ###############################################
 $ mkdir sample_1 && cd sample_1
 $ git init
 $ echo a > a.txt && echo b > b.txt
 $ mkdir sub_dir && echo c > sub_dir/c.txt
 $ git add --all
 $ git commit -m "add a.txt, b.txt, c.txt"
 [master (root-commit) 73b1c13] add a.txt, b.txt, c.txt
  3 files changed, 3 insertions(+)
  create mode 100644 a.txt
  create mode 100644 b.txt
  create mode 100644 sub_dir/c.txt
 $ tree .git # or "find .git"
 17 directories, 24 files

Figure 2. object relation

Figure 2 depicts the outline of object store. The tree architecture on your device should be exact the same with figure 2, except SHA1 of the commit object, that is because committer and author in your environment differ from mine.

The next step is to figure out how many objects was been created and relation among them.

 #######################################################
 ## Look into commit/tree/blob objects in object store.
 #######################################################
 # HEAD(symref) refers to current branch, which is master
 $ cat .git/HEAD
 ref: refs/heads/master
 # master(ref) refers to the latest commit object
 $ cat .git/refs/heads/master
 ed0a8533886a3ca5810046506fec786648e89971 # SHA1 of the commit object
 
 # Alternatively, you may use rev-parse command to get SHA1 of the referee, which give you the same result.
 $ git rev-parse HEAD
 ed0a8533886a3ca5810046506fec786648e89971
 $ git rev-parse master
 ed0a8533886a3ca5810046506fec786648e89971
 
 # Read the type of object{ed0a8}. Generally, a branch points to a commit object.
 $ git cat-file -t ed0a8
 commit
 
 # Read the content of commit{ed0a8}.
 $ git cat-file -p ed0a8
 tree 92e349e2e492b65e8137a9b435c5360b51f16f24
  author CJKu <cjcool.tw@gmail.com> 1414401560 +0800
  committer CJKu <cjcool.tw@gmail.com> 1414401560 +0800
 
   add a.txt, b.txt, c.txt
 
 # commit{ed0a8} refers to tree{92e34}. Read content of tree{92e34}.
 $ git cat-file -t 92e34
 tree
 $ git cat-file -p 92e34
 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
 040000 tree cf67e9ef3a0fc6d858423fc177f2fbbe985a6f17    sub_dir
 $ git cat-file -p cf67e
 100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20    c.txt
 
 # Or, use ls-tree -r to dump all blobs in tree.
 $ git ls-tree -r 92e349e
 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
 100644 blob f2ad6c76f0115a6ba5b00456a849810e7ec0af20    sub_dir/c.txt

Commit graphs

Figure: XXXX a figure need here to explain DAG

Git chooses directed acyclic graph (DAG) as commit graphs. Regards to DAG, you may refer to Wikipedia, or read simpler explanation here.

Not occurring in cycles. That means departure from a vertex, there is no path lead back to origin.
All edges are one-direction.

This design logic appears in git all around.

If you know a SHA1 of a commit object, you can trace back all its ancestor, while you have no idea how many descendants it has. And that's the reason why we use HEAD(tip of a path) that often.
local repository, which is cloned from a remote, knows the URL of a depot; while that depo has no idea how many clone repos scatter on the internet.

Cache and commit

 Linus Torvalds argued on the Git mailing list that you can’t grasp and fully appreciate the power of Git without first understanding the purpose of the index.
   - Version Control with Git, 2nd Edition, By: Jon Loeliger; Matthew McCullough, Publisher: O'Reilly Media, Inc.

To prevent confusion, when a git user says

Cache a file
Put a file in the index
Stage a file

All these statements mean the same thing: "I git add a file"

The index(referred as "stage area" sometimes) put things that you proposed for the next commit and does not contain file content. While a git-commit command submit, Git checks the index rather than your working directory to discover what to commit.

Here is a sample of using git-add and git-commit command to modify index and tree object. Figure 3 and 4 reveal git directory changes accordingly.

 $ mkdir git_sample_cache
 $ cd git_sample_cache
 $ git init
 
 # Cache a.txt and b.txt and uses ls-files to peer into index
 $ echo a > a.txt && git add a.txt
 $ echo b > b.txt && git add b.txt
 $ git ls-files --stage
 100644 78981922613b2afb6025042ff6bd878ac1994e85 0       a.txt
 100644 61780798228d17af2d34fce4cfbdf35556832472 0       b.txt
 
 # Summit change. 
 $ git commit -m "add"
 [master (root-commit) 9d262d6] add
  2 files changed, 2 insertions(+)
  create mode 100644 a.txt
  create mode 100644 b.txt
 
 # Look into master-commit-tree chain. 
 $ git rev-parse HEAD
 9d262d6394b25573fb3287b5cd019281bc8dc3b8
 $ git cat-file -t 9d262d6
 commit
 $ git cat-file -p 9d262d6
 tree f4b354863caa9cea99b95422c9dab70465757d87
 author CJKu <cjcool.tw@gmail.com> 1414482450 +0800
 committer CJKu <cjcool.tw@gmail.com> 1414482450 +0800
 
 add
 $git cat-file -t f4b35486
 tree
 $ git cat-file -p f4b354
 100644 blob 78981922613b2afb6025042ff6bd878ac1994e85    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt
 
 # Modify content of a.txt and cache it. 
 $ echo "add a line" >> a.txt
 $ git add a.txt
 
 # Print index. 
 $ git ls-files --stage
 100644 2915b75977f7d84d291f3329ce1cc251743a7c54 0       a.txt
 100644 61780798228d17af2d34fce4cfbdf35556832472 0       b.txt
 
 $ git commit -m "modify"
 [master 0492bef] modify
  1 file changed, 1 insertion(+)
 
 # Dig into new created tree.
 $ git cat-file -t 0492bef
 commit
 $ git cat-file -p 0492bef
 tree d3de1600ad651843b4659fc896c8686f76841824
 parent 9d262d6394b25573fb3287b5cd019281bc8dc3b8
 author CJKu <cjcool.tw@gmail.com> 1414483295 +0800
 committer CJKu <cjcool.tw@gmail.com> 1414483295 +0800
 
 modify
 $ git cat-file -t d3de160
 tree
 $ git cat-file -p d3de160
 100644 blob 2915b75977f7d84d291f3329ce1cc251743a7c54    a.txt
 100644 blob 61780798228d17af2d34fce4cfbdf35556832472    b.txt

Figure 3. cache - add files.

Figure 4. cache - modify a file.

To keep what you have changed in working directory to object store, you have to

Cache this tracked file
To snapshot the difference between index and current tree, new tree will be created.
Create a new commit object, which refer to this new tree, and record commit message, author and committer.

All these work can be done by using git-add and git-commit command. However, let's look deeper into commit process. This sample reveals how git appends a vertex into commit graphs.

 ###############################################
 ## git-commit ~= write-tree + commit-tree + reset
 ###############################################
 # There are two commit objects in store
 $ git log --graph --pretty=oneline --abbrev-commit
 * 43cf743 modify
 * 9d262d6 add
 # Reset to the first commit{"add"}, roll back index but don't touch working directory.
 $ git reset --mixed HEAD^
 $ git log --graph --pretty=oneline --abbrev-commit
 * 9d262d6 add
 # Delete orphan objects.
 $ git gc
 # Add a.txt into index again.
 $ git add a.txt
 # Generate a tree object according to index
 $ git write-tree
 d3de1600ad651843b4659fc896c8686f76841824
 # Confirm HEAD still refers to commit{SHA1: 9d262}
 $ git log --graph --pretty=oneline --abbrev-commit
 * 9d262d6 add
 # Create a new commit object associate with tree{SHA1: d3de1} and set parent as commit{SHA1: 9d262}
 $ echo -n "modify" | git commit-tree -p 9d262 d3de1
 43cf743105b49941565f2295db6cf7aae6f2cdc1
 # Move HEAD to this new commit{SHA1: 43cf7}
 $ git reset --soft 43cf7
 # Done! Check log
 $ git log --graph --pretty=oneline --abbrev-commit
 * 43cf743 modify
 * 9d262d6 add

While index is dirty, your next commit will write those change into object tree, which means git update tree by index in commit process; While checkout to a new branch, git regenerates index by tree object of new branch, which means git updates index by tree in switching branch process. The next two examples explain how git regenerate index via tree.

 ###############################################
 ## reconstruct index via git-read-tree(plumbing)
 ###############################################  
 # Take a look of index's content before wipe it out.
 $ git ls-files -s 
 100644 4331a357983b7acf195679e68e543405ef86cc15 0       README.md
 
 # Oops... index files is deleted occasionally.
 $ rm .git/index
 
 # get the identifier of commit object.
 $ git rev-parse HEAD
 46fbc5468dab716b1baf2141b89780586a00556f
 $ git cat-file -p 46fbc54
 tree 57a161d717cf98e78a5eea9a30f313b464fc0429
 author CJKu <cku@mozilla.com> 1414423500 +0800
 committer CJKu <cku@mozilla.com> 1414423500 +0800
 
 Initial commit
 
 # git-read-tree - Reads tree information into the index
 # you can read-tree from a commit object, or the root tree object
 # Let's read from a commit object first<tree-ish>.
 $ git read-tree 46fbc546  
 
 # Hooray! index file returns back!
 $ git ls-files -s 
 100644 4331a357983b7acf195679e68e543405ef86cc15 0       README.md
 
 # Read again from the root tree node.
 $ rm .git/index
 $ git read-tree 57a161d7 
 $ git ls-files -s 
 100644 4331a357983b7acf195679e68e543405ef86cc15 0       README.md

In practice, using read-tree plumbing command is not encouraged. Reset, porcelain command, is more promising.

 ###############################################
 ## reconstruct index via git-reset(porcelain) 
 ############################################### 
 # Keep kill index file.
 $ rm .git/index
 
 # git reset [--mixed | --soft | --hardp] [<commit>]
 # --hard    reset HEAD, index and working tree
 # --mixed   reset HEAD and index
 $ git reset --mixed HEAD  
 $ git reset --hard HEAD

status, diff and show

"git diff-files -p" == "git diff": compare index against working directory
"git diff-index -p HEAD" == "git diff HEAD": compare a tree againt working directory
"git diff-index --cached -p HEAD": compare a tree against index
"git diff-tree -p HEAD [HEAD^]" "git diff-tree -p commit_1 commit_2": compare the diff of two trees

log and reflog

change history

Reset and checkout

git-checkout and git-reset are a bit confusing at times, at least for me. The functionalities of these two commands are overlapped:

Both of them are able to move HEAD.
You can recover what you change in working directory by either of them.

From high level, you may also notice some difference in between

You can reset HEAD to a commit(a commit-ish) or branch(also, a commit-ish); You can checkout to a commit or a branch as well, though we usually git-checkout with a branch.
git-checkout change the working branch; while reset does not.

Figure N. reset and checkout at commit-ish level

OK, let's take a look at reset first:

 $ man git-reset
 $ git reset -h
 usage: git reset [--mixed | --soft | --hard] [<commit>]
    or: git reset <tree-ish> [--] <paths>...

Note: what is <tree-ish>

The definition of <tree-ish> at [git-mamual-page]

<object> indicates the object name for any type of object.
<blob> indicates a blob object name.
<tree> indicates a tree object name.
<commit> indicates a commit object name.
<tree-ish> indicates a tree, commit or tag object name. A command that takes a <tree-ish> argument ultimately wants to operate on a <tree> object but automatically dereferences <commit> and <tag> objects that point at a <tree>.
<commit-ish> indicates a commit or tag object name. A command that takes a <commit-ish> argument ultimately wants to operate on a <commit> object but automatically dereferences <tag> objects that point at a <commit>.

Here are some examples of valid git-reset command

$ git reset master -- README.txt

$ git reset master^ -- README.txt

$ git reset HEAD -- README.txt

$ git reset v1.0 -- README.txt # v1.0 is a tag name

$ git reset 0a11f0 -- README.txt # 0a11f0 is the name of a commit object

$ git reset 0b7123 -- README.txt # 0b7123 is the name of a tree object

$ git reset refs/heads/master -- README.txt

In short, a tree-ish is a thing that lead to a specific tree object. If you give a thing(name/ refname/ tag, ect...) to git, and git can resolve that thing to a unique tree object, then that thing is a tree-ish. In the following diagram, only the name of a blob is not a tree-ish, since git git can not reach any tree object start from a blob. I

Figure N. tree-ish

There are two forms: reset HEAD to a <commit> and reset paths from a <tree-ish>. In the first case, you have tree options(--merge and --keep are ignore here)

 git reset [--mixed | --soft | --hard] [<commit>]

Depend on how hard you want, different change scope applies:

soft option: change the referee of the current branch
- index file and working directory keep untouched.
- scenario: squash.
mixed option: change the referee of the current branch; regenerate index according to the tree of the new assign commit object.
- working directory keeps untouched.
- scenario: cherry-pick.
hard option: change the referee of the current branch; regenerate index according to the tree of the new assign commit object; Overwrite all modified tracked files in working directory. (Note: only tracked and dirty files will be overwritten by hard-reset. Unchanged and untracked files are out of scope)
- scenario: Overtime working, did lots of stupid things
- scenario: clearing out failed or stale merge

Another usage of reset is to reset paths:

  git reset [-q] <tree-ish> [--] <paths>

Using it while intending to recover files in working directory from a specific revision. Index or branch ref have nothing to do with this usage.

 $ echo "Jerry" > a.txt
 $ git add a.txt
 $ git commit -m "initial commit"
 $ echo "is cool" >> a.txt
 $ cat a.txt
 Jerry
 is cool
 # Oops... that is definitely a typo. He is not cool at all.
 $ git reset HEAD -- a.txt
 $ cat a.txt
 Jerry

Now, move forward to checkout command:

 $ git checkout -h
 usage: git checkout [options] <branch>
    or: git checkout [options] [<branch>] -- <file>...

No wonder, there are also two forms in git-checkout: checkout HEAD to a branch or checkout files from a branch. Let's talk about the first form: checkout HEAD to a branch or checkout

(Different with git-reset) Change the referee(branch) of HEAD.
(Different with git-reset) There is no hardness-choice in git-checkout. git-checkout always modifies working directory and the index.
(Similar to git-reset-hard) Regenerate index according to the tree of the new assign commit object; Overwrite all modified tracked files in working directory. The only difference between git-reset-hard and git-checkout is git-reset-hard never aborting, while git-checkout will.(Note: only tracked and dirty files will be overwritten by git-checkout. Unchanged files and untracked files are out of scope. checkout-aborting has direct relation with this principle).

Scott Chacon/Ben Straub wrote a fantastic article with regards to reset, and deserves you spend time on: http://git-scm.com/blog/2011/07/11/reset.html

checkout abort

I separate an individual section to discuss checkout abort, since it's really annoyed for a new hand.

If git think you may lose local change in working directory, it aborts your checkout command. Here is an example

 $ mkdir checkout_abort && cd checkout_abort
 $ git init
 $ echo "line 1" > a.txt
 $ git add a.txt
 $ git commit -m "add a.txt"
 
 $ echo "line 2" >> a.txt
 # checkout successful. Why?
 $ git checkout -b alt 
 M       a.txt
 Switched to a new branch 'alt'
 # checkout successful. Why?
 $ echo "line 3" >> a.txt
 $ git checkout master 
 M       a.txt
 Switched to a new branch 'master'
 $ git add a.txt
 # checkout aborting. Why?
 $ git commit -m "modify a.txt"
 error: Your local changes to the following files would be overwritten by checkout:
       a.txt
 Please, commit your changes or stash them before you can switch branches.
 Aborting

Go back to reset and checkout section, ..............TBD

Then, how to fix aborting? Depend on your situation, there are several strategies

Force checkout: if you don't care change lose at all, submit "git checkout -f <branch>" bravely.
Hard reset to HEAD: almost the same with previous one.
Commit your change: keep everything you change into object store by a new commit, and reset file you don't care
Stash: you need to rush into another branch to fix important thing, but you don't want to commit you change in the current branch without more analysis. Use git stash.

XXX mention git add bug here!

Branch

 $ git show-branch

merge/rebase

Remote branch

Transition

"My local is your remote; your local is his/her remote"

One direction

"You know me does not means I have to know you"

Figure 3. Remote and local

remote repository - local/current repository
remote = short name of URL
Remote-tracking branches are associated with a remote and have the specific purpose of following the changes of each branch in that remote repository.
A local-tracking branch is paired with a remote-tracking branch. It is a form of integration branch that collects both the changes from your local development and the changes from the remote-tracking branch.
Any local, nontracking branch is usually generically called a topic or development branch.
Finally, to complete the namespaces, a remote branch is a branch located in a nonlocal, remote repository. It is likely an upstream source for a remote-tracking branch.

Recipes/ Troubles

List troubles that you met while using git. Why this vcs drives you crazy? :)

Questions

Please list questions that you want to ask here.

Reference

Version Control with Git, 2nd Edition, By: Jon Loeliger; Matthew McCullough, Publisher: O'Reilly Media, Inc.
Git Recipes: A Problem-Solution Approach, By: Włodzimierz Gajda, Publisher: Apress
https://github.com/git/git/blob/master/Documentation/gittutorial.txt
https://github.com/git/git/blob/master/Documentation/gittutorial2.txt
https://github.com/git/git/blob/master/Documentation/gitcore-tutorial.txt
http://felipec.wordpress.com/2011/01/16/mercurial-vs-git-its-all-in-the-branches/
http://git-scm.com/blog/2011/07/11/reset.html
http://stackoverflow.com/questions/4044368/what-does-tree-ish-mean-in-git
http://git-scm.com/blog/2010/03/08/rerere.html
http://www.gitguys.com/topics/whats-the-deal-with-the-git-index/
http://alblue.bandlem.com/2011/10/git-tip-of-week-understanding-index.html
Directed acyclic graph, http://en.wikipedia.org/wiki/Directed_acyclic_graph
tree-ish and commit-ish, https://www.kernel.org/pub/software/scm/git/docs/gitrevisions.html#_specifying_revisions
tree-ish and commit-ish, https://www.kernel.org/pub/software/scm/git/docs/