• Post Reply Bookmark Topic Watch Topic
  • New Topic
programming forums Java Mobile Certification Databases Caching Books Engineering Micro Controllers OS Languages Paradigms IDEs Build Tools Frameworks Application Servers Open Source This Site Careers Other Pie Elite all forums
this forum made possible by our volunteer staff, including ...
Marshals:
  • Campbell Ritchie
  • Jeanne Boyarsky
  • Ron McLeod
  • Paul Clapham
  • Liutauras Vilda
Sheriffs:
  • paul wheaton
  • Rob Spoor
  • Devaka Cooray
Saloon Keepers:
  • Stephan van Hulst
  • Tim Holloway
  • Carey Brown
  • Frits Walraven
  • Tim Moores
Bartenders:
  • Mikalai Zaikin

Git Merge Rebase book

 
Ranch Hand
Posts: 572
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi,
Your article was interesting . I would have thought the second commit would have just stored changed lines .
What happens if the disk storing the repo runs out of space?

thanks,
Paul
 
Author
Posts: 37
7
  • Likes 1
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Paul,

Not the author of the Git Merge Rebase book, but I thought I'd pitch in a few thoughts—one of the biggest misconceptions of Git is the belief that commits store diffs (or deltas). In actuality, every commit in Git stores an entire version of your project. This makes Git extremely efficient—b/c when you change from one commit to another (say you switch branches, or you checkout a commit or a tag) Git simply recreates your working directory to look like the version stored in your commit.

Another misconception is that Git stores files—this (as you might have concluded from the original article) is not true. Git stores the contents of files in blobs—and stores the metadata of the files themselves (name, path, type) somewhere else, that is the tree. The (root) tree represents the state of the index at the time you made the commit, so the commit simply stores a reference to the tree. This is another great trick—separating the contents of files (blobs) from the metadata about the files (trees) themselves.

All this leads to how Git is so efficient. This efficiency comes from Git's internal datastructure—which is comprised of blobs/trees/commits (as the article describes), except these are all immutable. This is a powerful idea, leveraged by functional programming languages like Clojure, wherein, if something is immutable, it can be shared indiscriminately. That is, even if the same file "blob" is stored in multiple commits, Git does not have to make multiple copies of it. It simply stores a reference to the blob in a tree, and that tree is recorded in a commit.

In other words, multiple commits can share the same blob!

I've written about this on my blog, if you'd like a more in-depth read:

https://looselytyped.com/blog/2014/08/31/gits-guts-part-i/
https://looselytyped.com/blog/2014/10/31/gits-guts-part-ii/

Here is the image that might explain what I've been trying to say - https://looselytyped.com/posts/2014-08-31-gits-guts-part-i/drawing.svg (The rectangles represent blobs, triangles represent trees and circles represent commits)—notice how multiple commits can reference the same blob.

What happens if the disk storing the repo runs out of space?



The same that would happen to any other system when you run out of disk space! FWIW, if you ran out of disk space, it will probably be the OS and other critical pieces of software that would break first

Hope this helps. Feel free to reach out if you have any other questions.

Regards,
 
Marshal
Posts: 79180
377
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator

Raju Gandhi wrote:. . . multiple commits can share the same blob! . . .

And maybe the blobs are compressed? That would make it economical on memory space.
 
paul nisset
Ranch Hand
Posts: 572
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Raju,
Thanks for sharing your knowledge. Storing in blobs and tracking meta data is interesting. I can see how that would be more efficient.
Especially if you have a lot of people checking out files but not pushing changes that often.
-Paul
 
Saloon Keeper
Posts: 27764
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Hi Raju, glad to see you're still around!

Just for info, I looked at my own git archives to see what I could trim to save some space. I have quite a number of projects on my local server, including the source for my Spring Boot based port of the Gourmet Recipe Manager.

Almost without exception it wasn't worth the effort to try and reclaim space. About the only project whose git archive was really large was really large - something like 1.5GB in source. Most of which were binary objects from external sources whose long-term accessibility was in doubt. I think I deleted that one, since it was passed on to the client several years ago.

So git doesn't carry much guilt.
 
Author
Posts: 141
6
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Yes, Raju is spot on. Also, yup, the blobs are in fact compressed/deflated (and also merged as soon as the repo grows) but that is another topic.

All in all, it's quite the nifty storage mechanism, but it's also the reason why Git isn't a great version control system for something like games, with a large number of binary artifacts that change often. In those cases, you'll find companies use Perforce.
 
Raju Gandhi
Author
Posts: 37
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Campbell Ritchie

And maybe the blobs are compressed? That would make it economical on memory space.



Yes. Blobs/trees/commits are compressed, using zlib. Pro Git has a great chapter on this as well—https://git-scm.com/book/en/v2/Git-Internals-Git-Objects and they describe how you can unpack those objects using Ruby at the very bottom.

@Paul Nisset

Especially if you have a lot of people checking out files but not pushing changes that often.



Not to nitpick but you don't checkout files in Git. You checkout commits—be that directly via git checkout or a git switch or what-have-you. Furthermore, since blobs/trees/commits are immutable, when you push or pull, Git traces the commit graph, and only fetches/pushes the "new" things.

Just for info, I looked at my own git archives to see what I could trim to save some space



In my experience, almost always when I've seen a large Git repository, it's because of binaries (which Git is terrible at managing). That's not to say that there isn't a reason to do this—assets like images for web projects, proprietary executables that do not lend themselves well to a package management system—all of these are good reasons to add and commit binaries. But if its just plain old source code, I'd never worry about large repos because Git compresses a lot of the objects, and the immutable nature of it's datastructure makes sharing easy.

One final thing to note—Git has another format call "Packfiles" that further improves disk usage. You can find more info here - https://git-scm.com/book/en/v2/Git-Internals-Packfiles

Quoting @Tim Holloway

So git doesn't carry much guilt.



Words to live by.

Hope this helps. Feel free to reach out if you have any other questions.

Regards,

[Edited for formatting]
 
paul nisset
Ranch Hand
Posts: 572
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks Marco.
That is a good point about the type of Repo making a difference.
 
paul nisset
Ranch Hand
Posts: 572
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Raju

when you push or pull, Git traces the commit graph


That's something don't know about but maybe a reason for reading the book.
I thought that when I did a checkout ,git just over wrote my local copy with the entire file(s) that I checked out.

 
Raju Gandhi
Author
Posts: 37
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Paul

I thought that when I did a checkout ,git just over wrote my local copy with the entire file(s) that I checked out.



You are absolutely correct. But my point was you are checking out a "commit", and not files. Since Git stores the tree pointer in the commit, it knows which files a.k.a blobs (and sub-directories which are stored as nested trees inside other trees) made up the tree that was created when you made that commit.

So it unpacks the root "tree"—which tells it which "files" (blobs) are at the root level, and which sub-directories (nested trees) it needs to create at the root level. It then recurses the sub-trees and so on and so forth recreating the entire working directory just like it looked like when that commit was made.

This is also the reason why Git will sometimes not let you switch branches—b/c it essentially rewrites the entire working directory, but if you have modified or staged files, it can't do so without losing your changes. So it prompts you to either commit or stash them.
 
paul nisset
Ranch Hand
Posts: 572
2
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
@Raju
Thanks for the clarification . That makes sense.
 
Tim Holloway
Saloon Keeper
Posts: 27764
196
Android Eclipse IDE Tomcat Server Redhat Java Linux
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Much information of interest here. Many thanks to our authors!

I recently looked into cherry-picking as a possible solution to a problem I had. Ultimately didn't go that route, but it definitely would have helped to have had more documentation on the subject!
 
Raju Gandhi
Author
Posts: 37
7
  • Mark post as helpful
  • send pies
    Number of slices to send:
    Optional 'thank-you' note:
  • Quote
  • Report post to moderator
Thanks @Tim!

I recently looked into cherry-picking as a possible solution to a problem I had



I don't encourage cherry-picking (in my book I listed in the "Things we didn't cover" chapter) b/c it can often be abused. Excessive use of cherry-picking is usually symptomatic of bad Git workflows (That's not to say I think that was your problem—it's just an observation). I can certainly help answer any questions you might have though.

Regards,
 
reply
    Bookmark Topic Watch Topic
  • New Topic