* Keeping unreachable objects in a separate pack instead of loose? @ 2012-06-10 12:31 Theodore Ts'o 2012-06-10 23:24 ` Hallvard B Furuseth 2012-06-11 15:40 ` Junio C Hamano 0 siblings, 2 replies; 48+ messages in thread From: Theodore Ts'o @ 2012-06-10 12:31 UTC (permalink / raw) To: git I recently noticed that after a git gc, I had a huge number of loose objects that were unreachable. In fact, about 4.5 megabytes worth of objects. When I packed them, via: cd .git/objects ; find [0-9a-f][0-9a-f] -type f | git pack-objects pack the resulting pack file was 244k. Which got me thinking.... the whole point of leaving the objects loose is to make it easier to expire them, right? But given how expensive it is to have loose objects lying around, why not: a) Have git-pack-objects have an option which writes the unreachable objects into a separate pack file, instead of kicking them loose? b) Have git-prune delete a pack only if *all* of the objects in the pack meet the expiry deadline? What would be the downsides of pursueing such a strategy? Is it worth trying to implement as proof-of-concept? - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-10 12:31 Keeping unreachable objects in a separate pack instead of loose? Theodore Ts'o @ 2012-06-10 23:24 ` Hallvard B Furuseth 2012-06-11 14:44 ` Thomas Rast 2012-06-11 15:40 ` Junio C Hamano 1 sibling, 1 reply; 48+ messages in thread From: Hallvard B Furuseth @ 2012-06-10 23:24 UTC (permalink / raw) To: Theodore Ts'o; +Cc: git On Sun, June 10, 2012 14:31, Theodore Ts'o wrote: > I recently noticed that after a git gc, I had a huge number of loose > objects that were unreachable. In fact, about 4.5 megabytes worth of > objects. I got gigabytes once, and a full disk. See thread "git gc == git garbage-create from removed branch", May 3 2012. > When I packed them, via: > > cd .git/objects ; find [0-9a-f][0-9a-f] -type f | git pack-objects pack > > the resulting pack file was 244k. > > Which got me thinking.... the whole point of leaving the objects loose > is to make it easier to expire them, right? But given how expensive it > is to have loose objects lying around, why not: > > a) Have git-pack-objects have an option which writes the unreachable > objects into a separate pack file, instead of kicking them loose? I think this should be the default. It's very unintuitive that gc can eat up lots of disk space instead of saving space. Until this is fixed, this behavior needs to be documented - along with how to avoid it. > b) Have git-prune delete a pack only if *all* of the objects in the > pack meet the expiry deadline? > > What would be the downsides of pursueing such a strategy? Is it worth > trying to implement as proof-of-concept? Hallvard ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-10 23:24 ` Hallvard B Furuseth @ 2012-06-11 14:44 ` Thomas Rast 2012-06-11 15:31 ` Ted Ts'o 0 siblings, 1 reply; 48+ messages in thread From: Thomas Rast @ 2012-06-11 14:44 UTC (permalink / raw) To: Hallvard B Furuseth; +Cc: Theodore Ts'o, git, Jeff King, Nicolas Pitre "Hallvard B Furuseth" <h.b.furuseth@usit.uio.no> writes: > On Sun, June 10, 2012 14:31, Theodore Ts'o wrote: >> I recently noticed that after a git gc, I had a huge number of loose >> objects that were unreachable. In fact, about 4.5 megabytes worth of >> objects. > > I got gigabytes once, and a full disk. See thread > "git gc == git garbage-create from removed branch", May 3 2012. > >> When I packed them, via: >> >> cd .git/objects ; find [0-9a-f][0-9a-f] -type f | git pack-objects pack >> >> the resulting pack file was 244k. >> >> Which got me thinking.... the whole point of leaving the objects loose >> is to make it easier to expire them, right? But given how expensive it >> is to have loose objects lying around, why not: >> >> a) Have git-pack-objects have an option which writes the unreachable >> objects into a separate pack file, instead of kicking them loose? > > I think this should be the default. It's very unintuitive that > gc can eat up lots of disk space instead of saving space. > > Until this is fixed, this behavior needs to be documented - > along with how to avoid it. Starting with v1.7.10.2, and in the v1.7.11-rc versions, there's a change by Peff: 7e52f56 (gc: do not explode objects which will be immediately pruned, 2012-04-07). Does it solve your problem? -- Thomas Rast trast@{inf,student}.ethz.ch ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 14:44 ` Thomas Rast @ 2012-06-11 15:31 ` Ted Ts'o 2012-06-11 16:08 ` Jeff King 0 siblings, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 15:31 UTC (permalink / raw) To: Thomas Rast; +Cc: Hallvard B Furuseth, git, Jeff King, Nicolas Pitre On Mon, Jun 11, 2012 at 04:44:29PM +0200, Thomas Rast wrote: > > Starting with v1.7.10.2, and in the v1.7.11-rc versions, there's a > change by Peff: 7e52f56 (gc: do not explode objects which will be > immediately pruned, 2012-04-07). Does it solve your problem? I'm currently using 1.7.10.2.552.gaa3bb87, and a "git gc" still kicked loose a little over 4.5 megabytes of loose objects were not pruned via "git prune" (since they hadn't yet expired). These loose objects could be stored in a 244k pack file. So while I'm sure that change has helped, if you happen to use a workflow that uses git rebase and/or guilt and/or throwaway test integration branches a lot, there will still be a large number of unexpired commits which still get kicked loose, and won't get pruned for a week or two. What I think would make sense is for git pack-objects to have a new option which outputs a list of object id's which whould have been kicked out as loose objects if it had been given the (undocumented) --unpacked-unreachable option. Then the git-repack shell script (if given the -A option) would use that new option instead of --unpacked-unreachable, and then using the list created by this new option, create another pack which contains all of these unreachable-but-not-yet-expired objects. Regards, - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 15:31 ` Ted Ts'o @ 2012-06-11 16:08 ` Jeff King 2012-06-11 17:04 ` Nicolas Pitre 2012-06-11 17:27 ` Ted Ts'o 0 siblings, 2 replies; 48+ messages in thread From: Jeff King @ 2012-06-11 16:08 UTC (permalink / raw) To: Ted Ts'o; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 11:31:03AM -0400, Ted Ts'o wrote: > I'm currently using 1.7.10.2.552.gaa3bb87, and a "git gc" still kicked > loose a little over 4.5 megabytes of loose objects were not pruned via > "git prune" (since they hadn't yet expired). These loose objects > could be stored in a 244k pack file. Out of curiosity, what is the size of the whole repo? If it's a 500M kernel repository, then 4.5M is not all _that_ worrisome. Not that it could not be better, or that it's not worth addressing (since there are corner cases that behave way worse). But it gives a sense of the urgency of the problem, if that is the scope of the issue for average use. > What I think would make sense is for git pack-objects to have a new > option which outputs a list of object id's which whould have been > kicked out as loose objects if it had been given the (undocumented) > --unpacked-unreachable option. Then the git-repack shell script (if > given the -A option) would use that new option instead of > --unpacked-unreachable, and then using the list created by this new > option, create another pack which contains all of these > unreachable-but-not-yet-expired objects. I don't think that will work, because we will keep repacking the unreachable bits into new packs. And the 2-week expiration is based on the pack timestamp. So if your "repack -Ad" ends in two packs (the one you actually want, and the pack of expired crap), then you would get into this cycle: 1. You run "git repack -Ad". It makes A.pack, with stuff you want, and B.pack, with unreachable junk. They both get a timestamp of "now". 2. A day passes. You run "git repack -Ad" again. It makes C.pack, the new stuff you want, and repacks all of B.pack along with the new expired cruft from A.pack, making D.pack. B.pack can go away. D.pack gets a timestamp of "now". And so on, as long as you repack within the two week window, the objects from the cruft pack will never get ejected. So you might suggest that the problem is that in step 2, we repack the items from B. But if you don't, then you will accumulate a bunch of cruft packs (2 weeks worth), and those objects won't be delta'd against each other. It's probably better than making them all loose, of course (you get chunks of delta'd objects from each repack, instead of none at all), but it's far from a full solution to the issue. I think solving it for good would involve a separate list of per-object expiration dates. Obviously we get that easily with loose objects (since it is one object per file). As a workaround, it might be worth lowering the default pruneExpire from 2 weeks to 1 day or something. It is really about creating safety for operations in progress (e.g., you write the object, and then are _about_ to add it to the index or update a ref when it gets pruned). I think the 2 weeks number was pulled out of a hat as "absurdly long for an operation to take", and was never revisited because nobody cared or complained. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 16:08 ` Jeff King @ 2012-06-11 17:04 ` Nicolas Pitre 2012-06-11 17:45 ` Ted Ts'o 2012-06-11 17:46 ` Jeff King 2012-06-11 17:27 ` Ted Ts'o 1 sibling, 2 replies; 48+ messages in thread From: Nicolas Pitre @ 2012-06-11 17:04 UTC (permalink / raw) To: Jeff King; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Mon, 11 Jun 2012, Jeff King wrote: > As a workaround, it might be worth lowering the default pruneExpire from > 2 weeks to 1 day or something. It is really about creating safety for > operations in progress (e.g., you write the object, and then are _about_ > to add it to the index or update a ref when it gets pruned). I think the > 2 weeks number was pulled out of a hat as "absurdly long for an > operation to take", and was never revisited because nobody cared or > complained. Absolutely. I think this should even be considered a "fix" to lower this value not a "workaround". IIRC, the 2 weeks number was instored when there wasn't any reflog on HEAD and the only way to salvage lost commits was to use 'git fsck --lost-found'. These days, this is used only as a safety measure because there is always a tiny window during which objects are dangling before they're finally all referenced as you say. But someone would have to work hard to hit that race even if the delay was only 30 seconds. So realistically this could even be set to 1 hour. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 17:04 ` Nicolas Pitre @ 2012-06-11 17:45 ` Ted Ts'o 2012-06-11 17:54 ` Jeff King 2012-06-11 17:46 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 17:45 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jeff King, Thomas Rast, Hallvard B Furuseth, git On Mon, Jun 11, 2012 at 01:04:07PM -0400, Nicolas Pitre wrote: > > IIRC, the 2 weeks number was instored when there wasn't any reflog on > HEAD and the only way to salvage lost commits was to use 'git fsck > --lost-found'. These days, this is used only as a safety measure > because there is always a tiny window during which objects are dangling > before they're finally all referenced as you say. But someone would > have to work hard to hit that race even if the delay was only 30 > seconds. So realistically this could even be set to 1 hour. There's another useful part of the two week window, and it's as a partial workaround using .git/objects/info/alternates with one or more rewinding branches. My /usr/projects/e2fsprogs/base repo is a bare repo that contains all of my public branches, including a rewinding "pu" branch. My /usr/projects/e2fsprogs/e2fsprogs uses an alternates file to minimize disk usage, and it points at the base repo. The problem comes when I need to gc the base repo, every 3 months or so. When I do that, objects that belonged to older incarnations of the rewinding pu branch disappear. The two week window gives you a partial saving throw until development repo breaks due to objects that it depends upon disappearing. It would be nice if a gc of the devel repo knew that some of the objects it was depending on were "expired cruft", and copy them to the its local objects directory. But of course things don't work that way. Here's what I do today (please don't barf; I know it's ugly): 1) cd to the base repository; run "git repack -Adfl --window=300 --depth=100" 2) cd to the objects directory, and create my base "cruft" pack: find . --name [0-9a-f][0-9a-f] | tr -d / | git pack-objects pack- 3) hard link it into my devel repo's pack directory: ln pack-* /usr/projects/e2fsprogs/e2fsprogs/.git/objects/pack 4) to save space in my base repo, move it to the pack directory and run git prune-packed: mv pack-* pack git prune-packed 4) run "git repack -Adfl --window=300 --depth=100" in my devel repo 5) create a cruft pack in my devel repo (to save disk space): cd /usr/projects/e2fsprogs/e2fsprogs/.git/objects find . --name [0-9a-f][0-9a-f] | tr -d / | git pack-objects pack- mv pack-* pack git prune-packed This probably falls in the "don't use --shared unless you know what it does admonition in the git-clone man page. :-) Don't worry, I don't recommend that anyone *else* do this. But it works for me (although it would be nice if I made this workflow be a bit more optimized; at the very least I should make a shell script that does all this for me automatically, instead of typing all of the shell commands by hand.) - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 17:45 ` Ted Ts'o @ 2012-06-11 17:54 ` Jeff King 2012-06-11 18:20 ` Ted Ts'o 0 siblings, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-11 17:54 UTC (permalink / raw) To: Ted Ts'o; +Cc: Nicolas Pitre, Thomas Rast, Hallvard B Furuseth, git On Mon, Jun 11, 2012 at 01:45:07PM -0400, Ted Ts'o wrote: > There's another useful part of the two week window, and it's as a > partial workaround using .git/objects/info/alternates with one or more > rewinding branches. > > My /usr/projects/e2fsprogs/base repo is a bare repo that contains all > of my public branches, including a rewinding "pu" branch. > > My /usr/projects/e2fsprogs/e2fsprogs uses an alternates file to > minimize disk usage, and it points at the base repo. > > The problem comes when I need to gc the base repo, every 3 months or > so. When I do that, objects that belonged to older incarnations of > the rewinding pu branch disappear. The two week window gives you a > partial saving throw until development repo breaks due to objects that > it depends upon disappearing. You're doing it wrong (but you can hardly be blamed, because there isn't good tool support for doing it right). You should never prune or repack in the base repo without taking into account all of the refs of its children. We have a similar setup at github (every time you "fork" a repo, it is creating a new repo that links back to a project-wide "network" repo for its object store). We maintain a refs/remotes/XXX directory for each child repo, which stores the complete refs/ hierarchy of that child. It's all done by a tangled mass of shell scripts. I've considered trying to clean up and open source them, but I really doubt it would help people generally. There's so much stuff specific to our setup. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 17:54 ` Jeff King @ 2012-06-11 18:20 ` Ted Ts'o 2012-06-11 18:43 ` Jeff King 0 siblings, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 18:20 UTC (permalink / raw) To: Jeff King; +Cc: Nicolas Pitre, Thomas Rast, Hallvard B Furuseth, git On Mon, Jun 11, 2012 at 01:54:19PM -0400, Jeff King wrote: > > You're doing it wrong (but you can hardly be blamed, because there isn't > good tool support for doing it right). You should never prune or repack > in the base repo without taking into account all of the refs of its > children. Well, I don't do a simple gc. See the complicated set of steps I use to make sure I don't lose loose commits at the end of my last e-mail message on this thread. It gets worse when I have multiple devel repos, but I simplified things for the purposes of discussion. > We have a similar setup at github (every time you "fork" a repo, it is > creating a new repo that links back to a project-wide "network" repo for > its object store). We maintain a refs/remotes/XXX directory for each > child repo, which stores the complete refs/ hierarchy of that child. So you basically are copying the refs around and making sure the parent repo has an uptodate pointer of all of the child repos, such that when you do the repack, *all* of the commits end up in the parent commit, correct? The system that I'm using means that objects which are local to a child repo stays in the child repo, and if an object is about to be dropped from the parent repo as a result of a gc, the child repo has an opportunity claim a copy of that object for itself in its object database. You can do things either way. I like knowing that objects only used by a child repo are in the child repo's .git directory, but that's arguably more of a question of taste than anything else. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 18:20 ` Ted Ts'o @ 2012-06-11 18:43 ` Jeff King 0 siblings, 0 replies; 48+ messages in thread From: Jeff King @ 2012-06-11 18:43 UTC (permalink / raw) To: Ted Ts'o; +Cc: Nicolas Pitre, Thomas Rast, Hallvard B Furuseth, git On Mon, Jun 11, 2012 at 02:20:12PM -0400, Ted Ts'o wrote: > On Mon, Jun 11, 2012 at 01:54:19PM -0400, Jeff King wrote: > > > > You're doing it wrong (but you can hardly be blamed, because there isn't > > good tool support for doing it right). You should never prune or repack > > in the base repo without taking into account all of the refs of its > > children. > > Well, I don't do a simple gc. See the complicated set of steps I use > to make sure I don't lose loose commits at the end of my last e-mail > message on this thread. It gets worse when I have multiple devel > repos, but I simplified things for the purposes of discussion. Ah, right. I was thinking that your first step, which is "git repack -Adfl", would throw out old objects rather than unpack them in recent versions of git. But due to the way I implemented it (namely that you must pass --unpack-unreachable yourself, so this feature only kicks in automatically for "git gc"), that is not the case. I don't recall if that was an accident, or if I was very clever in maintaining backwards compatibility for your case. Let's just assume the latter. :) > > We have a similar setup at github (every time you "fork" a repo, it is > > creating a new repo that links back to a project-wide "network" repo for > > its object store). We maintain a refs/remotes/XXX directory for each > > child repo, which stores the complete refs/ hierarchy of that child. > > So you basically are copying the refs around and making sure the > parent repo has an uptodate pointer of all of the child repos, such > that when you do the repack, *all* of the commits end up in the parent > commit, correct? Yes. The child repositories generally have no objects in them at all (they occasionally do for a period between runs of the migration script). > The system that I'm using means that objects which are local to a > child repo stays in the child repo, and if an object is about to be > dropped from the parent repo as a result of a gc, the child repo has > an opportunity claim a copy of that object for itself in its object > database. That implies the concept of "local to a child repo", which implies that you have some set of "common" refs. I suspect in your case your base repo represents the master branch, or something similar. We actually treat our network repo as a pure parent; every repo, including the original one that everybody forks from, is a child. That makes it easier to treat the original repo as just another repo (e.g., the original owner is free to delete it, and the forks won't care). > You can do things either way. I like knowing that objects only used > by a child repo are in the child repo's .git directory, but that's > arguably more of a question of taste than anything else. Yeah, I don't think there is any real benefit to it. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 17:04 ` Nicolas Pitre 2012-06-11 17:45 ` Ted Ts'o @ 2012-06-11 17:46 ` Jeff King 1 sibling, 0 replies; 48+ messages in thread From: Jeff King @ 2012-06-11 17:46 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Mon, Jun 11, 2012 at 01:04:07PM -0400, Nicolas Pitre wrote: > IIRC, the 2 weeks number was instored when there wasn't any reflog on > HEAD and the only way to salvage lost commits was to use 'git fsck > --lost-found'. These days, this is used only as a safety measure > because there is always a tiny window during which objects are dangling > before they're finally all referenced as you say. But someone would > have to work hard to hit that race even if the delay was only 30 > seconds. So realistically this could even be set to 1 hour. Nope, the HEAD reflog dates back to bd104db (enable separate reflog for HEAD, 2007-01-26), whereas the discussion leading to gc.pruneExpire was in 2008. See this sub-thread for the discussion on the time limit: http://thread.gmane.org/gmane.comp.version-control.git/76899/focus=76943 The interesting points from that thread are: 1. You might care about object prune times if your session is interrupted. So imagine you are working on something, do not commit, go home for the weekend, and then come back. I think we are OK with most in-progress operations, because any blobs added to the index will of course be considered reachable and not pruned. What would hurt most is that you could do: $ hack hack hack $ git add foo $ hack hack hack $ git add foo $ hack hack hack [oops! I realized I really wanted the initial version of foo!] $ git fsck --lost-found If your session is interrupted during the third "hack hack hack" bit, a short prune might lose the initial version. Of course, that's _if_ a gc is run in the middle. So I find it possible, but somewhat unlikely. 2. If a branch is deleted, all of its reflogs go away immediately. A short expiration time would mean that the objects will probably go away at the next "gc". In many cases, you will be saved by the HEAD reflog, which stays around until the real expiration is reached. But not always (e.g., server-side reflogs). I'd much rather address this in the general case by saving deleted-branch reflogs, though I know you and I have disagreed on that in the past. One issue not brought up in that thread is that of bare repositories, which do not have reflogs enabled by default. In that case, the 2-week prune window is really doing something. I really wonder if there is a good reason not to turn on reflogs for all repositories, bare included. We have them on for all of the server-side repositories at github, and they have been invaluable for resolving many support tickets from people who forced a push without realizing what they were doing. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 16:08 ` Jeff King 2012-06-11 17:04 ` Nicolas Pitre @ 2012-06-11 17:27 ` Ted Ts'o 2012-06-11 18:34 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 17:27 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 12:08:24PM -0400, Jeff King wrote: > On Mon, Jun 11, 2012 at 11:31:03AM -0400, Ted Ts'o wrote: > > > I'm currently using 1.7.10.2.552.gaa3bb87, and a "git gc" still kicked > > loose a little over 4.5 megabytes of loose objects were not pruned via > > "git prune" (since they hadn't yet expired). These loose objects > > could be stored in a 244k pack file. > > Out of curiosity, what is the size of the whole repo? If it's a 500M > kernel repository, then 4.5M is not all _that_ worrisome. Not that it > could not be better, or that it's not worth addressing (since there are > corner cases that behave way worse). But it gives a sense of the urgency > of the problem, if that is the scope of the issue for average use. It' my e2fsprogs development repo. I have my "base" repo, which is what has been pushed out to the public (including a rewinding pu branch). The total size of that repo is a little over 15 megs: <tytso@tytso-glaptop.cam.corp.google.com> {/usr/projects/e2fsprogs/e2fsprogs} [maint] 899% ls ../base/objects/pack/ total 16156 908 pack-6964a1516433f16e43dcdf4fcec1996052099f31.idx 15248 pack-6964a1516433f16e43dcdf4fcec1996052099f31.pack I then have my development repo, which uses a .git/objects/info/alternates pointing at the bare "base" repo, so the only thing in this repo are my private development branches, and other things that haven't been pushed for public consumption. <tytso@tytso-glaptop.cam.corp.google.com> {/usr/projects/e2fsprogs/e2fsprogs} [maint] 900% ls .git/objects/pack/ total 1048 28 5a486e6c2156109f7dfc725b36a201c10652803d.idx 28 pack-7b2a9cccab669338f61a681e34c39362976fb5de.idx 224 5a486e6c2156109f7dfc725b36a201c10652803d.pack 768 pack-7b2a9cccab669338f61a681e34c39362976fb5de.pack The 4.5 megabytes of loose objects packed down to a 224k "cruft" repo, and 768k worth of private development objects. So depending on how you would want to do the comparison, probably the fairest thing to say is that I had a total "good" packs totally about 16 megs, and the loose cruft objects was an additional 4.5 megabytes. > I don't think that will work, because we will keep repacking the > unreachable bits into new packs. And the 2-week expiration is based on > the pack timestamp. So if your "repack -Ad" ends in two packs (the one > you actually want, and the pack of expired crap), then you would get > into this cycle: > > 1. You run "git repack -Ad". It makes A.pack, with stuff you want, and > B.pack, with unreachable junk. They both get a timestamp of "now". > > 2. A day passes. You run "git repack -Ad" again. It makes C.pack, the > new stuff you want, and repacks all of B.pack along with the > new expired cruft from A.pack, making D.pack. B.pack can go away. > D.pack gets a timestamp of "now". Hmm, yes. What we'd really want to do is to make D.pack contain those items that were are newly unreachable, not including the objects in B.pack, and keep B.pack around until the expiry window goes by. But that's a much more complicated thing, and the proof-of-concept algorithm I had outlined wouldn't do that. > I think solving it for good would involve a separate list of per-object > expiration dates. Obviously we get that easily with loose objects (since > it is one object per file). Well, either that or we need to teach git-repack the difference between packs that are expected to contain good stuff, and packs that contain cruft, and to not copy "old cruft" to new packs, so the old pack can finally get nuked 2 weeks (or whatever the expire window might happen to be) later. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 17:27 ` Ted Ts'o @ 2012-06-11 18:34 ` Jeff King 2012-06-11 20:44 ` Hallvard Breien Furuseth 2012-06-11 21:14 ` Ted Ts'o 0 siblings, 2 replies; 48+ messages in thread From: Jeff King @ 2012-06-11 18:34 UTC (permalink / raw) To: Ted Ts'o; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 01:27:32PM -0400, Ted Ts'o wrote: > The 4.5 megabytes of loose objects packed down to a 224k "cruft" repo, > and 768k worth of private development objects. > > So depending on how you would want to do the comparison, probably the > fairest thing to say is that I had a total "good" packs totally about > 16 megs, and the loose cruft objects was an additional 4.5 megabytes. OK, so that 4.5 is at least a respectable percentage of the total repo size. I suspect it may be worse for small repos in that sense, because the 4.5 megabytes is not "how big is this repo" but probably "how much work did you do in this repo in the last 2 weeks". Which should be a constant with respect to the total size of the repo. However, for a very busy repo (e.g., one used for automated integration testing or similar), the "how much work" number could be quite high. We ran into this at github due to our "merge this" button, which does test-merges to see if a pull request can be merged cleanly. Whenever the upstream branch is updated, all of the outstanding pull requests get re-tested, and the old test-merge objects become unreferenced. We ended up dropping pruneExpire to 1 day to keep the cruft to a minimum. > > 1. You run "git repack -Ad". It makes A.pack, with stuff you want, > > and B.pack, with unreachable junk. They both get a timestamp of > > "now". > > > > 2. A day passes. You run "git repack -Ad" again. It makes C.pack, > > the new stuff you want, and repacks all of B.pack along with the > > new expired cruft from A.pack, making D.pack. B.pack can go away. > > D.pack gets a timestamp of "now". > > Hmm, yes. What we'd really want to do is to make D.pack contain those > items that were are newly unreachable, not including the objects in > B.pack, and keep B.pack around until the expiry window goes by. But > that's a much more complicated thing, and the proof-of-concept > algorithm I had outlined wouldn't do that. Right. When dumping the list of unreachable objects from pack-objects, you'd want to tell it to ignore objects from any pack that contained only unreachable objects in the first place. Except that is perhaps not quite right, either. Because if a pack has 100 objects, but you happen to re-reference 1 of them, you'd probably want to leave it (even though that re-referenced one is now duplicated, the savings aren't worth the trouble of repacking the cruft objects). Of course, what is the N that makes it worth the trouble? Now you're getting into heuristics. You _could_ make a separate cruft pack for each pack that you repack. So if I have A.pack and B.pack, I'd pack all of the reachable objects into C.pack, and then make D.pack containing the unreachable objects from A.pack, and E.pack with the unreachable objects from B.pack. And then set the mtime of the cruft packs to that of their parent packs. And then the next time you pack, repacking D and E would probably be a no-op that preserves mtime, but might create a new pack that ejects some now-reachable object. To implement that, I think your --list-unreachable would just have to print a list of "<pack-mtime> <sha1>" pairs, and then you would pack each set with an identical mtime (or even a "close enough" mtime within some slop). But yet, this is all getting complicated. :) > > I think solving it for good would involve a separate list of > > per-object expiration dates. Obviously we get that easily with loose > > objects (since it is one object per file). > > Well, either that or we need to teach git-repack the difference > between packs that are expected to contain good stuff, and packs that > contain cruft, and to not copy "old cruft" to new packs, so the old > pack can finally get nuked 2 weeks (or whatever the expire window > might happen to be) later. That is harder because those objects may become re-reachable during that window. So I think you don't want to deal with "expected to contain..." but rather "what does it contain now?". The latter is easy to figure out by doing a reachability analysis (which we do as part of the repack anyway). -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 18:34 ` Jeff King @ 2012-06-11 20:44 ` Hallvard Breien Furuseth 2012-06-11 21:14 ` Jeff King 2012-06-11 21:14 ` Ted Ts'o 1 sibling, 1 reply; 48+ messages in thread From: Hallvard Breien Furuseth @ 2012-06-11 20:44 UTC (permalink / raw) To: Jeff King; +Cc: Ted Ts'o, Thomas Rast, git, Nicolas Pitre On Mon, 11 Jun 2012 14:34:14 -0400, Jeff King <peff@peff.net> wrote: > On Mon, Jun 11, 2012 at 01:27:32PM -0400, Ted Ts'o wrote: >> So depending on how you would want to do the comparison, probably >> the >> fairest thing to say is that I had a total "good" packs totally >> about >> 16 megs, and the loose cruft objects was an additional 4.5 >> megabytes. > > OK, so that 4.5 is at least a respectable percentage of the total > repo > size. I suspect it may be worse for small repos in that sense, (...) 'git gc' gave a 3100% increase with my example: $ git clone --bare --branch linux-overhaul-20010122 \ git://git.savannah.gnu.org/config.git $ cd config.git/ $ git tag -d `git tag`; git branch -D master $ du -s objects 624 objects $ git gc $ du -s objects 19840 objects Basically: Clone/fetch a repo, keep a small part of it, drop the rest, and gc. Gc explodes all the objects you no longer want. This hits you hard if your small project tracks a big one, and later ceases doing so. Maybe your project is to convert a small part of the remote into a library, or you're just tracking a few files from the remote - e.g. from the GNU Config repo. Not something one does every day, but it does happen. Tweaks to expiration time etc do not help here. Hallvard ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 20:44 ` Hallvard Breien Furuseth @ 2012-06-11 21:14 ` Jeff King 2012-06-11 21:41 ` Hallvard Breien Furuseth 0 siblings, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-11 21:14 UTC (permalink / raw) To: Hallvard Breien Furuseth Cc: Ted Ts'o, Thomas Rast, git, Nicolas Pitre, Nguyễn Thái Ngọc Duy On Mon, Jun 11, 2012 at 10:44:39PM +0200, Hallvard Breien Furuseth wrote: > >OK, so that 4.5 is at least a respectable percentage of the total > >repo > >size. I suspect it may be worse for small repos in that sense, (...) > > 'git gc' gave a 3100% increase with my example: > > $ git clone --bare --branch linux-overhaul-20010122 \ > git://git.savannah.gnu.org/config.git > $ cd config.git/ > $ git tag -d `git tag`; git branch -D master > $ du -s objects > 624 objects > $ git gc > $ du -s objects > 19840 objects > > Basically: Clone/fetch a repo, keep a small part of it, drop the > rest, and gc. Gc explodes all the objects you no longer want. I would argue that this is not a very interesting case in the first place, since the right thing to do is use "clone --single-branch"[1] to void transferring all of those objects in the first place. But there are plenty of variant cases, where you are not just deleting all of the refs, but rather doing some manipulation of the branches, diffing them to make sure it is safe to drop some bits, running filter-branch, etc. And it would be nice to make those cases more efficient. -Peff [1] It looks like "--single-branch" does not actually work, and still fetches master. I think this is a bug in the implementation of single-branch (it looks like it fetches HEAD unconditionally). +cc Duy. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 21:14 ` Jeff King @ 2012-06-11 21:41 ` Hallvard Breien Furuseth 0 siblings, 0 replies; 48+ messages in thread From: Hallvard Breien Furuseth @ 2012-06-11 21:41 UTC (permalink / raw) To: Jeff King Cc: Ted Ts'o, Thomas Rast, git, Nicolas Pitre, Nguyễn Thái Ngọc Duy On Mon, 11 Jun 2012 17:14:14 -0400, Jeff King <peff@peff.net> wrote: > On Mon, Jun 11, 2012 at 10:44:39PM +0200, Hallvard Breien Furuseth > wrote: > >> >OK, so that 4.5 is at least a respectable percentage of the total >> >repo >> >size. I suspect it may be worse for small repos in that sense, >> (...) >> >> 'git gc' gave a 3100% increase with my example: >> >> $ git clone --bare --branch linux-overhaul-20010122 \ >> git://git.savannah.gnu.org/config.git >> $ cd config.git/ >> $ git tag -d `git tag`; git branch -D master >> $ du -s objects >> 624 objects >> $ git gc >> $ du -s objects >> 19840 objects >> >> Basically: Clone/fetch a repo, keep a small part of it, drop the >> rest, and gc. Gc explodes all the objects you no longer want. > > I would argue that this is not a very interesting case in the first > place, since the right thing to do is use "clone --single-branch"[1] > to > void transferring all of those objects in the first place. Yeah, I just wanted a simple way to fetch a lot and drop most of it, since that's what triggers the problem. A simple use case would be where you did a lot of work between cloing and pruning most refs. I described some actual use cases below the command. > But there are plenty of variant cases, where you are not just > deleting > all of the refs, but rather doing some manipulation of the branches, > diffing them to make sure it is safe to drop some bits, running > filter-branch, etc. And it would be nice to make those cases more > efficient. -- Hallvard ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 18:34 ` Jeff King 2012-06-11 20:44 ` Hallvard Breien Furuseth @ 2012-06-11 21:14 ` Ted Ts'o 2012-06-11 21:39 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 21:14 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 02:34:14PM -0400, Jeff King wrote: > You _could_ make a separate cruft pack for each pack that you repack. So > if I have A.pack and B.pack, I'd pack all of the reachable objects into > C.pack, and then make D.pack containing the unreachable objects from > A.pack, and E.pack with the unreachable objects from B.pack. And then > set the mtime of the cruft packs to that of their parent packs. > > And then the next time you pack, repacking D and E would probably be a > no-op that preserves mtime, but might create a new pack that ejects some > now-reachable object. > > To implement that, I think your --list-unreachable would just have to > print a list of "<pack-mtime> <sha1>" pairs, and then you would pack > each set with an identical mtime (or even a "close enough" mtime within > some slop).... How about this instead? We distinguish between cruft packs and "real" packs by the filename. So we have "cruft-<SHA1>.{idx,pack}" and "pack-<SHA1>.{idx.pack}". Normally, git will look at any pack in the pack directory that has an .idx and .pack extension, but during repack operation, it will by only look in the pack-* packs first. If it can't find an object there, it will then fall back to trying to fetch the object from the cruft-* packs, and if it finds the object, it copies it into the new pack which is creating, thus "rescueing" an object which reappears during the expiry window. This should be a relatively rare event, and if it happens, the object will be in two packs, a pack-* pack and a cruft-* pack, but that's OK. So since git pack-objects isn't even looking in the cruft-* packs except when it needs to rescue an object, the objects in the cruft-* packs won't get copied, and we won't need to have per-object mtimes. It also means it will go faster since it's not copying the cruft-* packs at all, and possibly not even looking at them. Now all we need to do is delete any cruft-* packs which are older than the expiry window. We don't even need to look at their contents. It does imply that we may accumulate a new cruft-<SHA1> pack each time we run git gc, but users shouldn't be running git gc all that often anyway. And even if they do run it all the time, it will still be more efficient than keeping the unreachable objects as loose objects. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 21:14 ` Ted Ts'o @ 2012-06-11 21:39 ` Jeff King 2012-06-11 22:14 ` Ted Ts'o 0 siblings, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-11 21:39 UTC (permalink / raw) To: Ted Ts'o; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 05:14:01PM -0400, Ted Ts'o wrote: > On Mon, Jun 11, 2012 at 02:34:14PM -0400, Jeff King wrote: > > You _could_ make a separate cruft pack for each pack that you repack. So > > if I have A.pack and B.pack, I'd pack all of the reachable objects into > > C.pack, and then make D.pack containing the unreachable objects from > > A.pack, and E.pack with the unreachable objects from B.pack. And then > > set the mtime of the cruft packs to that of their parent packs. > > > > And then the next time you pack, repacking D and E would probably be a > > no-op that preserves mtime, but might create a new pack that ejects some > > now-reachable object. > > > > To implement that, I think your --list-unreachable would just have to > > print a list of "<pack-mtime> <sha1>" pairs, and then you would pack > > each set with an identical mtime (or even a "close enough" mtime within > > some slop).... > > How about this instead? We distinguish between cruft packs and "real" > packs by the filename. So we have "cruft-<SHA1>.{idx,pack}" and > "pack-<SHA1>.{idx.pack}". > > Normally, git will look at any pack in the pack directory that has an > .idx and .pack extension, but during repack operation, it will by only > look in the pack-* packs first. If it can't find an object there, it > will then fall back to trying to fetch the object from the cruft-* > packs, and if it finds the object, it copies it into the new pack > which is creating, thus "rescueing" an object which reappears during > the expiry window. This should be a relatively rare event, and if it > happens, the object will be in two packs, a pack-* pack and a cruft-* > pack, but that's OK. You don't need to do anything magical for the object lookup process of pack-objects. By definition, the unreachable objects will not be included in the new pack you are creating, because it is only packing reachable things. So the cruft pack does not have to be a fallback at all; it is a regular pack from the object lookup perspective. The important differences from the current behavior would be: 1. In pack-objects, do not explode (or if just listing, do not list) objects from cruft packs. 2. In repack, make a new pack from the list of unreachable objects. 3. During "repack -Ad", prune cruft packs only if they are older than an expiration date. And then you'd end up with a new cruft pack each time. You could just mark it with a ".cruft" file (similar to the existing ".keep" files), and you don't have to worry about changing the pack-* name. > So since git pack-objects isn't even looking in the cruft-* packs > except when it needs to rescue an object, the objects in the cruft-* > packs won't get copied, and we won't need to have per-object mtimes. > It also means it will go faster since it's not copying the cruft-* > packs at all, and possibly not even looking at them. Yeah. It doesn't eliminate duplicates, but that may not be worth caring about. I find the "cruft" marking a little hacky, because it is only "objects in here _may_ be cruft", but as long as that is understood, it is OK (and it is understood in the sequence above; "repack -Ad" is safe because it knows that it would have repacked any non-cruft). You would have to be careful with "repack -d" (without the "-a"). It would not be necessarily be safe to remove cruft packs, because you might not have rescued the objects. However, AFAICT "repack -d" does not currently delete packs at all, so this would be no different. > It does imply that we may accumulate a new cruft-<SHA1> pack each time > we run git gc, but users shouldn't be running git gc all that often > anyway. And even if they do run it all the time, it will still be > more efficient than keeping the unreachable objects as loose objects. Yeah, it would be nice to keep it all in a single pack, but that means doing the I/O on rewriting the cruft packs each time. And figuring out some way of handling the mtime in such a way that we don't keep refreshing the age during each gc. Speaking of which, what is the mtime of the newly created cruft pack? Is it the current mtime? Then those unreachable objects will stick for another 2 weeks, instead of being back-dated to their pack's date. You could back-date to the mtime of the most recent deleted pack, but that would still prolong the life of objects from the older packs. It may be acceptable to just ignore the issue, though; they will expire eventually. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 21:39 ` Jeff King @ 2012-06-11 22:14 ` Ted Ts'o 2012-06-11 22:23 ` Jeff King 0 siblings, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 22:14 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 05:39:48PM -0400, Jeff King wrote: > > Yeah. It doesn't eliminate duplicates, but that may not be worth caring > about. I find the "cruft" marking a little hacky, because it is only > "objects in here _may_ be cruft", but as long as that is understood, it > is OK (and it is understood in the sequence above; "repack -Ad" is safe > because it knows that it would have repacked any non-cruft). Well, all the objects in the file *were* cruft at the time that it was created. And the reason why we are keeping them around is in case we were wrong about their being cruft, so I guess I don't have that much trouble with the name. Something like "KillShelter" (as in the opposite of No-Kill Animal Shelters) would be more discriptive, but I think it's a bit lacking in taste.... > > It does imply that we may accumulate a new cruft-<SHA1> pack each time > > we run git gc, but users shouldn't be running git gc all that often > > anyway. And even if they do run it all the time, it will still be > > more efficient than keeping the unreachable objects as loose objects. > > Yeah, it would be nice to keep it all in a single pack, but that means > doing the I/O on rewriting the cruft packs each time. And figuring out > some way of handling the mtime in such a way that we don't keep > refreshing the age during each gc. Well, I'd like to avoid doing the I/O because I want to minimize wear on SSD drives; and given that it's unlikely that the cruft packs will be referenced, the fact that we have a bunch of cruft packs shouldn't be a big deal, especially if we teach git to search the cruft packs last. > Speaking of which, what is the mtime of the newly created cruft pack? Is > it the current mtime? Then those unreachable objects will stick for > another 2 weeks, instead of being back-dated to their pack's date. You > could back-date to the mtime of the most recent deleted pack, but that > would still prolong the life of objects from the older packs. It may be > acceptable to just ignore the issue, though; they will expire > eventually. Well, we have that problem today when "git pack-objects --unpack-unreachable" explodes unreferenced objects --- they are written with the current mtime. I assume you're worried about pre-existing loose objects that get collected up into a new cruft pack, since they would get the extra two weeks of life. Given how much more efficient storing the cruft objects in a pack, I think ignoring the issue is what makes the most amount of sense, since it's a one-time extension, and the extra objects really won't do any harm. One last thought: if a sysadmin is really hard up for space, (and if the cruft objects include some really big sound or video files) one advantage of labelling the cruft packs explicitly is that someone who really needs the space could potentially find the oldest cruft files and delete them, since they would be tagged for easy findability. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 22:14 ` Ted Ts'o @ 2012-06-11 22:23 ` Jeff King 2012-06-11 22:28 ` Ted Ts'o 2012-06-12 0:41 ` Nicolas Pitre 0 siblings, 2 replies; 48+ messages in thread From: Jeff King @ 2012-06-11 22:23 UTC (permalink / raw) To: Ted Ts'o; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 06:14:39PM -0400, Ted Ts'o wrote: > > Speaking of which, what is the mtime of the newly created cruft pack? Is > > it the current mtime? Then those unreachable objects will stick for > > another 2 weeks, instead of being back-dated to their pack's date. You > > could back-date to the mtime of the most recent deleted pack, but that > > would still prolong the life of objects from the older packs. It may be > > acceptable to just ignore the issue, though; they will expire > > eventually. > > Well, we have that problem today when "git pack-objects > --unpack-unreachable" explodes unreferenced objects --- they are > written with the current mtime. No, we don't; they get the mtime of the pack they are coming from (and if the pack is older than pruneExpire, they are not exploded at all, since they would just be pruned immediately anyway). So an exploded object might have only a day or an hour to live after the explosion, but with your strategy they always get two weeks. > I assume you're worried about pre-existing loose objects that get > collected up into a new cruft pack, since they would get the extra two > weeks of life. Given how much more efficient storing the cruft > objects in a pack, I think ignoring the issue is what makes the most > amount of sense, since it's a one-time extension, and the extra > objects really won't do any harm. I'm more specifically worried about large objects which are no better in packs than they are in loose form (e.g., video files). This strategy is a regression, since we are not saving space by putting them in a pack, but we are keeping them around much longer. It also makes it harder to just run "git prune" to get rid of large objects (since prune will never kill off a pack), or to manually delete files from the object database. You have to run "git gc --prune=now" instead, so it can make a new pack and throw away the old bits (or run "git repack -ad"). > One last thought: if a sysadmin is really hard up for space, (and if > the cruft objects include some really big sound or video files) one > advantage of labelling the cruft packs explicitly is that someone who > really needs the space could potentially find the oldest cruft files > and delete them, since they would be tagged for easy findability. No! That's exactly what I was worried about with the name. It is _not_ safe to do so. It's only safe after you have done a full repack to rescue any non-cruft objects. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 22:23 ` Jeff King @ 2012-06-11 22:28 ` Ted Ts'o 2012-06-11 22:35 ` Jeff King 2012-06-12 0:41 ` Nicolas Pitre 1 sibling, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-11 22:28 UTC (permalink / raw) To: Jeff King; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 06:23:08PM -0400, Jeff King wrote: > > I'm more specifically worried about large objects which are no better in > packs than they are in loose form (e.g., video files). This strategy is > a regression, since we are not saving space by putting them in a pack, > but we are keeping them around much longer. It also makes it harder to > just run "git prune" to get rid of large objects (since prune will never > kill off a pack), or to manually delete files from the object database. > You have to run "git gc --prune=now" instead, so it can make a new pack > and throw away the old bits (or run "git repack -ad"). If we're really worried about this, we could set a threshold and only pack small objects in the cruft packs. > > One last thought: if a sysadmin is really hard up for space, (and if > > the cruft objects include some really big sound or video files) one > > advantage of labelling the cruft packs explicitly is that someone who > > really needs the space could potentially find the oldest cruft files > > and delete them, since they would be tagged for easy findability. > > No! That's exactly what I was worried about with the name. It is _not_ > safe to do so. It's only safe after you have done a full repack to > rescue any non-cruft objects. Well, yes. I was thinking it would be safe thing to do after a "git gc" didn't result in enough space savings. This would require that a git repack always rescue objects from cruft packs even if the -a/-A options are not specified, but since we're doing a full reachability scan, that should slow down git gc much, right? - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 22:28 ` Ted Ts'o @ 2012-06-11 22:35 ` Jeff King 0 siblings, 0 replies; 48+ messages in thread From: Jeff King @ 2012-06-11 22:35 UTC (permalink / raw) To: Ted Ts'o; +Cc: Thomas Rast, Hallvard B Furuseth, git, Nicolas Pitre On Mon, Jun 11, 2012 at 06:28:43PM -0400, Ted Ts'o wrote: > On Mon, Jun 11, 2012 at 06:23:08PM -0400, Jeff King wrote: > > > > I'm more specifically worried about large objects which are no better in > > packs than they are in loose form (e.g., video files). This strategy is > > a regression, since we are not saving space by putting them in a pack, > > but we are keeping them around much longer. It also makes it harder to > > just run "git prune" to get rid of large objects (since prune will never > > kill off a pack), or to manually delete files from the object database. > > You have to run "git gc --prune=now" instead, so it can make a new pack > > and throw away the old bits (or run "git repack -ad"). > > If we're really worried about this, we could set a threshold and only > pack small objects in the cruft packs. I think I'd be more inclined to just ignore it. It is only prolonging the lifetime of the files by a finite amount (and we are discussing dropping that finite amount anyway). And as a bonus, this strategy could potentially allow an optimization that would make large files better in this case: if we notice that a pack has _only_ unreachable objects, we can simply mark it as ".cruft" without actually repacking it. Coupled with the recent-ish code to stream large blobs directly to packs, that means a large blob which becomes unreachable would not ever be rewritten. > > No! That's exactly what I was worried about with the name. It is _not_ > > safe to do so. It's only safe after you have done a full repack to > > rescue any non-cruft objects. > > Well, yes. I was thinking it would be safe thing to do after a "git > gc" didn't result in enough space savings. This would require that a > git repack always rescue objects from cruft packs even if the -a/-A > options are not specified, but since we're doing a full reachability > scan, that should slow down git gc much, right? Doing "git gc" will always repack everything, IIRC. It is "git gc --auto" which will make small incremental packs. I think we do a full reachability analysis so we can prune there, but that is something I think we should stop doing. It is typically orders of magnitude slower than the incremental repack. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-11 22:23 ` Jeff King 2012-06-11 22:28 ` Ted Ts'o @ 2012-06-12 0:41 ` Nicolas Pitre 2012-06-12 17:10 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 0:41 UTC (permalink / raw) To: Jeff King; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Mon, 11 Jun 2012, Jeff King wrote: > On Mon, Jun 11, 2012 at 06:14:39PM -0400, Ted Ts'o wrote: > > > One last thought: if a sysadmin is really hard up for space, (and if > > the cruft objects include some really big sound or video files) one > > advantage of labelling the cruft packs explicitly is that someone who > > really needs the space could potentially find the oldest cruft files > > and delete them, since they would be tagged for easy findability. > > No! That's exactly what I was worried about with the name. It is _not_ > safe to do so. It's only safe after you have done a full repack to > rescue any non-cruft objects. To make it "safe", the cruft packs would have to be searchable for object retrieval, but not during object creation. That nuance would affect the core code in subtle ways and I'm not sure if that would be worth it ... just for the safe handling of cruft. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 0:41 ` Nicolas Pitre @ 2012-06-12 17:10 ` Jeff King 2012-06-12 17:30 ` Nicolas Pitre 0 siblings, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-12 17:10 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Mon, Jun 11, 2012 at 08:41:03PM -0400, Nicolas Pitre wrote: > > On Mon, Jun 11, 2012 at 06:14:39PM -0400, Ted Ts'o wrote: > > > > > One last thought: if a sysadmin is really hard up for space, (and if > > > the cruft objects include some really big sound or video files) one > > > advantage of labelling the cruft packs explicitly is that someone who > > > really needs the space could potentially find the oldest cruft files > > > and delete them, since they would be tagged for easy findability. > > > > No! That's exactly what I was worried about with the name. It is _not_ > > safe to do so. It's only safe after you have done a full repack to > > rescue any non-cruft objects. > > To make it "safe", the cruft packs would have to be searchable for > object retrieval, but not during object creation. That nuance would > affect the core code in subtle ways and I'm not sure if that would be > worth it ... just for the safe handling of cruft. Why is that? If you do a "repack -Ad", then any referenced objects will have been retrieved and put into the new all-in-one pack. At that point, by deleting the cruft pack, you are guaranteed to be deleting only objects that are either unreferenced, or are duplicated in another pack. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:10 ` Jeff King @ 2012-06-12 17:30 ` Nicolas Pitre 2012-06-12 17:32 ` Jeff King 0 siblings, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 17:30 UTC (permalink / raw) To: Jeff King; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Jeff King wrote: > On Mon, Jun 11, 2012 at 08:41:03PM -0400, Nicolas Pitre wrote: > > > > On Mon, Jun 11, 2012 at 06:14:39PM -0400, Ted Ts'o wrote: > > > > > > > One last thought: if a sysadmin is really hard up for space, (and if > > > > the cruft objects include some really big sound or video files) one > > > > advantage of labelling the cruft packs explicitly is that someone who > > > > really needs the space could potentially find the oldest cruft files > > > > and delete them, since they would be tagged for easy findability. > > > > > > No! That's exactly what I was worried about with the name. It is _not_ > > > safe to do so. It's only safe after you have done a full repack to > > > rescue any non-cruft objects. > > > > To make it "safe", the cruft packs would have to be searchable for > > object retrieval, but not during object creation. That nuance would > > affect the core code in subtle ways and I'm not sure if that would be > > worth it ... just for the safe handling of cruft. > > Why is that? If you do a "repack -Ad", then any referenced objects will > have been retrieved and put into the new all-in-one pack. At that point, > by deleting the cruft pack, you are guaranteed to be deleting only > objects that are either unreferenced, or are duplicated in another pack. Now what if you fetch and a bunch of objects are already found in your cruft pack? Right now, we search for the existence of any object before creating them, and if the cruft packs are searchable then such objects won't get uncruftified. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:30 ` Nicolas Pitre @ 2012-06-12 17:32 ` Jeff King 2012-06-12 17:45 ` Shawn Pearce 2012-06-12 17:49 ` Nicolas Pitre 0 siblings, 2 replies; 48+ messages in thread From: Jeff King @ 2012-06-12 17:32 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 01:30:07PM -0400, Nicolas Pitre wrote: > > > To make it "safe", the cruft packs would have to be searchable for > > > object retrieval, but not during object creation. That nuance would > > > affect the core code in subtle ways and I'm not sure if that would be > > > worth it ... just for the safe handling of cruft. > > > > Why is that? If you do a "repack -Ad", then any referenced objects will > > have been retrieved and put into the new all-in-one pack. At that point, > > by deleting the cruft pack, you are guaranteed to be deleting only > > objects that are either unreferenced, or are duplicated in another pack. > > Now what if you fetch and a bunch of objects are already found in your > cruft pack? Right now, we search for the existence of any object before > creating them, and if the cruft packs are searchable then such objects > won't get uncruftified. Then those objects will remain in the cruft pack. Which is why, as I said, it is not generally safe to just delete a cruft pack. However, when you do a full repack, those objects will be copied into the new pack (because they are referenced). Which is why I am claiming that it is safe to remove cruft packs at that point. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:32 ` Jeff King @ 2012-06-12 17:45 ` Shawn Pearce 2012-06-12 17:50 ` Jeff King 2012-06-12 17:55 ` Nicolas Pitre 2012-06-12 17:49 ` Nicolas Pitre 1 sibling, 2 replies; 48+ messages in thread From: Shawn Pearce @ 2012-06-12 17:45 UTC (permalink / raw) To: Jeff King Cc: Nicolas Pitre, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 10:32 AM, Jeff King <peff@peff.net> wrote: > On Tue, Jun 12, 2012 at 01:30:07PM -0400, Nicolas Pitre wrote: > >> > > To make it "safe", the cruft packs would have to be searchable for >> > > object retrieval, but not during object creation. That nuance would >> > > affect the core code in subtle ways and I'm not sure if that would be >> > > worth it ... just for the safe handling of cruft. >> > >> > Why is that? If you do a "repack -Ad", then any referenced objects will >> > have been retrieved and put into the new all-in-one pack. At that point, >> > by deleting the cruft pack, you are guaranteed to be deleting only >> > objects that are either unreferenced, or are duplicated in another pack. >> >> Now what if you fetch and a bunch of objects are already found in your >> cruft pack? Right now, we search for the existence of any object before >> creating them, and if the cruft packs are searchable then such objects >> won't get uncruftified. > > Then those objects will remain in the cruft pack. Which is why, as I > said, it is not generally safe to just delete a cruft pack. However, > when you do a full repack, those objects will be copied into the new > pack (because they are referenced). Which is why I am claiming that it > is safe to remove cruft packs at that point. But there is a race condition with a concurrent fetch and a concurrent repack. If that fetch needs those cruft objects, and sees them in the cruft pack, and the repack sees the references before the fetch, the repacker might delete things the fetch is about to reference and that will leave you with a corrupt repository. I think we already have this race condition with loose unreachable objects whose mtimes are older than 2 weeks; they are removed by prune but may have just become reachable by a concurrent fetch that doesn't overwrite them because they already exist, and doesn't update the mtime because they aren't writable. ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:45 ` Shawn Pearce @ 2012-06-12 17:50 ` Jeff King 2012-06-12 17:57 ` Nicolas Pitre 2012-06-12 18:43 ` Andreas Schwab 2012-06-12 17:55 ` Nicolas Pitre 1 sibling, 2 replies; 48+ messages in thread From: Jeff King @ 2012-06-12 17:50 UTC (permalink / raw) To: Shawn Pearce Cc: Nicolas Pitre, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 10:45:22AM -0700, Shawn O. Pearce wrote: > > Then those objects will remain in the cruft pack. Which is why, as I > > said, it is not generally safe to just delete a cruft pack. However, > > when you do a full repack, those objects will be copied into the new > > pack (because they are referenced). Which is why I am claiming that it > > is safe to remove cruft packs at that point. > > But there is a race condition with a concurrent fetch and a concurrent > repack. If that fetch needs those cruft objects, and sees them in the > cruft pack, and the repack sees the references before the fetch, the > repacker might delete things the fetch is about to reference and that > will leave you with a corrupt repository. > > I think we already have this race condition with loose unreachable > objects whose mtimes are older than 2 weeks; they are removed by prune > but may have just become reachable by a concurrent fetch that doesn't > overwrite them because they already exist, and doesn't update the > mtime because they aren't writable. Correct. There is a race condition, but it is there already. I have discussed this with other GitHub folks, because we prune fairly aggressively (in our case it would be a push, not a fetch, of course). So far we have not had any record of it actually happening in practice. We could close it in both cases by tweaking the mtime of the file containing the object when we decide not to write because the object already exists. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:50 ` Jeff King @ 2012-06-12 17:57 ` Nicolas Pitre 2012-06-12 18:43 ` Andreas Schwab 1 sibling, 0 replies; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 17:57 UTC (permalink / raw) To: Jeff King Cc: Shawn Pearce, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Jeff King wrote: > On Tue, Jun 12, 2012 at 10:45:22AM -0700, Shawn O. Pearce wrote: > > > > Then those objects will remain in the cruft pack. Which is why, as I > > > said, it is not generally safe to just delete a cruft pack. However, > > > when you do a full repack, those objects will be copied into the new > > > pack (because they are referenced). Which is why I am claiming that it > > > is safe to remove cruft packs at that point. > > > > But there is a race condition with a concurrent fetch and a concurrent > > repack. If that fetch needs those cruft objects, and sees them in the > > cruft pack, and the repack sees the references before the fetch, the > > repacker might delete things the fetch is about to reference and that > > will leave you with a corrupt repository. > > > > I think we already have this race condition with loose unreachable > > objects whose mtimes are older than 2 weeks; they are removed by prune > > but may have just become reachable by a concurrent fetch that doesn't > > overwrite them because they already exist, and doesn't update the > > mtime because they aren't writable. > > Correct. There is a race condition, but it is there already. I have > discussed this with other GitHub folks, because we prune fairly > aggressively (in our case it would be a push, not a fetch, of course). > So far we have not had any record of it actually happening in practice. > > We could close it in both cases by tweaking the mtime of the file > containing the object when we decide not to write because the object > already exists. Yes, that is a worthwhile thing to do. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:50 ` Jeff King 2012-06-12 17:57 ` Nicolas Pitre @ 2012-06-12 18:43 ` Andreas Schwab 2012-06-12 19:07 ` Jeff King 2012-06-12 19:09 ` Nicolas Pitre 1 sibling, 2 replies; 48+ messages in thread From: Andreas Schwab @ 2012-06-12 18:43 UTC (permalink / raw) To: Jeff King Cc: Shawn Pearce, Nicolas Pitre, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git Jeff King <peff@peff.net> writes: > We could close it in both cases by tweaking the mtime of the file > containing the object when we decide not to write because the object > already exists. Though there is always the window between the existence check and the mtime update where pruning can hit you. Andreas. -- Andreas Schwab, schwab@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different." ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 18:43 ` Andreas Schwab @ 2012-06-12 19:07 ` Jeff King 2012-06-12 19:09 ` Nicolas Pitre 1 sibling, 0 replies; 48+ messages in thread From: Jeff King @ 2012-06-12 19:07 UTC (permalink / raw) To: Andreas Schwab Cc: Shawn Pearce, Nicolas Pitre, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 08:43:41PM +0200, Andreas Schwab wrote: > Jeff King <peff@peff.net> writes: > > > We could close it in both cases by tweaking the mtime of the file > > containing the object when we decide not to write because the object > > already exists. > > Though there is always the window between the existence check and the > mtime update where pruning can hit you. For the loose object case, you could do them both atomically by calling utime() on the object, and considering the object to exist only if it succeeds. Doing it safely for packs would be harder, though; I think you'd have to bump the mtime forward, do the search, and then bump it back. You might err by causing a pack not to be pruned, but that is better than the opposite. Unfortunately it gets trickier with network transfers. If somebody is pushing to your repository, you might tell them you have some set of objects, then they prepare a pack based on that assumption (which might take minutes or hours to transfer), and then finally at the end you find that you actually need the objects in question. Of course, that race is even harder to trigger, because we do not advertise unreachable objects. So you would have to have a sequence where the objects are reachable, the client connects and receives your ref advertisement, then the objects become unreachable (e.g., due to a simultaneous non-ff push or deletion), and you do a prune in that interval which removes the objects. Unlikely, but still possible. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 18:43 ` Andreas Schwab 2012-06-12 19:07 ` Jeff King @ 2012-06-12 19:09 ` Nicolas Pitre 2012-06-12 19:23 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 19:09 UTC (permalink / raw) To: Andreas Schwab Cc: Jeff King, Shawn Pearce, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Andreas Schwab wrote: > Jeff King <peff@peff.net> writes: > > > We could close it in both cases by tweaking the mtime of the file > > containing the object when we decide not to write because the object > > already exists. > > Though there is always the window between the existence check and the > mtime update where pruning can hit you. This is a tiny window compared to 2 weeks. This could be avoided entirely with: 1. check presence of object X 2. update its mtime 3. check presence of object X again 4. create if doesn't exist Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:09 ` Nicolas Pitre @ 2012-06-12 19:23 ` Jeff King 2012-06-12 19:39 ` Nicolas Pitre 0 siblings, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-12 19:23 UTC (permalink / raw) To: Nicolas Pitre Cc: Andreas Schwab, Shawn Pearce, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 03:09:25PM -0400, Nicolas Pitre wrote: > > Jeff King <peff@peff.net> writes: > > > > > We could close it in both cases by tweaking the mtime of the file > > > containing the object when we decide not to write because the object > > > already exists. > > > > Though there is always the window between the existence check and the > > mtime update where pruning can hit you. > > This is a tiny window compared to 2 weeks. I don't think the race window is actually 2 weeks long. If you have this sequence: 1. object X becomes unreferenced 2. 1 week later, you create a new ref that mentions X 3. >2 weeks later, you run "git prune --expire=2.weeks.ago" we will not consider the object for pruning in step 3, because it is reachable. The race is more like: 1. object X becomes unreferenced 2. >2 weeks later, you run "git prune --expire=2.weeks.ago" 3. git-prune reads the list of refs 4. simultaneous to the git-prune, you reference X 5. git-prune removes X 6. your reference is now broken So the race window depends on the time it takes "git prune" to run. I wonder if git-prune could do a double-check of the refs. Something like: 1. calculate reachability on all refs 2. read list of objects to prune, and make a list of unreachable ones 3. calculate reachability again (which should be very cheap, because you can stop when you get to an object you have already seen) 4. Drop any objects found in (3) from the list in (2), and delete items from your list But I think that still has a race where objects are created before step 2, but are not actually referenced until after step 3. I think doing it safely may actually require a repo-wide prune lock. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:23 ` Jeff King @ 2012-06-12 19:39 ` Nicolas Pitre 2012-06-12 19:41 ` Jeff King 0 siblings, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 19:39 UTC (permalink / raw) To: Jeff King Cc: Andreas Schwab, Shawn Pearce, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Jeff King wrote: > So the race window depends on the time it takes "git prune" to run. > > I wonder if git-prune could do a double-check of the refs. Something > like: > > 1. calculate reachability on all refs > > 2. read list of objects to prune, and make a list of unreachable ones > > 3. calculate reachability again (which should be very cheap, because > you can stop when you get to an object you have already seen) > > 4. Drop any objects found in (3) from the list in (2), and delete > items from your list > > But I think that still has a race where objects are created before > step 2, but are not actually referenced until after step 3. I think > doing it safely may actually require a repo-wide prune lock. Yeah... that's what I was thinking too. Maybe we're making our life overly miserable by trying to avoid any locking here. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:39 ` Nicolas Pitre @ 2012-06-12 19:41 ` Jeff King 0 siblings, 0 replies; 48+ messages in thread From: Jeff King @ 2012-06-12 19:41 UTC (permalink / raw) To: Nicolas Pitre Cc: Andreas Schwab, Shawn Pearce, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 03:39:05PM -0400, Nicolas Pitre wrote: > > So the race window depends on the time it takes "git prune" to run. > > > > I wonder if git-prune could do a double-check of the refs. Something > > like: > > > > 1. calculate reachability on all refs > > > > 2. read list of objects to prune, and make a list of unreachable ones > > > > 3. calculate reachability again (which should be very cheap, because > > you can stop when you get to an object you have already seen) > > > > 4. Drop any objects found in (3) from the list in (2), and delete > > items from your list > > > > But I think that still has a race where objects are created before > > step 2, but are not actually referenced until after step 3. I think > > doing it safely may actually require a repo-wide prune lock. > > Yeah... that's what I was thinking too. Maybe we're making our life > overly miserable by trying to avoid any locking here. I think I would be OK with "prune" locking, as long as everything else was able to happen simultaneously. Especially if we can keep prune's lock as short as possible through double-reads or similar tricks (like we do for ref updates). -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:45 ` Shawn Pearce 2012-06-12 17:50 ` Jeff King @ 2012-06-12 17:55 ` Nicolas Pitre 1 sibling, 0 replies; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 17:55 UTC (permalink / raw) To: Shawn Pearce Cc: Jeff King, Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 2081 bytes --] On Tue, 12 Jun 2012, Shawn Pearce wrote: > On Tue, Jun 12, 2012 at 10:32 AM, Jeff King <peff@peff.net> wrote: > > On Tue, Jun 12, 2012 at 01:30:07PM -0400, Nicolas Pitre wrote: > > > >> > > To make it "safe", the cruft packs would have to be searchable for > >> > > object retrieval, but not during object creation. That nuance would > >> > > affect the core code in subtle ways and I'm not sure if that would be > >> > > worth it ... just for the safe handling of cruft. > >> > > >> > Why is that? If you do a "repack -Ad", then any referenced objects will > >> > have been retrieved and put into the new all-in-one pack. At that point, > >> > by deleting the cruft pack, you are guaranteed to be deleting only > >> > objects that are either unreferenced, or are duplicated in another pack. > >> > >> Now what if you fetch and a bunch of objects are already found in your > >> cruft pack? Right now, we search for the existence of any object before > >> creating them, and if the cruft packs are searchable then such objects > >> won't get uncruftified. > > > > Then those objects will remain in the cruft pack. Which is why, as I > > said, it is not generally safe to just delete a cruft pack. However, > > when you do a full repack, those objects will be copied into the new > > pack (because they are referenced). Which is why I am claiming that it > > is safe to remove cruft packs at that point. > > But there is a race condition with a concurrent fetch and a concurrent > repack. If that fetch needs those cruft objects, and sees them in the > cruft pack, and the repack sees the references before the fetch, the > repacker might delete things the fetch is about to reference and that > will leave you with a corrupt repository. > > I think we already have this race condition with loose unreachable > objects whose mtimes are older than 2 weeks; they are removed by prune > but may have just become reachable by a concurrent fetch that doesn't > overwrite them because they already exist, and doesn't update the > mtime because they aren't writable. Splat! Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:32 ` Jeff King 2012-06-12 17:45 ` Shawn Pearce @ 2012-06-12 17:49 ` Nicolas Pitre 2012-06-12 17:54 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 17:49 UTC (permalink / raw) To: Jeff King; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Jeff King wrote: > On Tue, Jun 12, 2012 at 01:30:07PM -0400, Nicolas Pitre wrote: > > > > > To make it "safe", the cruft packs would have to be searchable for > > > > object retrieval, but not during object creation. That nuance would > > > > affect the core code in subtle ways and I'm not sure if that would be > > > > worth it ... just for the safe handling of cruft. > > > > > > Why is that? If you do a "repack -Ad", then any referenced objects will > > > have been retrieved and put into the new all-in-one pack. At that point, > > > by deleting the cruft pack, you are guaranteed to be deleting only > > > objects that are either unreferenced, or are duplicated in another pack. > > > > Now what if you fetch and a bunch of objects are already found in your > > cruft pack? Right now, we search for the existence of any object before > > creating them, and if the cruft packs are searchable then such objects > > won't get uncruftified. > > Then those objects will remain in the cruft pack. Which is why, as I > said, it is not generally safe to just delete a cruft pack. ... and my reply was about the needed changes to still make cruft packs always crufty even if some of its content suddenly becomes useful again. > However, when you do a full repack, those objects will be copied into > the new pack (because they are referenced). Which is why I am claiming > that it is safe to remove cruft packs at that point. Yes, but then there is no point marking such packs as cruft if at any moment they can become useful again. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:49 ` Nicolas Pitre @ 2012-06-12 17:54 ` Jeff King 2012-06-12 18:25 ` Nicolas Pitre 0 siblings, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-12 17:54 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 01:49:26PM -0400, Nicolas Pitre wrote: > > Then those objects will remain in the cruft pack. Which is why, as I > > said, it is not generally safe to just delete a cruft pack. > > ... and my reply was about the needed changes to still make cruft packs > always crufty even if some of its content suddenly becomes useful again. I think we are somehow missing each other's point, then. My point is that you do not _need_ to make the cruft packs 100% cruft. You can tolerate the duplicated objects until they are pruned. Earlier in the thread, I outlined another scheme by which you could repack and avoid the duplicates. It does not require changes to git's object lookup process, because it would involve manually feeding the list of cruft objects to pack-objects (which will pack what you ask it, regardless of whether the objects are in other packs). > > However, when you do a full repack, those objects will be copied into > > the new pack (because they are referenced). Which is why I am claiming > > that it is safe to remove cruft packs at that point. > > Yes, but then there is no point marking such packs as cruft if at any > moment they can become useful again. How do you know to keep the packs around and expire them after 2 weeks if they are not marked in some way? Otherwise you would delete them as part of a "git gc", pushing the reachable objects into the new pack and the unreachable objects into a new cruft pack. IOW, you need some way of keeping the expiration date on the unreachable objects, or they will keep getting "refreshed" by each gc. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 17:54 ` Jeff King @ 2012-06-12 18:25 ` Nicolas Pitre 2012-06-12 18:37 ` Ted Ts'o 2012-06-12 19:15 ` Jeff King 0 siblings, 2 replies; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 18:25 UTC (permalink / raw) To: Jeff King; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Jeff King wrote: > On Tue, Jun 12, 2012 at 01:49:26PM -0400, Nicolas Pitre wrote: > > > > Then those objects will remain in the cruft pack. Which is why, as I > > > said, it is not generally safe to just delete a cruft pack. > > > > ... and my reply was about the needed changes to still make cruft packs > > always crufty even if some of its content suddenly becomes useful again. > > I think we are somehow missing each other's point, then. My point is > that you do not _need_ to make the cruft packs 100% cruft. You can > tolerate the duplicated objects until they are pruned. Absolutely. Duplicated objectes are fine, and I was in fact suggesting to actively duplicate any needed object when it is to be found in a cruft pack only. > Earlier in the thread, I outlined another scheme by which you could > repack and avoid the duplicates. It does not require changes to git's > object lookup process, because it would involve manually feeding the > list of cruft objects to pack-objects (which will pack what you ask it, > regardless of whether the objects are in other packs). That might be hard to achieve good delta compression though, as the main key to sort those objects is their path name, and with unreferenced objects you might not necessarily have that information. The ability to reuse pack data might mitigate this though. > > > However, when you do a full repack, those objects will be copied into > > > the new pack (because they are referenced). Which is why I am claiming > > > that it is safe to remove cruft packs at that point. > > > > Yes, but then there is no point marking such packs as cruft if at any > > moment they can become useful again. > > How do you know to keep the packs around and expire them after 2 weeks > if they are not marked in some way? Otherwise you would delete them as > part of a "git gc", pushing the reachable objects into the new pack and > the unreachable objects into a new cruft pack. IOW, you need some way of > keeping the expiration date on the unreachable objects, or they will > keep getting "refreshed" by each gc. My feeling is that we should make a step backward and consider if this is actually the right problem to solve. I don't remember why I might have been opposed to a reflog for deleted branches as you say I did, but that is certainly a feature that could prove to be useful. Then having a repository that can be used as an alternate for other repositories without knowing about it is also a problem that needs fixing and not only because of this object expiry issue. This is not easy to fix though. Then, the creation of unreferenced objects from successive 'git add' shouldn't create that many objects in the first place. They currently never get the chance to be packed to start with. So the problem is really about 'git gc' creating more data on disk which is counter productive for a garbage collecting task. Maybe the trick is simply not to delete any of the old pack which content was repacked into a single new pack and let them age before deleting them, rather than exploding a bunch of loose objects. But then we're back to the same issue I wanted to get away from i.e. identifying real cruft packs and making them safely deletable. Oh well... Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 18:25 ` Nicolas Pitre @ 2012-06-12 18:37 ` Ted Ts'o 2012-06-12 19:15 ` Nicolas Pitre 2012-06-12 19:15 ` Jeff King 1 sibling, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-12 18:37 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jeff King, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 02:25:47PM -0400, Nicolas Pitre wrote: > > Earlier in the thread, I outlined another scheme by which you could > > repack and avoid the duplicates. It does not require changes to git's > > object lookup process, because it would involve manually feeding the > > list of cruft objects to pack-objects (which will pack what you ask it, > > regardless of whether the objects are in other packs). > > That might be hard to achieve good delta compression though, as the main > key to sort those objects is their path name, and with unreferenced > objects you might not necessarily have that information. The ability to > reuse pack data might mitigate this though. Compared to loose objects, even not-so-great delta compression is manna from heaven. Remember what originally got me to start this flag. There was 4.5 megabytes worth of loose objects, that when I created the object id list and fed the result to git pack-object, the resulting pack was 244k. OK, maybe the delta compression wasn't optimal. Compared to the 4.5 megabytes of loose objects --- I'll happily settle for that! :-) > So the problem is really about 'git gc' creating more data on disk which > is counter productive for a garbage collecting task. Maybe the trick is > simply not to delete any of the old pack which content was repacked into > a single new pack and let them age before deleting them, rather than > exploding a bunch of loose objects. But then we're back to the same > issue I wanted to get away from i.e. identifying real cruft packs and > making them safely deletable. But the old packs are huge; in my case, a full set of packs was around 16 megabytes. Right now, git gc *increased* my disk usage by 4.5 megabytes. If we don't delete the old backs, then git gc would increase disk usage by 16 megabytes --- which is far, far worse. Writing a 244k cruft pack is a soooooo much preferable. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 18:37 ` Ted Ts'o @ 2012-06-12 19:15 ` Nicolas Pitre 2012-06-12 19:19 ` Ted Ts'o 0 siblings, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 19:15 UTC (permalink / raw) To: Ted Ts'o; +Cc: Jeff King, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Ted Ts'o wrote: > On Tue, Jun 12, 2012 at 02:25:47PM -0400, Nicolas Pitre wrote: > > > Earlier in the thread, I outlined another scheme by which you could > > > repack and avoid the duplicates. It does not require changes to git's > > > object lookup process, because it would involve manually feeding the > > > list of cruft objects to pack-objects (which will pack what you ask it, > > > regardless of whether the objects are in other packs). > > > > That might be hard to achieve good delta compression though, as the main > > key to sort those objects is their path name, and with unreferenced > > objects you might not necessarily have that information. The ability to > > reuse pack data might mitigate this though. > > Compared to loose objects, even not-so-great delta compression is > manna from heaven. Remember what originally got me to start this > flag. There was 4.5 megabytes worth of loose objects, that when I > created the object id list and fed the result to git pack-object, the > resulting pack was 244k. > > OK, maybe the delta compression wasn't optimal. Compared to the 4.5 > megabytes of loose objects --- I'll happily settle for that! :-) Sure. However I would be even happier if we could delete those unneeded objects outright. The official reason why they're there for two weeks should be to avoid some race conditions, and in this case two weeks is way over the top as in "normal" conditions the actual window for a race is in the order of a few seconds.. Any other use case should be considered abusive. > > So the problem is really about 'git gc' creating more data on disk which > > is counter productive for a garbage collecting task. Maybe the trick is > > simply not to delete any of the old pack which content was repacked into > > a single new pack and let them age before deleting them, rather than > > exploding a bunch of loose objects. But then we're back to the same > > issue I wanted to get away from i.e. identifying real cruft packs and > > making them safely deletable. > > But the old packs are huge; in my case, a full set of packs was around > 16 megabytes. Right now, git gc *increased* my disk usage by 4.5 > megabytes. If we don't delete the old backs, then git gc would > increase disk usage by 16 megabytes --- which is far, far worse. > > Writing a 244k cruft pack is a soooooo much preferable. But as you might have noticed, there are a bunch of semantic problems with that as well. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:15 ` Nicolas Pitre @ 2012-06-12 19:19 ` Ted Ts'o 2012-06-12 19:35 ` Nicolas Pitre 0 siblings, 1 reply; 48+ messages in thread From: Ted Ts'o @ 2012-06-12 19:19 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jeff King, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 03:15:46PM -0400, Nicolas Pitre wrote: > > But the old packs are huge; in my case, a full set of packs was around > > 16 megabytes. Right now, git gc *increased* my disk usage by 4.5 > > megabytes. If we don't delete the old backs, then git gc would > > increase disk usage by 16 megabytes --- which is far, far worse. > > > > Writing a 244k cruft pack is a soooooo much preferable. > > But as you might have noticed, there are a bunch of semantic problems > with that as well. I've proposed something (explicitly labelled cruft packs) which is no worse than before. The one potential problem is that objects in the cruft pack might have their lifespan extended by two weeks (or whatever the expire timeout might be), but Peff has agreed that it's simple enough to ignore that, since the benefits far outweigh the potential that some objects in cruft packs will get to live a bit longer. The race condition you've pointed out exists today, with the git prune racing against the git fetch. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:19 ` Ted Ts'o @ 2012-06-12 19:35 ` Nicolas Pitre 2012-06-12 19:43 ` Ted Ts'o 0 siblings, 1 reply; 48+ messages in thread From: Nicolas Pitre @ 2012-06-12 19:35 UTC (permalink / raw) To: Ted Ts'o; +Cc: Jeff King, Thomas Rast, Hallvard B Furuseth, git On Tue, 12 Jun 2012, Ted Ts'o wrote: > On Tue, Jun 12, 2012 at 03:15:46PM -0400, Nicolas Pitre wrote: > > > But the old packs are huge; in my case, a full set of packs was around > > > 16 megabytes. Right now, git gc *increased* my disk usage by 4.5 > > > megabytes. If we don't delete the old backs, then git gc would > > > increase disk usage by 16 megabytes --- which is far, far worse. > > > > > > Writing a 244k cruft pack is a soooooo much preferable. > > > > But as you might have noticed, there are a bunch of semantic problems > > with that as well. > > I've proposed something (explicitly labelled cruft packs) which is no > worse than before. The one potential problem is that objects in the > cruft pack might have their lifespan extended by two weeks (or > whatever the expire timeout might be), but Peff has agreed that it's > simple enough to ignore that, since the benefits far outweigh the > potential that some objects in cruft packs will get to live a bit > longer. > > The race condition you've pointed out exists today, with the git prune > racing against the git fetch. Yes, however that race is trivial to fix when loose objects are used. Nicolas ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:35 ` Nicolas Pitre @ 2012-06-12 19:43 ` Ted Ts'o 0 siblings, 0 replies; 48+ messages in thread From: Ted Ts'o @ 2012-06-12 19:43 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Jeff King, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 03:35:17PM -0400, Nicolas Pitre wrote: > > > > The race condition you've pointed out exists today, with the git prune > > racing against the git fetch. > > Yes, however that race is trivial to fix when loose objects are used. We can use the same trivial fix with the cruft pack, by touching the mtime of the pack. Yes, it will extend the lifetime of the cruft pack by another 2 weeks but again, let me remind you: 244k versus 4.5 megabytes. I can live with an extra 244k hanging around a wee bit longer. :-) There are other fixes we could do involving flock() and removing the cruft label once we add a reference to a cruft pack, that I don't think would be that complicated. - Ted ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 18:25 ` Nicolas Pitre 2012-06-12 18:37 ` Ted Ts'o @ 2012-06-12 19:15 ` Jeff King 2012-06-13 18:17 ` Martin Fick 1 sibling, 1 reply; 48+ messages in thread From: Jeff King @ 2012-06-12 19:15 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Ted Ts'o, Thomas Rast, Hallvard B Furuseth, git On Tue, Jun 12, 2012 at 02:25:47PM -0400, Nicolas Pitre wrote: > My feeling is that we should make a step backward and consider if this > is actually the right problem to solve. I don't remember why I might > have been opposed to a reflog for deleted branches as you say I did, but > that is certainly a feature that could prove to be useful. I think your argument was along the lines of "this information can be reconstructed from the HEAD reflog, anyway, so it is not worth the effort". My counter to that is that the HEAD reflog is useless on bare repositories (I have considered adding each pushed ref to a HEAD-like reflog with everything in it, but doing it without lock contention between pushes to different refs is tricky). But keep in mind that a deletion reflog does not make this problem go away. It might make it less likely, but there are still cases where the gc can create a much larger object db. > Then having a repository that can be used as an alternate for other > repositories without knowing about it is also a problem that needs > fixing and not only because of this object expiry issue. This is not > easy to fix though. Yeah, I think that is an open problem, because you do not necessarily have any write access at all to the alternates repository (however, that does not need to stop us from making it safer in the case that you _do_ have write access to the alternates repository). > Then, the creation of unreferenced objects from successive 'git add' > shouldn't create that many objects in the first place. They currently > never get the chance to be packed to start with. I don't think these objects are necessarily from successive "git add"s. That is one source, but they may also come from reflogs expiring. I guess in that case that they would typically be in an older pack, though. > So the problem is really about 'git gc' creating more data on disk which > is counter productive for a garbage collecting task. Maybe the trick is > simply not to delete any of the old pack which content was repacked into > a single new pack and let them age before deleting them, rather than > exploding a bunch of loose objects. But then we're back to the same > issue I wanted to get away from i.e. identifying real cruft packs and > making them safely deletable. That is satisfyingly simple, but the storage requirement is quite bad. The unreachable objects are very much in the minority, and an occasional duplication there is not a big deal; duplicating all of the reachable objects would double the object directory's size. -Peff ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-12 19:15 ` Jeff King @ 2012-06-13 18:17 ` Martin Fick 2012-06-13 21:27 ` Johan Herland 0 siblings, 1 reply; 48+ messages in thread From: Martin Fick @ 2012-06-13 18:17 UTC (permalink / raw) To: git Jeff King <peff <at> peff.net> writes: > > Then, the creation of unreferenced objects from successive 'git add' > > shouldn't create that many objects in the first place. They currently > > never get the chance to be packed to start with. > > I don't think these objects are necessarily from successive "git add"s. > That is one source, but they may also come from reflogs expiring. I > guess in that case that they would typically be in an older pack, > though. ... > That is satisfyingly simple, but the storage requirement is quite bad. > The unreachable objects are very much in the minority, and an > occasional duplication there is not a big deal; duplicating all of the > reachable objects would double the object directory's size. ... (I don't think this is a valid generalization for servers) I am sorry to be coming a bit late into this discussion, but I think there is an even worse use case which can cause much worse loose object explosions which does not seem to have been mentioned yet: "the server upload rejected case". For example, think of a client pushing a change from the wrong repository to a server. Since there will be no history in common, the client will push the entire repository and if for some reason this gets rejected by the server (perhaps a pre-receive hook, or a gerrit server which says: "way too many new changes..."), then the pack file may stay abandonned on the server. When gc runs: boom the entire history of that other project will explode but not get pruned since the pack file may be fairly new! I believe that this has happened to us several times fairly recently. We have a tiny project which some people keep confusing for the kernel and they push a change destined for the kernel to it. Gerrit rejects it and their massive packfile (larger than the entire project) stays around. If gc runs, it almost becomes a DOS for us, the sheer number of loose object files makes the system crawl when accessing that repo, even on an SSD. We have been talking about moving to NFS soon (with packfiles git should still perform fairly well on NFS), but this explosion really scares me. It seems like the current design is a DOS just waiting to happen for servers. While I would love to eliminate the races discussed in this thread, I think I agree with Ted in that the first fix should just focus on never expanding loose objects for pruning (if certain objects simply don't do well in pack files and the local gc policy says they should be loose, go ahead: expand them, but that should be unrelated to pruning). People can DOS a server with unused packfiles too, but that rarely will have the same impact that loose objects would have, -Martin -- Employee of Qualcomm Innovation Center, Inc. which is a member of Code Aurora Forum ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-13 18:17 ` Martin Fick @ 2012-06-13 21:27 ` Johan Herland 0 siblings, 0 replies; 48+ messages in thread From: Johan Herland @ 2012-06-13 21:27 UTC (permalink / raw) To: Martin Fick; +Cc: git On Wed, Jun 13, 2012 at 8:17 PM, Martin Fick <mfick@codeaurora.org> wrote: > Jeff King <peff <at> peff.net> writes: >> > Then, the creation of unreferenced objects from successive 'git add' >> > shouldn't create that many objects in the first place. They currently >> > never get the chance to be packed to start with. >> >> I don't think these objects are necessarily from successive "git add"s. >> That is one source, but they may also come from reflogs expiring. I >> guess in that case that they would typically be in an older pack, >> though. > ... >> That is satisfyingly simple, but the storage requirement is quite bad. >> The unreachable objects are very much in the minority, and an >> occasional duplication there is not a big deal; duplicating all of the >> reachable objects would double the object directory's size. > ... > (I don't think this is a valid generalization for servers) > > I am sorry to be coming a bit late into this discussion, but I think there > is an even worse use case which can cause much worse loose object > explosions which does not seem to have been mentioned yet: "the > server upload rejected case". For example, think of a client pushing a > change from the wrong repository to a server. Since there will be no > history in common, the client will push the entire repository and if for > some reason this gets rejected by the server (perhaps a pre-receive > hook, or a gerrit server which says: "way too many new changes..."), > then the pack file may stay abandonned on the server. When gc runs: > boom the entire history of that other project will explode but not get > pruned since the pack file may be fairly new! [...] Just a +1 from me. We had the same problem at my former $dayjob, and worked around it by running a "git gc --prune=now" in the server repo (which is a command you'd rather not want to run in a server repo) to remove exploded loose objects. ...Johan -- Johan Herland, <johan@herland.net> www.herland.net ^ permalink raw reply [flat|nested] 48+ messages in thread
* Re: Keeping unreachable objects in a separate pack instead of loose? 2012-06-10 12:31 Keeping unreachable objects in a separate pack instead of loose? Theodore Ts'o 2012-06-10 23:24 ` Hallvard B Furuseth @ 2012-06-11 15:40 ` Junio C Hamano 1 sibling, 0 replies; 48+ messages in thread From: Junio C Hamano @ 2012-06-11 15:40 UTC (permalink / raw) To: Theodore Ts'o; +Cc: git "Theodore Ts'o" <tytso@mit.edu> writes: > Which got me thinking.... the whole point of leaving the objects loose > is to make it easier to expire them, right? But given how expensive it > is to have loose objects lying around, why not: > > a) Have git-pack-objects have an option which writes the unreachable > objects into a separate pack file, instead of kicking them loose? > > b) Have git-prune delete a pack only if *all* of the objects in the > pack meet the expiry deadline? > > What would be the downsides of pursueing such a strategy? Is it worth > trying to implement as proof-of-concept? I do not offhand see a downside; as a matter of fact, there has already been the first-step change in the direction in v1.7.10.2 and newer that avoids exploading the unreachable ones into loose object if we know they are going to be immediately pruned. ^ permalink raw reply [flat|nested] 48+ messages in thread
end of thread, other threads:[~2012-06-13 21:28 UTC | newest] Thread overview: 48+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-10 12:31 Keeping unreachable objects in a separate pack instead of loose? Theodore Ts'o 2012-06-10 23:24 ` Hallvard B Furuseth 2012-06-11 14:44 ` Thomas Rast 2012-06-11 15:31 ` Ted Ts'o 2012-06-11 16:08 ` Jeff King 2012-06-11 17:04 ` Nicolas Pitre 2012-06-11 17:45 ` Ted Ts'o 2012-06-11 17:54 ` Jeff King 2012-06-11 18:20 ` Ted Ts'o 2012-06-11 18:43 ` Jeff King 2012-06-11 17:46 ` Jeff King 2012-06-11 17:27 ` Ted Ts'o 2012-06-11 18:34 ` Jeff King 2012-06-11 20:44 ` Hallvard Breien Furuseth 2012-06-11 21:14 ` Jeff King 2012-06-11 21:41 ` Hallvard Breien Furuseth 2012-06-11 21:14 ` Ted Ts'o 2012-06-11 21:39 ` Jeff King 2012-06-11 22:14 ` Ted Ts'o 2012-06-11 22:23 ` Jeff King 2012-06-11 22:28 ` Ted Ts'o 2012-06-11 22:35 ` Jeff King 2012-06-12 0:41 ` Nicolas Pitre 2012-06-12 17:10 ` Jeff King 2012-06-12 17:30 ` Nicolas Pitre 2012-06-12 17:32 ` Jeff King 2012-06-12 17:45 ` Shawn Pearce 2012-06-12 17:50 ` Jeff King 2012-06-12 17:57 ` Nicolas Pitre 2012-06-12 18:43 ` Andreas Schwab 2012-06-12 19:07 ` Jeff King 2012-06-12 19:09 ` Nicolas Pitre 2012-06-12 19:23 ` Jeff King 2012-06-12 19:39 ` Nicolas Pitre 2012-06-12 19:41 ` Jeff King 2012-06-12 17:55 ` Nicolas Pitre 2012-06-12 17:49 ` Nicolas Pitre 2012-06-12 17:54 ` Jeff King 2012-06-12 18:25 ` Nicolas Pitre 2012-06-12 18:37 ` Ted Ts'o 2012-06-12 19:15 ` Nicolas Pitre 2012-06-12 19:19 ` Ted Ts'o 2012-06-12 19:35 ` Nicolas Pitre 2012-06-12 19:43 ` Ted Ts'o 2012-06-12 19:15 ` Jeff King 2012-06-13 18:17 ` Martin Fick 2012-06-13 21:27 ` Johan Herland 2012-06-11 15:40 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).