* Fetching everything in another bare repo @ 2023-03-08 22:39 Paul Smith 2023-03-09 6:41 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Paul Smith @ 2023-03-08 22:39 UTC (permalink / raw) To: git Apologies if this is the wrong list. I have a tool that wants to preserve every commit and never garbage collect (there are references that need to be maintained to older commits/branches that have been deleted). This tool keeps its own bare clone, and disables all GC and maintenance on it. Unfortunately a month or so ago, by accident someone re-cloned the primary copy of the repo that everyone else uses as this bare clone, which lost the old history. The good news is that I have a copy of the original bare clone with all the history intact. So now what I want to do is fetch the old data into the current bare clone (since the old clone doesn't have the newest stuff). And, I need to be sure that all commits are pulled, and kept, and nothing is cleaned up. I would also like any deleted branches to re-appear, but I don't want to change the location of any existing branches in the new repo. Is it sufficient to run something like this: git fetch --no-auto-maintenance --no-auto-gc <path-to-old-clone> ?? Are there other options I should consider? Is it better to fetch the NEW clone into the OLD clone, than to fetch the old clone into the new clone (the new data is much more important to preserve)? Since this is a one-off operation I don't care so much about making the fetch fast. Thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fetching everything in another bare repo 2023-03-08 22:39 Fetching everything in another bare repo Paul Smith @ 2023-03-09 6:41 ` Jeff King 2023-03-09 13:55 ` Paul Smith 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2023-03-09 6:41 UTC (permalink / raw) To: Paul Smith; +Cc: git On Wed, Mar 08, 2023 at 05:39:07PM -0500, Paul Smith wrote: > I have a tool that wants to preserve every commit and never garbage > collect (there are references that need to be maintained to older > commits/branches that have been deleted). This tool keeps its own bare > clone, and disables all GC and maintenance on it. OK. It's not clear to me if this archive repo retains the old references, or if it simply has a bunch of unreachable objects. That distinction will matter below. > Unfortunately a month or so ago, by accident someone re-cloned the > primary copy of the repo that everyone else uses as this bare clone, > which lost the old history. Oops. I take it from this that the repository _doesn't_ have all of the references. It just has unreachable objects. Which makes sense. Git cannot store "foo/bar" if "foo" still exists, so you'd eventually hit such a problem if you tried to keep all of the old references. > So now what I want to do is fetch the old data into the current bare > clone (since the old clone doesn't have the newest stuff). And, I need > to be sure that all commits are pulled, and kept, and nothing is > cleaned up. I would also like any deleted branches to re-appear, but I > don't want to change the location of any existing branches in the new > repo. > > Is it sufficient to run something like this: > > git fetch --no-auto-maintenance --no-auto-gc <path-to-old-clone> That wouldn't grab the unreachable objects from the old clone, though (again, assuming it has some that you care about). I think you probably want to treat the objects and references separately. It's safe to just copy all of the objects and packfiles from the old clone into the new one. You'll have duplicates, but you should be able to de-dup and get a single packfile with: git repack -ad --keep-unreachable And then you can do any ref updates in the new repository (since it now has all objects from both). You might want something like: # get the list of refs in both repositories git -C old-repo for-each-ref --format='%(refname)' >old git -C new-repo for-each-ref --format='%(refname)' >new # now find the refs that are only in the old one; for-each-ref # output is sorted, so we can just use comm comm -23 old new >missing-refs # now generate and apply commands to update those refs. You could # probably also use fetch here, but this is faster and we know we have # all of the objects. xargs git -C old-repo \ for-each-repo --format='create %(refname) %(objectname)' \ <missing-refs | git update-ref --stdin (caveat executor; I just typed this into my email and didn't test it, so there may be typos or small issues). -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fetching everything in another bare repo 2023-03-09 6:41 ` Jeff King @ 2023-03-09 13:55 ` Paul Smith 2023-03-09 15:35 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Paul Smith @ 2023-03-09 13:55 UTC (permalink / raw) To: git On Thu, 2023-03-09 at 01:41 -0500, Jeff King wrote: > On Wed, Mar 08, 2023 at 05:39:07PM -0500, Paul Smith wrote: > > > I have a tool that wants to preserve every commit and never garbage > > collect (there are references that need to be maintained to older > > commits/branches that have been deleted). This tool keeps its own > > bare clone, and disables all GC and maintenance on it. > > OK. It's not clear to me if this archive repo retains the old > references, or if it simply has a bunch of unreachable objects. > That distinction will matter below. Sorry; I've been using Git for a long time but am still not totally immersed in the terminology :). Basically, these bare clones have "gc.pruneExpire=never" set, and have never had any GC operations run so all commits are still present (when you say "unreachable" I assume you mean, not reachable through any reference). There is a separate database of information containing SHAs for these commits, that is used to find them, but there is nothing in Git itself that references them so they are indeed unreachable as far as Git is concerned. > I think you probably want to treat the objects and references > separately. It's safe to just copy all of the objects and packfiles > from the old clone into the new one. You'll have duplicates, but you > should be able to de-dup and get a single packfile with: > > git repack -ad --keep-unreachable Oh interesting. I did a quick verification and all of the objects / packfiles in the old clone either don't exist in the new one, or are identical. I'm sure you expected that but I needed to reassure myself I wouldn't be overwriting anything :). One question: is the objects/info/packs file anything to be concerned about or will git repack (or something) take care of handling it? > And then you can do any ref updates in the new repository (since it > now has all objects from both). It's actually possible that I don't care about refs at all. I might only care about objects. I'm not sure, I can check what exists in the old clone. But if I need them I can deal with them as you suggest (or something similar). Thanks Jeff! ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fetching everything in another bare repo 2023-03-09 13:55 ` Paul Smith @ 2023-03-09 15:35 ` Jeff King 2023-03-09 17:57 ` Konstantin Ryabitsev 2023-03-09 18:15 ` Paul Smith 0 siblings, 2 replies; 7+ messages in thread From: Jeff King @ 2023-03-09 15:35 UTC (permalink / raw) To: Paul Smith; +Cc: git On Thu, Mar 09, 2023 at 08:55:27AM -0500, Paul Smith wrote: > > OK. It's not clear to me if this archive repo retains the old > > references, or if it simply has a bunch of unreachable objects. > > That distinction will matter below. > > Sorry; I've been using Git for a long time but am still not totally > immersed in the terminology :). > > Basically, these bare clones have "gc.pruneExpire=never" set, and have > never had any GC operations run so all commits are still present (when > you say "unreachable" I assume you mean, not reachable through any > reference). Right, that's what I mean by unreachable. And no, you didn't use any terminology wrong. I was just not sure if you realized that running "fetch" would not get the unreachable objects. :) > There is a separate database of information containing SHAs for these > commits, that is used to find them, but there is nothing in Git itself > that references them so they are indeed unreachable as far as Git is > concerned. OK, that makes sense (and I've done something like that before, as well). > Oh interesting. I did a quick verification and all of the objects / > packfiles in the old clone either don't exist in the new one, or are > identical. I'm sure you expected that but I needed to reassure myself > I wouldn't be overwriting anything :). The files are named after the sha1 of their contents (and that goes for both loose objects and packfiles). But certainly it's a good idea to double check that nothing funny is going on. > One question: is the objects/info/packs file anything to be concerned > about or will git repack (or something) take care of handling it? You can ignore it. It will be regenerated by git-repack. But also, it's pretty useless these days. It's only used for "dumb" fetches (e.g., when you export a repo via static http, but without using the git-aware CGI). > > And then you can do any ref updates in the new repository (since it > > now has all objects from both). > > It's actually possible that I don't care about refs at all. I might > only care about objects. I'm not sure, I can check what exists in the > old clone. Yeah, if you have a separate database of branch tips, etc, then the refs aren't necessary. As long as you are careful not to run "gc" or repack without "-k". You may want to try the "preciousObjects" repository extension, which was designed to prevent accidents for a case like this. Something like: [this will cause old versions of Git that don't understand extensions.* to bail on all commands for safety] $ git config core.repositoryformatversion 1 [this will tell old versions of Git that don't understand this particular extension to bail on all commands for safety. But more importantly, it will tell recent versions (> 2.6.3) to allow most commands, but not ones that would delete unreachable objects] $ git config extensions.preciousObjects true [this is it in action] $ git repack -ad fatal: cannot delete packs in a precious-objects repo $ git prune fatal: cannot prune in a precious-objects repo Sadly it's not quite smart enough to realize that "git repack -adk" is safe. If you want to occasionally repack with that, you'd have to manually disable the flag for a moment. I will also say that while I implemented this extension a while back, it never actually saw production use for my intended case. So I think it's pretty good (and certainly safer than nothing), but it's not thoroughly tested in the wild. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fetching everything in another bare repo 2023-03-09 15:35 ` Jeff King @ 2023-03-09 17:57 ` Konstantin Ryabitsev 2023-03-10 9:04 ` Jeff King 2023-03-09 18:15 ` Paul Smith 1 sibling, 1 reply; 7+ messages in thread From: Konstantin Ryabitsev @ 2023-03-09 17:57 UTC (permalink / raw) To: Jeff King; +Cc: Paul Smith, git On Thu, Mar 09, 2023 at 10:35:46AM -0500, Jeff King wrote: > You may want to try the "preciousObjects" repository extension, which > was designed to prevent accidents for a case like this. Something like: > > [this will cause old versions of Git that don't understand > extensions.* to bail on all commands for safety] > $ git config core.repositoryformatversion 1 > > [this will tell old versions of Git that don't understand this > particular extension to bail on all commands for safety. But more > importantly, it will tell recent versions (> 2.6.3) to allow most > commands, but not ones that would delete unreachable objects] > $ git config extensions.preciousObjects true > > [this is it in action] > $ git repack -ad > fatal: cannot delete packs in a precious-objects repo > $ git prune > fatal: cannot prune in a precious-objects repo > > Sadly it's not quite smart enough to realize that "git repack -adk" is > safe. If you want to occasionally repack with that, you'd have to > manually disable the flag for a moment. > > I will also say that while I implemented this extension a while back, it > never actually saw production use for my intended case. So I think it's > pretty good (and certainly safer than nothing), but it's not thoroughly > tested in the wild. We use it in grokmirror for objstore repositories [1] (the super-parents of all forks), as a precautionary measure against a sysadmin running any kind of manual operation that may result in loose objects being deleted. I do believe it works well for that purpose. -K [1] https://github.com/mricon/grokmirror#object-storage-repositories ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fetching everything in another bare repo 2023-03-09 17:57 ` Konstantin Ryabitsev @ 2023-03-10 9:04 ` Jeff King 0 siblings, 0 replies; 7+ messages in thread From: Jeff King @ 2023-03-10 9:04 UTC (permalink / raw) To: Konstantin Ryabitsev; +Cc: Paul Smith, git On Thu, Mar 09, 2023 at 12:57:18PM -0500, Konstantin Ryabitsev wrote: > > I will also say that while I implemented this extension a while back, it > > never actually saw production use for my intended case. So I think it's > > pretty good (and certainly safer than nothing), but it's not thoroughly > > tested in the wild. > > We use it in grokmirror for objstore repositories [1] (the super-parents of > all forks), as a precautionary measure against a sysadmin running any kind of > manual operation that may result in loose objects being deleted. I do believe > it works well for that purpose. Ah, cool, thanks for letting us know. That's pretty much the same case I wrote it for (GitHub's shared-object-store fork repositories), but deployment got hung up on compatibility issues (since we used libgit2, as well). And then I never got around to it, and nobody seems to have cared too much. It's a nice safety to have, but I don't recall a single instance of an unintended naive gc ever destroying things. :) -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Fetching everything in another bare repo 2023-03-09 15:35 ` Jeff King 2023-03-09 17:57 ` Konstantin Ryabitsev @ 2023-03-09 18:15 ` Paul Smith 1 sibling, 0 replies; 7+ messages in thread From: Paul Smith @ 2023-03-09 18:15 UTC (permalink / raw) To: git On Thu, 2023-03-09 at 10:35 -0500, Jeff King wrote: > > Basically, these bare clones have "gc.pruneExpire=never" set, and > > have never had any GC operations run so all commits are still > > present (when you say "unreachable" I assume you mean, not > > reachable through any reference). > > Right, that's what I mean by unreachable. And no, you didn't use any > terminology wrong. I was just not sure if you realized that running > "fetch" would not get the unreachable objects. :) I definitely did not realize that, so good looking out :) Of course in retrospect it makes perfect sense: why would you fetch unreachable objects (normally)? > > One question: is the objects/info/packs file anything to be > > concerned about or will git repack (or something) take care of > > handling it? > > You can ignore it. OK thx. > Yeah, if you have a separate database of branch tips, etc, then the > refs aren't necessary. As long as you are careful not to run "gc" or > repack without "-k". It's actually a code review facility so it doesn't even care about branches, it's basically just storing before/after SHAs of changes to be reviewed. But the historical code reviews can sometimes be gold, even if they're some years old, so I'd prefer to keep them available. > You may want to try the "preciousObjects" repository extension, which > was designed to prevent accidents for a case like this. Oh interesting, I'll take a look. Cheers! ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-03-10 9:10 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-03-08 22:39 Fetching everything in another bare repo Paul Smith 2023-03-09 6:41 ` Jeff King 2023-03-09 13:55 ` Paul Smith 2023-03-09 15:35 ` Jeff King 2023-03-09 17:57 ` Konstantin Ryabitsev 2023-03-10 9:04 ` Jeff King 2023-03-09 18:15 ` Paul Smith
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).