* Question regarding quarantine environments @ 2018-08-02 17:58 Liam Decker 2018-08-02 18:39 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Liam Decker @ 2018-08-02 17:58 UTC (permalink / raw) To: git [-- Attachment #1: Type: text/plain, Size: 1265 bytes --] Hi all, I've been working on a git hook in golang recently. However, the library I was using did not support a possible quarantine directory, which I would use for my hook. I have been trying to find out how git finds this incoming directory in the objects folder, as their code simply assumed it resided in .git/objects/<1st byte>/<last 19 bytes> I read the documentation describing the git repository layout here [1] as well as the objects documentation here [2] and the git hooks documentation here [3] directed me to the receive-pack documentation here [4] I have also tried googling, but this is a pretty specific question The solution that I implemented was to check the objects directory for the object, and if it was not there, to look for a quarantine directory and try there. However, that feels fairly inefficient. For the curious, the library and solution I attempted are both here [5] If anyone could help direct me to find specifically how git looks for objects in the repository, I would be very grateful [1] https://git-scm.com/docs/gitrepository-layout [2] https://git-scm.com/book/en/v2/Git-Internals-Git-Objects [3] https://git-scm.com/docs/githooks [4] https://git-scm.com/docs/git-receive-pack [5] https://github.com/src-d/go-git/pull/887 [-- Attachment #2: Type: text/html, Size: 1790 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding quarantine environments 2018-08-02 17:58 Question regarding quarantine environments Liam Decker @ 2018-08-02 18:39 ` Jeff King 2018-08-03 2:49 ` Jonathan Nieder 2018-08-03 12:56 ` Ævar Arnfjörð Bjarmason 0 siblings, 2 replies; 7+ messages in thread From: Jeff King @ 2018-08-02 18:39 UTC (permalink / raw) To: Liam Decker; +Cc: git On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote: > I've been working on a git hook in golang recently. However, the library I > was using did not support a possible quarantine directory, which I would > use for my hook. > > I have been trying to find out how git finds this incoming directory in the > objects folder, as their code simply assumed it resided in > .git/objects/<1st byte>/<last 19 bytes> When you're running a hook inside the quarantine environment, then $GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main repository object directory (possibly alongside other alternates, if there were any already set). Any Git commands you run should therefore find objects from either location, but any writes would go to the quarantine (most notably, Git's own index-pack/unpack-objects processes, which is the point of the quarantine in the first place). > The solution that I implemented was to check the objects directory for the > object, and if it was not there, to look for a quarantine directory and try > there. However, that feels fairly inefficient. That's more or less what Git will do under the hood (though in the opposite order). > For the curious, the library and solution I attempted are both here [5] Just skimming, but it sounds like go-git does not support the GIT_OBJECT_DIRECTORY environment variable. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding quarantine environments 2018-08-02 18:39 ` Jeff King @ 2018-08-03 2:49 ` Jonathan Nieder 2018-08-03 12:56 ` Ævar Arnfjörð Bjarmason 1 sibling, 0 replies; 7+ messages in thread From: Jonathan Nieder @ 2018-08-03 2:49 UTC (permalink / raw) To: Jeff King; +Cc: Liam Decker, git Hi, Jeff King wrote: > On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote: >> The solution that I implemented was to check the objects directory for the >> object, and if it was not there, to look for a quarantine directory and try >> there. However, that feels fairly inefficient. > > That's more or less what Git will do under the hood (though in the > opposite order). > >> For the curious, the library and solution I attempted are both here [5] > > Just skimming, but it sounds like go-git does not support the > GIT_OBJECT_DIRECTORY environment variable. To be clear: we don't guarantee that the quarantine directory in the future will be where it is today. So as Peff hinted, supporting GIT_OBJECT_DIRECTORY in go-git is likely to be the best way forward for your tool. Thanks and hope that helps, Jonathan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding quarantine environments 2018-08-02 18:39 ` Jeff King 2018-08-03 2:49 ` Jonathan Nieder @ 2018-08-03 12:56 ` Ævar Arnfjörð Bjarmason 2018-08-03 13:00 ` Jeff King 1 sibling, 1 reply; 7+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-08-03 12:56 UTC (permalink / raw) To: Jeff King; +Cc: Liam Decker, git On Thu, Aug 02 2018, Jeff King wrote: > On Thu, Aug 02, 2018 at 12:58:52PM -0500, Liam Decker wrote: > >> I've been working on a git hook in golang recently. However, the library I >> was using did not support a possible quarantine directory, which I would >> use for my hook. >> >> I have been trying to find out how git finds this incoming directory in the >> objects folder, as their code simply assumed it resided in >> .git/objects/<1st byte>/<last 19 bytes> > > When you're running a hook inside the quarantine environment, then > $GIT_OBJECT_DIRECTORY in the environment will be set to the quarantine > directory, and $GIT_ALTERNATE_OBJECT_DIRECTORIES will point to the main > repository object directory (possibly alongside other alternates, if > there were any already set). > > Any Git commands you run should therefore find objects from either > location, but any writes would go to the quarantine (most notably, Git's > own index-pack/unpack-objects processes, which is the point of the > quarantine in the first place). To add to this, one interesting thing that you can do with hooks because of this quarantine is to answer certain questions about the push that were prohibitively expensive before it existed, but there's no explicit documentation for this. E.g. for a hook that wants to ban big blobs in the repo, but wants to allow all existing blobs (you don't want to block e.g. a revert of a commit that removed it from the checkout), you can juggle these two env variables and hide the "main" object dir from the hook for some operations, so e.g. if a blob lookup succeeds in the alternate quarantine dir, but not the main object dir, you know it's new. >> The solution that I implemented was to check the objects directory for the >> object, and if it was not there, to look for a quarantine directory and try >> there. However, that feels fairly inefficient. > > That's more or less what Git will do under the hood (though in the > opposite order). > >> For the curious, the library and solution I attempted are both here [5] > > Just skimming, but it sounds like go-git does not support the > GIT_OBJECT_DIRECTORY environment variable. > > -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding quarantine environments 2018-08-03 12:56 ` Ævar Arnfjörð Bjarmason @ 2018-08-03 13:00 ` Jeff King 2018-08-03 13:25 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 7+ messages in thread From: Jeff King @ 2018-08-03 13:00 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Liam Decker, git On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Any Git commands you run should therefore find objects from either > > location, but any writes would go to the quarantine (most notably, Git's > > own index-pack/unpack-objects processes, which is the point of the > > quarantine in the first place). > > To add to this, one interesting thing that you can do with hooks because > of this quarantine is to answer certain questions about the push that > were prohibitively expensive before it existed, but there's no explicit > documentation for this. > > E.g. for a hook that wants to ban big blobs in the repo, but wants to > allow all existing blobs (you don't want to block e.g. a revert of a > commit that removed it from the checkout), you can juggle these two env > variables and hide the "main" object dir from the hook for some > operations, so e.g. if a blob lookup succeeds in the alternate > quarantine dir, but not the main object dir, you know it's new. I'd be a bit careful with that, though, as the definition of "new" is vague there. For example, completing a thin pack may mean that the receiver creates a copy of a base object found in the main repo. That object isn't new as part of the push, nor was it even sent on the wire, but it will appear in the quarantine directory. But only sometimes, depending on whether we kept the sender's pack or exploded it to loose objects. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding quarantine environments 2018-08-03 13:00 ` Jeff King @ 2018-08-03 13:25 ` Ævar Arnfjörð Bjarmason 2018-08-03 13:29 ` Jeff King 0 siblings, 1 reply; 7+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2018-08-03 13:25 UTC (permalink / raw) To: Jeff King; +Cc: Liam Decker, git On Fri, Aug 03 2018, Jeff King wrote: > On Fri, Aug 03, 2018 at 02:56:11PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > Any Git commands you run should therefore find objects from either >> > location, but any writes would go to the quarantine (most notably, Git's >> > own index-pack/unpack-objects processes, which is the point of the >> > quarantine in the first place). >> >> To add to this, one interesting thing that you can do with hooks because >> of this quarantine is to answer certain questions about the push that >> were prohibitively expensive before it existed, but there's no explicit >> documentation for this. >> >> E.g. for a hook that wants to ban big blobs in the repo, but wants to >> allow all existing blobs (you don't want to block e.g. a revert of a >> commit that removed it from the checkout), you can juggle these two env >> variables and hide the "main" object dir from the hook for some >> operations, so e.g. if a blob lookup succeeds in the alternate >> quarantine dir, but not the main object dir, you know it's new. > > I'd be a bit careful with that, though, as the definition of "new" is > vague there. > > For example, completing a thin pack may mean that the receiver creates a > copy of a base object found in the main repo. That object isn't new as > part of the push, nor was it even sent on the wire, but it will appear > in the quarantine directory. But only sometimes, depending on whether we > kept the sender's pack or exploded it to loose objects. Right, I mean: is_new = !in_quarantine() && in_main() Or: is_new = !in_main() Should work, in the latter case if the object really is missing from the quarnatine too, other fsck bits will stop the push. But as you point out: is_new = in_quarantine() Cannot be relied upon, although it'll be true most of the time. Perhaps I'm missing some edge case above, but I wanted to reword it to make sure I understood it correctly (and perhaps you have a correction). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Question regarding quarantine environments 2018-08-03 13:25 ` Ævar Arnfjörð Bjarmason @ 2018-08-03 13:29 ` Jeff King 0 siblings, 0 replies; 7+ messages in thread From: Jeff King @ 2018-08-03 13:29 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason; +Cc: Liam Decker, git On Fri, Aug 03, 2018 at 03:25:08PM +0200, Ævar Arnfjörð Bjarmason wrote: > > I'd be a bit careful with that, though, as the definition of "new" is > > vague there. > > > > For example, completing a thin pack may mean that the receiver creates a > > copy of a base object found in the main repo. That object isn't new as > > part of the push, nor was it even sent on the wire, but it will appear > > in the quarantine directory. But only sometimes, depending on whether we > > kept the sender's pack or exploded it to loose objects. > > Right, I mean: > > is_new = !in_quarantine() && in_main() > > Or: > > is_new = !in_main() > > Should work, in the latter case if the object really is missing from the > quarnatine too, other fsck bits will stop the push. Ah, OK. Yes, I agree that should work to cover new objects (including ones that the other side but aren't actually needed to update the refs, though hopefully that is rare). There may also be other object stores, if the main repository used alternates (or if somebody set GIT_ALTERNATE_OBJECT_DIRECTORIES). You can probably disregard that, though, as: 1. If you ignore the main repo, presumably you ignore its recursive info/alternates, too. 2. The easy mechanism for ignoring the main repo is to ignore GIT_ALTERNATE_OBJECT_DIRECTORIES, so you'd already be handling that. > But as you point out: > > is_new = in_quarantine() > > Cannot be relied upon, although it'll be true most of the time. > > Perhaps I'm missing some edge case above, but I wanted to reword it to > make sure I understood it correctly (and perhaps you have a correction). Nope, I just didn't think through what you were saying carefully enough. ;) -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-08-03 13:30 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-02 17:58 Question regarding quarantine environments Liam Decker 2018-08-02 18:39 ` Jeff King 2018-08-03 2:49 ` Jonathan Nieder 2018-08-03 12:56 ` Ævar Arnfjörð Bjarmason 2018-08-03 13:00 ` Jeff King 2018-08-03 13:25 ` Ævar Arnfjörð Bjarmason 2018-08-03 13:29 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).