From: Derrick Stolee <derrickstolee@github.com>
To: "Junio C Hamano" <gitster@pobox.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
git@vger.kernel.org, me@ttaylorr.com
Subject: Re: [PATCH] object-file: reprepare alternates when necessary
Date: Wed, 8 Mar 2023 08:29:18 -0500 [thread overview]
Message-ID: <64903e15-014f-a81e-26dc-83e16b632b2b@github.com> (raw)
In-Reply-To: <xmqqttyw1ndr.fsf@gitster.g>
On 3/7/2023 12:29 PM, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> But in fact we've been doing the locking since 6c307626f1e (grep:
>> protect packed_git [re-]initialization, 2020-01-15). So the only thing
>> that really needs justification here is that putting the alternates
>> re-reading under the same lock
>>
>> There is a really interesting potential caveat here which you're not
>> discussing, which is...
>>> ...
>>> +void reprepare_alt_odb(struct repository *r)
>>> +{
>>> + r->objects->loaded_alternates = 0;
>>> + prepare_alt_odb(r);
>>> +}
>>> +
>>> ...
>> This seems reasonable, but wouldn't this do the same without introducing
>> an API function just for this one use-case?
>>
>> That's of course a nit, and you seem to have been adding this for
>> consistency with reprepare_packed_git(), but it already "owns" the
>> "approximate_object_count_valid" and "packed_git_initialized" flags in
>> "struct raw_object_store".
>>
>> So as we'll only need this from reprepare_packed_git() isn't it better
>> to declare that "loaded_alternates" is another such flag?
>
> I am not sure I got what you want to say 100%, but if you are saying
> that this "drop the 'loaded' flag and force prepare_*() function to
> redo its thing" must not be done only in reprepare_packed_git(), and
> that inlining the code there without introducing a helper function
> that anybody can casually call without thinking its consequenced
> through, then I tend to agree in principle. But it is just as easy
> to lift two lines of code from the rewritten/inlined code to a new
> place---to ensure people follow the obj_read_lock() rule, the
> comment before it may have to be a bit stronger, I wonder?
The fact that we do it in a lock in reprepare_packed_git() in the
only current caller does raise suspicion that someone could call it
later and not realize that the lock could be helpful. Inlining the
change within reprepare_packed_git() makes the most sense here
instead of mimicking the pattern.
>> Perhaps not, but as the resulting patch is much shorter it seems worth
>> considering.
The shortness of the patch is metric of quality, to me. The other
reason (we might introduce a footgun) is more interesting.
>> ...but to continue the above, the *really* important thing here (and
>> correct me if I'm wrong) is that we really need to *first* prepare the
>> alternates, and *then* do the rest, as our new alternates might point to
>> new loose objects and packs.
>
> Yes, and as Derrick explained in another message, we only have to
> worry about new ones getting added, not existing ones going away.
I'll be sure to clarify that in my next version.
>> So with both of the above (the same could be done with your new helper)
>> something like this would IMO make that much clearer:
>>
>> diff --git a/packfile.c b/packfile.c
>> index 79e21ab18e7..50cb46ca4b7 100644
>> --- a/packfile.c
>> +++ b/packfile.c
>> @@ -1008,6 +1008,13 @@ void reprepare_packed_git(struct repository *r)
>> struct object_directory *odb;
>>
>> obj_read_lock();
>> + /*
>> + * New alternates might point to new loose & pack dirs, so we
>> + * must first read those.
>> + */
>> + r->objects->loaded_alternates = 0;
>> + prepare_alt_odb(r);
>> +
>> for (odb = r->objects->odb; odb; odb = odb->next)
>> odb_clear_loose_cache(odb);
>>
>> And, I think this is an exsiting edge case, but we're only locking the
>> ODB of the "parent" repository in this case, so if we have alternates in
>> play aren't we going to racily compute the rest here, the loose objects
>> and packs of the alternates we're about to consult aren't under the same
>> lock?
I don't understand what you are saying here. odb_read_lock() does not
specify a repository and is instead a global lock on reading from any
object database.
Here is its implementation:
extern int obj_read_use_lock;
extern pthread_mutex_t obj_read_mutex;
static inline void obj_read_lock(void)
{
if(obj_read_use_lock)
pthread_mutex_lock(&obj_read_mutex);
}
static inline void obj_read_unlock(void)
{
if(obj_read_use_lock)
pthread_mutex_unlock(&obj_read_mutex);
}
So I don't believe that your proposed edge case exists.
Thanks,
-Stolee
next prev parent reply other threads:[~2023-03-08 13:31 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-06 20:59 [PATCH] object-file: reprepare alternates when necessary Derrick Stolee via GitGitGadget
2023-03-06 22:54 ` Junio C Hamano
2023-03-07 0:28 ` Taylor Blau
2023-03-07 14:52 ` Derrick Stolee
2023-03-07 17:16 ` Junio C Hamano
2023-03-08 15:55 ` Taylor Blau
2023-03-08 17:13 ` Derrick Stolee
2023-03-07 11:28 ` Ævar Arnfjörð Bjarmason
2023-03-07 17:29 ` Junio C Hamano
2023-03-07 18:18 ` Junio C Hamano
2023-03-08 13:29 ` Derrick Stolee [this message]
2023-03-08 18:47 ` [PATCH v2] " Derrick Stolee via GitGitGadget
2023-03-08 19:35 ` Junio C Hamano
2023-03-08 20:47 ` Taylor Blau
2023-03-09 7:24 ` Jeff King
2023-03-09 9:06 ` Eric Wong
2023-03-10 21:29 ` Jonathan Tan
2023-03-11 0:01 ` Junio C Hamano
2023-03-11 3:09 ` Jonathan Tan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=64903e15-014f-a81e-26dc-83e16b632b2b@github.com \
--to=derrickstolee@github.com \
--cc=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitgitgadget@gmail.com \
--cc=gitster@pobox.com \
--cc=me@ttaylorr.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.