From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-yw1-f193.google.com (mail-yw1-f193.google.com [209.85.128.193]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A5337396580 for ; Wed, 25 Mar 2026 23:13:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.193 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774480440; cv=none; b=IQ3rPAi+HNulcwDGedsyHKje7mSeN3jvTPNCLnb6z6HwRnSKe/Sfak4NASDAHPY8HNlW8OZcdDOFrW6VxLzjqlLWWQ0DWmYtFPSzbeLisarhPc8pEd8ZQENJTR7Zvx+hMPPNzPAYf2F6Ebn1ywN67B2C9Nc7MYArO4/HiUsEr2w= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774480440; c=relaxed/simple; bh=Q3jTCizlfLfhmTNwbE4ex8PT7MGgM/8sZw1oiDNz+pM=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=O+hrD5LtaA5Pzfncy65D66fk9RLVBhPS2eWBVT5Qa8UTx/ssf+LmVIMYYrv6YsSW3xRoGMnzWqW2HMSkArx89U8UzragZ8GsCUBlxwrDVMR8gLzy4ngWStl4ixpHXZKVrpydfk0dyfxXM2aBkLmQImTNi6NuHoyFgx9GBUzvhmY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ttaylorr.com; spf=pass smtp.mailfrom=ttaylorr.com; dkim=pass (2048-bit key) header.d=ttaylorr.com header.i=@ttaylorr.com header.b=UuxB3cjn; arc=none smtp.client-ip=209.85.128.193 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=ttaylorr.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr.com header.i=@ttaylorr.com header.b="UuxB3cjn" Received: by mail-yw1-f193.google.com with SMTP id 00721157ae682-79ab3e26cceso3571507b3.3 for ; Wed, 25 Mar 2026 16:13:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr.com; s=google; t=1774480438; x=1775085238; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=ZxjR+/BhbfFNd5RxP3ANQwFbym1dxpGyxbwmKcx9Rao=; b=UuxB3cjn1W8wtKEdRBXttIaobsSPxdZ+u71FkMYqLiw+pW8dTZKp3TfP9EtSYnHYP4 46u8cLADmFRPUuUuI8pfVb37kDWmD30kIatUANV+X92rUGsYMSV+zZmKAfMGtBY44awD MnxH/NK4Sf9B/t8tKeIuuyorbgHu5IxwOnqz/oHBaV3D2ifpybgzfEIfLrZbX11WmFNX pKn2dFBnoPWHnfUReRn8p6SC+AzyszyHrOGIoknpbLuK52tb1SCSDE7OJwvhUqRnmtSv abYv7+2MgvKesOKYNgwOFkHfxrp5G3g30EDYfTtwr3kj+smqPH92UpncJKJRFSPtvnFA +nSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774480438; x=1775085238; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZxjR+/BhbfFNd5RxP3ANQwFbym1dxpGyxbwmKcx9Rao=; b=Cjo2O+kWz3t5kL7sNDVRwBpvJdO02nhyaDCC89K0Kg6J4DPr7ScSgAPTqQb3leA7g5 MdHk90Uzsuvp/ebe2ESXh/y/4/z0Ea04rofC6HOu2GqP9DmmGZZrGpN+AUiqlI1wdBxl QEGih037MFlu6/Z7YxDZiYcCN3tPvIPs3m5E9cWBG+qDn318yD94djL+ajLHwrH+BkZO FBitdEISnkCmMvs4Ij4+a+0syivvsYZ0u3U/Oh3Rd7JRv0U8sr6LMTKsNh8nUBtcy092 +Q4u6rUGd9zykiqN9LQ0yv34FZg9Uiz000D61S8AX7H7FNMF7WhE24WVj2zK3lchj2vT Uk8Q== X-Gm-Message-State: AOJu0Yxc7dwZwrZvZ7I0ie//iSmQ2q7iRpVnRTHZOoxKK7KyAnbPjU5q g9ek895dntlSYrejPDQ7k8BSsiQRan/qxcfLLXn7UZW1ypfu5dOd2Z1AqPK6N//4ifA= X-Gm-Gg: ATEYQzzqxSGdML4Hcl+e/d8HyWfNs7aEBosS+Z8r+Mh1objp9M+FhL3qN2R1GX65QlB tFp3U/UGYhTw7ClNdVed4/pt43yE3Raoz1Wjd0kP/m6k1sWD+YonlIPgTo1KZQDq9ZoKpwz9BPE MN+QKXQG9AQsNvxN+/xM5j71BoggvYsbD7INJKSiYHaTkpWzxwCiggrzEfFunNyDBf57SwFN3ET NLYtRExF1qMZwSKiGgVGgZ7fn0V7Nx7bvQqtmIjLikVo6TWY/tcC73r/v1iYsW6vU/lGwibqawK mGKVTKBx9kZPpjJqH3rWMOIf1C3utEP1E1J71jcqd3qVOD8gRO7oxctQls9ahh4Fo0d6BuSOWyZ JOXXsuocsS05IThp8amIugIGMHmdYRJmmsEnZYnHEn9WYscJZ21RbcOpAAKmYGSRZfpUY0t47a9 6gWitIHsU5KkuV7K+4JEAphi/utiQF3douTN1yG8QdUp+B3BK9fGQfxI1v/RyORwkcVh655i+Qo GLZOK1d3sgR1qc45+O54sgBiVGWUw== X-Received: by 2002:a05:690c:81:b0:79a:8eee:fa3a with SMTP id 00721157ae682-79acf62cca4mr58518647b3.25.1774480437580; Wed, 25 Mar 2026 16:13:57 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id 00721157ae682-79b1e41e735sm4843317b3.45.2026.03.25.16.13.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Mar 2026 16:13:57 -0700 (PDT) Date: Wed, 25 Mar 2026 19:13:56 -0400 From: Taylor Blau To: Patrick Steinhardt Cc: git@vger.kernel.org, Junio C Hamano , Jeff King , Elijah Newren Subject: Re: [PATCH 2/5] pack-objects: refactor `read_packs_list_from_stdin()` to use `strmap` Message-ID: References: Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: On Tue, Mar 24, 2026 at 08:39:28AM +0100, Patrick Steinhardt wrote: > > Extract the logic for sorting packs by mtime and adding their objects > > into a separate `stdin_packs_add_entries()` helper. > > Right, the ordering was my first question. Interestingly though, that > function doesn't seem to be added in this commit... ah, it's called > `stdin_packs_add_pack_entries()`. Ah, good catch. I had originally called it `stdin_packs_add_entries()` but renamed it before sending, apparently without adjusting the commit message appropriately. > > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > > index 9a89bc5c4c9..72c9ddbed6b 100644 > > --- a/builtin/pack-objects.c > > +++ b/builtin/pack-objects.c > > @@ -3837,90 +3838,120 @@ static int pack_mtime_cmp(const void *_a, const void *_b) > > return 0; > > } > > > > -static void read_packs_list_from_stdin(struct rev_info *revs) > > +struct stdin_pack_info { > > + struct packed_git *p; > > + enum { > > + STDIN_PACK_INCLUDE = (1<<0), > > + STDIN_PACK_EXCLUDE_CLOSED = (1<<1), > > It might make sense to provide a sentence for each of the enums to > explain what they do. I'm not opposed, but I am not sure what information would be helpful to add here, since these correspond one-to-one with the three possible prefixes for packfile names we receive with --stdin-packs. > > +static void stdin_packs_add_pack_entries(struct strmap *packs, > > + struct rev_info *revs) > > +{ > > + struct string_list keys = STRING_LIST_INIT_NODUP; > > + struct string_list_item *item; > > + struct hashmap_iter iter; > > + struct strmap_entry *entry; > > + > > + strmap_for_each_entry(packs, &iter, entry) { > > + struct stdin_pack_info *info = entry->value; > > + if (!info->p) > > + die(_("could not find pack '%s'"), entry->key); > > + > > + string_list_append(&keys, entry->key)->util = info->p; > > + } > > + > > + /* > > + * Order packs by ascending mtime; use QSORT directly to access the > > + * string_list_item's ->util pointer, which string_list_sort() does not > > + * provide. > > + */ > > + QSORT(keys.items, keys.nr, pack_mtime_cmp); > > Okay. I was briefly wondering whether it would make more sense to use > `string_list_sort()`, but I guess it doesn't buy us much. Yeah. This is actually carried forward from the existing implementation, and uses the separate QSORT() because `string_list_sort()` doesn't provide access to the `util` field of the items, which we need to sort by mtime. > > + for_each_string_list_item(item, &keys) { > > + struct stdin_pack_info *info = strmap_get(packs, item->string); > > We could avoid this extra lookup if you instead were to store the pack > info in the `item->util` field. Good idea. Funnily enough, we already assign ->util = info->p in the loop above, but never use it. Something like this on top should clean things up nicely: --- 8< --- diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 72c9ddbed6b..c9b33d1673d 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3859,7 +3859,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs, if (!info->p) die(_("could not find pack '%s'"), entry->key); - string_list_append(&keys, entry->key)->util = info->p; + string_list_append(&keys, entry->key)->util = info; } /* @@ -3870,9 +3870,7 @@ static void stdin_packs_add_pack_entries(struct strmap *packs, QSORT(keys.items, keys.nr, pack_mtime_cmp); for_each_string_list_item(item, &keys) { - struct stdin_pack_info *info = strmap_get(packs, item->string); - if (!info->p) - die(_("could not find pack '%s'"), item->string); + struct stdin_pack_info *info = item->util; if (info->kind & STDIN_PACK_INCLUDE) for_each_object_in_pack(info->p, --- >8 --- > > + if (!info->p) > > + die(_("could not find pack '%s'"), item->string); > > This case basically cannot happen as we already `die()` further up, > right? Should we rather `BUG()` or drop the check completely? I think we should drop the check completely here, there's no way that we would have a NULL 'info->p' by this point with the check that exists a few lines up. > > + if (info->kind & STDIN_PACK_INCLUDE) > > + for_each_object_in_pack(info->p, > > + add_object_entry_from_pack, > > + revs, > > + ODB_FOR_EACH_OBJECT_PACK_ORDER); > > + } > > + > > + string_list_clear(&keys, 0); > > +} > > + > > +static void stdin_packs_read_input(struct rev_info *revs) > > { > > struct strbuf buf = STRBUF_INIT; > > - struct string_list include_packs = STRING_LIST_INIT_DUP; > > - struct string_list exclude_packs = STRING_LIST_INIT_DUP; > > - struct string_list_item *item = NULL; > > + struct strmap packs = STRMAP_INIT; > > struct packed_git *p; > > > > while (strbuf_getline(&buf, stdin) != EOF) { > > - if (!buf.len) > > + struct stdin_pack_info *info; > > + const char *key = buf.buf; > > + > > + if (!key || !*key) > > The first case of `!key` cannot ever happen as strbufs always have `buf` > set. You're right, this is just muscle memory, but the left-hand side of the condition is unnecessary. I'll remove it. > > continue; > > > > + if (*key == '^') > > + key++; > > + > > + info = strmap_get(&packs, key); > > + if (!info) { > > + CALLOC_ARRAY(info, 1); > > + strmap_put(&packs, key, info); > > + } > > + > > if (*buf.buf == '^') > > - string_list_append(&exclude_packs, buf.buf + 1); > > + info->kind |= STDIN_PACK_EXCLUDE_CLOSED; > > else > > - string_list_append(&include_packs, buf.buf); > > + info->kind |= STDIN_PACK_INCLUDE; > > I was briefly wondering whether we need error handling for the case > where a pack is marked both as excluded and included. But we didn't have > it beforehand, either. Yeah, I think this is a consequence of 752b465c3c0 (pack-objects: fix error when same packfile is included and excluded, 2023-04-14). > > [snip] > > + > > + /* > > + * Arguments we got on stdin may not even be > > + * packs. First check that to avoid segfaulting > > + * later on in e.g. pack_mtime_cmp(), excluded > > + * packs are handled below. > > + */ > > + if (!is_pack_valid(p)) > > + die(_("packfile %s cannot be accessed"), p->pack_name); > > Hm. Doesn't this change behaviour though? Beforehand, we would have > checked the packfile for every included pack. Now we only check the > packfile for every included pack that was yielded by > `repo_for_each_pack()`. So if an included pack wasn't yielded at all we > wouldn't notice that it doesn't exist? > > I guess an easy fix would be to mark every pack that we have processed > as seen in the pack info, and then loop over all pack infos a second > time to verify that we've seen all that we expected to see. > > Which you in fact already do :) That post-processing happens in > `stdin_packs_add_pack_entries()`, where you verify that the `p` pointer > is set as expected. And if it's not we die with a message that the pack > wasn't found. Good. Thanks for double checking. > This was a bit more demanding to review, but I very much like the > outcome of this. Yeah, I really struggled to try and find a productive way to break this up into smaller changes. But in the end I couldn't find any good splits that I liked, hence the larger-than-usual patch. Thanks for reviewing it, I think that it makes the rest of the series a little more palatable, and the resulting code is easier to reason about IMHO. Thanks, Taylor