From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-dl1-f44.google.com (mail-dl1-f44.google.com [74.125.82.44]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 099673B19D2 for ; Wed, 13 May 2026 21:19:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=74.125.82.44 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778707147; cv=none; b=MXAZB8VIzmt5tLnBTY1VND2OlGBBzBYQikU29yE6khhiou0Ei2DSiIGIk8j2V5dvpDCToShKZwKmD9yYuBjzpMMNgkSLMEOvk9Jfd57BVlfpfKKC04q5WNK4EgfWc9K8pmZuVcCnOMpqGky3un0swz9ZJ79VyIAY+DoK8jK88t0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778707147; c=relaxed/simple; bh=3kPKw3Rk+xXdwDTx5X7fWm7+KiJklPi7bhfwxG2fXJU=; h=Message-Id:In-Reply-To:References:From:Date:Subject:Content-Type: MIME-Version:To:Cc; b=I7Lg6VVNHV2lZ0AHewq9TY5szwYglPFMrNANmvzfahELp/uIuNuppqLtVBsHir6Thn17bAzF8stL20UhJ7HnZhLeWxzuci1RK/Y17uAJhe7pi76sxJlYNT7rEIySFfSpXEa2DbwF33eX0A8oNhafiRX4keX0zfGISQHr8GqWJmw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gOy8jSVo; arc=none smtp.client-ip=74.125.82.44 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gOy8jSVo" Received: by mail-dl1-f44.google.com with SMTP id a92af1059eb24-12db2e415a7so5265273c88.1 for ; Wed, 13 May 2026 14:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778707145; x=1779311945; darn=vger.kernel.org; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:from:to:cc:subject:date :message-id:reply-to; bh=AfFz+oBiHcd6CYn3ou64EwQvdh1J1Q8TD+BgSrpUxBw=; b=gOy8jSVo5Vs2D6rPlPmG+6VGH29aTVkCYdmlR9NvWTdZvfyZyPT8e10s54d4f8JvjV fc1jYigpN42FLfXBIiPmoTe30d+F/a/iSr0iYWtMq+xAg6OYatqJH9EeD0p7+WINRRAO S97enaQfLKuDEhA5wfqp36m728vXYro6vejtAj4/dzoXcXBq7CckDJUi4IKi+2u5wPRM SIww0h4xMksrH0NIy/id6dAhGzVOPjlGA3DBZJckPWpFOqLRfYzqtv6tPD1Qz7rG/d0l bPLiyvDSCU5GPrA74GKoFkGjWyKMf+v11cJGhQeqKReQjHnrYo4FYK30qDf+esi7jVCM DTLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778707145; x=1779311945; h=cc:to:mime-version:content-transfer-encoding:fcc:subject:date:from :references:in-reply-to:message-id:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=AfFz+oBiHcd6CYn3ou64EwQvdh1J1Q8TD+BgSrpUxBw=; b=XTLH9Tyy1Cnxsy3tBnWOXoiOKQ4Hyt24cIaF0xedgaw+7QEptbhsEZYniBKVO78AWz 9M9P86+/p3LbgxYiWgTPKVl5tXbSaSqbIzDJO/PjhKLgqwgTYexTHPPRokIjiJ0MgSW5 v8/qwtWG5uc+KmDzpHZwrdX6Kp+dhGS7fGeQgCiOVEaNXC0CtFocSSMzdyNkUn/Xc9zi +IE14Fj47DYrfquBZQ2huPo7TP1zvCjfRTGRLusoU7o3rZ/9mAjwwhIastsrwlCeYhc2 /l8Lz7unJG9FaXVQv2tvHD4R0KqfiSLye11ZkICIMrwD6Xlwdp1NWKrBawGuwv70/ztY yiMw== X-Gm-Message-State: AOJu0YxrWdjSZwAePjWMKgozUeutretEylq25tvvw+dcsXCHRdNi1KXR /v8T+B/Fr/ZYBm2YcUAEdQZO503mPUyfWD040bJ6VbsyaxuJiCHraeiS5AkcCQaK X-Gm-Gg: Acq92OGYYiNcWknDz8iTbS2KjyxTcDx5OZvrjCY4FS2Oq2chrFNdI2JrHDMjJvfQOVt 6iNIMBKS8hkvPSEE9H0mGK9IQWRufl+pD3Mi2Z4hMKLhQ26H3AyjpcptRNL2ZY5Bq+/ujPpchcD xVEVP6II5QIMOHZhq66MCea1imXkqc+PjbeCHUTV+etS0ixvjFB2fMQFbJhX1xu+kuatCgCHvTE g4AT3Iyp8JtH4J3UdNnws7MmMUQik95fPVL/ofoNFbVt1+W30/XN9XPGz5VYrkE23pmwEiyLBgq JAaoqjjvP+MHbPzXk9MrtEwhyY2JdFoSzW6ooempoSow4AMT5gOkEmcQcDsNpaQCMsN6/A1ssSb i3A3K/cNJ+J7jraHvXZ/CPy5wtwmNgW5WQV6YcIYUalJah9FP+hQzdX2ErXSuB2iAMHH46Lb1lq Cg6crStwWQ9XE/hCLen/CIaOiJ9HsBjuLfofxk X-Received: by 2002:a05:7300:fd15:b0:2d2:96e8:1bf5 with SMTP id 5a478bee46e88-301541afb40mr2354402eec.3.1778707145084; Wed, 13 May 2026 14:19:05 -0700 (PDT) Received: from [127.0.0.1] ([20.169.77.168]) by smtp.gmail.com with ESMTPSA id 5a478bee46e88-302978b3cb2sm704691eec.30.2026.05.13.14.19.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 13 May 2026 14:19:04 -0700 (PDT) Message-Id: In-Reply-To: References: From: "Derrick Stolee via GitGitGadget" Date: Wed, 13 May 2026 21:18:46 +0000 Subject: [PATCH v4 04/13] path-walk: always emit directly-requested objects Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk X-Mailing-List: git@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 To: git@vger.kernel.org Cc: christian.couder@gmail.com, gitster@pobox.com, johannes.schindelin@gmx.de, johncai86@gmail.com, karthik.188@gmail.com, kristofferhaugsbakk@fastmail.com, me@ttaylorr.com, newren@gmail.com, peff@peff.net, ps@pks.im, Taylor Blau , Derrick Stolee , Derrick Stolee From: Derrick Stolee We are preparing to integrate the path-walk API with some --filter options in 'git pack-objects', but there is a subtle issue that is revealed when those are put together and the test suite is run with GIT_TEST_PACK_PATH_WALK=1. When a filter reduces the set of requested objects, this results in filtering out directly-requested objects, such as in the download of needed blobs in a blobless partial clone. The root cause is that the scan of pending objects in the path-walk API respects the filters set in the path_walk_info instead of overriding them for pending objects. We can tell that a path is part of the directly-referenced objects if its path name starts with '/' (other paths, including root trees never have this starting character). Create a path_is_for_direct_objects() to make this meaning clear, especially as we add more references in the future as we integrate the path-walk API with partial clone filter options. Signed-off-by: Derrick Stolee --- Documentation/technical/api-path-walk.adoc | 7 ++++ path-walk.c | 42 ++++++++++++++-------- path-walk.h | 5 +++ 3 files changed, 39 insertions(+), 15 deletions(-) diff --git a/Documentation/technical/api-path-walk.adoc b/Documentation/technical/api-path-walk.adoc index a67de1b143..6e17b13d61 100644 --- a/Documentation/technical/api-path-walk.adoc +++ b/Documentation/technical/api-path-walk.adoc @@ -48,6 +48,13 @@ commits. applications could disable some options to make it simpler to walk the objects or to have fewer calls to `path_fn`. + +Note that objects directly requested as pending objects (such as targets +of lightweight tags or other ref tips) are always emitted to `path_fn`, +even when the corresponding type flag is disabled. Only objects +discovered during the tree walk are subject to these type filters. This +ensures that objects specifically requested through the revision input +are never silently dropped. ++ While it is possible to walk only commits in this way, consumers would be better off using the revision walk API instead. diff --git a/path-walk.c b/path-walk.c index 6e426af433..05bfc1c114 100644 --- a/path-walk.c +++ b/path-walk.c @@ -248,6 +248,17 @@ static int add_tree_entries(struct path_walk_context *ctx, return 0; } +/* + * Paths starting with '/' (e.g., "/tags", "/tagged-blobs") hold objects that + * were directly requested by 'pending' objects rather than discovered during + * tree traversal. + */ +static int path_is_for_direct_objects(const char *path) +{ + ASSERT(path); + return path[0] == '/'; +} + /* * For each path in paths_to_explore, walk the trees another level * and add any found blobs to the batch (but only if they exist and @@ -306,14 +317,19 @@ static int walk_path(struct path_walk_context *ctx, if (list->type == OBJ_BLOB && ctx->revs->prune_data.nr && + !path_is_for_direct_objects(path) && !match_pathspec(ctx->repo->index, &ctx->revs->prune_data, path, strlen(path), 0, NULL, 0)) return 0; - /* Evaluate function pointer on this data, if requested. */ - if ((list->type == OBJ_TREE && ctx->info->trees) || - (list->type == OBJ_BLOB && ctx->info->blobs) || + /* + * Evaluate function pointer on this data, if requested. + * Ignore object type filters for tagged objects (path starts + * with `/`). + */ + if ((list->type == OBJ_TREE && (ctx->info->trees || path_is_for_direct_objects(path))) || + (list->type == OBJ_BLOB && (ctx->info->blobs || path_is_for_direct_objects(path))) || (list->type == OBJ_TAG && ctx->info->tags)) ret = ctx->info->path_fn(path, &list->oids, list->type, ctx->info->path_fn_data); @@ -374,10 +390,8 @@ static int setup_pending_objects(struct path_walk_info *info, if (info->tags) CALLOC_ARRAY(tags, 1); - if (info->blobs) - CALLOC_ARRAY(tagged_blobs, 1); - if (info->trees) - root_tree_list = strmap_get(&ctx->paths_to_lists, root_path); + CALLOC_ARRAY(tagged_blobs, 1); + root_tree_list = strmap_get(&ctx->paths_to_lists, root_path); /* * Pending objects include: @@ -421,8 +435,6 @@ static int setup_pending_objects(struct path_walk_info *info, switch (obj->type) { case OBJ_TREE: - if (!info->trees) - continue; if (pending->path) { char *path = *pending->path ? xstrfmt("%s/", pending->path) : xstrdup(""); @@ -435,8 +447,6 @@ static int setup_pending_objects(struct path_walk_info *info, break; case OBJ_BLOB: - if (!info->blobs) - continue; if (pending->path) add_path_to_list(ctx, pending->path, OBJ_BLOB, &obj->oid, 1); else @@ -532,15 +542,17 @@ int walk_objects_by_path(struct path_walk_info *info) push_to_stack(&ctx, root_path); /* - * Set these values before preparing the walk to catch - * lightweight tags pointing to non-commits and indexed objects. + * Ensure that prepare_revision_walk() keeps all pending objects + * even through an object type filter. */ - info->revs->blob_objects = info->blobs; - info->revs->tree_objects = info->trees; + info->revs->blob_objects = info->revs->tree_objects = 1; if (prepare_revision_walk(info->revs)) die(_("failed to setup revision walk")); + info->revs->blob_objects = info->blobs; + info->revs->tree_objects = info->trees; + /* * Walk trees to mark them as UNINTERESTING. * This is particularly important when 'edge_aggressive' is set. diff --git a/path-walk.h b/path-walk.h index 5ef5a8440e..657eeda8ec 100644 --- a/path-walk.h +++ b/path-walk.h @@ -36,6 +36,11 @@ struct path_walk_info { /** * Initialize which object types the path_fn should be called on. This * could also limit the walk to skip blobs if not set. + * + * Note: even when 'blobs' or 'trees' is disabled, objects that are + * directly requested as pending objects will still be emitted to + * path_fn. Only objects discovered during the tree walk are filtered by + * these flags. */ int commits; int trees; -- gitgitgadget