git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Toon Claes <toon@iotcl.com>,
	Karthik Nayak <karthik.188@gmail.com>,
	 Taylor Blau <me@ttaylorr.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH v3 05/11] builtin/cat-file: support "blob:limit=" objects filter
Date: Wed, 02 Apr 2025 13:13:40 +0200	[thread overview]
Message-ID: <20250402-pks-cat-file-object-type-filter-v3-5-4da6bb21871c@pks.im> (raw)
In-Reply-To: <20250402-pks-cat-file-object-type-filter-v3-0-4da6bb21871c@pks.im>

Implement support for the "blob:limit=" filter in git-cat-file(1), which
causes us to omit all blobs that are bigger than a certain size.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/git-cat-file.adoc |  5 +++++
 builtin/cat-file.c              | 15 ++++++++++++++-
 t/t1006-cat-file.sh             | 18 +++++++++++++++---
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-cat-file.adoc b/Documentation/git-cat-file.adoc
index afcdb0a4738..48e05e1af52 100644
--- a/Documentation/git-cat-file.adoc
+++ b/Documentation/git-cat-file.adoc
@@ -91,6 +91,11 @@ OPTIONS
 	printed at all. The '<filter-spec>' may be one of the following:
 +
 The form '--filter=blob:none' omits all blobs.
++
+The form '--filter=blob:limit=<n>[kmg]' omits blobs of size at least n
+bytes or units.  n may be zero.  The suffixes k, m, and g can be used to name
+units in KiB, MiB, or GiB.  For example, 'blob:limit=1k' is the same as
+'blob:limit=1024'.
 
 --path=<path>::
 	For use with `--textconv` or `--filters`, to allow specifying an object
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index bcceb646f85..629c6cddcb2 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -483,8 +483,11 @@ static void batch_object_write(const char *obj_name,
 		int ret;
 
 		if (use_mailmap ||
-		    opt->objects_filter.choice == LOFC_BLOB_NONE)
+		    opt->objects_filter.choice == LOFC_BLOB_NONE ||
+		    opt->objects_filter.choice == LOFC_BLOB_LIMIT)
 			data->info.typep = &data->type;
+		if (opt->objects_filter.choice == LOFC_BLOB_LIMIT)
+			data->info.sizep = &data->size;
 
 		if (pack)
 			ret = packed_object_info(the_repository, pack, offset,
@@ -509,6 +512,15 @@ static void batch_object_write(const char *obj_name,
 				return;
 			}
 			break;
+		case LOFC_BLOB_LIMIT:
+			if (data->type == OBJ_BLOB &&
+			    data->size >= opt->objects_filter.blob_limit_value) {
+				if (!opt->all_objects)
+					report_object_status(opt, obj_name,
+							     &data->oid, "excluded");
+				return;
+			}
+			break;
 		default:
 			BUG("unsupported objects filter");
 		}
@@ -1049,6 +1061,7 @@ int cmd_cat_file(int argc,
 	case LOFC_DISABLED:
 		break;
 	case LOFC_BLOB_NONE:
+	case LOFC_BLOB_LIMIT:
 		if (!batch.enabled)
 			usage(_("objects filter only supported in batch mode"));
 		break;
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7404c135b1e..4f14840b71a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -1356,11 +1356,12 @@ test_expect_success PERL '--batch-command info is unbuffered by default' '
 test_expect_success 'setup for objects filter' '
 	git init repo &&
 	(
-		# Seed the repository with three different sets of objects:
+		# Seed the repository with four different sets of objects:
 		#
 		#   - The first set is fully packed and has a bitmap.
 		#   - The second set is packed, but has no bitmap.
 		#   - The third set is loose.
+		#   - The fourth set is loose and contains big objects.
 		#
 		# This ensures that we cover all these types as expected.
 		cd repo &&
@@ -1368,7 +1369,14 @@ test_expect_success 'setup for objects filter' '
 		git repack -Adb &&
 		test_commit second &&
 		git repack -d &&
-		test_commit third
+		test_commit third &&
+
+		for n in 1000 10000
+		do
+			printf "%"$n"s" X >large.$n || return 1
+		done &&
+		git add large.* &&
+		git commit -m fourth
 	)
 '
 
@@ -1380,7 +1388,7 @@ test_expect_success 'objects filter with unknown option' '
 	test_cmp expect err
 '
 
-for option in blob:limit=1 object:type=tag sparse:oid=1234 tree:1 sparse:path=x
+for option in object:type=tag sparse:oid=1234 tree:1 sparse:path=x
 do
 	test_expect_success "objects filter with unsupported option $option" '
 		case "$option" in
@@ -1435,5 +1443,9 @@ test_objects_filter () {
 }
 
 test_objects_filter "blob:none"
+test_objects_filter "blob:limit=1"
+test_objects_filter "blob:limit=500"
+test_objects_filter "blob:limit=1000"
+test_objects_filter "blob:limit=1k"
 
 test_done

-- 
2.49.0.604.gff1f9ca942.dirty


  parent reply	other threads:[~2025-04-02 11:13 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-21  7:47 [PATCH 0/9] builtin/cat-file: allow filtering objects in batch mode Patrick Steinhardt
2025-02-21  7:47 ` [PATCH 1/9] builtin/cat-file: rename variable that tracks usage Patrick Steinhardt
2025-02-21  7:47 ` [PATCH 2/9] builtin/cat-file: wire up an option to filter objects Patrick Steinhardt
2025-02-26 15:20   ` Toon Claes
2025-02-28 10:51     ` Patrick Steinhardt
2025-02-28 17:44       ` Junio C Hamano
2025-03-03 10:40         ` Patrick Steinhardt
2025-02-27 11:20   ` Karthik Nayak
2025-02-21  7:47 ` [PATCH 3/9] builtin/cat-file: support "blob:none" objects filter Patrick Steinhardt
2025-02-26 15:22   ` Toon Claes
2025-02-27 11:26   ` Karthik Nayak
2025-02-21  7:47 ` [PATCH 4/9] builtin/cat-file: support "blob:limit=" " Patrick Steinhardt
2025-02-21  7:47 ` [PATCH 5/9] builtin/cat-file: support "object:type=" " Patrick Steinhardt
2025-02-26 15:23   ` Toon Claes
2025-02-28 10:51     ` Patrick Steinhardt
2025-02-21  7:47 ` [PATCH 6/9] pack-bitmap: expose function to iterate over bitmapped objects Patrick Steinhardt
2025-02-24 18:05   ` Junio C Hamano
2025-02-25  6:59     ` Patrick Steinhardt
2025-02-25 16:59       ` Junio C Hamano
2025-02-27 23:26       ` Taylor Blau
2025-02-28 10:54         ` Patrick Steinhardt
2025-02-27 23:23     ` Taylor Blau
2025-02-27 23:32       ` Junio C Hamano
2025-02-27 23:39         ` Taylor Blau
2025-02-21  7:47 ` [PATCH 7/9] pack-bitmap: introduce function to check whether a pack is bitmapped Patrick Steinhardt
2025-02-27 23:33   ` Taylor Blau
2025-02-21  7:47 ` [PATCH 8/9] builtin/cat-file: deduplicate logic to iterate over all objects Patrick Steinhardt
2025-02-21  7:47 ` [PATCH 9/9] builtin/cat-file: use bitmaps to efficiently filter by object type Patrick Steinhardt
2025-02-27 11:38   ` Karthik Nayak
2025-02-27 23:48   ` Taylor Blau
2025-03-27  9:43 ` [PATCH v2 00/10] builtin/cat-file: allow filtering objects in batch mode Patrick Steinhardt
2025-03-27  9:43   ` [PATCH v2 01/10] builtin/cat-file: rename variable that tracks usage Patrick Steinhardt
2025-04-01  9:51     ` Karthik Nayak
2025-04-02 11:13       ` Patrick Steinhardt
2025-04-07 20:25         ` Junio C Hamano
2025-03-27  9:43   ` [PATCH v2 02/10] builtin/cat-file: wire up an option to filter objects Patrick Steinhardt
2025-04-01 11:45     ` Toon Claes
2025-04-02 11:13       ` Patrick Steinhardt
2025-04-01 12:05     ` Karthik Nayak
2025-04-02 11:13       ` Patrick Steinhardt
2025-03-27  9:43   ` [PATCH v2 03/10] builtin/cat-file: support "blob:none" objects filter Patrick Steinhardt
2025-04-01 12:22     ` Karthik Nayak
2025-04-01 12:31       ` Karthik Nayak
2025-04-02 11:13         ` Patrick Steinhardt
2025-03-27  9:43   ` [PATCH v2 04/10] builtin/cat-file: support "blob:limit=" " Patrick Steinhardt
2025-03-27  9:44   ` [PATCH v2 05/10] builtin/cat-file: support "object:type=" " Patrick Steinhardt
2025-03-27  9:44   ` [PATCH v2 06/10] pack-bitmap: allow passing payloads to `show_reachable_fn()` Patrick Steinhardt
2025-04-01 12:17     ` Toon Claes
2025-04-02 11:13       ` Patrick Steinhardt
2025-03-27  9:44   ` [PATCH v2 07/10] pack-bitmap: add function to iterate over filtered bitmapped objects Patrick Steinhardt
2025-03-27  9:44   ` [PATCH v2 08/10] pack-bitmap: introduce function to check whether a pack is bitmapped Patrick Steinhardt
2025-04-01 11:46     ` Toon Claes
2025-04-02 11:13       ` Patrick Steinhardt
2025-03-27  9:44   ` [PATCH v2 09/10] builtin/cat-file: deduplicate logic to iterate over all objects Patrick Steinhardt
2025-04-01 12:13     ` Toon Claes
2025-04-02 11:13       ` Patrick Steinhardt
2025-04-03 18:24         ` Toon Claes
2025-03-27  9:44   ` [PATCH v2 10/10] builtin/cat-file: use bitmaps to efficiently filter by object type Patrick Steinhardt
2025-04-02 11:13 ` [PATCH v3 00/11] builtin/cat-file: allow filtering objects in batch mode Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 01/11] builtin/cat-file: rename variable that tracks usage Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 02/11] builtin/cat-file: introduce function to report object status Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 03/11] builtin/cat-file: wire up an option to filter objects Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 04/11] builtin/cat-file: support "blob:none" objects filter Patrick Steinhardt
2025-04-02 11:13   ` Patrick Steinhardt [this message]
2025-04-02 11:13   ` [PATCH v3 06/11] builtin/cat-file: support "object:type=" " Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 07/11] pack-bitmap: allow passing payloads to `show_reachable_fn()` Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 08/11] pack-bitmap: add function to iterate over filtered bitmapped objects Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 09/11] pack-bitmap: introduce function to check whether a pack is bitmapped Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 10/11] builtin/cat-file: deduplicate logic to iterate over all objects Patrick Steinhardt
2025-04-02 11:13   ` [PATCH v3 11/11] builtin/cat-file: use bitmaps to efficiently filter by object type Patrick Steinhardt
2025-04-03  8:17   ` [PATCH v3 00/11] builtin/cat-file: allow filtering objects in batch mode Karthik Nayak
2025-04-08  0:32     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250402-pks-cat-file-object-type-filter-v3-5-4da6bb21871c@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=karthik.188@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=toon@iotcl.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).