All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, ethomson@edwardthomson.com,
	jonathantanmy@google.com, jrnieder@gmail.com,
	jeffhost@microsoft.com
Subject: [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter
Date: Thu, 13 Jul 2017 17:34:45 +0000	[thread overview]
Message-ID: <20170713173459.3559-6-git@jeffhostetler.com> (raw)
In-Reply-To: <20170713173459.3559-1-git@jeffhostetler.com>

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a filter for traverse_commit_list_filtered() to omit
blobs larger than a requested size from the result, but always
include ".git*" special files.

This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 list-objects-filters.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filters.h | 12 +++++++
 2 files changed, 109 insertions(+)

diff --git a/list-objects-filters.c b/list-objects-filters.c
index f29d8bc..f04d70e 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -83,3 +83,100 @@ void traverse_commit_list_omit_all_blobs(
 
 	oidset2_clear(&d.omits);
 }
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+struct filter_omit_large_blobs_data {
+	struct oidset2 omits;
+	int64_t max_bytes;
+};
+
+static list_objects_filter_result filter_omit_large_blobs(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_omit_large_blobs_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	unsigned long s;
+	enum object_type t;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		/*
+		 * If previously provisionally omitted (because of size), see if the
+		 * current filename is special and force it to be included.
+		 */
+		if (oidset2_contains(&filter_data->omits, &obj->oid)) {
+			if ((strncmp(filename, ".git", 4) == 0) && filename[4]) {
+				oidset2_remove(&filter_data->omits, &obj->oid);
+				return LOFR_MARK_SEEN | LOFR_SHOW;
+			}
+			return LOFR_ZERO; /* continue provisionally omitting it */
+		}
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		if (object_length < filter_data->max_bytes)
+			return LOFR_MARK_SEEN | LOFR_SHOW;
+
+		/*
+		 * Provisionally omit it.  We've already established that this blob
+		 * is too big and doesn't have a special filename, so we WANT to
+		 * omit it.  However, there may be a special file elsewhere in the
+		 * tree that references this same blob, so we cannot reject it yet.
+		 * Leave the LOFR_ bits unset so that if the blob appears again in
+		 * the traversal, we will be asked again.
+		 *
+		 * No need for a pathname, since we only test for special filenames
+		 * above.
+		 */
+		oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+			       NULL);
+		return LOFR_ZERO;
+	}
+}
+
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit)
+{
+	struct filter_omit_large_blobs_data d;
+
+	memset(&d, 0, sizeof(d));
+	d.max_bytes = large_byte_limit;
+
+	traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+				      filter_omit_large_blobs, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index b981020..32b2833 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -14,4 +14,16 @@ void traverse_commit_list_omit_all_blobs(
 	oidset2_foreach_cb print_omitted_object,
 	void *ctx_data);
 
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit);
+
 #endif /* LIST_OBJECTS_FILTERS_H */
-- 
2.9.3


  parent reply	other threads:[~2017-07-13 17:37 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` Jeff Hostetler [this message]
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170713173459.3559-6-git@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=ethomson@edwardthomson.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.