From: Jeff Hostetler <git@jeffhostetler.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, ethomson@edwardthomson.com,
jonathantanmy@google.com, jrnieder@gmail.com,
jeffhost@microsoft.com
Subject: [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter
Date: Thu, 13 Jul 2017 17:34:46 +0000 [thread overview]
Message-ID: <20170713173459.3559-7-git@jeffhostetler.com> (raw)
In-Reply-To: <20170713173459.3559-1-git@jeffhostetler.com>
From: Jeff Hostetler <jeffhost@microsoft.com>
Create a filter for traverse_commit_list_filtered() to omit the
blobs that would not be needed by a sparse checkout using the
given sparse-checkout spec.
This filter will be used in a future commit by rev-list and
pack-objects for partial/narrow clone/fetch.
A future enhancement should be able to also omit tree objects
not needed by such a sparse checkout, but that is not currently
supported.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
list-objects-filters.c | 179 +++++++++++++++++++++++++++++++++++++++++++++++++
list-objects-filters.h | 16 +++++
2 files changed, 195 insertions(+)
diff --git a/list-objects-filters.c b/list-objects-filters.c
index f04d70e..cacf645 100644
--- a/list-objects-filters.c
+++ b/list-objects-filters.c
@@ -180,3 +180,182 @@ void traverse_commit_list_omit_large_blobs(
oidset2_clear(&d.omits);
}
+
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+struct frame {
+ int defval;
+ int child_prov_omit : 1;
+};
+
+struct filter_use_sparse_data {
+ struct oidset2 omits;
+ struct exclude_list el;
+
+ size_t nr, alloc;
+ struct frame *array_frame;
+};
+
+static list_objects_filter_result filter_use_sparse(
+ list_objects_filter_type filter_type,
+ struct object *obj,
+ const char *pathname,
+ const char *filename,
+ void *filter_data_)
+{
+ struct filter_use_sparse_data *filter_data = filter_data_;
+ int64_t object_length = -1;
+ int val, dtype;
+ unsigned long s;
+ enum object_type t;
+ struct frame *frame;
+
+ switch (filter_type) {
+ default:
+ die("unkown filter_type");
+ return LOFR_ZERO;
+
+ case LOFT_BEGIN_TREE:
+ assert(obj->type == OBJ_TREE);
+ dtype = DT_DIR;
+ val = is_excluded_from_list(pathname, strlen(pathname),
+ filename, &dtype, &filter_data->el);
+ if (val < 0)
+ val = filter_data->array_frame[filter_data->nr].defval;
+
+ ALLOC_GROW(filter_data->array_frame, filter_data->nr + 1,
+ filter_data->alloc);
+ filter_data->nr++;
+ filter_data->array_frame[filter_data->nr].defval = val;
+ filter_data->array_frame[filter_data->nr].child_prov_omit = 0;
+
+ /*
+ * A directory with this tree OID may appear in multiple
+ * places in the tree. (Think of a directory move, with
+ * no other changes.) And with a different pathname, the
+ * is_excluded...() results for this directory and items
+ * contained within it may be different. So we cannot
+ * mark it SEEN (yet), since that will prevent process_tree()
+ * from revisiting this tree object with other pathnames.
+ *
+ * Only SHOW the tree object the first time we visit this
+ * tree object.
+ *
+ * We always show all tree objects. A future optimization
+ * may want to attempt to narrow this.
+ */
+ if (obj->flags & FILTER_REVISIT)
+ return LOFR_ZERO;
+ obj->flags |= FILTER_REVISIT;
+ return LOFR_SHOW;
+
+ case LOFT_END_TREE:
+ assert(obj->type == OBJ_TREE);
+ assert(filter_data->nr > 0);
+
+ frame = &filter_data->array_frame[filter_data->nr];
+ filter_data->nr--;
+
+ /*
+ * Tell our parent directory if any of our children were
+ * provisionally omitted.
+ */
+ filter_data->array_frame[filter_data->nr].child_prov_omit |=
+ frame->child_prov_omit;
+
+ /*
+ * If there are NO provisionally omitted child objects (ALL child
+ * objects in this folder were INCLUDED), then we can mark the
+ * folder as SEEN (so we will not have to revisit it again).
+ */
+ if (!frame->child_prov_omit)
+ return LOFR_MARK_SEEN;
+ return LOFR_ZERO;
+
+ case LOFT_BLOB:
+ assert(obj->type == OBJ_BLOB);
+ assert((obj->flags & SEEN) == 0);
+
+ frame = &filter_data->array_frame[filter_data->nr];
+
+ /*
+ * If we previously provisionally omitted this blob because
+ * its pathname was not in the sparse-checkout AND this
+ * reference to the blob has the same pathname, we can avoid
+ * repeating the exclusion logic on this pathname and just
+ * continue to provisionally omit it.
+ */
+ if (obj->flags & FILTER_REVISIT) {
+ struct oidset2_entry *entry_prev;
+ entry_prev = oidset2_get(&filter_data->omits, &obj->oid);
+ if (entry_prev && !strcmp(pathname, entry_prev->pathname)) {
+ frame->child_prov_omit = 1;
+ return LOFR_ZERO;
+ }
+ }
+
+ dtype = DT_REG;
+ val = is_excluded_from_list(pathname, strlen(pathname),
+ filename, &dtype, &filter_data->el);
+ if (val < 0)
+ val = frame->defval;
+ if (val > 0)
+ return LOFR_MARK_SEEN | LOFR_SHOW;
+
+ t = sha1_object_info(obj->oid.hash, &s);
+ assert(t == OBJ_BLOB);
+ object_length = (int64_t)((uint64_t)(s));
+
+ /*
+ * Provisionally omit it. We've already established that
+ * this pathname is not in the sparse-checkout specification,
+ * so we WANT to omit this blob. However, a pathname elsewhere
+ * in the tree may also reference this same blob, so we cannot
+ * reject it yet. Leave the LOFR_ bits unset so that if the
+ * blob appears again in the traversal, we will be asked again.
+ *
+ * The pathname we associate with this omit is just the first
+ * one we saw for this blob. Other instances of this blob may
+ * have other pathnames and that is fine. We just use it for
+ * perf because most of the time, the blob will be in the same
+ * place as we walk the commits.
+ */
+ oidset2_insert(&filter_data->omits, &obj->oid, object_length,
+ pathname);
+ obj->flags |= FILTER_REVISIT;
+ frame->child_prov_omit = 1;
+ return LOFR_ZERO;
+ }
+}
+
+void traverse_commit_list_use_sparse(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data,
+ struct object_id *oid)
+{
+ struct filter_use_sparse_data d;
+
+ memset(&d, 0, sizeof(d));
+ if (add_excludes_from_blob_to_list(oid, NULL, 0, &d.el) < 0)
+ die("filter_use_sparse could not load specification");
+ ALLOC_GROW(d.array_frame, d.nr + 1, d.alloc);
+ d.array_frame[d.nr].defval = 0; /* default to include */
+ d.array_frame[d.nr].child_prov_omit = 0;
+
+ traverse_commit_list_filtered(revs, show_commit, show_object, ctx_data,
+ filter_use_sparse, &d);
+
+ if (print_omitted_object)
+ oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+ oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filters.h b/list-objects-filters.h
index 32b2833..52e507b 100644
--- a/list-objects-filters.h
+++ b/list-objects-filters.h
@@ -26,4 +26,20 @@ void traverse_commit_list_omit_large_blobs(
void *ctx_data,
int64_t large_byte_limit);
+/*
+ * A filter driven by a sparse-checkout specification to only
+ * include blobs that a sparse checkout would populate.
+ *
+ * The sparse-checkout spec is loaded from the blob with the
+ * given OID (rather than .git/info/sparse-checkout) because
+ * the repo may be bare.
+ */
+void traverse_commit_list_use_sparse(
+ struct rev_info *revs,
+ show_commit_fn show_commit,
+ show_object_fn show_object,
+ oidset2_foreach_cb print_omitted_object,
+ void *ctx_data,
+ struct object_id *oid);
+
#endif /* LIST_OBJECTS_FILTERS_H */
--
2.9.3
next prev parent reply other threads:[~2017-07-13 17:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 17:34 [PATCH v2 00/19] WIP object filtering for partial clone Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` Jeff Hostetler [this message]
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170713173459.3559-7-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=ethomson@edwardthomson.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.