All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] WIP Partial clone part 1: object filtering
@ 2017-10-24 18:53 Jeff Hostetler
  2017-10-24 18:53 ` [PATCH 01/13] dir: allow exclusions from blob in addition to file Jeff Hostetler
                   ` (14 more replies)
  0 siblings, 15 replies; 38+ messages in thread
From: Jeff Hostetler @ 2017-10-24 18:53 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

I've been working with Jonathan Tan to combine our partial clone
proposals.  This patch series represents a first step in that effort
and introduces an object filtering mechanism to select unwanted
objects.

[1] traverse_commit_list and list-objects is extended to allow
    various filters.
[2] rev-list is extended to expose filtering.  This allows testing
    of the filtering options.  And can be used later to predict
    missing objects before commands like checkout or merge.
[3] pack-objects is extended to use filtering parameters and build
    packfiles that omit unwanted objects.

This patch series lays the ground work for subsequent parts which
will extend clone, fetch, fetch-pack, upload-pack, fsck, and etc.


Jeff Hostetler (13):
  dir: allow exclusions from blob in addition to file
  list-objects-filter-map: extend oidmap to collect omitted objects
  list-objects: filter objects in traverse_commit_list
  list-objects-filter-blobs-none: add filter to omit all blobs
  list-objects-filter-blobs-limit: add large blob filtering
  list-objects-filter-sparse: add sparse filter
  list-objects-filter-options: common argument parsing
  list-objects: add traverse_commit_list_filtered method
  extension.partialclone: introduce partial clone extension
  rev-list: add list-objects filtering support
  t6112: rev-list object filtering test
  pack-objects: add list-objects filtering
  t5317: pack-objects object filtering test

 Documentation/git-pack-objects.txt             |   8 +-
 Documentation/git-rev-list.txt                 |   5 +-
 Documentation/rev-list-options.txt             |  30 ++
 Documentation/technical/repository-version.txt |  22 ++
 Makefile                                       |   6 +
 builtin/pack-objects.c                         |  18 +-
 builtin/rev-list.c                             |  84 +++++-
 cache.h                                        |   4 +
 config.h                                       |   3 +
 dir.c                                          |  51 +++-
 dir.h                                          |   3 +
 environment.c                                  |   2 +
 list-objects-filter-blobs-limit.c              | 146 ++++++++++
 list-objects-filter-blobs-limit.h              |  18 ++
 list-objects-filter-blobs-none.c               |  83 ++++++
 list-objects-filter-blobs-none.h               |  18 ++
 list-objects-filter-map.c                      |  63 ++++
 list-objects-filter-map.h                      |  26 ++
 list-objects-filter-options.c                  | 101 +++++++
 list-objects-filter-options.h                  |  50 ++++
 list-objects-filter-sparse.c                   | 241 ++++++++++++++++
 list-objects-filter-sparse.h                   |  30 ++
 list-objects.c                                 | 111 +++++--
 list-objects.h                                 |  43 ++-
 partial-clone-utils.c                          |  99 +++++++
 partial-clone-utils.h                          |  34 +++
 setup.c                                        |  15 +
 t/t5317-pack-objects-filter-objects.sh         | 384 +++++++++++++++++++++++++
 t/t6112-rev-list-filters-objects.sh            | 223 ++++++++++++++
 29 files changed, 1897 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filter-blobs-limit.c
 create mode 100644 list-objects-filter-blobs-limit.h
 create mode 100644 list-objects-filter-blobs-none.c
 create mode 100644 list-objects-filter-blobs-none.h
 create mode 100644 list-objects-filter-map.c
 create mode 100644 list-objects-filter-map.h
 create mode 100644 list-objects-filter-options.c
 create mode 100644 list-objects-filter-options.h
 create mode 100644 list-objects-filter-sparse.c
 create mode 100644 list-objects-filter-sparse.h
 create mode 100644 partial-clone-utils.c
 create mode 100644 partial-clone-utils.h
 create mode 100755 t/t5317-pack-objects-filter-objects.sh
 create mode 100755 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3


^ permalink raw reply	[flat|nested] 38+ messages in thread
* [PATCH 05/13] list-objects-filter-large: add large blob filter to list-objects
@ 2017-09-22 20:30 Jeff Hostetler
  2017-09-22 20:30 ` [PATCH 08/13] list-objects: add traverse_commit_list_filtered method Jeff Hostetler
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff Hostetler @ 2017-09-22 20:30 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, jonathantanmy, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Create a filter for traverse_commit_list_worker() to omit blobs
larger than a requested size from the result, but always include
".git*" special files.

Signed-off-by: Jeff Hostetler <jeffhost>@microsoft.com>
---
 Makefile                    |   1 +
 list-objects-filter-large.c | 108 ++++++++++++++++++++++++++++++++++++++++++++
 list-objects-filter-large.h |  18 ++++++++
 3 files changed, 127 insertions(+)
 create mode 100644 list-objects-filter-large.c
 create mode 100644 list-objects-filter-large.h

diff --git a/Makefile b/Makefile
index b98e3dc..f1f3979 100644
--- a/Makefile
+++ b/Makefile
@@ -799,6 +799,7 @@ LIB_OBJS += line-log.o
 LIB_OBJS += line-range.o
 LIB_OBJS += list-objects.o
 LIB_OBJS += list-objects-filter-all.o
+LIB_OBJS += list-objects-filter-large.o
 LIB_OBJS += ll-merge.o
 LIB_OBJS += lockfile.o
 LIB_OBJS += log-tree.o
diff --git a/list-objects-filter-large.c b/list-objects-filter-large.c
new file mode 100644
index 0000000..1af39b6
--- /dev/null
+++ b/list-objects-filter-large.c
@@ -0,0 +1,108 @@
+#include "cache.h"
+#include "dir.h"
+#include "tag.h"
+#include "commit.h"
+#include "tree.h"
+#include "blob.h"
+#include "diff.h"
+#include "tree-walk.h"
+#include "revision.h"
+#include "list-objects.h"
+#include "list-objects-filter-large.h"
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+struct filter_omit_large_blobs_data {
+	struct oidset2 omits;
+	int64_t max_bytes;
+};
+
+static list_objects_filter_result filter_omit_large_blobs(
+	list_objects_filter_type filter_type,
+	struct object *obj,
+	const char *pathname,
+	const char *filename,
+	void *filter_data_)
+{
+	struct filter_omit_large_blobs_data *filter_data = filter_data_;
+	int64_t object_length = -1;
+	unsigned long s;
+	enum object_type t;
+
+	switch (filter_type) {
+	default:
+		die("unkown filter_type");
+		return LOFR_ZERO;
+
+	case LOFT_BEGIN_TREE:
+		assert(obj->type == OBJ_TREE);
+		/* always include all tree objects */
+		return LOFR_MARK_SEEN | LOFR_SHOW;
+
+	case LOFT_END_TREE:
+		assert(obj->type == OBJ_TREE);
+		return LOFR_ZERO;
+
+	case LOFT_BLOB:
+		assert(obj->type == OBJ_BLOB);
+		assert((obj->flags & SEEN) == 0);
+
+		/*
+		 * If previously provisionally omitted (because of size), see if the
+		 * current filename is special and force it to be included.
+		 */
+		if (oidset2_contains(&filter_data->omits, &obj->oid)) {
+			if ((strncmp(filename, ".git", 4) == 0) && filename[4]) {
+				oidset2_remove(&filter_data->omits, &obj->oid);
+				return LOFR_MARK_SEEN | LOFR_SHOW;
+			}
+			return LOFR_ZERO; /* continue provisionally omitting it */
+		}
+
+		t = sha1_object_info(obj->oid.hash, &s);
+		assert(t == OBJ_BLOB);
+		object_length = (int64_t)((uint64_t)(s));
+
+		if (object_length < filter_data->max_bytes)
+			return LOFR_MARK_SEEN | LOFR_SHOW;
+
+		/*
+		 * Provisionally omit it.  We've already established that this blob
+		 * is too big and doesn't have a special filename, so we WANT to
+		 * omit it.  However, there may be a special file elsewhere in the
+		 * tree that references this same blob, so we cannot reject it yet.
+		 * Leave the LOFR_ bits unset so that if the blob appears again in
+		 * the traversal, we will be asked again.
+		 *
+		 * No need for a pathname, since we only test for special filenames
+		 * above.
+		 */
+		oidset2_insert(&filter_data->omits, &obj->oid, t, object_length,
+			       NULL);
+		return LOFR_ZERO;
+	}
+}
+
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit)
+{
+	struct filter_omit_large_blobs_data d;
+
+	memset(&d, 0, sizeof(d));
+	d.max_bytes = large_byte_limit;
+
+	traverse_commit_list_worker(revs, show_commit, show_object, ctx_data,
+				    filter_omit_large_blobs, &d);
+
+	if (print_omitted_object)
+		oidset2_foreach(&d.omits, print_omitted_object, ctx_data);
+
+	oidset2_clear(&d.omits);
+}
diff --git a/list-objects-filter-large.h b/list-objects-filter-large.h
new file mode 100644
index 0000000..4a5c772
--- /dev/null
+++ b/list-objects-filter-large.h
@@ -0,0 +1,18 @@
+#ifndef LIST_OBJECTS_FILTER_LARGE_H
+#define LIST_OBJECTS_FILTER_LARGE_H
+
+#include "oidset2.h"
+
+/*
+ * A filter for list-objects to omit large blobs,
+ * but always include ".git*" special files.
+ */
+void traverse_commit_list_omit_large_blobs(
+	struct rev_info *revs,
+	show_commit_fn show_commit,
+	show_object_fn show_object,
+	oidset2_foreach_cb print_omitted_object,
+	void *ctx_data,
+	int64_t large_byte_limit);
+
+#endif /* LIST_OBJECTS_FILTER_LARGE_H */
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-10-30 22:27 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-24 18:53 [PATCH 00/13] WIP Partial clone part 1: object filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 01/13] dir: allow exclusions from blob in addition to file Jeff Hostetler
2017-10-25  4:05   ` Eric Sunshine
2017-10-25  6:43   ` Junio C Hamano
2017-10-25 14:54     ` Jeff Hostetler
2017-10-26  3:47       ` Junio C Hamano
2017-10-26 18:11         ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 02/13] list-objects-filter-map: extend oidmap to collect omitted objects Jeff Hostetler
2017-10-25  7:10   ` Junio C Hamano
2017-10-25 19:22     ` Jeff Hostetler
2017-10-26  4:12       ` Junio C Hamano
2017-10-24 18:53 ` [PATCH 03/13] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-10-25  4:05   ` Jonathan Tan
2017-10-25 19:25     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 04/13] list-objects-filter-blobs-none: add filter to omit all blobs Jeff Hostetler
2017-10-24 18:53 ` [PATCH 05/13] list-objects-filter-blobs-limit: add large blob filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 06/13] list-objects-filter-sparse: add sparse filter Jeff Hostetler
2017-10-24 18:53 ` [PATCH 07/13] list-objects-filter-options: common argument parsing Jeff Hostetler
2017-10-25  4:14   ` Jonathan Tan
2017-10-25 19:28     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 08/13] list-objects: add traverse_commit_list_filtered method Jeff Hostetler
2017-10-25  4:24   ` Jonathan Tan
2017-10-25 19:29     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 09/13] extension.partialclone: introduce partial clone extension Jeff Hostetler
2017-10-24 18:53 ` [PATCH 10/13] rev-list: add list-objects filtering support Jeff Hostetler
2017-10-25  4:41   ` Jonathan Tan
2017-10-25 19:37     ` Jeff Hostetler
2017-10-24 18:53 ` [PATCH 11/13] t6112: rev-list object filtering test Jeff Hostetler
2017-10-24 18:53 ` [PATCH 12/13] pack-objects: add list-objects filtering Jeff Hostetler
2017-10-24 18:53 ` [PATCH 13/13] t5317: pack-objects object filtering test Jeff Hostetler
2017-10-25  4:57 ` [PATCH 00/13] WIP Partial clone part 1: object filtering Jonathan Tan
2017-10-25  5:00 ` Junio C Hamano
2017-10-25  6:46   ` Jonathan Tan
2017-10-25 15:39     ` Jeff Hostetler
2017-10-26  2:09       ` Junio C Hamano
2017-10-26  2:01     ` Junio C Hamano
2017-10-30 22:27     ` Jonathan Nieder
  -- strict thread matches above, loose matches on Subject: below --
2017-09-22 20:30 [PATCH 05/13] list-objects-filter-large: add large blob filter to list-objects Jeff Hostetler
2017-09-22 20:30 ` [PATCH 08/13] list-objects: add traverse_commit_list_filtered method Jeff Hostetler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.