From: Jeff Hostetler <git@jeffhostetler.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, ethomson@edwardthomson.com,
jonathantanmy@google.com, jrnieder@gmail.com,
jeffhost@microsoft.com
Subject: [PATCH v2 00/19] WIP object filtering for partial clone
Date: Thu, 13 Jul 2017 17:34:40 +0000 [thread overview]
Message-ID: <20170713173459.3559-1-git@jeffhostetler.com> (raw)
From: Jeff Hostetler <jeffhost@microsoft.com>
This WIP is a follow up to my earlier patch series to teach
pack-objects to omit large blobs from packfiles. [1]
Like the previous version, this version builds upon a suggestion from
Peff [2] to use the traverse_commit_list() machinery to allow custom
object filtering using a filter callback. This hides the filtering
logic in list-objects.c and list-objects-filters.c and minimizes the
changes to actual commands, such as pack-objects.
This version adds that same filtering capability to rev-list allowing
filtering to be demonstrated without building a packfile. Filtered
blobs are printed with a leading "~" (along with their sizes).
$ ./git rev-list --objects HEAD~1..HEAD
74f806c70507317b8bdbcf3b08459c7c83906bee
818617707aac81ae4620239182b514f65638e37e
d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c
$ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD
74f806c70507317b8bdbcf3b08459c7c83906bee
818617707aac81ae4620239182b514f65638e37e
d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
~306c16551e548ace12c709a332bfea22adcc395f 40732
$ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD
~306c16551e548ace12c709a332bfea22adcc395f 40732
This version contains 3 filters:
1. filter-omit-all-blobs to exclude all blobs (trees and commits only).
2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n>
(but always including ".git*" special files).
3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the
corresponding sparse-checkout.
Sparse-checkout filtering is currently limited to filtering unneeded blobs.
A later enhancement should be able to also filter unneeded tree objects.
This version updates clone, fetch, fetch-pack, and upload-pack commands
to pass the additional object-filter parameters.
As a (possibly) temporary measure, some commands have been updated to
relax missing blob errors during consistency checks. Maintining info
on missing blobs is currently being discussed in [3].
TODO
1. Incorporate with a patch series like [4] to dynamically fetch a
missing blob from the server in read_object on demand.
2. Resolve missing blob consistency check issue.
3. Store filter options from clone in config or .git/info and default
to them in subsequent fetches.
4. fsck, gc, and assorted commands.
5. testing.
[1] https://public-inbox.org/git/20170622203615.34135-1-git@jeffhostetler.com/
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
[4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/
Jeff Hostetler (19):
dir: refactor add_excludes()
oidset2: create oidset subclass with object length and pathname
list-objects: filter objects in traverse_commit_list
list-objects-filters: add omit-all-blobs filter
list-objects-filters: add omit-large-blobs filter
list-objects-filters: add use-sparse-checkout filter
object-filter: common declarations for object filtering
rev-list: add object filtering support
rev-list: add filtering help text
t6112: rev-list object filtering test
pack-objects: add object filtering support
pack-objects: add filtering help text
upload-pack: add filter-objects to protocol documentation
upload-pack: add object filtering
fetch-pack: add object filtering support
connected: add filter_allow_omitted option to API
clone: add filter arguments
index-pack: relax consistency checks for omitted objects
fetch: add object filtering to fetch
Documentation/git-pack-objects.txt | 14 +
Documentation/git-rev-list.txt | 7 +-
Documentation/rev-list-options.txt | 26 ++
Documentation/technical/pack-protocol.txt | 16 +
Documentation/technical/protocol-capabilities.txt | 7 +
Makefile | 3 +
builtin/clone.c | 28 ++
builtin/fetch-pack.c | 3 +
builtin/fetch.c | 27 +-
builtin/index-pack.c | 15 +
builtin/pack-objects.c | 33 +-
builtin/rev-list.c | 58 +++-
connected.c | 3 +
connected.h | 6 +
dir.c | 53 +++-
dir.h | 4 +
fetch-pack.c | 28 ++
fetch-pack.h | 2 +
list-objects-filters.c | 361 ++++++++++++++++++++++
list-objects-filters.h | 45 +++
list-objects.c | 66 +++-
list-objects.h | 30 ++
object-filter.c | 201 ++++++++++++
object-filter.h | 145 +++++++++
oidset2.c | 101 ++++++
oidset2.h | 56 ++++
t/t6112-rev-list-filters-objects.sh | 37 +++
transport.c | 27 ++
transport.h | 8 +
upload-pack.c | 39 ++-
30 files changed, 1425 insertions(+), 24 deletions(-)
create mode 100644 list-objects-filters.c
create mode 100644 list-objects-filters.h
create mode 100644 object-filter.c
create mode 100644 object-filter.h
create mode 100644 oidset2.c
create mode 100644 oidset2.h
create mode 100644 t/t6112-rev-list-filters-objects.sh
--
2.9.3
next reply other threads:[~2017-07-13 17:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-13 17:34 Jeff Hostetler [this message]
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170713173459.3559-1-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=ethomson@edwardthomson.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=jonathantanmy@google.com \
--cc=jrnieder@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.