All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff Hostetler <git@jeffhostetler.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, ethomson@edwardthomson.com,
	jonathantanmy@google.com, jrnieder@gmail.com,
	jeffhost@microsoft.com
Subject: [PATCH v2 00/19] WIP object filtering for partial clone
Date: Thu, 13 Jul 2017 17:34:40 +0000	[thread overview]
Message-ID: <20170713173459.3559-1-git@jeffhostetler.com> (raw)

From: Jeff Hostetler <jeffhost@microsoft.com>

This WIP is a follow up to my earlier patch series to teach
pack-objects to omit large blobs from packfiles. [1]

Like the previous version, this version builds upon a suggestion from
Peff [2] to use the traverse_commit_list() machinery to allow custom
object filtering using a filter callback.  This hides the filtering
logic in list-objects.c and list-objects-filters.c and minimizes the
changes to actual commands, such as pack-objects.

This version adds that same filtering capability to rev-list allowing
filtering to be demonstrated without building a packfile.  Filtered
blobs are printed with a leading "~" (along with their sizes).

    $ ./git rev-list --objects HEAD~1..HEAD
    74f806c70507317b8bdbcf3b08459c7c83906bee
    818617707aac81ae4620239182b514f65638e37e 
    d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
    306c16551e548ace12c709a332bfea22adcc395f builtin/fetch.c

    $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest HEAD~1..HEAD
    74f806c70507317b8bdbcf3b08459c7c83906bee
    818617707aac81ae4620239182b514f65638e37e 
    d21329bffeb9801682d8d6d6acedc2958d17f4e0 builtin
    ~306c16551e548ace12c709a332bfea22adcc395f 40732

    $ ./git rev-list --objects --filter-omit-all-blobs --filter-print-manifest --quiet HEAD~1..HEAD
    ~306c16551e548ace12c709a332bfea22adcc395f 40732

This version contains 3 filters:
1. filter-omit-all-blobs to exclude all blobs (trees and commits only).

2. filter-omit-large-blobs=<n>[kmg] to exclude blobs larger than <n>
   (but always including ".git*" special files).

3. filter-use-sparse=<blob-ish> to exclude blobs not needed by the
   corresponding sparse-checkout.

Sparse-checkout filtering is currently limited to filtering unneeded blobs.
A later enhancement should be able to also filter unneeded tree objects.

This version updates clone, fetch, fetch-pack, and upload-pack commands
to pass the additional object-filter parameters.

As a (possibly) temporary measure, some commands have been updated to
relax missing blob errors during consistency checks.  Maintining info
on missing blobs is currently being discussed in [3].

TODO
1. Incorporate with a patch series like [4] to dynamically fetch a
   missing blob from the server in read_object on demand.
2. Resolve missing blob consistency check issue.
3. Store filter options from clone in config or .git/info and default
   to them in subsequent fetches.
4. fsck, gc, and assorted commands.
5. testing.


[1] https://public-inbox.org/git/20170622203615.34135-1-git@jeffhostetler.com/
[2] https://public-inbox.org/git/20170309073117.g3br5btsfwntcdpe@sigill.intra.peff.net/
[3] https://public-inbox.org/git/cover.1499800530.git.jonathantanmy@google.com/
[4] https://public-inbox.org/git/20170505152802.6724-1-benpeart@microsoft.com/


Jeff Hostetler (19):
  dir: refactor add_excludes()
  oidset2: create oidset subclass with object length and pathname
  list-objects: filter objects in traverse_commit_list
  list-objects-filters: add omit-all-blobs filter
  list-objects-filters: add omit-large-blobs filter
  list-objects-filters: add use-sparse-checkout filter
  object-filter: common declarations for object filtering
  rev-list: add object filtering support
  rev-list: add filtering help text
  t6112: rev-list object filtering test
  pack-objects: add object filtering support
  pack-objects: add filtering help text
  upload-pack: add filter-objects to protocol documentation
  upload-pack: add object filtering
  fetch-pack: add object filtering support
  connected: add filter_allow_omitted option to API
  clone: add filter arguments
  index-pack: relax consistency checks for omitted objects
  fetch: add object filtering to fetch

 Documentation/git-pack-objects.txt                |  14 +
 Documentation/git-rev-list.txt                    |   7 +-
 Documentation/rev-list-options.txt                |  26 ++
 Documentation/technical/pack-protocol.txt         |  16 +
 Documentation/technical/protocol-capabilities.txt |   7 +
 Makefile                                          |   3 +
 builtin/clone.c                                   |  28 ++
 builtin/fetch-pack.c                              |   3 +
 builtin/fetch.c                                   |  27 +-
 builtin/index-pack.c                              |  15 +
 builtin/pack-objects.c                            |  33 +-
 builtin/rev-list.c                                |  58 +++-
 connected.c                                       |   3 +
 connected.h                                       |   6 +
 dir.c                                             |  53 +++-
 dir.h                                             |   4 +
 fetch-pack.c                                      |  28 ++
 fetch-pack.h                                      |   2 +
 list-objects-filters.c                            | 361 ++++++++++++++++++++++
 list-objects-filters.h                            |  45 +++
 list-objects.c                                    |  66 +++-
 list-objects.h                                    |  30 ++
 object-filter.c                                   | 201 ++++++++++++
 object-filter.h                                   | 145 +++++++++
 oidset2.c                                         | 101 ++++++
 oidset2.h                                         |  56 ++++
 t/t6112-rev-list-filters-objects.sh               |  37 +++
 transport.c                                       |  27 ++
 transport.h                                       |   8 +
 upload-pack.c                                     |  39 ++-
 30 files changed, 1425 insertions(+), 24 deletions(-)
 create mode 100644 list-objects-filters.c
 create mode 100644 list-objects-filters.h
 create mode 100644 object-filter.c
 create mode 100644 object-filter.h
 create mode 100644 oidset2.c
 create mode 100644 oidset2.h
 create mode 100644 t/t6112-rev-list-filters-objects.sh

-- 
2.9.3


             reply	other threads:[~2017-07-13 17:35 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-13 17:34 Jeff Hostetler [this message]
2017-07-13 17:34 ` [PATCH v2 01/19] dir: refactor add_excludes() Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 02/19] oidset2: create oidset subclass with object length and pathname Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 03/19] list-objects: filter objects in traverse_commit_list Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 04/19] list-objects-filters: add omit-all-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 05/19] list-objects-filters: add omit-large-blobs filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 06/19] list-objects-filters: add use-sparse-checkout filter Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 07/19] object-filter: common declarations for object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 08/19] rev-list: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 09/19] rev-list: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 10/19] t6112: rev-list object filtering test Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 11/19] pack-objects: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 12/19] pack-objects: add filtering help text Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 13/19] upload-pack: add filter-objects to protocol documentation Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 14/19] upload-pack: add object filtering Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 15/19] fetch-pack: add object filtering support Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 16/19] connected: add filter_allow_omitted option to API Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 17/19] clone: add filter arguments Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 18/19] index-pack: relax consistency checks for omitted objects Jeff Hostetler
2017-07-13 17:34 ` [PATCH v2 19/19] fetch: add object filtering to fetch Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170713173459.3559-1-git@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=ethomson@edwardthomson.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.