All of lore.kernel.org
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Josh Steadmon <steadmon@google.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] clone, submodule: pass partial clone filters to submodules
Date: Fri, 21 Jan 2022 17:49:01 -0800	[thread overview]
Message-ID: <xmqqsftgbkvm.fsf@gitster.g> (raw)
In-Reply-To: <50ebf7bd39adf34fa4ada27cd433d81b5381abe5.1642735881.git.steadmon@google.com> (Josh Steadmon's message of "Thu, 20 Jan 2022 19:32:48 -0800")

Josh Steadmon <steadmon@google.com> writes:

> When cloning a repo with a --filter and with --recurse-submodules
> enabled, the partial clone filter only applies to
> the top-level repo. This can lead to unexpected bandwidth and disk
> usage for projects which include large submodules. For example, a user
> might wish to make a partial clone of Gerrit and would run:
> `git clone --recurse-submodules --filter=blob:5k
> https://gerrit.googlesource.com/gerrit`. However, only the superproject
> would be a partial clone; all the submodules would have all blobs
> downloaded regardless of their size. With this change, the same filter
> applies to submodules, meaning the expected bandwidth and disk savings
> apply consistently.
>
> Plumb the --filter argument from git-clone through git-submodule and
> git-submodule--helper, such that submodule clones also have the filter
> applied.
>
> This applies the same filter to the superproject and all submodules.
> Users who prefer the current behavior (i.e., a filter only on the
> superproject) would need to clone with `--no-recurse-submodules` and
> then manually initialize each submodule.

Two concerns (I do not say "issues", because I honestly do not know
how much this will hurt in the future).

 - Obviously, this changes the end user experience.  To users in the
   scenario that motivated this change (described above), obviously
   it is a change in a good way, and but I wonder if there are
   workflows that are hurt and actually have to resort to the
   workaround to preserve the current behaviour.

 - Passing the filter down to submodules means that the filter
   settings are universal across projects.  The current set of
   filters, I do not think such an assumption is too bad.  If 5k
   blob is too large for the top-level superproject, it is OK for
   the superproject to dictate that 5k blob is too large for any of
   the submodules the superproject uses.  But can we forever limit
   the filter vocabulary to the ones that can sensibly be applied
   recursively?  If we had a filter that goes with pathnames
   (e.g. "I only want src/ and test/ directories and nothing else
   initially"), such a set of pathnames appropriate in the context
   of the superproject is unlikely to apply to its submodules.  Even
   the existing "depth" filter is iffy, if a toplevel superproject
   is fairly flat and one of the submodules has a directory
   hierarchy that is ultra deep.

Will queue and wait for others to comment.

Thanks.

  reply	other threads:[~2022-01-22  1:49 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-21  3:32 [PATCH] clone, submodule: pass partial clone filters to submodules Josh Steadmon
2022-01-22  1:49 ` Junio C Hamano [this message]
2022-01-25 21:00   ` Elijah Newren
2022-01-26  6:03     ` Junio C Hamano
2022-02-01 21:33       ` Josh Steadmon
2022-01-25 21:08 ` Elijah Newren
2022-02-01 21:34   ` Josh Steadmon
2022-02-05  0:40 ` [PATCH v2] " Josh Steadmon
2022-02-05  0:54   ` Josh Steadmon
2022-02-05  1:00     ` Josh Steadmon
2022-02-05  5:00 ` [PATCH v3] " Josh Steadmon
2022-02-09 22:44   ` Jonathan Tan
2022-02-09 23:37     ` Junio C Hamano
2022-02-19 17:30   ` [PATCH] t5617,t7814: remove unnecessary 'uploadpack.allowanysha1inwant' Philippe Blain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqsftgbkvm.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.