From: Ian Campbell <ijc@hellion.org.uk>
To: gitster@pobox.com
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch
Date: Sun, 17 Sep 2017 10:43:23 +0100 [thread overview]
Message-ID: <1505641403.22447.6.camel@hellion.org.uk> (raw)
In-Reply-To: <20170917073657.31193-4-ijc@hellion.org.uk>
On Sun, 2017-09-17 at 08:36 +0100, Ian Campbell wrote:
> +if test -n "$state_branch"
> +then
> > + echo "Saving rewrite state to $state_branch" 1>&2
> > + state_blob=$(
> > + perl -e'opendir D, "../map" or die;
> > + open H, "|-", "git hash-object -w --stdin" or die;
> > + foreach (sort readdir(D)) {
> > + next if m/^\.\.?$/;
> > + open F, "<../map/$_" or die;
> > + chomp($f = <F>);
> > + print H "$_:$f\n" or die;
> > + }
> > + close(H) or die;' || die "Unable to save state")
One things I've noticed is that for a full Linux tree history the
filter.map file is 50M+ which causes github to complain:
remote: warning: File filter.map is 54.40 MB; this is larger than GitHub's recommended maximum file size of 50.00 MB
(you can simulate this with `git log --pretty=format:"%H:%H"
upstream/master`.) I suppose that's not a bad recommendation for any
infra, not just GH's.
The blob is compressed in the object store so there isn't _much_ point
in compressing the map (also, it only goes down to ~30MB anyway so we
aren't buying all that much time), but I'm wondering if perhaps I
should look into a more intelligent representation, perhaps hashed by
the first two characters (as .git/objects is) to divide into several
blobs and have two levels.
I'm also wondering if the .git-rewrite/map directory, which will have
70k+ (and growing) directory entries for a modern Linux tree, would
benefit from the same sort of thing. OTOH in this case the extra shell
machinations to turn abcdef123 into ab/cdef123 might overwhelm the
savings in directory lookup time (unless there is a helper already for
that. That assume that directory lookup is even a bottleneck, I've not
measured but anecdotally/gut-feeling the commits-per-second does seem
to be decreasing over the course of the filtering process.
Ian.
prev parent reply other threads:[~2017-09-17 9:43 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-17 7:36 [PATCH v2 0/4] filter-branch: support for incremental update + fix for ancient tag format Ian Campbell
2017-09-17 7:36 ` [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted Ian Campbell
2017-09-19 3:01 ` Junio C Hamano
2017-09-19 6:42 ` Ian Campbell
2017-09-17 7:36 ` [PATCH v2 2/4] filter-branch: reset $GIT_* before cleaning up Ian Campbell
2017-09-17 7:36 ` [PATCH v2 3/4] filter-branch: preserve and restore $GIT_AUTHOR_* and $GIT_COMMITTER_* Ian Campbell
2017-09-17 7:36 ` [PATCH v2 4/4] Subject: filter-branch: stash away ref map in a branch Ian Campbell
2017-09-17 9:43 ` Ian Campbell [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1505641403.22447.6.camel@hellion.org.uk \
--to=ijc@hellion.org.uk \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).