Re: Rebase max-pack-size?

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Dana How" <danahow@gmail.com>
To: "Junio C Hamano" <junkio@cox.net>
Cc: "Git Mailing List" <git@vger.kernel.org>, danahow@gmail.com
Subject: Re: Rebase max-pack-size?
Date: Tue, 22 May 2007 23:33:27 -0700	[thread overview]
Message-ID: <56b7f5510705222333l6e285e67l1dc327322a2ab250@mail.gmail.com> (raw)
In-Reply-To: <7virak5cr5.fsf@assigned-by-dhcp.cox.net>

On 5/22/07, Junio C Hamano <junkio@cox.net> wrote:
> Dana How <danahow@gmail.com> writes:
> ----------------------------------------------------------------
> > Subject: [PATCH] Alter sha1close() 3rd argument to request flush only
> >
> > update=0 suppressed writing the final SHA-1 but was not used.
> > Now final=0 suppresses SHA-1 finalization, SHA-1 writing,
> > and closing -- in other words,  only flush the buffer.
>
> What it does is understandable but it somehow feels funny that
> "sha1close(file, hashresult, 0)" does _not_ close it (and does
> not hash either for obvious reasons ;-).  I would say we should
> let it pass this round, but might want to separate the first
> part out into a separate "update hash and flush" function if we
> get more callers.
OK,  see the list at the end.

> ----------------------------------------------------------------
> > Subject: [PATCH 1/4] git-repack --max-pack-size: new file statics and code restructuring
> >
> > Add "pack_size_limit", the limit specified by --max-pack-size,
> > "written_list", the list of objects written to the current pack,
> > and "nr_written", the number of objects in written_list.
> > Put "base_name" at file scope again and add forward declarations.
> > Move write_index_file() call from cnd_pack_objects() to
> > write_pack_file() since only the latter will know how
> > many times to call write_index_file().
>
> I would have split this part a bit differently.
>
> This is mostly about restructuring the code so that
> write_index_file() is called from write_pack_file(), which by
> itself is a very good change (but then we might have been better
> off passing basename as a parameter).
That would have worked too.  I made base_name file scope
b/c that's how it was before NP's immediatley previous patches.

> You are not using written_list nor limit yet but are introducing
> them in this step, which feels not quite right.  I usually compile stuff
> with -Werror, and if I ever have to bisect the series, this would bomb
> out for these unused variables.  Not nice.
Very good point -- I did not think of that.
I will make sure in the future
each patch in a patchset has no warnings.

> ----------------------------------------------------------------
> > Subject: [PATCH 2/4] git-repack --max-pack-size: write_{object,one}() respect pack limit
> >
> > With --max-pack-size,  generate the appropriate write limit
> > for each object and check against it before each group of writes.
> > Update delta usability rules to handle base being in a previously-
> > written pack.  Inline sha1write_compress() so we know the
> > exact size of the written data when it needs to be compressed.
> > Detect and return write "failure".
>
> Again, this is split somewhat wrongly, as you do not have a way
> to set max size from the caller, but more importantly, even
> though write_one and write_object knows to obey the limit
> (perhaps somebody bisecting this series later may set the limit
> under the debugger, to work around the lack of option parser),
> write_pack_file() does not notice zero return from write_one();
> I would have added a check there to do:
>
>         die("sorry, limit reached and we do not have code to split the pack yet.")
>
> which obviously can be updated in the next patch.
This makes sense to me.  In a previous version of the patchset,
I had some temporary code like you say (just some extra
arguments, not any checking or die() calls) which was
immediately changed/replaced in the next patch.
Shawn or Nicolas didn't like that,
so I migrated to the split I used here.

I didn't really intend bisecting this patchset to do
anything useful *with respect to max-pack-size functionality*,
but bisecting it would be useful to detect if these patches
had broken something else.  But I can rethink that in
the future.

> ----------------------------------------------------------------
> > Subject: [PATCH 3/4] git-repack --max-pack-size: split packs as asked by write_{object,one}()
> >
> > Rewrite write_pack_file() to break to a new packfile
> > whenever write_object/write_one request it,  and
> > correct the header's object count in the previous packfile.
> > Change write_index_file() to write an index
> > for just the objects in the most recent packfile.
> >
> > diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> > --- a/builtin-pack-objects.c
> > +++ b/builtin-pack-objects.c
> > @@ -652,7 +663,26 @@ static void write_pack_file(void)
> > +     } while (nr_remaining && i < nr_objects);
> > +
> > +     for (j = 0; i < nr_objects; i++) {
> > +             struct object_entry *e = objects + i;
> > +             j += !e->offset && !e->preferred_base;
> > +     }
> > +     if (j)
> > +             die("wrote %u objects as expected but %u unwritten", written, j);
> I am a bit confused by this loop.  Don't you have to start with i=0
> for this check to be meaningful?
The previous do-while loop whose last line is shown
could end with i < nr_objects due to nr_remaining becoming
0.  This means the objects [i ... nr_objects) have not been
inspected.  They should all be either already written
or non-writable,  which is what the two terms in the && expression
are testing.  See list at end.

> ----------------------------------------------------------------
> > Subject: [PATCH 4/4] git-repack --max-pack-size: add option parsing to enable feature
> >
> > Add --max-pack-size parsing and usage messages.
> > Upgrade git-repack.sh to handle multiple packfile names,
> > and build packfiles in GIT_OBJECT_DIRECTORY not GIT_DIR.
> > Update documentation.
> >
> > diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> > --- a/builtin-pack-objects.c
> > +++ b/builtin-pack-objects.c
> > @@ -1713,6 +1713,13 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
> > +             if (!prefixcmp(arg, "--max-pack-size=")) {
> > +                     char *end;
> > +                     pack_size_limit = strtoul(arg+16, &end, 0) * 1024 * 1024;
> > +                     if (!arg[16] || *end)
> > +                             usage(pack_usage);
> > +                     continue;
> > +             }
>
> Hmmm.  I was almost going to suggest to have this spelled in
> bytes, with suffixes like k/m/g.  However, because wanting to
> limit a pack under 1.4MB does not make much sense these days,
> and because having to spell "up to 2GB" as 2047 is not too much
> trouble, I think this is Ok.
I thought about suffixes too but I chose this way
b/c that's how git fast-import does it.

> Shouldn't we have a safety to error out when --stdout and
> --max-pack-size are both given?  Currently it silently ignores
> the limit, doesn't it?
Yes & yes.  Previously there was disagreement on this point.
One preference was to disallow the combination,
another was that it was useful and should be supported.
So I left the code in a state where a small follow-on patch
could make the decision.  The current behavior is not
to disallow the combination,  and it is well-defined
(you get a sequence of packs on stdout,  concatenated,
whose headers indicate how many objects there are
in the current pack and all following).

I do not think --stdout && --max-pack-size is currently
useful,  and a follow-on patch should complain.
See list at end.

> > diff --git a/git-repack.sh b/git-repack.sh
> > --- a/git-repack.sh
> > +++ b/git-repack.sh
> > @@ -35,7 +36,7 @@ true)
> > -PACKTMP="$GIT_DIR/.tmp-$$-pack"
> > +PACKTMP="$GIT_OBJECT_DIRECTORY/.tmp-$$-pack"
>
> Although this is a good change, this hunk does not belong to
> this.
OK.
If you decide not to keep this change here for historical clarity,
I can add it back later.

> Overall everything looks good, except some minor details noted
> above.  Separation of the commits into logical steps does not
> need to be fixed up (they are already in 'next'), but follow-up
> patches might be needed.
OK,  I will not edit and re-submit these patches.
But I will submit two follow-on patches:

Patch 1:
* Pull "sha1flush()" or similar out of sha1close() inside csum-file.c.
  This will require some edits to callers.

Patch 2:
* Add a comment to "confusing loop" to explain what it's checking.
* Complain about --stdout && --max-pack-size combination.

This will have to happen tomorrow.

> And I have to agree with Linus; responding this way was more
> cumbersome than it should have been.
Understood.

Thanks,
-- 
Dana L. How  danahow@gmail.com  +1 650 804 5991 cell

next prev parent reply	other threads:[~2007-05-23  6:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-14 15:47 Rebase max-pack-size? Dana How
2007-05-23  5:29 ` Junio C Hamano
2007-05-23  6:33   ` Dana How [this message]
2007-05-23  7:23     ` Junio C Hamano
     [not found] <56b7f5510705121325h65c62147h2b633fdddaece0be@mail.gmail.com>
2007-05-12 22:57 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56b7f5510705222333l6e285e67l1dc327322a2ab250@mail.gmail.com \
    --to=danahow@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).