git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	Christian Couder <chriscool@tuxfamily.org>,
	git@vger.kernel.org
Subject: Re: [RFC/PATCH] replace: add --graft option
Date: Mon, 19 May 2014 13:35:05 -0400	[thread overview]
Message-ID: <20140519173505.GA28673@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqvbt11y15.fsf@gitster.dls.corp.google.com>

On Mon, May 19, 2014 at 10:25:10AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > It might make sense to just teach parse_commit_header to be a little
> > more thorough; it has to read past those lines anyway to find the author
> > and committer lines, so it would not be much more expensive to note
> > them.  And then of course the code needs to be pulled out of the
> > pretty-printer and made generally accessible.
> 
> I notice that you said "might" above.

Yeah. I think having a reusable parser is definitely a good thing. I'm
not sure if it's worth the massive amount of refactoring that would be
required for this particular use case.

> > That's more or less what Christian's function is doing, though it
> > assumes things like that the parents come immediately after the tree,
> > and that they are all in a group. Those are all true for objects created
> > by git, but I think we can be a little flexible.
> 
> The headers up to committer are cast in stone in their ordering, and
> I do not immediately see how loosening it would be beneficial.
> 
> Unless you are trying to give users a new way to record exactly the
> same commit in twenty-four (or more) ways with their own object
> names, that is ;-)

Sorry, I didn't mean to imply that people can do what they want with
that ordering. Implementations that reorder the headers are stupid and
wrong, and should be fixed.

BUT. I really don't like making these assumptions or doing ad-hoc
parsing all through the code. We _do_ see quirky, wrong objects, and
want to handle them sanely and consistently. That's hard to do when
there are parsers sprinkled throughout the code which handle each case
slightly differently. I don't recall seeing this with header ordering,
but I know that broken ident lines have caused us headaches in the past,
and I'm happy that we've (mostly) settled on the code in
split_ident_line to handle this.

Things like:

  +       /* find existing parents */
  +       parent_start = buf.buf;
  +       parent_start += 46; /* "tree " + "hex sha1" + "\n" */
  +       parent_end = parent_start;

scare me. Is buf actually 46 bytes long? If not, we read past the end of
the buffer. What if it contains something besides "tree <sha1>\n" at the
beginning? We don't even notice!

The version I posted elsewhere in the thread at least operates on a
line-by-line basis, and tries to verify its assumptions (and barfs if
not). It's still doing its own parsing, but at least it's keeping its
assumptions on the object format to a minimum ("drop old parent lines",
"add new ones after tree, and barf if there's not exactly one tree
line").

-Peff

  reply	other threads:[~2014-05-19 17:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-18 18:29 [RFC/PATCH] replace: add --graft option Christian Couder
2014-05-19  9:42 ` Michael Haggerty
2014-05-19 11:19   ` Jeff King
2014-05-19 17:25     ` Junio C Hamano
2014-05-19 17:35       ` Jeff King [this message]
2014-05-19 18:34         ` Junio C Hamano
2014-05-23  6:39   ` Christian Couder
2014-05-19 11:21 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140519173505.GA28673@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=chriscool@tuxfamily.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).