git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Eric Sunshine <sunshine@sunshineco.com>,
	Erik Faye-Lund <kusmabite@gmail.com>
Subject: Re: [PATCH v2 1/6] commit: provide a function to find a header in a buffer
Date: Wed, 27 Aug 2014 14:16:06 -0400	[thread overview]
Message-ID: <20140827181606.GA6590@peff.net> (raw)
In-Reply-To: <20140827180016.GA6269@peff.net>

On Wed, Aug 27, 2014 at 02:00:16PM -0400, Jeff King wrote:

> That may be something some callers want, but they should build it
> separately around this find_commit_header, so that callers that want a
> single line (like "encoding" or "author") do not have to pay the price.
> I didn't bother building it out here since there are no callers which
> want it yet (though I did not look at the mergetag code, which could
> possibly be converted).

I just peeked at the mergetag code. It is all built around
read_commit_extra_headers, which is a different approach (it is "copy
out non-standard things", not "find this one thing I am looking for").

The former is more efficient if we are looking for lots of things, since
we'd only have to parse once. But we don't use it that way (we parse the
whole thing and then see if we have any "mergetag" headers).

The most efficient and convenient thing IMHO would be a progressive
parser that keeps a partially-parsed state and advances the parser
on-demand. So if I ask it for header "foo", it would start at the
beginning and parse until it finds "foo", marking the location of
anything along the way. If I then ask for "bar", it would keep going
from the end of "bar", and so forth.

I do not know if that is even worth the effort, though. I do not think
commit-parsing is a major hotspot for most operations (it might be for a
traversal, but we already use a minimalistic parser there that only
grabs the items that "struct commit" cares about). And we already
zlib-inflate the whole commit object in the first place, so it's not
like we haven't touched all these bytes anyway[1].

-Peff

[1] A long time ago I experimented with having parse_commit do a partial
    inflate, just up to the empty-line delimiter. I don't have the
    numbers handy, but I recall that it did not make a measurable
    improvement in rev-list speeds.

  reply	other threads:[~2014-08-27 18:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-27  7:55 [PATCH v2 0/6] clean up author parsing Jeff King
2014-08-27  7:56 ` [PATCH v2 1/6] commit: provide a function to find a header in a buffer Jeff King
2014-08-27 17:30   ` Junio C Hamano
2014-08-27 18:00     ` Jeff King
2014-08-27 18:16       ` Jeff King [this message]
2014-08-27 19:05       ` Junio C Hamano
2014-08-27 19:14         ` Jeff King
2014-08-27 19:26           ` Junio C Hamano
2014-08-27 19:38             ` Jeff King
2014-08-27 19:41               ` Junio C Hamano
2014-08-27  7:56 ` [PATCH v2 2/6] record_author_info: fix memory leak on malformed commit Jeff King
2014-08-27  7:56 ` [PATCH v2 3/6] record_author_info: use find_commit_header Jeff King
2014-08-27  7:57 ` [PATCH v2 4/6] use strbufs in date functions Jeff King
2014-08-27  7:57 ` [PATCH v2 5/6] determine_author_info: reuse parsing functions Jeff King
2014-08-27  7:57 ` [PATCH v2 6/6] determine_author_info: copy getenv output Jeff King
2014-08-27  9:06 ` [PATCH v2 0/6] clean up author parsing Christian Couder
2014-08-27 14:18   ` Jeff King
2014-08-27 17:36 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140827181606.GA6590@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=kusmabite@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).