git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: "Jonathon Mah" <jmah@me.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Duy Nguyen" <pclouds@gmail.com>,
	"Stefan Näwe" <stefan.naewe@atlas-elektronik.com>,
	Armin <netzverweigerer@gmail.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH 0/3] lazily load commit->buffer
Date: Sat, 26 Jan 2013 13:26:53 -0800	[thread overview]
Message-ID: <7v8v7f1vqa.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20130126094026.GA9646@sigill.intra.peff.net> (Jeff King's message of "Sat, 26 Jan 2013 04:40:28 -0500")

Jeff King <peff@peff.net> writes:

> Yeah, agreed. I started to fix this up with a use/unuse pattern and
> realized something: all of the call sites are calling logmsg_reencode
> anyway, because that is the next logical step in doing anything with the
> buffer that is not just parsing out the parent/timestamp/tree info. And
> since that function already might allocate (for the re-encoded copy),
> callers have to handle the maybe-borrowed-maybe-free situation already.
>
> So I came up with this patch series, which I think should fix the
> problem, and actually makes the call-sites easier to read, rather than
> harder.
>
>   [1/3]: commit: drop useless xstrdup of commit message
>   [2/3]: logmsg_reencode: never return NULL
>   [3/3]: logmsg_reencode: lazily load missing commit buffers
>
> Here's the diffstat:
>
>  builtin/blame.c                  | 22 ++-------
>  builtin/commit.c                 | 14 +-----
>  commit.h                         |  1 +
>  pretty.c                         | 93 ++++++++++++++++++++++++++---------
>  t/t4042-diff-textconv-caching.sh |  8 +++
>  5 files changed, 85 insertions(+), 53 deletions(-)
>
> Not too bad, and 27 of the lines added in pretty.c are new comments
> explaining the flow of logmsg_reencode. So even if this doesn't get
> every case, I think it's a nice cleanup.

This looks very good.

I wonder if this lets us get rid of the hack in cmd_log_walk() that
does this:

        while ((commit = get_revision(rev)) != NULL) {
                if (!log_tree_commit(rev, commit) &&
                    rev->max_count >= 0)
                        rev->max_count++;
!               if (!rev->reflog_info) {
!                       /* we allow cycles in reflog ancestry */
                        free(commit->buffer);
                        commit->buffer = NULL;
!               }
                free_commit_list(commit->parents);
                commit->parents = NULL;

After log_tree_commit() handles the commit, using the buffer, we
discard the memory associated to it because we know we no longer
will use it in normal cases.

The "do not do that if rev->reflog_info is true" was added in
a6c7306 (--walk-reflogs: do not crash with cyclic reflog ancestry,
2007-01-20) because the second and subsequent display of "commit"
(which happens to occur only when walking reflogs) needs to look at
commit->buffer again, and this hack forces us to retain the buffer
for _all_ commit objects.

But your patches could be seen as a different (and more correct) way
to fix the same issue.  Once the display side learns how to re-read
the log text of the commit object, the above becomes unnecessary, no?

We may still be helped if majority of commit objects that appear in
the reflog appear more than once, in which case retaining the buffer
for _all_ commits could be an overall win.  Not having to read the
buffer for the same commit each time it is shown for majority of
multiply-appearing commits, in exchange for having to keep the
buffer for commits that appears only once that are minority and
suffering increasted page cache pressure.  That could be seen as an
optimization.

But that is a performance thing, not a correctness issue, so "we
allow cycles" implying "therefore if we discard the buffer, we will
show wrong output" becomes an incorrect justification.

I happen to have HEAD reflog that is 30k entries long; more than 26k
represent a checkout of unique commit.  So I suspect that the above
hack to excessive retain commit->buffer for already used commits will
not help us performance-wise, either.

  parent reply	other threads:[~2013-01-26 21:27 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-23 14:38 segmentation fault (nullpointer) with git log --submodule -p Armin
2013-01-23 20:02 ` Jeff King
2013-01-24 12:11   ` Stefan Näwe
2013-01-24 13:40     ` Duy Nguyen
2013-01-24 14:06       ` Stefan Näwe
2013-01-24 14:14         ` Duy Nguyen
2013-01-24 23:27           ` Jeff King
2013-01-24 23:56             ` Junio C Hamano
2013-01-25  0:55               ` Jeff King
2013-01-25  2:05                 ` Duy Nguyen
2013-01-25  3:59               ` Junio C Hamano
2013-01-25  4:08                 ` Jeff King
2013-01-25  4:21                   ` Junio C Hamano
2013-01-25  5:53                 ` Jonathan Nieder
2013-01-25  7:27                   ` Junio C Hamano
2013-01-25  7:32                   ` Jonathon Mah
2013-01-25 15:36                     ` Junio C Hamano
2013-01-26  9:40                       ` [PATCH 0/3] lazily load commit->buffer Jeff King
2013-01-26  9:42                         ` [PATCH 1/3] commit: drop useless xstrdup of commit message Jeff King
2013-01-26  9:44                         ` [PATCH 2/3] logmsg_reencode: never return NULL Jeff King
2013-01-26  9:44                         ` [PATCH 3/3] logmsg_reencode: lazily load missing commit buffers Jeff King
2013-01-26 21:26                         ` Junio C Hamano [this message]
2013-01-26 22:14                           ` [PATCH 0/3] lazily load commit->buffer Jeff King
2013-01-27  5:32                             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v8v7f1vqa.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jmah@me.com \
    --cc=jrnieder@gmail.com \
    --cc=netzverweigerer@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefan.naewe@atlas-elektronik.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).