git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: David Kastrup <dak@gnu.org>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] blame.c: don't drop origin blobs as eagerly
Date: Fri, 27 May 2016 17:00:25 +0200 (CEST)	[thread overview]
Message-ID: <alpine.DEB.2.20.1605271633230.4449@virtualbox> (raw)
In-Reply-To: <1464356141-3797-1-git-send-email-dak@gnu.org>

Hi David,

it is good practice to Cc: the original author of the code in question, in
this case Junio. I guess he sees this anyway, but that is really just an
assumption.

On Fri, 27 May 2016, David Kastrup wrote:

> When a parent blob already has chunks queued up for blaming, dropping
> the blob at the end of one blame step will cause it to get reloaded
> right away, doubling the amount of I/O and unpacking when processing a
> linear history.

It is obvious from your commit message that you have studied the code
quite deeply. To make it easier for the reader (which might be your future
self), it is advisable to give at least a little bit of introduction, e.g.
what the "parent blob" is.

I would *guess* that it is the blob corresponding to the same path in the
parent of the current revision, but that should be spelled out explicitly.

> Keeping such parent blobs in memory seems like a reasonable
> optimization.  It's conceivable that this may incur additional memory

This sentence would be easier to read if "It's conceivable that" was
simply deleted.

> pressure particularly when the history contains lots of merges from
> long-diverged branches.  In practice, this optimization appears to
> behave quite benignly,

Why not just stop here? I say that because...

> and a viable strategy for limiting the total amount of cached blobs in a
> useful manner seems rather hard to implement.

... this sounds awfully handwaving. Since we already have reference
counting, it sounds fishy to claim that simply piggybacking a global
counter on top of it would be "rather hard".

> In addition, calling git-blame with -C leads to similar memory retention
> patterns.

This is a red herring. Just delete it. I, for one, being a heavy user of
`git blame`, could count the number of times I used blame's -C option
without any remaining hands. Zero times.

Besides, -C is *supposed* to look harder. By that argument, you could read
all blobs in rev-list even when the user did not specify --objects
"because --objects leads to similar memory retention patterns". So: let's
just forget about that statement.

The commit message is missing your sign-off.

Also: is there an easy way to reproduce your claims of better I/O
characteristics? Something like a command-line, ideally with a file in
git.git's own history, that demonstrates the I/O before and after the
patch, would be an excellent addition to the commit message.

Further: I would have at least expected some rudimentary discussion why
this patch -- which seems to at least partially contradict 7c3c796 (blame:
drop blob data after passing blame to the parent, 2007-12-11) -- is not
regressing on the intent of said commit.

> diff --git a/builtin/blame.c b/builtin/blame.c
> index 21f42b0..2596fbc 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -1556,7 +1556,8 @@ finish:
>  	}
>  	for (i = 0; i < num_sg; i++) {
>  		if (sg_origin[i]) {
> -			drop_origin_blob(sg_origin[i]);
> +			if (!sg_origin[i]->suspects)
> +				drop_origin_blob(sg_origin[i]);
>  			origin_decref(sg_origin[i]);
>  		}

It would be good to mention in the commit message that this patch does not
change anything for blobs with only one remaining reference (the current
one) because origin_decref() would do the same job as drop_origin_blob
when decrementing the reference counter to 0.

In fact, I suspect that simply removing the drop_origin_blob() call might
result in the exact same I/O pattern.

Ciao,
Johannes

  reply	other threads:[~2016-05-27 15:00 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-27 13:35 [PATCH] blame.c: don't drop origin blobs as eagerly David Kastrup
2016-05-27 15:00 ` Johannes Schindelin [this message]
2016-05-27 15:41   ` David Kastrup
2016-05-28  6:37     ` Johannes Schindelin
2016-05-28  8:29       ` David Kastrup
2016-05-28 12:34         ` Johannes Schindelin
2016-05-28 14:00           ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2019-04-02 11:56 David Kastrup
2019-04-03  7:45 ` Junio C Hamano
2019-04-03  9:32   ` Duy Nguyen
2019-04-03 11:36     ` Jeff King
2019-04-03 12:06       ` Duy Nguyen
2019-04-03 12:19         ` Jeff King
2019-04-03 12:32           ` David Kastrup
2019-04-03 11:08   ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1605271633230.4449@virtualbox \
    --to=johannes.schindelin@gmx.de \
    --cc=dak@gnu.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).