All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Kastrup <dak@gnu.org>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: git@vger.kernel.org, gitster@pobox.com
Subject: Re: [PATCH] blame.c: don't drop origin blobs as eagerly
Date: Fri, 27 May 2016 17:41:09 +0200	[thread overview]
Message-ID: <87d1o7pkyy.fsf@fencepost.gnu.org> (raw)
In-Reply-To: <alpine.DEB.2.20.1605271633230.4449@virtualbox> (Johannes Schindelin's message of "Fri, 27 May 2016 17:00:25 +0200 (CEST)")

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Fri, 27 May 2016, David Kastrup wrote:
>
>> pressure particularly when the history contains lots of merges from
>> long-diverged branches.  In practice, this optimization appears to
>> behave quite benignly,
>
> Why not just stop here?

Because there is a caveat.

> I say that because...
>
>> and a viable strategy for limiting the total amount of cached blobs in a
>> useful manner seems rather hard to implement.
>
> ... this sounds awfully handwaving.

Because it is.

> Since we already have reference counting, it sounds fishy to claim
> that simply piggybacking a global counter on top of it would be
> "rather hard".

You'll see that the patch is from 2014.  When I actively worked on it,
I found no convincing/feasible way to enforce a reasonable hard limit.
I am not picking up work on this again but am merely flushing my queue
so that the patch going to waste is not on my conscience.

>> In addition, calling git-blame with -C leads to similar memory retention
>> patterns.
>
> This is a red herring. Just delete it. I, for one, being a heavy user of
> `git blame`, could count the number of times I used blame's -C option
> without any remaining hands. Zero times.
>
> Besides, -C is *supposed* to look harder.

We are not talking about "looking harder" but "taking more memory than
the set limit".

> Also: is there an easy way to reproduce your claims of better I/O
> characteristics? Something like a command-line, ideally with a file in
> git.git's own history, that demonstrates the I/O before and after the
> patch, would be an excellent addition to the commit message.

I've used it on the wortliste repository and system time goes down from
about 70 seconds to 50 seconds (this is a flash drive).  User time from
about 4:20 to 4:00.  It is a rather degenerate repository (predominantly
small changes in one humongous text file) so savings for more typical
cases might end up less than that.  But then it is degenerate
repositories that are most costly to blame.

> Further: I would have at least expected some rudimentary discussion
> why this patch -- which seems to at least partially contradict 7c3c796
> (blame: drop blob data after passing blame to the parent, 2007-12-11)
> -- is not regressing on the intent of said commit.

It is regressing on the intent of said commit by _retaining_ blob data
that it is _sure_ to look at again because of already having this data
asked for again in the priority queue.  That's the point.  It only drops
the blob data for which it has no request queued yet.  But there is no
handle on when the request is going to pop up until it actually leaves
the priority queue: the priority queue is a heap IIRC and thus only
provides partial orderings.

>> diff --git a/builtin/blame.c b/builtin/blame.c
>> index 21f42b0..2596fbc 100644
>> --- a/builtin/blame.c
>> +++ b/builtin/blame.c
>> @@ -1556,7 +1556,8 @@ finish:
>>  	}
>>  	for (i = 0; i < num_sg; i++) {
>>  		if (sg_origin[i]) {
>> -			drop_origin_blob(sg_origin[i]);
>> +			if (!sg_origin[i]->suspects)
>> +				drop_origin_blob(sg_origin[i]);
>>  			origin_decref(sg_origin[i]);
>>  		}
>
> It would be good to mention in the commit message that this patch does not
> change anything for blobs with only one remaining reference (the current
> one) because origin_decref() would do the same job as drop_origin_blob
> when decrementing the reference counter to 0.

A sizable number of references but not blobs are retained and needed for
producing the information in the final printed output (at least when
using the default sequential output instead of --incremental or
--porcelaine or similar).

> In fact, I suspect that simply removing the drop_origin_blob() call
> might result in the exact same I/O pattern.

It's been years since I actually worked on the code but I'm still pretty
sure you are wrong.

-- 
David Kastrup

  reply	other threads:[~2016-05-27 15:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-05-27 13:35 [PATCH] blame.c: don't drop origin blobs as eagerly David Kastrup
2016-05-27 15:00 ` Johannes Schindelin
2016-05-27 15:41   ` David Kastrup [this message]
2016-05-28  6:37     ` Johannes Schindelin
2016-05-28  8:29       ` David Kastrup
2016-05-28 12:34         ` Johannes Schindelin
2016-05-28 14:00           ` David Kastrup
  -- strict thread matches above, loose matches on Subject: below --
2019-04-02 11:56 David Kastrup
2019-04-03  7:45 ` Junio C Hamano
2019-04-03  9:32   ` Duy Nguyen
2019-04-03 11:36     ` Jeff King
2019-04-03 12:06       ` Duy Nguyen
2019-04-03 12:19         ` Jeff King
2019-04-03 12:32           ` David Kastrup
2019-04-03 11:08   ` David Kastrup

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87d1o7pkyy.fsf@fencepost.gnu.org \
    --to=dak@gnu.org \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.