git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Ulrich Windl <Ulrich.Windl@rz.uni-regensburg.de>
Cc: peff@peff.net, git@vger.kernel.org
Subject: Re: Antw: Re: non-smooth progress  indication for git fsck and git gc
Date: Mon, 20 Aug 2018 10:57:13 +0200	[thread overview]
Message-ID: <87woslpg9i.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <5B7A7CDC020000A10002CDCF@gwsmtp1.uni-regensburg.de>


On Mon, Aug 20 2018, Ulrich Windl wrote:

>>>> Jeff King <peff@peff.net> schrieb am 16.08.2018 um 22:55 in Nachricht
> <20180816205556.GA8257@sigill.intra.peff.net>:
>> On Thu, Aug 16, 2018 at 10:35:53PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> This is all interesting, but I think unrelated to what Ulrich is talking
>>> about. Quote:
>>>
>>>     Between the two phases of "git fsck" (checking directories and
>>>     checking objects) there was a break of several seconds where no
>>>     progress was indicated
>>>
>>> I.e. it's not about the pause you get with your testcase (which is
>>> certainly another issue) but the break between the two progress bars.
>>
>> I think he's talking about both. What I said responds to this:
>
> Hi guys!
>
> Yes, I was wondering what git does between the two visible phases, and between
> the lines I was suggesting another progress message between those phases. At
> least the maximum unspecific three-dot-message "Thinking..." could be displayed
> ;-) Of course anything more appropriate would be welcome.
> Also that message should only be displayed if it's foreseeable that the
> operation will take significant time. In my case (I just repeated it a few
> minutes ago) the delay is significant (at least 10 seconds). As noted earlier I
> was hoping to capture the timing in a screencast, but it seems all the delays
> were just optimized away in the recording.
>
>>
>>> >> During "git gc" the writing objects phase did not update for some
>>> >> seconds, but then the percentage counter jumped like from 15% to 42%.
>>
>> But yeah, I missed that the fsck thing was specifically about a break
>> between two meters. That's a separate problem, but also worth
>> discussing (and hopefully much easier to address).
>>
>>> If you fsck this repository it'll take around (on my spinning rust
>>> server) 30 seconds between 100% of "Checking object directories" before
>>> you get any output from "Checking objects".
>>>
>>> The breakdown of that is (this is from approximate eyeballing):
>>>
>>>  * We spend 1-3 seconds just on this:
>>>
>>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack
>
>> -check.c#L181
>>
>> OK, so that's checking the sha1 over the .idx file. We could put a meter
>> on that. I wouldn't expect it to generally be all that slow outside of
>> pathological cases, since it scales with the number of objects (and 1s
>> is our minimum update anyway, so that might be OK as-is). Your case has
>> 13M objects, which is quite large.
>
> Sometimes an oldish CPU could bring performance surprises, maybe. Anyway my
> CPU is question is an AMD Phenom2 quad-core with 3.2GHz nominal, and there is a
> classic spinning disk with 5400RPM built in...
>
>>
>>>  * We spend the majority of the ~30s on this:
>>>
>>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack
>
>> -check.c#L70-L79
>>
>> This is hashing the actual packfile. This is potentially quite long,
>> especially if you have a ton of big objects.
>
> That seems to apply. BTW: Is there a way go get some repository statistics
> like a histogram of object sizes (or whatever that might be useful to help
> making decisions)?

The git-sizer program is really helpful in this regard:
https://github.com/github/git-sizer

>>
>> I wonder if we need to do this as a separate step anyway, though. Our
>> verification is based on index-pack these days, which means it's going
>> to walk over the whole content as part of the "Indexing objects" step to
>> expand base objects and mark deltas for later. Could we feed this hash
>> as part of that walk over the data? It's not going to save us 30s, but
>> it's likely to be more efficient. And it would fold the effort naturally
>> into the existing progress meter.
>>
>>>  * Wes spend another 3-5 seconds on this QSORT:
>>>
>>
> https://github.com/git/git/blob/63749b2dea5d1501ff85bab7b8a7f64911d21dea/pack
>
>> -check.c#L105
>>
>> That's a tough one. I'm not sure how we'd count it (how many compares we
>> do?). And each item is doing so little work that hitting the progress
>> code may make things noticeably slower.
>
> If it's sorting, maybe add some code like (wild guess):
>
> if (objects_to_sort > magic_number)
>    message("Sorting something...");

I think a good solution to these cases is to just introduce something to
the progress.c mode where it learns how to display a counter where we
don't know what the end-state will be. Something like your proposed
magic_number can just be covered under the more general case where we
don't show the progress bar unless it's been 1 second (which I believe
is the default).

>>
>> Again, your case is pretty big. Just based on the number of objects,
>> linux.git should be 1.5-2.5 seconds on your machine for the same
>> operation. Which I think may be small enough to ignore (or even just
>> print a generic before/after). It's really the 30s packfile hash that's
>> making the whole thing so terrible.
>>
>> -Peff

  reply	other threads:[~2018-08-20  8:57 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-16  6:54 non-smooth progress indication for git fsck and git gc Ulrich Windl
2018-08-16 15:18 ` Duy Nguyen
2018-08-16 16:05   ` Jeff King
2018-08-20  8:27   ` Antw: " Ulrich Windl
2018-08-16 15:57 ` Jeff King
2018-08-16 20:02   ` Jeff King
2018-08-16 22:10     ` Junio C Hamano
2018-08-16 20:35   ` Ævar Arnfjörð Bjarmason
2018-08-16 20:55     ` Jeff King
2018-08-16 21:06       ` Jeff King
2018-08-17 14:39         ` Duy Nguyen
2018-08-20  8:33       ` Antw: " Ulrich Windl
2018-08-20  8:57         ` Ævar Arnfjörð Bjarmason [this message]
2018-08-20  9:37           ` Ulrich Windl
2018-08-21  1:07           ` Jeff King
2018-08-21  6:20             ` Ulrich Windl
2018-08-21 15:21             ` Duy Nguyen
2018-09-01 12:53     ` Ævar Arnfjörð Bjarmason
2018-09-01 13:52       ` Ævar Arnfjörð Bjarmason
2018-09-02  7:46       ` Jeff King
2018-09-02  7:55         ` Jeff King
2018-09-02  8:55           ` Jeff King
2018-09-03 16:48             ` Ævar Arnfjörð Bjarmason
2018-09-07  3:30               ` Jeff King
2018-09-04 15:53           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87woslpg9i.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Ulrich.Windl@rz.uni-regensburg.de \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).