Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: "René Scharfe" <l.s.r@web.de>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	git@vger.kernel.org, "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling
Date: Tue, 15 Jun 2021 18:46:17 +0200	[thread overview]
Message-ID: <877divystm.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <34d5febf-508c-52b8-a04b-98298d75bd8d@web.de>


On Tue, Jun 15 2021, René Scharfe wrote:

> Am 14.06.21 um 21:08 schrieb Ævar Arnfjörð Bjarmason:
>>
>> On Mon, Jun 14 2021, René Scharfe wrote:
>>
>>> Am 14.06.21 um 13:07 schrieb Ævar Arnfjörð Bjarmason:
>>>>
>>>> On Thu, Jun 10 2021, René Scharfe wrote:
>>>>
>>>>> Am 09.06.21 um 00:12 schrieb Ævar Arnfjörð Bjarmason:
>>>>>>
>>>>>> On Tue, Jun 08 2021, René Scharfe wrote:
>>>>>>
>>>>>>> I wonder (only in a semi-curious way, though) if we can detect
>>>>>>> off-by-one errors by adding an assertion to display_progress() that
>>>>>>> requires the first update to have the value 0, and in stop_progress()
>>>>>>> one that requires the previous display_progress() call to have a value
>>>>>>> equal to the total number of work items.  Not sure it'd be worth the
>>>>>>> hassle..
>>>>>>
>>>>>> That's intentional. We started eating 3 apples, got to one, but now our
>>>>>> house is on fire and we're eating no more apples today, even if we
>>>>>> planned to eat 3 when we sat down.
>>>>>>
>>>>>> The progress bar reflects this unexpected but recoverable state:
>>>>>>
>>>>>>     $ perl -wE 'for (0..1) { say "update"; say "progress $_" }' |
>>>>>>       ./helper/test-tool progress --total=3 Apples 2>&1 |
>>>>>>       cat -v | perl -pe 's/\^M\K/\n/g'
>>>>>>     Apples:   0% (0/3)^M
>>>>>>     Apples:  33% (1/3)^M
>>>>>>     Apples:  33% (1/3), done.
>>>>>>
>>>>>> We're at 1/3, but we're done. No more apples.
>>>>>>
>>>>>> This isn't just some hypothetical, e.g. consider neeing to unlink() or
>>>>>> remove files/directories one at a time in a directory and getting the
>>>>>> estimated number from st_nlink (yeah yeah, unportable, but it was the
>>>>>> first thing I thought of).
>>>>>>
>>>>>> We might think we're processing 10 entries, but another other processes
>>>>>> might make our progress bar end at more or less than the 100% we
>>>>>> expected. That's OK, not something we should invoke BUG() about.
>>>>>
>>>>> It doesn't have to be a BUG; a warning would suffice.  And I hope not
>>>>> finishing the expected number of items due to a catastrophic event is
>>>>> rare enough that an additional warning wouldn't cause too much pain.
>>>>
>>>> It's not a catastrophic event, just a run of the mill race condition
>>>> we'll expect if we're dealing with the real world.
>>>>
>>>> E.g. you asked to unlink 1000 files, we do so, we find 10 are unlinked
>>>> already, or the command is asked to recursively unlink all files in a
>>>> directory tree, and new ones have showed up.
>>>>
>>>> In those cases we should just just shrug and move on, no need for a
>>>> warning. We just don't always have perfect information about future
>>>> state at the start of the loop.
>>>
>>> If a planned work item is cancelled then it can still be counted as
>>> done.  Or the total could be adjusted, but that might look awkward.
>>>
>>>>> Loops that *regularly* end early are not a good fit for progress
>>>>> percentages, I think.
>>>>
>>>> Arguably yes, but in these fuzzy cases not providing a "total" means
>>>> showing no progress at all, just a counter. Perhaps we should have some
>>>> other "provide total, and it may be fuzzy" flag. Not providing it might
>>>> run into your proposed BUG(), my point was that the current API
>>>> providing this flexibility is intentional.
>>>
>>> Your patch turns a loop that doesn't immediately report skipped items
>>> into one with contiguous progress updates.  That's a good way to deal
>>> with the imagined restrictions for error detection.  Another would be
>>> to make the warnings optional.
>>
>> I don't see how there's anything wrong with the API use, how it needs a
>> warning etc.
>
> You pointed out that many callsites do:
>
> 	for (i = 0; i < large_number; i++) {
> 		display_progress(p, i + 1);
> 		/* work work work */
> 	}
>
> This is an off-by-one error because a finished item is reported before
> work on it starts.  Adding a warning can help find these cases reliably.

I understand that we're respectfully disagreeing and that's what you
think, but I really don't think it helps anyone if we just repeat our
respective points.

I don't think it's off-by-one, but you do.

Yes, I understand that you think that progress bars should absolutely
never ever show 1/5 if the first item is not finished. I disagree and
think that's not intuitive, per my "eating an Apple" example
upthread.

We disagree, and I for one think I understand what you mean, perhaps you
don't understand what I mean, but let's try to move on.

>>>>>> Similarly, the n=0 being distinguishable from the first
>>>>>> display_progress() is actually useful in practice. It's something I've
>>>>>> seen git.git emit (not recently, I patched the relevant code to emit
>>>>>> more granular progress).
>>>>>>
>>>>>> It's useful to know that we're stalling on the setup code before the
>>>>>> for-loop, not on the first item.
>>>>>
>>>>> Hmm, preparations that take a noticeable time might deserve their own
>>>>> progress line.
>>>>
>>>> Sure, and I've split some of those up in the past, but this seems like
>>>> ducking/not addressing the point that the API use we disagree on has
>>>> your preferred use conflating these conditions, but mine does not...
>>>
>>> Subtle.  If preparation takes a long time and each item less than that
>>> then the progress display is likely to jump from "0/n" to "i/n", where
>>> i > 1, and the meaning of "1/n" becomes moot.
>>
>> In practice we're making humongous jumps all over the place, we don't
>> write to the terminal for every item processed, and if we did it would
>> be too fast to be perceptable to the user.
>>
>> So I don't think this is an issue in the first place, as noted upthread
>> in <8735tt4fhx.fsf@evledraar.gmail.com>. Regardless of what we think of
>> the supposed off-by-one issue you seemed to think that it was enough of
>> an issue to justify complexity at the API use level (needing to think
>> about "continue" statements in loops, etc.), but now you think it's
>> moot?
>
> I don't understand your question.  Let me rephrase what I find moot:
>
> You wrote that the first display_progress() call being made with n>0
> would be useful to you to see long-running preparations.  If items are
> processed quicker than one per second, then whatever number the first
> display_progress() call writes to the screen will be overwritten within
> a second.  So the value of the first update shouldn't actually matter
> much for your use case -- unless items takes a long time to process.

I think it would be better if you replied specifically to the comments I
had later about throughput progress bars, i.e.:

    How does the idea that we show "has been done" make sense when you
    combine the progress.c API with the display_throughput(). I.e. output
    like[...]

Anyway, in this case I understood you to mean that you thought the
off-by-one wasn't a big deal in practice most of the time, I don't think
so either for e.g. counting objects in pack files.

I do think it's useful to be consistent though, and for e.g. cases of
downloading 5 files it makes sense to show 1/5 if we are currently in
the process of downloading files 1 out of 5, not 0/5 or whatever.

next prev parent reply	other threads:[~2021-06-15 16:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-07 14:43 [PATCH 0/2] trivial progress.c API usage fixes Ævar Arnfjörð Bjarmason
2021-06-07 14:43 ` [PATCH 1/2] read-cache.c: don't guard calls to progress.c API Ævar Arnfjörð Bjarmason
2021-06-07 15:28   ` Derrick Stolee
2021-06-07 15:52     ` Ævar Arnfjörð Bjarmason
2021-06-07 16:11       ` Derrick Stolee
2021-06-07 14:43 ` [PATCH 2/2] read-cache: fix incorrect count and progress bar stalling Ævar Arnfjörð Bjarmason
2021-06-07 15:31   ` Derrick Stolee
2021-06-07 15:58     ` Ævar Arnfjörð Bjarmason
2021-06-07 19:20       ` René Scharfe
2021-06-07 19:49         ` Ævar Arnfjörð Bjarmason
2021-06-07 23:41           ` Junio C Hamano
2021-06-08 10:58             ` Ævar Arnfjörð Bjarmason
2021-06-08 16:14               ` René Scharfe
2021-06-08 22:12                 ` Ævar Arnfjörð Bjarmason
2021-06-10  5:30                   ` Junio C Hamano
2021-06-10 15:14                     ` René Scharfe
2021-06-10 15:14                   ` René Scharfe
2021-06-14 11:07                     ` Ævar Arnfjörð Bjarmason
2021-06-14 17:18                       ` René Scharfe
2021-06-14 19:08                         ` Ævar Arnfjörð Bjarmason
2021-06-15  2:32                           ` Junio C Hamano
2021-06-15 15:14                           ` René Scharfe
2021-06-15 16:46                             ` Ævar Arnfjörð Bjarmason [this message]
2021-06-20 12:53                               ` René Scharfe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877divystm.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=pclouds@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.