* Abuse of warnings for unhandled errors and programming errors
@ 2025-08-08  9:30 Markus Armbruster
  2025-08-19 12:30 ` Daniel P. Berrangé
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Armbruster @ 2025-08-08  9:30 UTC (permalink / raw)
  To: qemu-devel
In "[PATCH 00/12] Error reporting cleanup, a fix, and &error_warn
removal", I challenged the use of warnings in a few places.  I think the
topic deserves a wider audience than the one a rather pedestrian cleanup
series draws.
To make my case, I need to start with errors.  We distinguish between
ordinary errors (for lack of a better word) and programming errors.
Ordinary errors are things like nonsensical user requests, unavailable
resources, and so forth.  A correct program is prepared for such
failures, detects them, and reports them to the user.  The user can then
fix their request, try again when resources are available, and so forth.
Tools for reporting ordinary errors are error_report(),
error_report_err(), &error_fatal, and friends.
Programming errors are bugs.  A developer needs to fix the program.
Unlike ordinary errors, programming errors are *unexpected*.
Programming errors are commonly not recoverable.  The proper tool for
unrecoverable ones is assertions.  &error_abort can be a convenient way
to assert "this can't fail".
On to warnings.
When some failure doesn't prevent satisfying some request, an ordinary
error can be misleading.  We make it a warning instead then.
What if it's a programming error we recover from?
Aside: trying to recover in a buggy program is risky, but that's not the
debate I want to have here.
How do we want such recoverable programming errors reported?
Warning?  We seem to be abusing warnings this way, and I hate it.  What
we have to report is a *bug*, and we should make that crystal clear.
"warning: FunctionYouNeverHeardAbout() failed" does not.  It could be
anything, and you likely need to look at the source to find out.
Ordinary error reporting with "internal error: " prefix, so the user
understands this is a bug, and all they can do about it is report it?
Log the bug somehow?
Thoughts?
Regardless of all that, beware of "this shouldn't fail, and if it does,
I don't know how to handle it / can't be bothered to handle it, so I
just spit out a warning and hope for the best."  Stop and *think*
whether it is a programming error or an ordinary error that is difficult
/ inconvenient / impossible to handle.  Then do the right thing if
practical.  If not, you're accepting a defect.  Make it locally obvious
that it's a defect, so it's at least a *known* defect, and the next guy
coming along won't have to guess.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: Abuse of warnings for unhandled errors and programming errors
  2025-08-08  9:30 Abuse of warnings for unhandled errors and programming errors Markus Armbruster
@ 2025-08-19 12:30 ` Daniel P. Berrangé
  2025-09-10 11:05   ` Markus Armbruster
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel P. Berrangé @ 2025-08-19 12:30 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel
On Fri, Aug 08, 2025 at 11:30:32AM +0200, Markus Armbruster wrote:
> In "[PATCH 00/12] Error reporting cleanup, a fix, and &error_warn
> removal", I challenged the use of warnings in a few places.  I think the
> topic deserves a wider audience than the one a rather pedestrian cleanup
> series draws.
> 
> 
> To make my case, I need to start with errors.  We distinguish between
> ordinary errors (for lack of a better word) and programming errors.
> 
> Ordinary errors are things like nonsensical user requests, unavailable
> resources, and so forth.  A correct program is prepared for such
> failures, detects them, and reports them to the user.  The user can then
> fix their request, try again when resources are available, and so forth.
> 
> Tools for reporting ordinary errors are error_report(),
> error_report_err(), &error_fatal, and friends.
The thing about nonsense user rquests / unavailable resources , etc
is that almost none of them should imply exiting QEMU, except if they
occur in the context of system startup before the VM starts executing.
Once running we should do everything in our power to not let the users
workload die.
From that POV, I tend to wish that error_fatal did not exist and that
we instead propagated all fatal errors up until reaching main(), so
we were not at risk of using error_fatal in runtime scenarios. We're
largely stuck with what we've got though, due to our need to retrofit
error reporting in to our existing codebase design.
I do try to push back in review any time we introduce new code that
doesn't propagate errors as high up the stack as possible/practcal.
> Programming errors are bugs.  A developer needs to fix the program.
> Unlike ordinary errors, programming errors are *unexpected*.
> 
> Programming errors are commonly not recoverable.  The proper tool for
> unrecoverable ones is assertions.  &error_abort can be a convenient way
> to assert "this can't fail".
We could have called it &error_assert but that's bike shed colouring :-)
> On to warnings.
> 
> When some failure doesn't prevent satisfying some request, an ordinary
> error can be misleading.  We make it a warning instead then.
> 
> What if it's a programming error we recover from?
> 
> Aside: trying to recover in a buggy program is risky, but that's not the
> debate I want to have here.
> 
> How do we want such recoverable programming errors reported?
> 
> Warning?  We seem to be abusing warnings this way, and I hate it.  What
> we have to report is a *bug*, and we should make that crystal clear.
> "warning: FunctionYouNeverHeardAbout() failed" does not.  It could be
> anything, and you likely need to look at the source to find out.
> 
> Ordinary error reporting with "internal error: " prefix, so the user
> understands this is a bug, and all they can do about it is report it?
> 
> Log the bug somehow?
> 
> Thoughts?
I don't see 'warnings' as something directly actionable for a user.
Rather they are messages that I would want to see included in a log
file that a user attaches to a bug report if they find some behavioural
problem. If the user understands the warning great, but that isn't a
requirement.
IOW, while informative warnings is of course better than not, as long
as the warning message contains sufficient info for the maintainer to
understand what happened the minimum quality bar is satisfied IMHO.
With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: Abuse of warnings for unhandled errors and programming errors
  2025-08-19 12:30 ` Daniel P. Berrangé
@ 2025-09-10 11:05   ` Markus Armbruster
  2025-09-15 18:02     ` Daniel P. Berrangé
  0 siblings, 1 reply; 5+ messages in thread
From: Markus Armbruster @ 2025-09-10 11:05 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Fri, Aug 08, 2025 at 11:30:32AM +0200, Markus Armbruster wrote:
>> In "[PATCH 00/12] Error reporting cleanup, a fix, and &error_warn
>> removal", I challenged the use of warnings in a few places.  I think the
>> topic deserves a wider audience than the one a rather pedestrian cleanup
>> series draws.
>> 
>> 
>> To make my case, I need to start with errors.  We distinguish between
>> ordinary errors (for lack of a better word) and programming errors.
>> 
>> Ordinary errors are things like nonsensical user requests, unavailable
>> resources, and so forth.  A correct program is prepared for such
>> failures, detects them, and reports them to the user.  The user can then
>> fix their request, try again when resources are available, and so forth.
>> 
>> Tools for reporting ordinary errors are error_report(),
>> error_report_err(), &error_fatal, and friends.
>
> The thing about nonsense user rquests / unavailable resources , etc
> is that almost none of them should imply exiting QEMU, except if they
> occur in the context of system startup before the VM starts executing.
> Once running we should do everything in our power to not let the users
> workload die.
>
> From that POV, I tend to wish that error_fatal did not exist and that
> we instead propagated all fatal errors up until reaching main(), so
> we were not at risk of using error_fatal in runtime scenarios. We're
> largely stuck with what we've got though, due to our need to retrofit
> error reporting in to our existing codebase design.
Yes, &error_fatal is almost always wrong after the guest starts.
When it isn't wrong, it's quite convenient.
> I do try to push back in review any time we introduce new code that
> doesn't propagate errors as high up the stack as possible/practcal.
>
>> Programming errors are bugs.  A developer needs to fix the program.
>> Unlike ordinary errors, programming errors are *unexpected*.
>> 
>> Programming errors are commonly not recoverable.  The proper tool for
>> unrecoverable ones is assertions.  &error_abort can be a convenient way
>> to assert "this can't fail".
>
> We could have called it &error_assert but that's bike shed colouring :-)
Yup :)
>> On to warnings.
>> 
>> When some failure doesn't prevent satisfying some request, an ordinary
>> error can be misleading.  We make it a warning instead then.
>> 
>> What if it's a programming error we recover from?
>> 
>> Aside: trying to recover in a buggy program is risky, but that's not the
>> debate I want to have here.
>> 
>> How do we want such recoverable programming errors reported?
>> 
>> Warning?  We seem to be abusing warnings this way, and I hate it.  What
>> we have to report is a *bug*, and we should make that crystal clear.
>> "warning: FunctionYouNeverHeardAbout() failed" does not.  It could be
>> anything, and you likely need to look at the source to find out.
>> 
>> Ordinary error reporting with "internal error: " prefix, so the user
>> understands this is a bug, and all they can do about it is report it?
>> 
>> Log the bug somehow?
>> 
>> Thoughts?
>
> I don't see 'warnings' as something directly actionable for a user.
> Rather they are messages that I would want to see included in a log
> file that a user attaches to a bug report if they find some behavioural
> problem. If the user understands the warning great, but that isn't a
> requirement.
>
> IOW, while informative warnings is of course better than not, as long
> as the warning message contains sufficient info for the maintainer to
> understand what happened the minimum quality bar is satisfied IMHO.
That's a low quality bar indeed.  Here's mine:
1. A warning should make perfectly clear whether this is a bug that
should be reported, or an issue with usage, resources, etc. that can be
ignored, but may help understand future trouble, if any (typically an
ordinary error that wasn't fully handled).  Our errors make bug
vs. ordinary error clear.
2. A warning of the former kind (bug) should provide information
developers need to start debugging.  For errors, we give them a core
dump, source file and line number.  For warnings, we currently give them
grep and warm wishes.
3. A warning of the latter kind (not a bug) should at least try to
provide hints that help users diagnose and correct / work around what's
wrong.  "warning: failed to WSAEventSelect()" doesn't.  "warning:
trouble initializing slirp for user mode networking" might.
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: Abuse of warnings for unhandled errors and programming errors
  2025-09-10 11:05   ` Markus Armbruster
@ 2025-09-15 18:02     ` Daniel P. Berrangé
  2025-09-16 13:25       ` Markus Armbruster
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel P. Berrangé @ 2025-09-15 18:02 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel
On Wed, Sep 10, 2025 at 01:05:40PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> > On Fri, Aug 08, 2025 at 11:30:32AM +0200, Markus Armbruster wrote:
> >
> > I don't see 'warnings' as something directly actionable for a user.
> > Rather they are messages that I would want to see included in a log
> > file that a user attaches to a bug report if they find some behavioural
> > problem. If the user understands the warning great, but that isn't a
> > requirement.
> >
> > IOW, while informative warnings is of course better than not, as long
> > as the warning message contains sufficient info for the maintainer to
> > understand what happened the minimum quality bar is satisfied IMHO.
> 
> That's a low quality bar indeed.  Here's mine:
> 
> 1. A warning should make perfectly clear whether this is a bug that
> should be reported, or an issue with usage, resources, etc. that can be
> ignored, but may help understand future trouble, if any (typically an
> ordinary error that wasn't fully handled).  Our errors make bug
> vs. ordinary error clear.
> 
> 2. A warning of the former kind (bug) should provide information
> developers need to start debugging.  For errors, we give them a core
> dump, source file and line number.  For warnings, we currently give them
> grep and warm wishes.
Don't understate the value of "grep"  :-)
It is my #1 tool when I see a string mentioned in bug report, whether
that's from error_report or warn_report.
What annoys/frustrates me is when the message (whether an eror or
warning) lacks the technical details of exactly what failed.
ie if the message doesn't substitute in either the errno value, or the
strerror string, or equivalent, then it massively reduces the chances
of diagnosis unless the bug is easily reproducible.
> 3. A warning of the latter kind (not a bug) should at least try to
> provide hints that help users diagnose and correct / work around what's
> wrong.  "warning: failed to WSAEventSelect()" doesn't.  "warning:
> trouble initializing slirp for user mode networking" might.
I agree that the latter is more informative for users, though for
developers it might be worse as such a generic message may provide
cover for several different low level problems. 
Neither help me diagnose the problem though if I see them in a bug
report though, because neither include the "WSAGetLastError()" value
(or its string equivalent).
Probably we deserve both. In the case of "Error" objects, we have
error_prepend so code can do
  error_setg(errp, "failed to WSAEventSelect: %s", WSAGetLastError())
and then the caller one or more levels up the stack can do
  error_prepend(errp, "unable to initialize slirp user mode networking")
which becomes useful when we see the final error_report() or equiv.
We don't have an error_prepend concept for warnings 
With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
^ permalink raw reply	[flat|nested] 5+ messages in thread
* Re: Abuse of warnings for unhandled errors and programming errors
  2025-09-15 18:02     ` Daniel P. Berrangé
@ 2025-09-16 13:25       ` Markus Armbruster
  0 siblings, 0 replies; 5+ messages in thread
From: Markus Armbruster @ 2025-09-16 13:25 UTC (permalink / raw)
  To: Daniel P. Berrangé; +Cc: qemu-devel
Daniel P. Berrangé <berrange@redhat.com> writes:
> On Wed, Sep 10, 2025 at 01:05:40PM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> > On Fri, Aug 08, 2025 at 11:30:32AM +0200, Markus Armbruster wrote:
>> >
>> > I don't see 'warnings' as something directly actionable for a user.
>> > Rather they are messages that I would want to see included in a log
>> > file that a user attaches to a bug report if they find some behavioural
>> > problem. If the user understands the warning great, but that isn't a
>> > requirement.
>> >
>> > IOW, while informative warnings is of course better than not, as long
>> > as the warning message contains sufficient info for the maintainer to
>> > understand what happened the minimum quality bar is satisfied IMHO.
>> 
>> That's a low quality bar indeed.  Here's mine:
>> 
>> 1. A warning should make perfectly clear whether this is a bug that
>> should be reported, or an issue with usage, resources, etc. that can be
>> ignored, but may help understand future trouble, if any (typically an
>> ordinary error that wasn't fully handled).  Our errors make bug
>> vs. ordinary error clear.
>> 
>> 2. A warning of the former kind (bug) should provide information
>> developers need to start debugging.  For errors, we give them a core
>> dump, source file and line number.  For warnings, we currently give them
>> grep and warm wishes.
>
> Don't understate the value of "grep"  :-)
>
> It is my #1 tool when I see a string mentioned in bug report, whether
> that's from error_report or warn_report.
>
> What annoys/frustrates me is when the message (whether an eror or
> warning) lacks the technical details of exactly what failed.
>
> ie if the message doesn't substitute in either the errno value, or the
> strerror string, or equivalent, then it massively reduces the chances
> of diagnosis unless the bug is easily reproducible.
True.
>> 3. A warning of the latter kind (not a bug) should at least try to
>> provide hints that help users diagnose and correct / work around what's
>> wrong.  "warning: failed to WSAEventSelect()" doesn't.  "warning:
>> trouble initializing slirp for user mode networking" might.
>
> I agree that the latter is more informative for users, though for
> developers it might be worse as such a generic message may provide
> cover for several different low level problems. 
My two example messages are differently generic :)
The first tells a user nothing, and a developer which function failed.
It doesn't tell what failed.  If we're lucky, only one thing can fail
this way, so finding it is easy enough.  Are we feeling lucky?
The second tells both of them what failed, but doesn't tell the
developer how exactly it failed.
> Neither help me diagnose the problem though if I see them in a bug
> report though, because neither include the "WSAGetLastError()" value
> (or its string equivalent).
>
> Probably we deserve both.
No objection at all as long as the message remains useful to the users.
Remember, this kind of warning is for users first, developers second.
Warnings about bugs are the other way round.
>                           In the case of "Error" objects, we have
> error_prepend so code can do
>
>   error_setg(errp, "failed to WSAEventSelect: %s", WSAGetLastError())
>
> and then the caller one or more levels up the stack can do
>
>   error_prepend(errp, "unable to initialize slirp user mode networking")
>
> which becomes useful when we see the final error_report() or equiv.
>
> We don't have an error_prepend concept for warnings 
I'm not following.  error_prepend() doesn't care how the error it
modifies will be consumed.  The following should just work:
    error_setg(errp, "failed to WSAEventSelect: %s", WSAGetLastError())
    ... up the stack some ...
    error_prepend(errp, "unable to initialize slirp user mode networking")
    ... up the stack some more ...
    warn_report_err(err);
We even have a convenience function to warn with prepended text:
    error_setg(errp, "failed to WSAEventSelect: %s", WSAGetLastError())
    ... up the stack some ...
    warn_reportf_err(err, "unable to initialize slirp user mode networking: ")
^ permalink raw reply	[flat|nested] 5+ messages in thread
end of thread, other threads:[~2025-09-16 13:26 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08  9:30 Abuse of warnings for unhandled errors and programming errors Markus Armbruster
2025-08-19 12:30 ` Daniel P. Berrangé
2025-09-10 11:05   ` Markus Armbruster
2025-09-15 18:02     ` Daniel P. Berrangé
2025-09-16 13:25       ` Markus Armbruster
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).