Re: [Qemu-devel] [PATCH 00/10]: QError v4

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Anthony Liguori <aliguori@linux.vnet.ibm.com>
To: Markus Armbruster <armbru@redhat.com>
Cc: kraxel@redhat.com, qemu-devel@nongnu.org,
	Luiz Capitulino <lcapitulino@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 00/10]: QError v4
Date: Wed, 18 Nov 2009 12:08:58 -0600	[thread overview]
Message-ID: <4B04383A.9050101@linux.vnet.ibm.com> (raw)
In-Reply-To: <m33a4bes1c.fsf@crossbow.pond.sub.org>

Markus Armbruster wrote:
> 1. QError feels overengineered, particularly for a first iteration of a
>    protocol.
>
>    We go to great lengths to build highly structured error objects.
>    There is only one sane reason for doing that: a demonstrated client
>    need.  I can't see that need.
>   

A GUI wants to present the user a mechanism to reduce the memory 
consumption of a VM.  There are three reasons why this would not work.  
The first is that ballooning was not configured for this guest.  The 
second is that ballooning was configured but the version of the KVM 
kernel module does not have a sync mmu.  The third is that an 
appropriate guest driver has not been loaded for the device.

If we don't specify a type of error, all the GUI can do is fail right 
before trying to balloon the guest.  That offers the user no insight why 
it failed other than popping up a message box with an error message 
(that may not even be in their locale).  For a non-english speaker, it's 
gibberish.

If you use three types of errors, then a GUI can differentiate these 
cases and presents the user different feedback.  For the first case, it 
can simply gray out the menu item.  For the second can, it can also gray 
out the menu item but it can provide a one time indication (in say, a 
status bar), that ballooning is unavailable because the kernel version 
does not support it.  For the third error, you would want to prompt the 
user to indicate the driver is not loaded in the guest.

It's highly likely that for this last case, you'd want generic code to 
generate this error.  Further more, in order to generate the error 
message for a user, you need to know what device does not have a 
functioning driver.  You may say it's obvious for something like info 
balloon but it's not so obvious if you assume we support things other 
than just virtio-pci-balloon.  For instance, with s390, it's a different 
balloon driver.  For xenner/xenpv, it's a different driver.  It really 
suggests we should send the qdev device name for the current balloon driver.

> 2. It falls short of the requirement that clients can reasonably
>    classify errors they don't know.
>   

I think this is wishful thinking.  I strongly suspect that you can't 
find a reasonably popular web browser that doesn't know all of the error 
codes that web servers generate.

We should document all of the errors that each QMP function can return.

That said, if you thought there was 4-5 common classes of errors, adding 
an additional 'base_class' field could be a completely reasonable thing 
to do.  As long as you're adding information vs. taking it away, I'm 
quite happy.

> 3. It falls short of the requirement that clients can easily present a
>    human-readable error description to their human users, regardless of
>    whether they know the error or not.
>   

That's just incorrect.  We provide an example conversion table that's 
akin to strerror() or a __repr__ for an exception in Python.

> In more detail[*]:
>
> There are more than one way to screw up here.  One is certainly to fall
> short of client requirements in a way that is costly to fix (think
> incompatible protocol revision).  Another is to overshoot them in a way
> that is costly to maintain.  A third one is to spend too much time on
> figuring out the perfect solution.
>
> I believe our true problem is that we're still confused and/or
> disagreeing on client requirements, and this has led to a design that
> tries to cover all the conceivable bases, and feels overengineered to
> me.
>   

I've described my requirements for what a client can do.  I'd like to 
understand how you disagree.

> There are only so many complexity credits you can spend in a program,
> both globally and individually.  I'm very, very wary of making error
> reporting more complex than absolutely, desperately necessary.
>   

We're following a very well established model of error reporting 
(object-based exceptions).  This is well established and well understood.

> Reporting errors should remain as easy as we can make it.  The more
> cumbersome it is to report an error, the less it is done, and the more
> vaguely it is done.  If you have to edit more than the error site to
> report an error accurately, then chances skyrocket that it won't be
> reported, or it'll be reported inaccurately. 

I don't buy these sort of arguments.  We are not designing a library 
that is open to abuse.  If our developers are fall victim to this, then 
we need to retrain them to be less lazy by not accepting patches that do 
a poor job reporting errors.

>  And not because coders are
> stupid or lazy, but because they make sensible use of their very limited
> complexity credits: if you can either get the job done with lousy error
> messages, or not get it done at all, guess what the smart choice is.
>   

Error reporting is extremely important in a management scenario.  You 
are severely limited by what actions you can take based on the amount of 
error information returned.  This is particularly when dealing with a 
multi-client management mechanism.  IMHO, this is a sensible place to 
spend complexity credits.

> If I understand Dan correctly, machine-readable error code +
> human-readable description is just fine, as long as the error code is
> reasonably specific and the description is precise and complete.  Have
> we heard anything else from client developers?
>   

There are no client developers because QMP doesn't exist.  Historically, 
we haven't provided high quality error messages so of course libvirt 
makes due without them today.

But I can answer on behalf of the management work we've done based on 
libvirt.

If you attempt to do a virDomainSetMemory() to a qemu domain, you only 
get an error if you're using a version of qemu that doesn't support the 
balloon command.  You do not get an error at all for the case where the 
kernel module doesn't support ballooning or a guest driver isn't loaded.

The concrete use case here is building an autoballoon mechanism part of 
a larger management suite.  If you're in an environment where memory 
overcommit is important, then you really do want to know whether 
ballooning isn't functioning because appropriate guest drivers aren't 
loaded.

> I'm not a client developer, but let me make a few propositions on client
> needs anyway:
>
> * Clients are just as complexity-bound as the server.  They prefer their
>   errors as simple as possible, but no simpler.
>   

I agree but the problem is that every client has a different definition 
of "simple as possible".  My client doesn't necessarily care about PCI 
hotplug error messages because it's not a function that we use.   
However, ballooning is tremendously important and I want to know 
everything I can about why it failed.

> * The crucial question for the client isn't "what exactly went wrong".
>   It is "how should I handle this error".  Answering that question
>   should be easy (say, check the error code).  Figuring out what went
>   wrong should still be possible for a human operator of the client.
>   

I disagree.  The client needs to know what went wrong if they are going 
to take a corrective action.

> * Clients don't want to be tightly coupled to the server.
>   

I don't follow this.

> * No matter how smart and up-to-date the client is, there will always be
>   errors it doesn't know.  And it needs to answer the "how to handle"
>   question whether it knows the error code or not!  That's why protocols
>   like HTTP have simple rules to classify error codes.
>   

HTTP is the exception and the HTTP error classes really leave an awful 
lot to be desired.  If you feel it's important to add error classes, I'd 
be happy to consider those patches.

>   Likewise, it needs to be able to give a human operator enough
>   information to figure out what went wrong whether it knows the error
>   or not.  How do you expect clients to format a structured error object
>   for an error they don't know into human-readable text?  Isn't it much
>   easier and more robust to cut out the formatting middle-man and send
>   the text along with the error?
>   

We do that by providing a conversion table.  The fundamental problem 
here is localization.

> * Clients need to present a human-readable error description to their
>   human users, regardless of whether they know the error or not.
>   

I would never pass an error string from a third party directly to a 
user.  I doubt you'll find a lot of good applications that do.  From a 
usability perspective, you need to be in control of your interactions 
with the user.  They grammar needs to be consistent and it has to be 
localized.  The best you would do with a third party string is log it to 
some log file for later examination by support.  In that case, dumping 
the class code and the supporting information in JSON form is just as 
good as a text description.

> There's a general rule of programming that I've found quite hard to
> learn and quite painful to disobey: always try the stupidest solution
> that could possibly work first.
>
> Based on what I've learned about client requirements so far, I figure
> that solution is "easily classified error code + human-readable
> description".
>   

I appreciate you feel strongly that this is the right thing to do.  That 
said, I disagree and I have specific use cases in mind that are not 
satisfied by the above mechanism.  In a situation like this, I tend to 
think the best thing to do is provide more information such that 
everyone can get the function they want.

If we add a base_class member to the error structure and we have the 
table for formatting error messages, then that should satisfy your 
requirements.  If you're concerned about a client having to have an 
up-to-date version of the table, we can introduce a monitor command to 
pretty print a json error object which would address that concern.

Then I think your argument boils down to, you think the remaining 
infrastructure is unnecessary for what you anticipate the uses to be.  
But hopefully, you'll at least concede that there are valid use cases 
beyond what you anticipate to be the normal case that do need this 
information.

> If we later realize that this solution was *too* stupid, we can simply
> add a data member to the error object.
>   

It's never that easy because a management tool has to support a least 
common denominator.

-- 
Regards,

Anthony Liguori

next prev parent reply	other threads:[~2009-11-18 18:09 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-17 19:43 [Qemu-devel] [PATCH 00/10]: QError v4 Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 01/10] QJSON: Introduce qobject_from_jsonv() Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 02/10] QString: Introduce qstring_append_chr() Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 03/10] QString: Introduce qstring_append_int() Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 04/10] QString: Introduce qstring_from_substr() Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 05/10] utests: Add qstring_append_chr() unit-test Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 06/10] utests: Add qstring_from_substr() unit-test Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 07/10] Introduce QError Luiz Capitulino
2009-11-18 15:16   ` Markus Armbruster
2009-11-18 17:23     ` Luiz Capitulino
2009-11-19  8:42       ` Markus Armbruster
2009-11-19 12:59         ` [Qemu-devel] " Paolo Bonzini
2009-11-18 18:14   ` [Qemu-devel] " Daniel P. Berrange
2009-11-18 19:58     ` Anthony Liguori
2009-11-18 20:13       ` Luiz Capitulino
2009-11-17 19:43 ` [Qemu-devel] [PATCH 08/10] monitor: QError support Luiz Capitulino
2009-11-18 15:16   ` Markus Armbruster
2009-11-18 17:29     ` Luiz Capitulino
2009-11-18 18:16   ` Daniel P. Berrange
2009-11-17 19:43 ` [Qemu-devel] [PATCH 09/10] qdev: Use QError for 'device not found' error Luiz Capitulino
2009-11-18 15:17   ` Markus Armbruster
2009-11-18 17:32     ` Luiz Capitulino
2009-11-20  7:23     ` Amit Shah
2009-11-17 19:43 ` [Qemu-devel] [PATCH 10/10] monitor: do_info_balloon(): use QError Luiz Capitulino
2009-11-18 15:17   ` Markus Armbruster
2009-11-18 15:58     ` Anthony Liguori
2009-11-18 18:10       ` Luiz Capitulino
2009-11-18 16:06 ` [Qemu-devel] [PATCH 00/10]: QError v4 Markus Armbruster
2009-11-18 18:08   ` Anthony Liguori [this message]
2009-11-19  2:36     ` Jamie Lokier
2009-11-20 15:56       ` Anthony Liguori
2009-11-20 16:20         ` Luiz Capitulino
2009-11-20 16:27           ` Anthony Liguori
2009-11-20 17:57             ` Markus Armbruster
2009-11-20 17:29         ` Markus Armbruster
2009-11-20 17:37           ` Anthony Liguori
2009-11-19 10:11     ` Markus Armbruster
2009-11-20 16:13       ` Anthony Liguori
2009-11-20 18:47         ` Markus Armbruster
2009-11-20 19:04           ` Anthony Liguori
2009-11-21 10:02             ` Markus Armbruster
2009-11-22 16:08               ` Anthony Liguori
2009-11-23 13:06                 ` Luiz Capitulino
2009-11-23 13:11                   ` Anthony Liguori
2009-11-23 13:34                     ` Luiz Capitulino
2009-11-23 13:50                       ` Alexander Graf
2009-11-24 11:55                         ` Luiz Capitulino
2009-11-24 12:13                           ` Alexander Graf
2009-11-23 16:08                 ` Markus Armbruster
2009-11-23 12:42               ` Luiz Capitulino
2009-11-23 16:15                 ` Markus Armbruster
2009-11-18 18:13   ` [Qemu-devel] " Paolo Bonzini
2009-11-19 10:25     ` Markus Armbruster
2009-11-19 13:01       ` Paolo Bonzini
2009-11-19 14:11         ` Markus Armbruster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B04383A.9050101@linux.vnet.ibm.com \
    --to=aliguori@linux.vnet.ibm.com \
    --cc=armbru@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).