From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50823) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQBbt-0005eV-OS for qemu-devel@nongnu.org; Wed, 19 Mar 2014 04:12:30 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WQBbo-0004le-Hd for qemu-devel@nongnu.org; Wed, 19 Mar 2014 04:12:25 -0400 Received: from mx1.redhat.com ([209.132.183.28]:6157) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WQBbo-0004lV-9F for qemu-devel@nongnu.org; Wed, 19 Mar 2014 04:12:20 -0400 From: Markus Armbruster References: <87lhw7rppw.fsf@rustcorp.com.au> Date: Wed, 19 Mar 2014 09:12:15 +0100 In-Reply-To: <87lhw7rppw.fsf@rustcorp.com.au> (Rusty Russell's message of "Wed, 19 Mar 2014 11:04:19 +1030") Message-ID: <87vbva6200.fsf@blackfin.pond.sub.org> MIME-Version: 1.0 Content-Type: text/plain Subject: Re: [Qemu-devel] virtio device error reporting best practice? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Rusty Russell Cc: Dave Airlie , "qemu-devel@nongnu.org" Rusty Russell writes: > Dave Airlie writes: >> So I'm looking at how best to do virtio gpu device error reporting, >> and how to deal with illegal stuff, >> >> I've two levels of errors I want to support, >> >> a) unrecoverable or bad guest kernel programming errors, > > The QEMU standard approach is to exit at this point. No, really. > >> b) per 3D context errors from the renderer backend, >> >> (b) I can easily report in an event queue and the guest kernel can in >> theory blow away the offenders, this is how GL works with some >> extensions, > > That's probably sanest. > >> For (a) I can expect a response from every command I put into the main >> GPU control queue, the response should always be no error, but in some >> cases it will be because the guest hit some host resource error, or >> asked for something insane, (guest kernel drivers would be broken in >> most of these cases). >> >> Alternately I can use the separate event queue to send async errors >> when the guest does something bad, >> >> I'm also considering adding some sort of flag in config space saying >> the device needs a reset before it will continue doing anything, > > I generally dislike error codes which Never Happen; it's like making > every void function return int just in case: the caller has no idea what > to do if it fails. > > The litmus test: does *your* guest handle failures other than by giving > up on the device? If so, sure, you need to have a sane error-reporting > strategy. Err, isn't this a circular argument? No need for QEMU to report the failure, because the guest won't handle it; no need to handle the failure, because QEMU won't report it. What about this: would you make your guest handle failures if they were reported? >> The main reason I'm considering this stuff is for security reasons if >> the guest asks for something really illegal or crazy what should the >> expected behaviour of the host be? (at least secure I know that). > > If the guest userspace can do it, don't exit. If the kernel only, and > it's should have known better, abort is OK. > > Sure that doesn't help much! Immediate exit() or abort() denies the guest the ability to degrade service gracefully (disable the device, cry for help and try to hobble on), or report its brokenness ungracefully (kernel panic, crash dump). I doubt denying that is okay unless the device is so important that without it you can't even hope to panic.