From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36707)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rusty@ozlabs.org>) id 1WQ4p8-0001XV-Ba
	for qemu-devel@nongnu.org; Tue, 18 Mar 2014 20:57:44 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <rusty@ozlabs.org>) id 1WQ4p2-00029d-Bm
	for qemu-devel@nongnu.org; Tue, 18 Mar 2014 20:57:38 -0400
Received: from ozlabs.org ([203.10.76.45]:60307)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <rusty@ozlabs.org>) id 1WQ4p2-00029N-0w
	for qemu-devel@nongnu.org; Tue, 18 Mar 2014 20:57:32 -0400
From: Rusty Russell <rusty@rustcorp.com.au>
In-Reply-To: <CAPM=9twJX3F+as1TuoerW1Yt-b0xw8YEf1YHa0B+MLMJBd0i_w@mail.gmail.com>
References: <CAPM=9twJX3F+as1TuoerW1Yt-b0xw8YEf1YHa0B+MLMJBd0i_w@mail.gmail.com>
Date: Wed, 19 Mar 2014 11:04:19 +1030
Message-ID: <87lhw7rppw.fsf@rustcorp.com.au>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Subject: Re: [Qemu-devel] virtio device error reporting best practice?
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Dave Airlie <airlied@gmail.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>

Dave Airlie <airlied@gmail.com> writes:
> So I'm looking at how best to do virtio gpu device error reporting,
> and how to deal with illegal stuff,
>
> I've two levels of errors I want to support,
>
> a) unrecoverable or bad guest kernel programming errors,

The QEMU standard approach is to exit at this point.  No, really.

> b) per 3D context errors from the renderer backend,
>
> (b) I can easily report in an event queue and the guest kernel can in
> theory blow away the offenders, this is how GL works with some
> extensions,

That's probably sanest.

> For (a) I can expect a response from every command I put into the main
> GPU control queue, the response should always be no error, but in some
> cases it will be because the guest hit some host resource error, or
> asked for something insane, (guest kernel drivers would be broken in
> most of these cases).
>
> Alternately I can use the separate event queue to send async errors
> when the guest does something bad,
>
> I'm also considering adding some sort of flag in config space saying
> the device needs a reset before it will continue doing anything,

I generally dislike error codes which Never Happen; it's like making
every void function return int just in case: the caller has no idea what
to do if it fails.

The litmus test: does *your* guest handle failures other than by giving
up on the device?  If so, sure, you need to have a sane error-reporting
strategy.

> The main reason I'm considering this stuff is for security reasons if
> the guest asks for something really illegal or crazy what should the
> expected behaviour of the host be? (at least secure I know that).

If the guest userspace can do it, don't exit.  If the kernel only, and
it's should have known better, abort is OK.

Sure that doesn't help much!
Rusty.