From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Jan Beulich" Subject: Re: [PATCH 0/5] x86: properly propagate errors to hypercall callee Date: Wed, 09 Mar 2011 14:20:14 +0000 Message-ID: <4D779AAE02000078000358FD@vpn.id2.novell.com> References: <4D7770D802000078000357C9@vpn.id2.novell.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Keir Fraser Cc: "xen-devel@lists.xensource.com" List-Id: xen-devel@lists.xenproject.org >>> On 09.03.11 at 14:44, Keir Fraser wrote: > I wonder what the scope of the problem really is. Mostly this cacheattr > stuff applies to memory allocated by a graphics driver I suppose, and > probably at boot time in dom0. I wonder how the bug was observed during = dom0 > boot given that Xen chooses a default dom0 memory allocation that leaves > enough memory free for a decent-sized dom0 SWIOTLB plus some extra slack = on > top of that. Any idea how the Xen memory pool happened to be entirely = empty > at the time radeon drm driver caused the superpage shattering to occur? This isn't a boot time problem, it's a run time one (and was reported to us as such). The driver does allocations (and cache attribute changes) based on user mode (X) demands. > I'm not against turning the host crash into a guest crash (which I think = is > typically what is going to happen, although I suppose at least some = Linux > driver-related mapping/remapping functions can handle failure) as this = might > be an improvement when starting up non-dom0 driver domains for example. = But I'm afraid that's not only a question of driver domains doing such. With the addition of !is_hvm_domain() to l1_disallow_mask(), any page in a HVM guest that its kernel chooses to make non-WB can trigger the BUG() currently. And, noting just now, there's then a potential collision between the kernel and tools/stubdom (qemu-dm) mapping the page - the latter, mapping a page WB, would undo what the guest itself may have requested earlier - imo the cache attr adjustment shouldn't be done if it's not the owner of the page that's doing the mapping (and quite probably the cache attr should be inherited by the non- owner, though that raises the problem of updating mappings that the non-owner may have established before the owner assigned the non-default attr). > I think we should consider punting a resource error up to the guest as a > very bad thing and still WARN_ON or otherwise log the situation to Xen's = own > console. Hmm, possibly. Jan