From mboxrd@z Thu Jan 1 00:00:00 1970 From: catalin.marinas@arm.com (Catalin Marinas) Date: Tue, 3 Mar 2015 10:57:05 +0000 Subject: EDAC on arm64 In-Reply-To: <2550695.nbkPi0RDF3@wuerfel> References: <54F11133.70909@redhat.com> <2937202.iu6lrkO1gm@wuerfel> <20150302222502.GA13277@MBP.local> <2550695.nbkPi0RDF3@wuerfel> Message-ID: <20150303105705.GF28951@e104818-lin.cambridge.arm.com> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Tue, Mar 03, 2015 at 10:23:06AM +0100, Arnd Bergmann wrote: > On Monday 02 March 2015 22:25:16 Catalin Marinas wrote: > > On Mon, Mar 02, 2015 at 08:40:16PM +0100, Arnd Bergmann wrote: > > > On Monday 02 March 2015 14:58:41 Catalin Marinas wrote: > > > > On Mon, Mar 02, 2015 at 10:59:32AM +0000, Will Deacon wrote: > > > > > On Sat, Feb 28, 2015 at 12:52:03AM +0000, Jon Masters wrote: > > > > > > Have you considered reviving the patch you posted previously for EDAC > > > > > > support (the atomic_scrub read/write test piece dependency)? > > > > > > > > > > > > http://lists.infradead.org/pipermail/linux-arm-kernel/2014-April/249039.html > > > > > > > > > > Well, we'd need a way to handle the non-coherent DMA case and it's really > > > > > not clear how to fix that. > > > > > > > > I agree, that's where the discussions stopped. Basically the EDAC memory > > > > writing is racy with any non-cacheable memory accesses (by CPU or > > > > device). The only way we could safely use this is only if all the > > > > devices are coherent *and* KVM is disabled. With KVM, guests may access > > > > the memory uncached, so we hit the same problem. > > > > > > Is this a setting of the host, or does the guest always have this capability? > > > > The guest can always make it stricter than what the host set in stage 2 > > (i.e. from Normal Cacheable -> NonCacheable -> Device) but never in the > > other direction. > > Do you have an idea what the purpose of this is? Why would a guest > even want to mark pages as noncachable that are mapped into the > host as cachable and that might have dirty cache lines? The stage 1 / stage 2 attributes combination works such that the hypervisor can impose more stricter attributes or none at all (in which case it is up to the guest to decide what it needs). So for example devices mapped into the guest address space (e.g. the GIC) are marked as Device memory in stage 2 so that the guest could never map them as Normal Cacheable memory (with some bad consequences). The other direction is that the guest may want to create a stricter mapping than what the host wants. A possible reason is some frame buffer or anything else where the guest assumes that by creating a non-cacheable mapping it won't need to do cache maintenance. That's why when KVM maps a page into the guest address space, it flushes the cache so there are no dirty lines. Since the host would not write to such page again, it won't dirty the cache (and if the guest does, it needs to deal with it itself). There are some scenarios where this does not work well: a virtual frame buffer emulated by Qemu where the guest thinks it is non-cacheable and Qemu maps it as cacheable. The only sane solution here is to tell the guest that the (virtual) frame buffer device is DMA coherent and that it should use a cacheable mapping. But I don't think the host should somehow (transparently) upgrade the cacheability that the guest thinks it has. > > > If a guest can influence the caching of a page it has access to, I can > > > imagine all sorts of security problems with malicious guests regardless > > > of EDAC. > > > > Not as long as the host is aware of this. Basically it needs to flush > > the cache on a page when it is mapped into the guest address space (IPA) > > and flush it again when reading a page from guest. > > You have to flush and invalidate the cache line, but of course nobody > wants to do that because it totally destroys performance. There are two cases when the cache needs flushing: (1) when a page is mapped into the guest address space (done lazily via the page faulting mechanism) and (2) when the host reads a page already mapped in the guest address space (e.g. swapping out). They indeed take some time but none of them are on a critical path. (and maybe at some point we'll get fully transparent caches on ARM as well, so we don't have to worry about cache maintenance) -- Catalin