From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:58909) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YIcKo-0008VC-Hd for qemu-devel@nongnu.org; Tue, 03 Feb 2015 07:12:03 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YIcKl-0003Fu-B1 for qemu-devel@nongnu.org; Tue, 03 Feb 2015 07:12:02 -0500 Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:37253) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YIcKl-0003FX-1w for qemu-devel@nongnu.org; Tue, 03 Feb 2015 07:11:59 -0500 Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 3 Feb 2015 12:11:56 -0000 Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 39FF91B08070 for ; Tue, 3 Feb 2015 12:12:00 +0000 (GMT) Received: from d06av09.portsmouth.uk.ibm.com (d06av09.portsmouth.uk.ibm.com [9.149.37.250]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t13CBroS44564530 for ; Tue, 3 Feb 2015 12:11:53 GMT Received: from d06av09.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t13CBqqb027435 for ; Tue, 3 Feb 2015 05:11:53 -0700 From: Thomas Huth Date: Tue, 3 Feb 2015 13:11:37 +0100 Message-Id: <1422965498-11500-1-git-send-email-thuth@linux.vnet.ibm.com> Subject: [Qemu-devel] [PATCH RFC 0/1] KVM: ioctl for reading/writing guest memory List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: kvm@vger.kernel.org, qemu-devel@nongnu.org Cc: cornelia.huck@de.ibm.com, pbonzini@redhat.com, Thomas Huth , agraf@suse.de, borntraeger@de.ibm.com tl;dr: This patch adds a new ioctl to KVM on s390x for reading and writing from/to virtual guest memory, to take account of the so-called IPTE-lock on s390x (a locking mechanism for the host to walk MMU tables of the guest). Long story: Certain instruction interception handlers in QEMU have to access the memory of the guest, either to retrieve additional paramaters/data or to supply results to the guest. On s390x, some of them (e.g. MSCH, SSCH, STSCH, ...) are specified to use logical (i.e. virtual) addresses in memory, i.e. the addresses are subject to MMU translation. The current handlers in target-s390x/ioinst.c just work "by accident" since the Linux kernel on s390x uses a 1:1 MMU mapping for kernel memory, but for correct behaviour we have to do a MMU page table walk in these handlers first. Now on s390x, there's another specialty for the case the host has to walk the MMU tables of the guest: While doing the page table walk (or while accessing the memory of the guest in bigger, non-atomic chunks on multiple pages), there is a small chance that another CPU might zap or change the MMU mappings inbetween, so in that case an unexpected/undefined behaviour might occur. To avoid such problems, the SIE facility features a locking mechanism, the so called IPTE-lock, which prevents other virtual CPUs from issuing the IPTE (invalidate page table entry) or similar instructions. When the lock is being held, these other instructions are intercepted, so that the execution of the instructions can be delayed until the page table walk / memory operation finished on the locking CPU. The kernel part of KVM on s390x already uses this locking mechanism for the interception handlers in the kernel (e.g. during the read_guest() and write_guest() functions). For proper MMU page table walk support in QEMU, the IPTE-lock has now somehow to be provided to the userspace, too. However, providing this lock directly to the userspace would be quite ugly, since we then need to deal with a lot of cumbersome conditions (how should the kernel behave if userspace takes the lock for too long or forgets to free it again etc.). Additionally, there is also another specialty of s390x pending - proper handling of the so-called storage keys when accessing the guest memory - which is also done best in the kernel space instead of user space (I can elaborate more on that topic on request). So I decided to introduce a simple ioctl for reading and writing from/to guest memory instead of exporting the lock itself to userspace. The userspace (QEMU) then can simply call this ioctl when it wants to read or write from/to virtual guest memory. Then kernel then takes the IPTE-lock, walks the MMU table of the guest to find out the physical address that corresponds to the virtual address, copies the requested amount of bytes from the userspace buffer to guest memory or the other way round, and finally frees the IPTE-lock again. Does that sound like a viable solution (IMHO it does ;-))? Or should I maybe try to pursue another approach? Thomas Huth (1): KVM: s390: Add MEMOP ioctls for reading/writing guest memory Documentation/virtual/kvm/api.txt | 44 +++++++++++++++++++++++++ arch/s390/kvm/gaccess.c | 22 +++++++++++++ arch/s390/kvm/gaccess.h | 2 + arch/s390/kvm/kvm-s390.c | 63 +++++++++++++++++++++++++++++++++++++ include/uapi/linux/kvm.h | 21 ++++++++++++ 5 files changed, 152 insertions(+), 0 deletions(-)