From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751284AbdBXLfQ (ORCPT ); Fri, 24 Feb 2017 06:35:16 -0500 Received: from mail-wr0-f177.google.com ([209.85.128.177]:34291 "EHLO mail-wr0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751151AbdBXLfM (ORCPT ); Fri, 24 Feb 2017 06:35:12 -0500 Date: Fri, 24 Feb 2017 12:34:07 +0100 From: Christoffer Dall To: Radim =?utf-8?B?S3LEjW3DocWZ?= Cc: David Hildenbrand , Christian Borntraeger , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Paolo Bonzini , Andrew Jones , Marc Zyngier , Cornelia Huck , James Hogan , Paul Mackerras , Christoffer Dall Subject: Re: [PATCH 4/5] KVM: add __kvm_request_needs_mb Message-ID: <20170224113407.GH1409@cbox> References: <20170216160449.13094-1-rkrcmar@redhat.com> <20170216160449.13094-5-rkrcmar@redhat.com> <865e0ec3-6918-5372-0c85-af2181209749@redhat.com> <35bdb122-3783-91b1-56b5-27bda4c923ad@de.ibm.com> <7f521412-1e8f-e519-1274-5db3ec7d36b8@redhat.com> <20170222151704.GA3856@potion> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170222151704.GA3856@potion> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 22, 2017 at 04:17:05PM +0100, Radim Krčmář wrote: > [Oops, the end of this thread got dragged into a mark-as-read spree ...] > > 2017-02-17 11:13+0100, David Hildenbrand: > >>> This is really complicated stuff, and the basic reason for it (if I > >>> remember correctly) is that s390x does reenable all interrupts when > >>> entering the sie (see kvm-s390.c:__vcpu_run()). So the fancy smp-based > >>> kicks don't work (as it is otherwise just racy), and if I remember > >>> correctly, SMP reschedule signals (s390x external calls) would be > >>> slower. (Christian, please correct me if I'm wrong) > >> > >> No the reason was that there are some requests that need to be handled > >> outside run SIE. For example one reason was the guest prefix page. > >> This must be mapped read/write ALL THE TIME when a guest is running, > >> otherwise the host might crash. So we have to exit SIE and make sure that > >> it does not reenter, therefore we use the RELOAD_MMU request from a notifier > >> that is called from page table functions, whenever memory management decides > >> to unmap/write protect (dirty pages tracking, reference tracking, page migration > >> or compaction...) > >> > >> SMP-based request wills kick out the guest, but for some thing like the > >> one above it will be too late. > > > > While what you said is 100% correct, I had something else in mind that > > hindered using vcpu_kick() and especially kvm_make_all_cpus_request(). > > And I remember that being related to how preemption and > > OUTSIDE_GUEST_MODE is handled. I think this boils down to what would > > have to be implemented in kvm_arch_vcpu_should_kick(). > > > > x86 can track the guest state using vcpu->mode, because they can be sure > > that the guest can't reschedule while in the critical guest entry/exit > > section. This is not true for s390x, as preemption is enabled. That's > > why vcpu->mode cannot be used in its current form to track if a VCPU is > > in/oustide/exiting guest mode. And kvm_make_all_cpus_request() currently > > relies on this setting. > > > > For now, calling vcpu_kick() on s390x will result in a BUG(). > > > > > > On s390x, there are 3 use cases I see for requests: > > > > 1. Remote requests that need a sync > > > > Make a request, wait until SIE has been left and make sure the request > > will be processed before re-entering the SIE. e.g. KVM_REQ_RELOAD_MMU > > notifier in mmu notifier you mentioned. Also KVM_REQ_DISABLE_IBS is a > > candidate. > > Btw. aren't those requests racy? > > void exit_sie(struct kvm_vcpu *vcpu) > { > atomic_or(CPUSTAT_STOP_INT, &vcpu->arch.sie_block->cpuflags); > > If you get stalled here and the target VCPU handles the request and > reenters SIE in the meantime, then you'll wait until its next exit. > (And miss an unbounded amount of exits in the worst case.) > > while (vcpu->arch.sie_block->prog0c & PROG_IN_SIE) > cpu_relax(); > } > > And out of curiosity -- how many cycles does this loop usually take? > > > 2. Remote requests that don't need a sync > > > > E.g. KVM_REQ_ENABLE_IBS doesn't strictly need it, while > > KVM_REQ_DISABLE_IBS does. > > A usual KVM request would kick the VCPU out of nested virt as well. > Shouldn't it be done for these as well? > > > 3. local requests > > > > E.g. KVM_REQ_TLB_FLUSH from kvm_s390_set_prefix() > > > > > > Of course, having a unified interface would be better. > > > > /* set the request and kick the CPU out of guest mode */ > > kvm_set_request(req, vcpu); > > > > /* set the request, kick the CPU out of guest mode, wait until guest > > mode has been left and make sure the request will be handled before > > reentering guest mode */ > > kvm_set_sync_request(req, vcpu); > > Sounds good, I'll also add > > kvm_set_self_request(req, vcpu); > > > Same maybe even for multiple VCPUs (as there are then ways to speed it > > up, e.g. first kick all, then wait for all) > > > > This would require arch specific callbacks to > > 1. pre announce the request (e.g. set PROG_REQUEST on s390x) > > 2. kick the cpu (e.g. CPUSTAT_STOP_INT and later > > kvm_s390_vsie_kick(vcpu) on s390x) > > 3. check if still executing the guest (e.g. PROG_IN_SIE on s390x) > > > > This would only make sense if there are other use cases for sync > > requests. At least I remember that Power also has a faster way for > > kicking VCPUs, not involving SMP rescheds. I can't judge if this is a > > s390x only thing and is better be left as is :) > > > > At least vcpu_kick() could be quite easily made to work on s390x. > > > > Radim, are there also other users that need something like sync requests? > > I think that ARM has a similar need when updating vgic, but relies on an > asumption that VCPUs are going to be out after kicking them with > kvm_make_all_cpus_request(). > (vgic_change_active_prepare in virt/kvm/arm/vgic/vgic-mmio.c) Yes, we have similar needs. We don't actually use the requests infrastructure in the moment (although I have plans to move to that following a long series of optimization patches I have stashed on my machine), but we reuse the kvm_make_all_cpus_request function to figure out which CPUs need a kick, and which don't, instead of duplicating this logic in the ARM tree. > > Having synchronous requests in a common API should probably wait for the > completion of the request, not just for the kick, which would make race > handling simpler. > > I'm not going to worry about them in this pass, though. > I'll be happy to help working on this or at least reviewing stuff to move our home-baked "stop all VCPUs and wait for something before entering the guest again" functionality to common functionality that uses requests. Thanks, -Christoffer