From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:45273) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLajt-00077U-Uk for qemu-devel@nongnu.org; Tue, 09 Oct 2012 10:24:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TLajo-0005rI-Vt for qemu-devel@nongnu.org; Tue, 09 Oct 2012 10:24:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:3121) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TLajo-0005r4-MY for qemu-devel@nongnu.org; Tue, 09 Oct 2012 10:24:48 -0400 Message-ID: <50743390.3@redhat.com> Date: Tue, 09 Oct 2012 16:24:16 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1348577763-12920-1-git-send-email-pbonzini@redhat.com> <20121008113932.GB16332@stefanha-thinkpad.redhat.com> <5072CE54.8020208@redhat.com> <20121009090811.GB13775@stefanha-thinkpad.redhat.com> <5073EDB3.3020804@redhat.com> <5073FE3A.1090903@redhat.com> <507401D8.8090203@redhat.com> <507405B5.4060108@redhat.com> <507410BD.6050901@redhat.com> <50741218.90000@redhat.com> <5074171A.2030904@redhat.com> <5074226A.3030907@redhat.com> <507424E5.4060705@redhat.com> <50742B97.2060608@redhat.com> In-Reply-To: <50742B97.2060608@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Block I/O outside the QEMU global mutex was "Re: [RFC PATCH 00/17] Support for multiple "AIO contexts"" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Kevin Wolf , Anthony Liguori , Ping Fan Liu , Stefan Hajnoczi , qemu-devel@nongnu.org, Jan Kiszka On 10/09/2012 03:50 PM, Paolo Bonzini wrote: > Il 09/10/2012 15:21, Avi Kivity ha scritto: >> On 10/09/2012 03:11 PM, Paolo Bonzini wrote: >>>> But no, it's actually impossible. Hotplug may be triggered from a vcpu >>>> thread, which clearly it can't be stopped. >>> >>> Hotplug should always be asynchronous (because that's how hardware >>> works), so it should always be possible to delegate the actual work to a >>> non-VCPU thread. Or not? >> >> The actual device deletion can happen from a different thread, as long >> as you isolate the device before. That's part of the garbage collector >> idea. >> >> vcpu thread: >> rcu_read_lock >> lookup >> dispatch >> mmio handler >> isolate >> queue(delete_work) >> rcu_read_unlock >> >> worker thread: >> process queue >> delete_work >> synchronize_rcu() / stop_machine() >> acquire qemu lock >> delete object >> drop qemu lock >> >> Compared to the garbage collector idea, this drops fined-grained locking >> for the qdev tree, a significant advantage. But it still suffers from >> dispatching inside the rcu critical section, which is something we want >> to avoid. > > But we are not Linux, and I think the tradeoffs are different for RCU in > Linux vs. QEMU. > > For CPUs in the kernel, running user code is just one way to get things > done; QEMU threads are much more event driven, and their whole purpose > is to either run the guest or sleep, until "something happens" (VCPU > exit or readable fd). In other words, QEMU threads should be able to > stay most of the time in KVM_RUN or select() for any workload (to some > approximation). If you're streaming data (the saturated iothread from that other thread) or live migrating or have a block job with fast storage, this isn't necessarily true. You could make sure each thread polls the rcu state periodically though. > Not just that: we do not need to minimize RCU critical sections, because > anyway we want to minimize the time spent in QEMU, period. > > So I believe that to some approximation, in QEMU we can completely > ignore everything else, and behave as if threads were always under > rcu_read_lock(), except if in KVM_RUN/select. KVM_RUN and select are > what Paul McKenney calls extended quiescent states, and in fact the > following mapping works: > > rcu_extended_quiesce_start() -> rcu_read_unlock(); > rcu_extended_quiesce_end() -> rcu_read_lock(); > rcu_read_lock/unlock() -> nop > > This in turn means that dispatching inside the RCU critical section is > not really bad. I believe you still cannot synchronize_rcu() while in an rcu critical section per the rcu documentation, even when lock/unlock map to nops. Of course we can violate that and it wouldn't know a thing, but I prefer to stick to the established pattern. -- error compiling committee.c: too many arguments to function