From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:43349) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TRlUZ-0002DW-Cg for qemu-devel@nongnu.org; Fri, 26 Oct 2012 11:06:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TRlUR-0005TJ-Cu for qemu-devel@nongnu.org; Fri, 26 Oct 2012 11:06:35 -0400 Received: from mx4-phx2.redhat.com ([209.132.183.25]:60994) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TRlUR-0005TD-4a for qemu-devel@nongnu.org; Fri, 26 Oct 2012 11:06:27 -0400 Date: Fri, 26 Oct 2012 11:05:21 -0400 (EDT) From: Paolo Bonzini Message-ID: <1448235187.2276012.1351263921287.JavaMail.root@redhat.com> In-Reply-To: <508968AB.2040704@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [patch v4 05/16] memory: introduce ref, unref interface for MemoryRegionOps List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Liu Ping Fan , Stefan Hajnoczi , Marcelo Tosatti , qemu-devel@nongnu.org, Anthony Liguori , Jan Kiszka ----- Messaggio originale ----- > Da: "Avi Kivity" > A: "Paolo Bonzini" > Cc: "Liu Ping Fan" , qemu-devel@nongnu.org, = "Anthony Liguori" , > "Marcelo Tosatti" , "Jan Kiszka" , "Stefan Hajnoczi" > > Inviato: Gioved=C3=AC, 25 ottobre 2012 18:28:27 > Oggetto: Re: [patch v4 05/16] memory: introduce ref,unref interface for M= emoryRegionOps >=20 > On 10/24/2012 09:29 AM, Paolo Bonzini wrote: > > Il 23/10/2012 18:09, Avi Kivity ha scritto: > >>> But our interfaces had better support asynchronicity, and indeed > >>> they > >>> do: after you write to the "eject" register, the "up" will show > >>> the > >>> device as present until after destroy is done. This can be > >>> changed to > >>> show the device as present only until after step 4 is done. > >>=20 > >> Let's say we want to eject the hotplug hardware itself (just as an > >> example). With refcounts, the callback that updates "up" will hold > >> on to to it via refcounts. With stop_machine(), you need to cancel > >> that callback, or wait for it somehow, or it can arrive after the > >> stop_machine() and bite you. > >=20 > > The callback that updates "up" is for the parent of the hotplug > > hardware. There is nothing that has to be updated in the hotplug > > hardware itself. >=20 > I meant, as an unrealistic example, hot-unplugging the bridge itself. > So we have a callback that updates information in the bridge (up > register state) being called asynchronously. >=20 > A more realistic example would be hot-unplug of an HBA, then the block > layer callback comes back to update the device. So stop_machine() > would need to cancel all I/O and wait for I/O that cannot be cancelled. Cancellation+wait would be triggered by isolate (4a) and it would run outside stop_machine(). We know that stop_machine() will eventually run because the guest cannot place more requests for the devices to process. At this point we're here: > > 4a. close all backends (also cancel or complete all pending I/O) >=20 > ^ long latency >=20 but none of this is done in stop_machine(). Once cancellation/wait finishes, the HBA gives a green-light to the parent, which proceeds as follows: > > 4b. notify parent that we're done > > 4ba. parent removes device from its bus > > 4bb. parent notifies guest > > 4bc. parent schedules stop_machine(qdev_free(child)) > > 5. a bottom half calls stop_machine(qdev_free(child)) All we're doing in stop_machine() is really calling the destructor, which---in an isolate-enabled device---only includes calls to qemu_del_timer, drive_put_ref, memory_region_destroy and the like. > Maybe my worry about long stop_machine latencies is premature. > Everyone in the kernel hates it, but the kernel scales a lot more > than qemu and is in a much better place wrt threading. stop_machine may indeed require (or at least warmly suggest) a conversion to isolate of storage devices, in order to reduce the latency of the destructor. We do not have that many though (the IDE and SCSI buses, and virtio-blk). Paolo