From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=38226 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PxYcp-0002Io-Kv for qemu-devel@nongnu.org; Thu, 10 Mar 2011 00:41:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PxYco-0002mW-2T for qemu-devel@nongnu.org; Thu, 10 Mar 2011 00:41:27 -0500 Received: from [222.73.24.84] (port=51459 helo=song.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PxYcn-0002l1-J8 for qemu-devel@nongnu.org; Thu, 10 Mar 2011 00:41:26 -0500 Message-ID: <4D78617A.8010402@cn.fujitsu.com> Date: Thu, 10 Mar 2011 13:28:26 +0800 From: Wen Congyang MIME-Version: 1.0 Subject: Re: [Qemu-devel] [PATCH] `qdev_free` when unplug a pci device References: <1298396180-23734-1-git-send-email-wdauchy@gmail.com> <20110223025001.GC19727@valinux.co.jp> <4D6B0DF8.5000407@cn.fujitsu.com> <20110309040814.GM23238@us.ibm.com> <4D770A51.6050509@cn.fujitsu.com> <20110309061230.GP23238@us.ibm.com> <4D7729E5.8010600@cn.fujitsu.com> <20110310043123.GG23238@us.ibm.com> In-Reply-To: <20110310043123.GG23238@us.ibm.com> Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=ISO-8859-1 List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ryan Harper Cc: qemu-devel@nongnu.org, Isaku Yamahata , Gerd Hoffmann , William Dauchy , Markus Armbruster At 03/10/2011 12:31 PM, Ryan Harper Write: > * Wen Congyang [2011-03-09 01:21]: >> At 03/09/2011 02:12 PM, Ryan Harper Write: >>> * Wen Congyang [2011-03-08 23:09]: >>>> At 03/09/2011 12:08 PM, Ryan Harper Write: >>>>> * Wen Congyang [2011-02-27 20:56]: >>>>>> Hi Markus Armbruster >>>>>> >>>>>> At 02/23/2011 04:30 PM, Markus Armbruster Write: >>>>>>> Isaku Yamahata writes: >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I don't think this patch is correct. Let me explain. >>>>>>> >>>>>>> Device hot unplug is *not* guaranteed to succeed. >>>>>>> >>>>>>> For some buses, such as USB, it always succeeds immediately, i.e. when >>>>>>> the device_del monitor command finishes, the device is gone. Live is >>>>>>> good. >>>>>>> >>>>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance. It >>>>>>> doesn't wait for the dance to complete. Why? The dance can take an >>>>>>> unpredictable amount of time, including forever. >>>>>>> >>>>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI >>>>>>> slot, and the unplug has not yet completed (race condition), or it >>>>>>> failed. Yes, Virginia, PCI hotplug *can* fail. >>>>>>> >>>>>>> When unplug succeeds, the qdev is automatically destroyed. >>>>>>> pciej_write() does that for PIIX4. Looks like pcie_cap_slot_event() >>>>>>> does it for PCIE. >>>>>> >>>>>> I got a similar problem. When I unplug a pci device by hand, it works >>>>>> as expected, and I can hotplug it again. But when I use a srcipt to >>>>>> do the same thing, sometimes it failed. I think I may find another bug. >>>>>> >>>>>> Steps to reproduce this bug: >>>>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name >>>>>> #! /bin/bash >>>>>> >>>>>> while true; do >>>>>> virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000 >>>>>> if [[ $? -ne 0 ]]; then >>>>>> break >>>>>> fi >>>>>> virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7 >>>>>> if [[ $? -ne 0 ]]; then >>>>>> break >>>>>> fi >>>>>> sleep 5 >>>>> >>>>> How do you know that the guest has responded at this point before you >>>>> attempt to attach again at the top of the loop. Any attach/detach >>>>> requires the guest to respond to the request and it may not respond at >>>>> all. >>>> >>>> When I attach/detach interface by hand, it works fine: I can see the new interface >>>> when I attach it, and it disapears when I detached it. >>> >>> The point is that since the attach and detach require guest >>> participation, this interface isn't reliable. You have a sleep 5 in >>> your loop, hoping to wait long enough for the guest to respond, but >>> after a number of iterations in your loop it fails, you can bump the >>> sleep to to 3600 seconds and the guest *still* might not respond... >> >> We use sci interrupt to tell the guest that a device has been attached/detached. >> But the sci interrupt is *lost* in qemu, so the guest does not know a device has >> been attached/detached, and does not respond it. >> >> If the sci interrupt is not lost, the guest can respond it. > > *can* is the important word. Even if the interrupt isn;t lost, you have > no way to guarantee that the guest will respond at all. That's not to > say there isn't a bug around the lost interrupt; but rather a more > general point about hotplug's current architecture. I don't know whether a real hardware has the same behavior. Should we make sure the sci interrupt not lost? > >> >>> >>> >