qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Wen Congyang <wency@cn.fujitsu.com>
To: Isaku Yamahata <yamahata@valinux.co.jp>
Cc: Gerd Hoffmann <kraxel@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	William Dauchy <wdauchy@gmail.com>,
	qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] `qdev_free` when unplug a pci device
Date: Tue, 01 Mar 2011 15:32:08 +0800	[thread overview]
Message-ID: <4D6CA0F8.8040905@cn.fujitsu.com> (raw)
In-Reply-To: <20110301071358.GJ14584@valinux.co.jp>

At 03/01/2011 03:13 PM, Isaku Yamahata Write:
> On Tue, Mar 01, 2011 at 02:58:44PM +0800, Wen Congyang wrote:
>> At 03/01/2011 12:11 PM, Isaku Yamahata Write:
>>> Hi.
>>>
>>> I don't suppose that just introducing pending bits solve the issue.
>>> Your test only use single hot plug/unplug. How about mixing of multiple
>>> hot plug/unplug with different slots.
>>
>> The qemu uses the same thread to deal with monitor command and I/O request
>> from guest OS. So I think mixing of multiple hot plug/unplug with different
>> slots can also work fine.
>>
>>> Zeroing up/down on piix4_device_hotplug() is the culprit.
>>>
>>> State machine of (up, down) would be needed.
>>> (up, down) of each slots should be changed following
>>> something like
>>
>> Why we need this feature? We access s->pci0_status only in main thread and
>> do not accesss it when we deal with signal, so there is no competition to
>> access it.
> 
> We are talking about losing interrupt and events, aren't we?
> Not race caused by threads.

I am sorry, I do not understand what you are talk about.

> The issue is, Qemu injected sci interrupt into guest,
> but before the guest completes to handled it,
> users/qemu can start to inject the next sci event triggered by hot plug/unplug.
> Thus qemu loses sci interrupt and up/down status.

Yes, you are right. But I still do not understand why introducing pending bits
can not solve the issue?

Thanks
Wen Congyang

> 
> thanks
>>
>>>
>>> - on power-on  (0, 0)
>>> - on hot plug  (0, 0) -> (1, 0)
>>>                if other combination -> error
>>> - on hot unpug (1, 0) or (0, 0) -> (0, 1)
>>>                (0, 0) is for cold plugged devices.
>>>                (1, 0) is for hot plugged devices.
>>>                if other combination -> error
>>> - on pciej_write(write on PCI_EJ_BASE)
>>>                (0, 1) -> (0, 0) and qdev_free()
>>>                if other combination -> ignore
>>>
>>> When enabling sci, those bits are checked and raise sci if necessary.
>>>
>>> Any comments on this?
>>> Anyway the current pci hotplug-related commands are somewhat broken,
>>> and needs clean up with multifunction hotplug support which is
>>> on my todo list.
>>>
>>> thanks
>>>
>>> On Mon, Feb 28, 2011 at 10:52:40AM +0800, Wen Congyang wrote:
>>>> Hi Markus Armbruster
>>>>
>>>> At 02/23/2011 04:30 PM, Markus Armbruster Write:
>>>>> Isaku Yamahata <yamahata@valinux.co.jp> writes:
>>>>>
>>>>
>>>> <snip>
>>>>
>>>>>
>>>>> I don't think this patch is correct.  Let me explain.
>>>>>
>>>>> Device hot unplug is *not* guaranteed to succeed.
>>>>>
>>>>> For some buses, such as USB, it always succeeds immediately, i.e. when
>>>>> the device_del monitor command finishes, the device is gone.  Live is
>>>>> good.
>>>>>
>>>>> But for PCI, device_del merely initiates the ACPI unplug rain dance.  It
>>>>> doesn't wait for the dance to complete.  Why?  The dance can take an
>>>>> unpredictable amount of time, including forever.
>>>>>
>>>>> Problem: Subsequent device_add can fail if it reuses the qdev ID or PCI
>>>>> slot, and the unplug has not yet completed (race condition), or it
>>>>> failed.  Yes, Virginia, PCI hotplug *can* fail.
>>>>>
>>>>> When unplug succeeds, the qdev is automatically destroyed.
>>>>> pciej_write() does that for PIIX4.  Looks like pcie_cap_slot_event()
>>>>> does it for PCIE.
>>>>
>>>> I got a similar problem.  When I unplug a pci device by hand, it works
>>>> as expected, and I can hotplug it again. But when I use a srcipt to
>>>> do the same thing, sometimes it failed. I think I may find another bug.
>>>>
>>>> Steps to reproduce this bug:
>>>> 1. cat ./test-e1000.sh # RHEL6RC is domain name
>>>>    #! /bin/bash
>>>>
>>>>    while true; do
>>>>            virsh attach-interface RHEL6RC network default --mac 52:54:00:1f:db:c7 --model e1000
>>>>            if [[ $? -ne 0 ]]; then
>>>>                    break
>>>>            fi
>>>>            virsh detach-interface RHEL6RC network --mac 52:54:00:1f:db:c7
>>>>            if [[ $? -ne 0 ]]; then
>>>>                    break
>>>>            fi
>>>>            sleep 5
>>>>    done
>>>>
>>>> 2. ./test-e1000.sh
>>>>    Interface attached successfully
>>>>
>>>>    Interface detached successfully
>>>>
>>>>    Interface attached successfully
>>>>
>>>>    Interface detached successfully
>>>>
>>>>    error: Failed to attach interface
>>>>    error: operation failed: adding e1000,netdev=hostnet1,id=net1,mac=52:54:00:1f:db:c7,bus=pci.0,addr=0x8 device failed: Duplicate ID 'net1' for device
>>>>
>>>>
>>>> I use ftrace to trace whether sci interrupt is passed to guest OS, here is log:
>>>> # cat trace | grep 'irq=9'
>>>>           <idle>-0     [000] 118342.634772: irq_handler_entry: irq=9 name=acpi
>>>>           <idle>-0     [000] 118342.634831: irq_handler_exit: irq=9 ret=handled
>>>>           <idle>-0     [000] 118342.693216: irq_handler_entry: irq=9 name=acpi
>>>>           <idle>-0     [000] 118342.693254: irq_handler_exit: irq=9 ret=handled
>>>>           <idle>-0     [000] 118347.737689: irq_handler_entry: irq=9 name=acpi
>>>>           <idle>-0     [000] 118347.737738: irq_handler_exit: irq=9 ret=handled
>>>> According to the log, we only receive 3 sci interrupt, and one is lost.
>>>>
>>>> I enable piix4's debug and 1 line printf into piix4_device_hotplug:
>>>>     printf("slot: %d, up: %d, down: %d, state: %d\n", slot, s->pci0_status.up, s->pci0_status.down, state);
>>>> here is the log:
>>>> ========
>>>> PM: mapping to 0xb000
>>>> PM readw port=0x0004 val=0x0000
>>>> ...
>>>> gpe write afe2 <== 0
>>>> gpe write afe0 <== 255
>>>> gpe write afe3 <== 0
>>>> gpe write afe1 <== 255
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0000
>>>> gpe read afe0 == 0
>>>> gpe read afe2 == 0
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == 0
>>>> PM writew port=0x0000 val=0x0020
>>>> PM readw port=0x0002 val=0x0000
>>>> PM writew port=0x0002 val=0x0020
>>>> PM readw port=0x0002 val=0x0020
>>>> gpe write afe2 <== 255
>>>> gpe write afe3 <== 255
>>>> ...
>>>> slot: 6, up: 0, down: 0, state: 1   <==== we attach interface the first time
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0120
>>>> gpe read afe0 == 2
>>>> gpe read afe2 == ff
>>>> gpe read afe2 == ff
>>>> gpe write afe2 <== 253
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == ff
>>>> pcihotplug read ae00 == 40
>>>> pcihotplug read ae04 == 0
>>>> ...
>>>> gpe write afe0 <== 2
>>>> gpe write afe2 <== 255
>>>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0120
>>>> gpe read afe0 == 2
>>>> gpe read afe2 == ff
>>>> gpe read afe2 == ff
>>>> gpe write afe2 <== 253
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == ff
>>>> pcihotplug read ae00 == 0
>>>> pcihotplug read ae04 == 40
>>>> ...
>>>> pciej write ae08 <== 64		<=== we will call qdev_free()
>>>> pciej write ae08 <== 64
>>>> gpe write afe0 <== 2
>>>> gpe write afe2 <== 255
>>>> slot: 7, up: 0, down: 64, state: 1  <=== we attach interface the second time.
>>>> PM readw port=0x0000 val=0x0001
>>>> PM readw port=0x0002 val=0x0120
>>>> gpe read afe0 == 2
>>>> gpe read afe2 == ff
>>>> gpe read afe2 == ff
>>>> gpe write afe2 <== 253
>>>> gpe read afe1 == 0
>>>> gpe read afe3 == ff
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> pcihotplug read ae00 == 80
>>>> pcihotplug read ae04 == 0
>>>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
>>>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
>>>> gpe write afe2 <== 255
>>>> ===========
>>>>
>>>> Here is short log, and show the different between the first time and second time:
>>>> ===========
>>>> gpe write afe0 <== 2
>>>> gpe write afe2 <== 255
>>>> slot: 6, up: 64, down: 0, state: 0  <===== we detach interface the first time
>>>> ....
>>>> slot: 7, up: 128, down: 0, state: 0  <=== we detach interface the second time
>>>> gpe write afe0 <== 2		<=== write 2 to afe0 will cause sci interupt to be lost
>>>> gpe write afe2 <== 255
>>>>
>>>> ===========
>>>>
>>>> Does this behavior is a normal behavior?
>>>>
>>>> The following patch can fix this problem.
>>>>
>>>> From 3c149575b52261b8da9a73d5c4ebdfedce9866c1 Mon Sep 17 00:00:00 2001
>>>> From: Wen Congyang <wency@cn.fujitsu.com>
>>>> Date: Mon, 28 Feb 2011 10:33:45 +0800
>>>> Subject: [PATCH] pend hotplug event until last event is handled by guest OS
>>>>
>>>> ---
>>>>  hw/acpi_piix4.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
>>>>  1 files changed, 44 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
>>>> index b5a2762..4ff3475 100644
>>>> --- a/hw/acpi_piix4.c
>>>> +++ b/hw/acpi_piix4.c
>>>> @@ -74,6 +74,7 @@ typedef struct PIIX4PMState {
>>>>      /* for pci hotplug */
>>>>      struct gpe_regs gpe;
>>>>      struct pci_status pci0_status;
>>>> +    struct pci_status pci0_status_pending;
>>>>      uint32_t pci0_hotplug_enable;
>>>>  } PIIX4PMState;
>>>>  
>>>> @@ -518,6 +519,19 @@ static void gpe_reset_val(uint16_t *cur, int addr, uint32_t val)
>>>>      *cur = (*cur & (0xff << (8 - shift))) | (x1 << shift);
>>>>  }
>>>>  
>>>> +static void raise_pending_hotplug(PIIX4PMState *s)
>>>> +{
>>>> +    if (s->pci0_status_pending.up || s->pci0_status_pending.down) {
>>>> +        s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>>>> +        s->pci0_status.up = s->pci0_status_pending.up;
>>>> +        s->pci0_status.down = s->pci0_status_pending.down;
>>>> +        s->pci0_status_pending.up = 0;
>>>> +        s->pci0_status_pending.down = 0;
>>>> +
>>>> +        pm_update_sci(s);
>>>> +    }
>>>> +}
>>>> +
>>>>  static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>>>>  {
>>>>      PIIX4PMState *s = opaque;
>>>> @@ -539,6 +553,11 @@ static void gpe_writeb(void *opaque, uint32_t addr, uint32_t val)
>>>>      pm_update_sci(s);
>>>>  
>>>>      PIIX4_DPRINTF("gpe write %x <== %d\n", addr, val);
>>>> +
>>>> +    if (!(g->sts & PIIX4_PCI_HOTPLUG_STATUS) && g->en & PIIX4_PCI_HOTPLUG_STATUS) {
>>>> +        /* check and reraise the pending hotplug event */
>>>> +        raise_pending_hotplug(s);
>>>> +    }
>>>>  }
>>>>  
>>>>  static uint32_t pcihotplug_read(void *opaque, uint32_t addr)
>>>> @@ -639,12 +658,22 @@ static void enable_device(PIIX4PMState *s, int slot)
>>>>      s->pci0_status.up |= (1 << slot);
>>>>  }
>>>>  
>>>> +static void pend_enable_device(PIIX4PMState *s, int slot)
>>>> +{
>>>> +    s->pci0_status_pending.up |= (1 << slot);
>>>> +}
>>>> +
>>>>  static void disable_device(PIIX4PMState *s, int slot)
>>>>  {
>>>>      s->gpe.sts |= PIIX4_PCI_HOTPLUG_STATUS;
>>>>      s->pci0_status.down |= (1 << slot);
>>>>  }
>>>>  
>>>> +static void pend_disable_device(PIIX4PMState *s, int slot)
>>>> +{
>>>> +    s->pci0_status_pending.down |= (1 << slot);
>>>> +}
>>>> +
>>>>  static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>>>  				PCIHotplugState state)
>>>>  {
>>>> @@ -659,15 +688,23 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
>>>>          return 0;
>>>>      }
>>>>  
>>>> -    s->pci0_status.up = 0;
>>>> -    s->pci0_status.down = 0;
>>>> -    if (state == PCI_HOTPLUG_ENABLED) {
>>>> -        enable_device(s, slot);
>>>> +    if (s->gpe.sts & PIIX4_PCI_HOTPLUG_STATUS) {
>>>> +        if (state == PCI_HOTPLUG_ENABLED) {
>>>> +            pend_enable_device(s, slot);
>>>> +        } else {
>>>> +            pend_disable_device(s, slot);
>>>> +        }
>>>>      } else {
>>>> -        disable_device(s, slot);
>>>> -    }
>>>> +        s->pci0_status.up = 0;
>>>> +        s->pci0_status.down = 0;
>>>> +        if (state == PCI_HOTPLUG_ENABLED) {
>>>> +            enable_device(s, slot);
>>>> +        } else {
>>>> +            disable_device(s, slot);
>>>> +        }
>>>>  
>>>> -    pm_update_sci(s);
>>>> +        pm_update_sci(s);
>>>> +    }
>>>>  
>>>>      return 0;
>>>>  }
>>>> -- 
>>>> 1.7.1
>>>>
>>>>>
>>>>> Your patch adds a *second* qdev_free().  No good.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
> 

  reply	other threads:[~2011-03-01  7:33 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-22 17:36 [Qemu-devel] [PATCH] `qdev_free` when unplug a pci device William Dauchy
2011-02-23  2:50 ` Isaku Yamahata
2011-02-23  8:30   ` Markus Armbruster
2011-02-23  9:32     ` William Dauchy
2011-02-28  2:52     ` Wen Congyang
2011-03-01  4:11       ` Isaku Yamahata
2011-03-01  6:58         ` Wen Congyang
2011-03-01  7:13           ` Isaku Yamahata
2011-03-01  7:32             ` Wen Congyang [this message]
2011-03-01  9:49               ` Isaku Yamahata
2011-03-09  4:08       ` Ryan Harper
2011-03-09  5:04         ` Wen Congyang
2011-03-09  6:12           ` Ryan Harper
2011-03-09  7:19             ` Wen Congyang
2011-03-10  4:31               ` Ryan Harper
2011-03-10  5:28                 ` Wen Congyang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D6CA0F8.8040905@cn.fujitsu.com \
    --to=wency@cn.fujitsu.com \
    --cc=armbru@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wdauchy@gmail.com \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).