qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
	gleb@redhat.com, qemu-devel@nongnu.org,
	Isaku Yamahata <yamahata@valinux.co.jp>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Aurelien Jarno <aurelien@aurel32.net>
Subject: Re: [Qemu-devel] [PATCHv2] piix: fix up/down races + document
Date: Fri, 30 Mar 2012 15:19:33 +0300	[thread overview]
Message-ID: <20120330121933.GA9120@redhat.com> (raw)
In-Reply-To: <1333050751.29235.6.camel@ul30vt>

On Thu, Mar 29, 2012 at 01:52:31PM -0600, Alex Williamson wrote:
> On Thu, 2012-03-29 at 21:38 +0200, Michael S. Tsirkin wrote:
> > On Thu, Mar 29, 2012 at 10:53:38AM -0600, Alex Williamson wrote:
> > > On Thu, 2012-03-29 at 14:51 +0200, Michael S. Tsirkin wrote:
> > > > piix acpi interface suffers from the following 2 issues:
> > > > 
> > > > 1.
> > > > - delete device a
> > > > - quickly add device b in another slot
> > > > 
> > > > if we do this before guest reads the down register,
> > > > the down event is discarded and device will never
> > > > be deleted.
> > > > 
> > > > 2.
> > > > - delete device a
> > > > - quickly reset before guest can respond
> > > > 
> > > > interrupt is reset and guest will never eject the device.
> > > > 
> > > > To fix this, we implement two changes:
> > > > 
> > > > 1. Add two new registers:
> > > > CLR_UP
> > > > CLR_DOWN
> > > > bios will now write to these the value it read from UP/DOWN
> > > > 
> > > > 2. on reset, remove all devices which have DOWN bit set
> > > > 
> > > > For compatibility with old guests, we also clear
> > > > the DOWN bit on write to EJ0 for a device.
> > > > 
> > > > This patch also adds more detailed documentation
> > > > for the ACPI interface, including the semantics
> > > > of both old and new registers.
> > > > 
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > > 
> > > > Changes from v1:
> > > > - tweaked a comment about clearing down register on reset
> > > > - documentation update
> > > > 
> > > >  docs/specs/acpi_pci_hotplug.txt |   68 +++++++++++++++++++++++++++++++----
> > > >  hw/acpi_piix4.c                 |   75 +++++++++++++++++++++++++++++++-------
> > > >  2 files changed, 121 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/docs/specs/acpi_pci_hotplug.txt b/docs/specs/acpi_pci_hotplug.txt
> > > > index f0f74a7..a8398f9 100644
> > > > --- a/docs/specs/acpi_pci_hotplug.txt
> > > > +++ b/docs/specs/acpi_pci_hotplug.txt
> > > > @@ -7,31 +7,83 @@ describes the interface between QEMU and the ACPI BIOS.
> > > >  ACPI GPE block (IO ports 0xafe0-0xafe3, byte access):
> > > >  -----------------------------------------
> > > >  
> > > > -Generic ACPI GPE block. Bit 1 (GPE.1) used to notify PCI hotplug/eject
> > > > +Generic ACPI GPE block. Bit 1 (GPE.1) is used to notify PCI hotplug/hotunplug
> > > >  event to ACPI BIOS, via SCI interrupt.
> > > >  
> > > > -PCI slot injection notification pending (IO port 0xae00-0xae03, 4-byte access):
> > > > +Semantics: this event occurs each time a bit in either Pending PCI slot
> > > > +injection notification or Pending PCI slot removal notification registers
> > > > +changes status from 0 to 1.
> > > > +
> > > > +Pending PCI slot injection notification (IO port 0xae00-0xae03, 4-byte access):
> > > >  ---------------------------------------------------------------
> > > > -Slot injection notification pending. One bit per slot.
> > > > +Slot injection notification is pending. One bit per slot.
> > > > +Guests should not write this register, in practice the value
> > > > +written overwrites the register.
> > > 
> > > Sorry if I'm behind on the discussion, but why do we need to define
> > > 0xae0c & 0xae10 below for write-only access when these registers (0xae00
> > > & 0xae04) are effectively read-only (yes, write is there, but it should
> > > *never* be used in the current implementation)?  Thanks,
> > > 
> > > Alex
> > 
> > Because otherwise a new bios will be broken on an old qemu.
> > We should have made the registers read-only in the old qemu
> > then we won't have the problem now.
> 
> If you think there might be rouge bioses out there that ever wrote to
> these registers, but I doubt that's the case.

As I understand it, the issue is not rouge bioses,
it's the old qemus that make these writeable, and we want the new bios
to work there. Therefore new bios should avoid writing these
registers.

> > Yes, it's probably not a big deal, but it's just as easy
> > to create a new register as it is to redefine the old one.
> 
> It could just as easily be called a spec clarification that writes were
> never intended previously.

As I see it, the issue is with old implementations, not the spec.

>  It's easy to add two more registers... oh
> wait, we only have 16bits of ioport space...  Thanks,
> 
> Alex

8 bytes is still cheap ...

> > > >  Read by ACPI BIOS GPE.1 handler to notify OS of injection
> > > >  events.
> > > >  
> > > > -PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > > +Semantics: after a slot is populated by hotplug, it is immediately enabled
> > > > +and a bit in this register is set.
> > > > +Reset value: 0
> > > > +
> > > > +Pending PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > >  -----------------------------------------------------
> > > > -Slot removal notification pending. One bit per slot.
> > > > +Slot removal notification is pending. One bit per slot.
> > > > +Guests should not write this register, in practice the value
> > > > +written overwrites the register.
> > > >  
> > > >  Read by ACPI BIOS GPE.1 handler to notify OS of removal
> > > >  events.
> > > >  
> > > > +Semantics: upon hotunplug request, a bit in this register is set
> > > > +and the OSPM is notified about hotunplug request, but a slot remains enabled
> > > > +until eject.
> > > > +Reset value: 0
> > > > +
> > > >  PCI device eject (IO port 0xae08-0xae0b, 4-byte access):
> > > >  ----------------------------------------
> > > >  
> > > >  Used by ACPI BIOS _EJ0 method to request device removal. One bit per slot.
> > > > -Reads return 0.
> > > > +Guests should not read this register. In practice reads return 0.
> > > > +
> > > > +Semantics: selected hotunpluggable slots are disabled and powered off.
> > > > +Has no effect on non-hotunpluggable slots.
> > > > +
> > > > +For compatibility with old BIOS, writes to this register
> > > > +clear bits in pending slot injection notification register.
> > > >  
> > > >  PCI removability status (IO port 0xae0c-0xae0f, 4-byte access):
> > > >  -----------------------------------------------
> > > >  
> > > > -Used by ACPI BIOS _RMV method to indicate removability status to OS. One
> > > > -bit per slot.
> > > > +Used by ACPI BIOS _RMV method to detect removability status
> > > > +and report to OS. One bit per slot.
> > > > +Guests should not write this register, in practice the value
> > > > +written is discarded.
> > > > +
> > > > +Semantics: selected slots support hotplug. Contents of this register
> > > > +do not change while guest is running.
> > > > +
> > > > +Clear PCI slot injection notification (IO port 0xae10-0xae14, 4-byte access):
> > > > +---------------------------------------------------------------
> > > > +Clears bits in pending slot injection notification register. One bit per slot.
> > > > +Guests should not read this register. In practice reads return 0.
> > > > +
> > > > +Written to by ACPI BIOS GPE.1 handler after reading the
> > > > +pending slot injection notification register and before
> > > > +notifying OS of injection events.
> > > > +
> > > > +Semantics: used to acknowledge the injection notification.
> > > > +Does not affect slot status.
> > > > +
> > > > +Clear PCI slot removal notification (IO port 0xae04-0xae07, 4-byte access):
> > > > +-----------------------------------------------------
> > > > +Clears bits in slot removal notification pending register. One bit per slot.
> > > > +Guests should not read this register. In practice reads return 0.
> > > > +
> > > > +Written to by ACPI BIOS GPE.1 handler after reading the
> > > > +pending slot removal notification register and before
> > > > +notifying OS of removal events.
> > > > +
> > > > +Semantics: used to acknowledge the hotunplug request.
> > > > +Does not affect slot status.
> > > > diff --git a/hw/acpi_piix4.c b/hw/acpi_piix4.c
> > > > index 797ed24..50553fd 100644
> > > > --- a/hw/acpi_piix4.c
> > > > +++ b/hw/acpi_piix4.c
> > > > @@ -43,6 +43,8 @@
> > > >  #define PCI_BASE 0xae00
> > > >  #define PCI_EJ_BASE 0xae08
> > > >  #define PCI_RMV_BASE 0xae0c
> > > > +#define PCI_CLR_UP_BASE 0xae10
> > > > +#define PCI_CLR_DOWN_BASE 0xae14
> > > >  
> > > >  #define PIIX4_PCI_HOTPLUG_STATUS 2
> > > >  
> > > > @@ -287,6 +289,28 @@ static void piix4_update_hotplug(PIIX4PMState *s)
> > > >      }
> > > >  }
> > > >  
> > > > +static void acpi_piix_eject_slot(PIIX4PMState *s, unsigned slots)
> > > > +{
> > > > +    DeviceState *qdev, *next;
> > > > +    BusState *bus = qdev_get_parent_bus(&s->dev.qdev);
> > > > +    int slot = ffs(slots) - 1;
> > > > +
> > > > +    /*
> > > > +     * Clear DOWN register - this is good for a case where guest can't
> > > > +     * write to CLR_DOWN because of VM reset, and for compatibility with
> > > > +     * old guests which do not have CLR_DOWN.
> > > > +     */
> > > > +    s->pci0_status.down &= ~(1U << slot);
> > > > +
> > > > +    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > > +        PCIDevice *dev = PCI_DEVICE(qdev);
> > > > +        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > > +        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > > +            qdev_free(qdev);
> > > > +        }
> > > > +    }
> > > > +}
> > > > +
> > > >  static void piix4_reset(void *opaque)
> > > >  {
> > > >      PIIX4PMState *s = opaque;
> > > > @@ -302,6 +326,19 @@ static void piix4_reset(void *opaque)
> > > >          pci_conf[0x5B] = 0x02;
> > > >      }
> > > >      piix4_update_hotplug(s);
> > > > +    /*
> > > > +     * Guest lost remove events if any.
> > > > +     * As it's being reset it's safe to remove the device now.
> > > > +     */
> > > > +    while (s->pci0_status.down) {
> > > > +        acpi_piix_eject_slot(s, s->pci0_status.down);
> > > > +    }
> > > > +    /*
> > > > +     * Guest lost add events if any.
> > > > +     * As it's being reset and will rescan the bus we can discard
> > > > +     * past events now.
> > > > +     */
> > > > +    s->pci0_status.up = 0;
> > > >  }
> > > >  
> > > >  static void piix4_powerdown(void *opaque, int irq, int power_failing)
> > > > @@ -490,22 +527,31 @@ static uint32_t pciej_read(void *opaque, uint32_t addr)
> > > >  
> > > >  static void pciej_write(void *opaque, uint32_t addr, uint32_t val)
> > > >  {
> > > > -    BusState *bus = opaque;
> > > > -    DeviceState *qdev, *next;
> > > > -    int slot = ffs(val) - 1;
> > > > +    PIIX4PMState *s = opaque;
> > > >  
> > > > -    QTAILQ_FOREACH_SAFE(qdev, &bus->children, sibling, next) {
> > > > -        PCIDevice *dev = PCI_DEVICE(qdev);
> > > > -        PCIDeviceClass *pc = PCI_DEVICE_GET_CLASS(dev);
> > > > -        if (PCI_SLOT(dev->devfn) == slot && !pc->no_hotplug) {
> > > > -            qdev_free(qdev);
> > > > -        }
> > > > +    if (val) {
> > > > +        acpi_piix_eject_slot(s, val);
> > > >      }
> > > >  
> > > > -
> > > >      PIIX4_DPRINTF("pciej write %x <== %d\n", addr, val);
> > > >  }
> > > >  
> > > > +static void pci_clr_up_write(void *opaque, uint32_t addr, uint32_t val)
> > > > +{
> > > > +    PIIX4PMState *s = opaque;
> > > > +    s->pci0_status.up &= ~val;
> > > > +
> > > > +    PIIX4_DPRINTF("pci_clr_up write %x <== %d\n", addr, val);
> > > > +}
> > > > +
> > > > +static void pci_clr_down_write(void *opaque, uint32_t addr, uint32_t val)
> > > > +{
> > > > +    PIIX4PMState *s = opaque;
> > > > +    s->pci0_status.down &= ~val;
> > > > +
> > > > +    PIIX4_DPRINTF("pci_clr_down write %x <== %d\n", addr, val);
> > > > +}
> > > > +
> > > >  static uint32_t pcirmv_read(void *opaque, uint32_t addr)
> > > >  {
> > > >      PIIX4PMState *s = opaque;
> > > > @@ -532,12 +578,15 @@ static void piix4_acpi_system_hot_add_init(PCIBus *bus, PIIX4PMState *s)
> > > >      register_ioport_write(PCI_BASE, 8, 4, pcihotplug_write, pci0_status);
> > > >      register_ioport_read(PCI_BASE, 8, 4,  pcihotplug_read, pci0_status);
> > > >  
> > > > -    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, bus);
> > > > -    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, bus);
> > > > +    register_ioport_write(PCI_EJ_BASE, 4, 4, pciej_write, s);
> > > > +    register_ioport_read(PCI_EJ_BASE, 4, 4,  pciej_read, s);
> > > >  
> > > >      register_ioport_write(PCI_RMV_BASE, 4, 4, pcirmv_write, s);
> > > >      register_ioport_read(PCI_RMV_BASE, 4, 4,  pcirmv_read, s);
> > > >  
> > > > +    register_ioport_write(PCI_CLR_UP_BASE, 4, 4, pci_clr_up_write, s);
> > > > +    register_ioport_write(PCI_CLR_DOWN_BASE, 4, 4, pci_clr_down_write, s);
> > > > +
> > > >      pci_bus_hotplug(bus, piix4_device_hotplug, &s->dev.qdev);
> > > >  }
> > > >  
> > > > @@ -567,8 +616,6 @@ static int piix4_device_hotplug(DeviceState *qdev, PCIDevice *dev,
> > > >          return 0;
> > > >      }
> > > >  
> > > > -    s->pci0_status.up = 0;
> > > > -    s->pci0_status.down = 0;
> > > >      if (state == PCI_HOTPLUG_ENABLED) {
> > > >          enable_device(s, slot);
> > > >      } else {
> > > 
> > > 
> 
> 

  reply	other threads:[~2012-03-30 12:19 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-29 12:51 [Qemu-devel] [PATCHv2] piix: fix up/down races + document Michael S. Tsirkin
2012-03-29 13:04 ` Gleb Natapov
2012-03-29 13:30   ` Michael S. Tsirkin
2012-03-29 16:53 ` Alex Williamson
2012-03-29 19:38   ` Michael S. Tsirkin
2012-03-29 19:52     ` Alex Williamson
2012-03-30 12:19       ` Michael S. Tsirkin [this message]
2012-03-30 16:05 ` Igor Mammedov
2012-04-01  8:22   ` Michael S. Tsirkin
2012-04-01  8:26     ` Gleb Natapov
2012-04-01  9:06       ` Michael S. Tsirkin
2012-04-01  9:13         ` Gleb Natapov
2012-04-01  9:19           ` Gleb Natapov
2012-04-01  9:24 ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120330121933.GA9120@redhat.com \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=aurelien@aurel32.net \
    --cc=gleb@redhat.com \
    --cc=kraxel@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yamahata@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).