From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:37226)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1T4pMp-0005tU-1X
	for qemu-devel@nongnu.org; Fri, 24 Aug 2012 04:35:53 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1T4pMk-0005lU-80
	for qemu-devel@nongnu.org; Fri, 24 Aug 2012 04:35:46 -0400
Received: from mout.web.de ([212.227.15.3]:58180)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1T4pMj-0005lK-QF
	for qemu-devel@nongnu.org; Fri, 24 Aug 2012 04:35:42 -0400
Message-ID: <50373CD6.3070002@web.de>
Date: Fri, 24 Aug 2012 10:35:34 +0200
From: Jan Kiszka <jan.kiszka@web.de>
MIME-Version: 1.0
References: <50371D06.3010706@web.de> <20120824082048.GC7830@redhat.com>
	<50373998.3030708@web.de>
In-Reply-To: <50373998.3030708@web.de>
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigA2E0A5982052F199D8E08CC2"
Subject: Re: [Qemu-devel] [PATCH v2][for 1.2?] msix: Drop tracking of used
	vectors
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Cam Macdonell <cam@cs.ualberta.ca>, qemu-devel <qemu-devel@nongnu.org>, Hannes Reinecke <hare@suse.de>

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigA2E0A5982052F199D8E08CC2
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On 2012-08-24 10:21, Jan Kiszka wrote:
> On 2012-08-24 10:20, Michael S. Tsirkin wrote:
>> On Fri, Aug 24, 2012 at 08:19:50AM +0200, Jan Kiszka wrote:
>>> From: Jan Kiszka <jan.kiszka@siemens.com>
>>>
>>> This optimization was once used in qemu-kvm to keep KVM route usage l=
ow.
>>> But now we solved that problem via lazy updates. It also tried to han=
dle
>>> the case of vectors shared between different sources of the same devi=
ce.
>>> However, this never really worked and will have to be addressed
>>> differently anyway. So drop this obsolete interface.
>>>
>>> We still need interfaces to clear pending vectors though. Provide
>>> msix_clear_vector and msix_clear_all_vectors for this.
>>>
>>> This also unbreaks MSI-X after reset for ivshmem and megasas as devic=
e
>>> models can't easily mark their vectors used afterward (megasas didn't=

>>> even try).
>>>
>>> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
>>> ---
>>>
>>> This patch has been posted some moons again, and we had a discussion
>>> about the impact on the existing users. At that time, MSI-X refactori=
ng
>>> for KVM support was not yet merged. Now it is, and it should be bette=
r
>>> much clearer that vector usage tracking has no business with that
>>> feature.
>>>
>>>  hw/ivshmem.c    |   20 -------------------
>>>  hw/megasas.c    |    4 ---
>>>  hw/msix.c       |   57 ++++++++++++---------------------------------=
---------
>>>  hw/msix.h       |    5 +--
>>>  hw/pci.h        |    2 -
>>>  hw/virtio-pci.c |   20 +++++++-----------
>>>  6 files changed, 23 insertions(+), 85 deletions(-)
>>>
>>> diff --git a/hw/ivshmem.c b/hw/ivshmem.c
>>> index 47f2a16..5ffff48 100644
>>> --- a/hw/ivshmem.c
>>> +++ b/hw/ivshmem.c
>>> @@ -523,28 +523,11 @@ static void ivshmem_read(void *opaque, const ui=
nt8_t * buf, int flags)
>>>      return;
>>>  }
>>> =20
>>> -/* Select the MSI-X vectors used by device.
>>> - * ivshmem maps events to vectors statically, so
>>> - * we just enable all vectors on init and after reset. */
>>> -static void ivshmem_use_msix(IVShmemState * s)
>>> -{
>>> -    int i;
>>> -
>>> -    if (!msix_present(&s->dev)) {
>>> -        return;
>>> -    }
>>> -
>>> -    for (i =3D 0; i < s->vectors; i++) {
>>> -        msix_vector_use(&s->dev, i);
>>> -    }
>>> -}
>>> -
>>>  static void ivshmem_reset(DeviceState *d)
>>>  {
>>>      IVShmemState *s =3D DO_UPCAST(IVShmemState, dev.qdev, d);
>>> =20
>>>      s->intrstatus =3D 0;
>>> -    ivshmem_use_msix(s);
>>>      return;
>>>  }
>>> =20
>>> @@ -586,8 +569,6 @@ static void ivshmem_setup_msi(IVShmemState * s)
>>> =20
>>>      /* allocate QEMU char devices for receiving interrupts */
>>>      s->eventfd_table =3D g_malloc0(s->vectors * sizeof(EventfdEntry)=
);
>>> -
>>> -    ivshmem_use_msix(s);
>>>  }
>>> =20
>>>  static void ivshmem_save(QEMUFile* f, void *opaque)
>>> @@ -629,7 +610,6 @@ static int ivshmem_load(QEMUFile* f, void *opaque=
, int version_id)
>>> =20
>>>      if (ivshmem_has_feature(proxy, IVSHMEM_MSI)) {
>>>          msix_load(&proxy->dev, f);
>>> -	ivshmem_use_msix(proxy);
>>>      } else {
>>>          proxy->intrstatus =3D qemu_get_be32(f);
>>>          proxy->intrmask =3D qemu_get_be32(f);
>>> diff --git a/hw/megasas.c b/hw/megasas.c
>>> index c35a15d..4766117 100644
>>> --- a/hw/megasas.c
>>> +++ b/hw/megasas.c
>>> @@ -2121,10 +2121,6 @@ static int megasas_scsi_init(PCIDevice *dev)
>>>      pci_register_bar(&s->dev, 2, PCI_BASE_ADDRESS_SPACE_IO, &s->port=
_io);
>>>      pci_register_bar(&s->dev, 3, bar_type, &s->queue_io);
>>> =20
>>> -    if (megasas_use_msix(s)) {
>>> -        msix_vector_use(&s->dev, 0);
>>> -    }
>>> -
>>>      if (!s->sas_addr) {
>>>          s->sas_addr =3D ((NAA_LOCALLY_ASSIGNED_ID << 24) |
>>>                         IEEE_COMPANY_LOCALLY_ASSIGNED) << 36;
>>> diff --git a/hw/msix.c b/hw/msix.c
>>> index aea340b..163ce4c 100644
>>> --- a/hw/msix.c
>>> +++ b/hw/msix.c
>>> @@ -273,7 +273,6 @@ int msix_init(struct PCIDevice *dev, unsigned sho=
rt nentries,
>>> =20
>>>      dev->msix_table =3D g_malloc0(table_size);
>>>      dev->msix_pba =3D g_malloc0(pba_size);
>>> -    dev->msix_entry_used =3D g_malloc0(nentries * sizeof *dev->msix_=
entry_used);
>>> =20
>>>      msix_mask_all(dev, nentries);
>>> =20
>>> @@ -326,16 +325,6 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsi=
gned short nentries,
>>>      return 0;
>>>  }
>>> =20
>>> -static void msix_free_irq_entries(PCIDevice *dev)
>>> -{
>>> -    int vector;
>>> -
>>> -    for (vector =3D 0; vector < dev->msix_entries_nr; ++vector) {
>>> -        dev->msix_entry_used[vector] =3D 0;
>>> -        msix_clr_pending(dev, vector);
>>> -    }
>>> -}
>>> -
>>>  /* Clean up resources for the device. */
>>>  void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegi=
on *pba_bar)
>>>  {
>>> @@ -344,7 +333,6 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *ta=
ble_bar, MemoryRegion *pba_bar)
>>>      }
>>>      pci_del_capability(dev, PCI_CAP_ID_MSIX, MSIX_CAP_LENGTH);
>>>      dev->msix_cap =3D 0;
>>> -    msix_free_irq_entries(dev);
>>>      dev->msix_entries_nr =3D 0;
>>>      memory_region_del_subregion(pba_bar, &dev->msix_pba_mmio);
>>>      memory_region_destroy(&dev->msix_pba_mmio);
>>> @@ -354,8 +342,6 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *ta=
ble_bar, MemoryRegion *pba_bar)
>>>      memory_region_destroy(&dev->msix_table_mmio);
>>>      g_free(dev->msix_table);
>>>      dev->msix_table =3D NULL;
>>> -    g_free(dev->msix_entry_used);
>>> -    dev->msix_entry_used =3D NULL;
>>>      dev->cap_present &=3D ~QEMU_PCI_CAP_MSIX;
>>>      return;
>>>  }
>>> @@ -390,7 +376,6 @@ void msix_load(PCIDevice *dev, QEMUFile *f)
>>>          return;
>>>      }
>>> =20
>>> -    msix_free_irq_entries(dev);
>>>      qemu_get_buffer(f, dev->msix_table, n * PCI_MSIX_ENTRY_SIZE);
>>>      qemu_get_buffer(f, dev->msix_pba, (n + 7) / 8);
>>>      msix_update_function_masked(dev);
>>> @@ -419,8 +404,9 @@ void msix_notify(PCIDevice *dev, unsigned vector)=

>>>  {
>>>      MSIMessage msg;
>>> =20
>>> -    if (vector >=3D dev->msix_entries_nr || !dev->msix_entry_used[ve=
ctor])
>>> +    if (vector >=3D dev->msix_entries_nr) {
>>>          return;
>>> +    }
>>>      if (msix_is_masked(dev, vector)) {
>>>          msix_set_pending(dev, vector);
>>>          return;
>>> @@ -436,7 +422,7 @@ void msix_reset(PCIDevice *dev)
>>>      if (!msix_present(dev)) {
>>>          return;
>>>      }
>>> -    msix_free_irq_entries(dev);
>>> +    msix_clear_all_vectors(dev);
>>>      dev->config[dev->msix_cap + MSIX_CONTROL_OFFSET] &=3D
>>>  	    ~dev->wmask[dev->msix_cap + MSIX_CONTROL_OFFSET];
>>>      memset(dev->msix_table, 0, dev->msix_entries_nr * PCI_MSIX_ENTRY=
_SIZE);
>>> @@ -444,41 +430,24 @@ void msix_reset(PCIDevice *dev)
>>>      msix_mask_all(dev, dev->msix_entries_nr);
>>>  }
>>> =20
>>> -/* PCI spec suggests that devices make it possible for software to c=
onfigure
>>> - * less vectors than supported by the device, but does not specify a=
 standard
>>> - * mechanism for devices to do so.
>>> - *
>>> - * We support this by asking devices to declare vectors software is =
going to
>>> - * actually use, and checking this on the notification path. Devices=
 that
>>> - * don't want to follow the spec suggestion can declare all vectors =
as used. */
>>> -
>>> -/* Mark vector as used. */
>>> -int msix_vector_use(PCIDevice *dev, unsigned vector)
>>> +/* Clear pending vector. */
>>> +void msix_clear_vector(PCIDevice *dev, unsigned vector)
>>>  {
>>> -    if (vector >=3D dev->msix_entries_nr)
>>> -        return -EINVAL;
>>> -    dev->msix_entry_used[vector]++;
>>> -    return 0;
>>> -}
>>> -
>>
>> I keep thinking we should instead call vector notifier
>> here, so that virtio can detect and handle cases where
>> kvm_virtio_pci_vq_vector_use fails.
>> ATM it just asserts.
>>
>> Maybe I am wrong.
>>
>> Let's delay this decision until 1.3.
>=20
> Yep, that feature can be added on top of this patch in 1.3, but we
> should still remove the usage tracking to fix its issues. There is no
> feature regression related to this.

OK, to address that virtio requirement (in 1.3) we can simply
preallocate an MSI route on VIRTIO_MSI_CONFIG/QUEUE_VECTOR and update
that route (I have a patch for kvm_irqchip_update_msi_route here
already) on vector_use.

The point is: That all is pure virtio business, nothing the generic
MSI-X layer nor any normal MSI-X device or even pci-assign/vfio have to
deal with.

Jan


--------------enigA2E0A5982052F199D8E08CC2
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAlA3PNYACgkQitSsb3rl5xRXYQCgq8k7y44UnB8vX70LAhrBK994
1iQAoL+N1jw2YVRUFZobVI9HnuacyQYp
=TcUv
-----END PGP SIGNATURE-----

--------------enigA2E0A5982052F199D8E08CC2--