From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alexander Graf <agraf@suse.de>, qemu-devel@nongnu.org
Cc: qemu-ppc@nongnu.org, Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB
Date: Fri, 23 May 2014 00:25:50 +1000 [thread overview]
Message-ID: <537E08EE.1050501@ozlabs.ru> (raw)
In-Reply-To: <537DD82F.4020402@suse.de>
On 05/22/2014 08:57 PM, Alexander Graf wrote:
>
> On 22.05.14 12:53, Alexey Kardashevskiy wrote:
>> On 05/22/2014 05:16 PM, Alexander Graf wrote:>
>>>> Am 22.05.2014 um 08:53 schrieb Alexey Kardashevskiy <aik@ozlabs.ru>:
>>>>
>>>>> On 05/21/2014 10:42 PM, Alexey Kardashevskiy wrote:
>>>>>> On 05/21/2014 08:35 PM, Alexander Graf wrote:
>>>>>>
>>>>>>> On 21.05.14 12:13, Alexey Kardashevskiy wrote:
>>>>>>>> On 05/21/2014 07:50 PM, Alexander Graf wrote:
>>>>>>>>> On 21.05.14 11:33, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 05/21/2014 07:13 PM, Alexander Graf wrote:
>>>>>>>>>>> On 21.05.14 11:11, Michael S. Tsirkin wrote:
>>>>>>>>>>>> On Wed, May 21, 2014 at 11:06:09AM +0200, Alexander Graf wrote:
>>>>>>>>>>>>> On 21.05.14 10:52, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>> On 05/21/2014 06:40 PM, Alexander Graf wrote:
>>>>>>>>>>>>>>> On 15.05.14 11:59, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>>>> Currently SPAPR PHB keeps track of all allocated MSI/MISX
>>>>>>>>>>>>>>> interrupt as
>>>>>>>>>>>>>>> XICS used to be unable to reuse interrupts which becomes a
>>>>>>>>>>>>>>> problem for
>>>>>>>>>>>>>>> dynamic MSI reconfiguration which is happening on guest driver
>>>>>>>>>>>>>>> reload or
>>>>>>>>>>>>>>> PCI hot (un)plug. Another problem is that PHB has a limit of
>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>> supporting MSI/MSIX (SPAPR_MSIX_MAX_DEVS=32) and there is no
>>>>>>>>>>>>>>> good
>>>>>>>>>>>>>>> reason
>>>>>>>>>>>>>>> for that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This makes use of new XICS ability to reuse interrupts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This removes cached MSI configuration from SPAPR PHB so the
>>>>>>>>>>>>>>> first
>>>>>>>>>>>>>>> IRQ
>>>>>>>>>>>>>>> number
>>>>>>>>>>>>>>> of a device is stored in MSI/MSIX config space so there is no
>>>>>>>>>>>>>>> need to
>>>>>>>>>>>>>>> store
>>>>>>>>>>>>>>> this anywhere else. From now on, SPAPR PHB only keeps flags
>>>>>>>>>>>>>>> telling
>>>>>>>>>>>>>>> what
>>>>>>>>>>>>>>> type
>>>>>>>>>>>>>>> of interrupt for which device it has configured in order to
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> error if
>>>>>>>>>>>>>>> (for example) MSIX was enabled and the guest is trying to
>>>>>>>>>>>>>>> disable
>>>>>>>>>>>>>>> MSI
>>>>>>>>>>>>>>> which
>>>>>>>>>>>>>>> it has not enabled.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This removes a limit for the maximum number of MSIX-enabled
>>>>>>>>>>>>>>> devices
>>>>>>>>>>>>>>> per PHB,
>>>>>>>>>>>>>>> now XICS and PCI bus capacity are the only limitation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This changes migration stream as it fixes
>>>>>>>>>>>>>>> vmstate_spapr_pci_msi::name
>>>>>>>>>>>>>>> which was
>>>>>>>>>>>>>>> wrong since the beginning.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This fixed traces to be more informative.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In reality either MSIX or MSI is enabled, never both. So I
>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>> remove
>>>>>>>>>>>>>>> msi/msix
>>>>>>>>>>>>>>> bitmaps from this patch, would it make sense?
>>>>>>>>>>>>>> Is this a hard requirement? Does a device have to choose between
>>>>>>>>>>>>>> MSIX and
>>>>>>>>>>>>>> MSI or could it theoretically have both enabled? Is this a PCI
>>>>>>>>>>>>>> limitation,
>>>>>>>>>>>>>> a PAPR/XICS limitation or just a limitation of your
>>>>>>>>>>>>>> implementation?
>>>>>>>>>>>>> My implementation does not have this limitation, I asked if I can
>>>>>>>>>>>>> simplify
>>>>>>>>>>>>> code by introducing one :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> I cannot see any reason why PCI cannot have both MSI and MSIX
>>>>>>>>>>>>> enabled
>>>>>>>>>>>>> but
>>>>>>>>>>>>> it does not seem to be used by anyone => cannot debug and
>>>>>>>>>>>>> confirm.
>>>>>>>>>>>>>
>>>>>>>>>>>>> PAPR spec assumes that if the guest tries enabling MSIX when
>>>>>>>>>>>>> MSI is
>>>>>>>>>>>>> already
>>>>>>>>>>>>> enabled, this is a "change", not enabling both types. But it also
>>>>>>>>>>>>> says MSI
>>>>>>>>>>>>> and MSIX vector numbers are not shared. Hm.
>>>>>>>>>>>> Yeah, I'm not aware of any limitation on hardware here and I'd
>>>>>>>>>>>> rather not impose one.
>>>>>>>>>>>>
>>>>>>>>>>>> Michael, do you know of any hardware that uses MSI *and* MSI-X at
>>>>>>>>>>>> the same time?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Alex
>>>>>>>>>>> No, and the PCI spec says:
>>>>>>>>>>> A function is permitted to implement both MSI and MSI-X, but
>>>>>>>>>>> system
>>>>>>>>>>> software is
>>>>>>>>>>> prohibited from enabling both at the same time. If system
>>>>>>>>>>> software
>>>>>>>>>>> enables both at the same time, the result is undefined.
>>>>>>>>>> Ah, cool. So yes Alexey, feel free to impose it :).
>>>>>>>>> Heh. This solves just half of the problem - I still have to keep
>>>>>>>>> track of
>>>>>>>>> what device got MSI/MSIX configured via that ibm,change-msi
>>>>>>>>> interface. I
>>>>>>>>> was hoping I can store such flag somewhere in a device PCI config
>>>>>>>>> space
>>>>>>>>> but
>>>>>>>>> MSI/MSIX enable bit is not good as it is not set when those calls are
>>>>>>>>> made.
>>>>>>>>> And I cannot rely on address/data fields much as the guest can
>>>>>>>>> change them
>>>>>>>>> (I already use them to store IRQ numbers and btw it is missing
>>>>>>>>> checks when
>>>>>>>>> I read them back for disposal, I'll fix in next round).
>>>>>>>>>
>>>>>>>>> Or on "enable" event I could put IRQ numbers to .data of MSI
>>>>>>>>> config space
>>>>>>>>> and on "disable" check if it is not zero, then configuration took
>>>>>>>>> place,
>>>>>>>>> then I can remove my msi[]/msix[] flag arrays. If the guest did
>>>>>>>>> any change
>>>>>>>>> to MSI/MSIX config space (it does not on SPAPR except weird selftest
>>>>>>>>> cases), I compare .data with what ICS can possibly have and either
>>>>>>>>> reject
>>>>>>>>> "disable" or handle it and if it breaks XICS - that's too bad for the
>>>>>>>>> stupid guest. Would that be acceptable?
>>>>>>>> Can't you prohibit the guest from writing to the MSI configuration
>>>>>>>> registers itself? Then you don't need to do sanity checks.
>>>>>>> I could for emulated devices but VFIO uses the same code. For example,
>>>>>>> there is an IBM SCSI IPR card which does a "self test". For that, it
>>>>>>> saves
>>>>>>> MSIX BAR content, does reboot via some backdoor interface and
>>>>>>> restores MSIX
>>>>>>> BAR. It has been solved for VFIO in the host kernel by restoring
>>>>>>> MSIX data
>>>>>>> from cached values when guest is trying to restore it with what it
>>>>>>> thinks
>>>>>>> is actual MSIX data (it is virtualized because of x86). But there is
>>>>>>> cache
>>>>>> We already have a cache because we don't access the real PCI
>>>>>> registers with
>>>>>> msi_set_message(), no?
>>>>>
>>>>> For emulated devices there is no cache. And in any case the guest is
>>>>> allowed to write to it... Who knows what AIX does? I do not.
>>>>
>>>> Tried GHashTable for keeping bus:dev.fn <-> (irq, num), more or less ok
>>>> but
>>>> how to migrate such thing? Temporary cache somewhere and then unpack
>>>> it? Or
>>>> use old style migration callbacks?
>>> Could you try to introduce a new vmstate type that just serializes and
>>> deserializes hash tables? Maybe there is already a serialization
>>> function for it in glib?
>> I have not found any (most likely I do not know how to search there),
>> I added mine, here are VMSTATE_HASH + its use for SPAPR.
>>
>>
>> Is this a movement to right direction? I need to put key/value sizes
>> into VMSTATE definition somehow but do not really want to touch
>> VMStateField.
>
> I'm not sure. Juan?
I also tried to simplify this particular thing more by assuming that the
key is always "int" and put size of value to VMStateField::size. But there
is no way to get the size in VMStateInfo::get(). Or I could do a
"subsection" and try implementing has as an array (and have get/put defined
for items, should work) but in this case I'll need to cache number of
elements of the hash table somewhere so I'll need a wrapper around GHashTable.
Making generalized version is not obvious for me :(
--
Alexey
prev parent reply other threads:[~2014-05-22 14:26 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-15 9:59 [Qemu-devel] [PATCH v2 0/8] Move interrupts from spapr to xics Alexey Kardashevskiy
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 1/8] xics: Add flags for interrupts Alexey Kardashevskiy
2014-05-21 6:30 ` Alexey Kardashevskiy
2014-05-21 8:40 ` Alexander Graf
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 2/8] xics: Add xics_find_source() Alexey Kardashevskiy
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 3/8] xics: Disable flags reset on xics reset Alexey Kardashevskiy
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 4/8] spapr: Move interrupt allocator to xics Alexey Kardashevskiy
2014-05-21 8:34 ` Alexander Graf
2014-05-21 8:46 ` Alexey Kardashevskiy
2014-05-21 8:47 ` Alexander Graf
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 5/8] xics: Remove obsolete xics_set_irq_type() Alexey Kardashevskiy
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 6/8] spapr: Remove @next_irq Alexey Kardashevskiy
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 7/8] xics: Implement xics_ics_free() Alexey Kardashevskiy
2014-05-15 9:59 ` [Qemu-devel] [PATCH v2 8/8] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
2014-05-21 8:40 ` Alexander Graf
2014-05-21 8:52 ` Alexey Kardashevskiy
2014-05-21 9:06 ` Alexander Graf
2014-05-21 9:11 ` Michael S. Tsirkin
2014-05-21 9:13 ` Alexander Graf
2014-05-21 9:33 ` Alexey Kardashevskiy
2014-05-21 9:38 ` Michael S. Tsirkin
2014-05-21 10:03 ` Alexey Kardashevskiy
2014-05-21 9:50 ` Alexander Graf
2014-05-21 10:13 ` Alexey Kardashevskiy
2014-05-21 10:35 ` Alexander Graf
2014-05-21 12:42 ` Alexey Kardashevskiy
2014-05-22 6:53 ` Alexey Kardashevskiy
2014-05-22 7:16 ` Alexander Graf
2014-05-22 10:53 ` Alexey Kardashevskiy
2014-05-22 10:53 ` [Qemu-devel] [PATCH 1/2] vmstate: Add helper to enable GHashTable migration Alexey Kardashevskiy
2014-05-22 10:57 ` Alexander Graf
2014-05-22 11:04 ` Alexey Kardashevskiy
2014-05-22 10:53 ` [Qemu-devel] [PATCH 2/2] spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB Alexey Kardashevskiy
2014-05-22 10:57 ` [Qemu-devel] [PATCH v2 8/8] " Alexander Graf
2014-05-22 14:25 ` Alexey Kardashevskiy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=537E08EE.1050501@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=agraf@suse.de \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.