From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address and data
Date: Fri, 20 Jul 2012 00:24:05 +1000 [thread overview]
Message-ID: <50081885.2090906@ozlabs.ru> (raw)
In-Reply-To: <20120719092705.GF20120@redhat.com>
One comment below.
On 19/07/12 19:27, Michael S. Tsirkin wrote:
> On Thu, Jul 19, 2012 at 10:32:40AM +1000, Alexey Kardashevskiy wrote:
>> On 19/07/12 01:23, Michael S. Tsirkin wrote:
>>> On Wed, Jul 18, 2012 at 11:17:12PM +1000, Alexey Kardashevskiy wrote:
>>>> On 18/07/12 22:43, Michael S. Tsirkin wrote:
>>>>> On Thu, Jun 21, 2012 at 09:39:10PM +1000, Alexey Kardashevskiy wrote:
>>>>>> Added (msi|msix)_set_message() functions.
>>>>>>
>>>>>> Currently msi_notify()/msix_notify() write to these vectors to
>>>>>> signal the guest about an interrupt so the correct values have to
>>>>>> written there by the guest or QEMU.
>>>>>>
>>>>>> For example, POWER guest never initializes MSI/MSIX vectors, instead
>>>>>> it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on
>>>>>> POWER we have to initialize MSI/MSIX message from QEMU.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>
>>>>> So guests do enable MSI through config space, but do
>>>>> not fill in vectors?
>>>>
>>>> Yes. msix_capability_init() calls arch_setup_msi_irqs() which does everything it needs to do (i.e. calls hypervisor) before msix_capability_init() writes PCI_MSIX_FLAGS_ENABLE to the PCI_MSIX_FLAGS register.
>>>>
>>>> These vectors are the PCI bus addresses, the way they are set is specific for a PCI host controller, I do not see why the current scheme is a bug.
>>>
>>> I won't work with any real PCI device, will it? Real pci devices expect
>>> vectors to be written into their memory.
>>
>>
>> Yes. And the hypervisor does this. On POWER (at least book3s - server powerpc, the whole config space kitchen is hidden behind RTAS (kind of bios). For the guest, this RTAS is implemented in hypervisor, for the host - in the system firmware. So powerpc linux does not have to have PHB drivers. Kinda cool.
>>
>> Usual powerpc server is running without the host linux at all, it is running a hypervisor called pHyp. And every guest knows that it is a guest, there is no full machine emulation, it is para-virtualization. In power-kvm, we replace that pHyp with the host linux and now QEMU plays a hypervisor role. Some day We will move the hypervisor to the host kernel completely (?) but now it is in QEMU.
>
> OKay. So it is a POWER-specific weirdness as I suspected.
> Sure, if this is what real hardware does we pretty much have to
> emulate this.
>
>>>>> Very strange. Are you sure it's not
>>>>> just a guest bug? How does it work for other PCI devices?
>>>>
>>>> Did not get the question. It works the same for every PCI device under POWER guest.
>>>
>>> I mean for real PCI devices.
>>>
>>>>> Can't we just fix guest drivers to program the vectors properly?
>>>>>
>>>>> Also pls address the comment below.
>>>>
>>>> Comment below.
>>>>
>>>>> Thanks!
>>>>>
>>>>>> ---
>>>>>> hw/msi.c | 13 +++++++++++++
>>>>>> hw/msi.h | 1 +
>>>>>> hw/msix.c | 9 +++++++++
>>>>>> hw/msix.h | 2 ++
>>>>>> 4 files changed, 25 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/msi.c b/hw/msi.c
>>>>>> index 5233204..cc6102f 100644
>>>>>> --- a/hw/msi.c
>>>>>> +++ b/hw/msi.c
>>>>>> @@ -105,6 +105,19 @@ static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit)
>>>>>> return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32);
>>>>>> }
>>>>>>
>>>>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg)
>>>>>> +{
>>>>>> + uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
>>>>>> + bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
>>>>>> +
>>>>>> + if (msi64bit) {
>>>>>> + pci_set_quad(dev->config + msi_address_lo_off(dev), msg.address);
>>>>>> + } else {
>>>>>> + pci_set_long(dev->config + msi_address_lo_off(dev), msg.address);
>>>>>> + }
>>>>>> + pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data);
>>>>>> +}
>>>>>> +
>>>>>
>>>>> Please add documentation. Something like
>>>>>
>>>>> /*
>>>>> * Special API for POWER to configure the vectors through
>>>>> * a side channel. Should never be used by devices.
>>>>> */
>>>>
>>>>
>>>> It is useful for any para-virtualized environment I believe, is not it?
>>>> For s390 as well. Of course, if it supports PCI, for example, what I am not sure it does though :)
>>>
>>> I expect the normal guest to program the address into MSI register using
>>> config accesses, same way that it enables MSI/MSIX.
>>> Why POWER does it differently I did not yet figure out but I hope
>>> this weirdness is not so widespread.
>>
>>
>> In para-virt I would expect the guest not to touch config space at all. At least it should use one interface rather than two but this is how it is.
>
> It's not new that firmware developers consistently make inconsistent
> design decisions :)
It depends on how to look at it. Enabling MSI via the config space is also done via a special set of hypervisor calls (common and IBM-specific) so it is all hidden in one place - the system firmware, what is cool - no PHB drivers in the guest. Although MSI would not need any additional hypercall to init vectors (everything can be done via config space), there is MSI-X which stores vectors in BAR and there is no hypercall for BARs as they are simply memory mapped. This is I think why the firmware people (or phyp but it is probably the same) added IBM-specific MSI/MSIX config hypercalls.
And I do not quite understand why MSIX people could not use extended PCI config space which is 4096 bytes, quite a lot, enough to fit 256 vectors (have not seen a card which asked for more than 9 _per function_). If somebody really needs 2048, he may want 16384 as well (or any other crazy number), etc, so why did they put such a limit, it is a BAR, it is huge? :) A, offtopic anyway.
>>>>>> bool msi_enabled(const PCIDevice *dev)
>>>>>> {
>>>>>> return msi_present(dev) &&
>>>>>> diff --git a/hw/msi.h b/hw/msi.h
>>>>>> index 75747ab..6ec1f99 100644
>>>>>> --- a/hw/msi.h
>>>>>> +++ b/hw/msi.h
>>>>>> @@ -31,6 +31,7 @@ struct MSIMessage {
>>>>>>
>>>>>> extern bool msi_supported;
>>>>>>
>>>>>> +void msi_set_message(PCIDevice *dev, MSIMessage msg);
>>>>>> bool msi_enabled(const PCIDevice *dev);
>>>>>> int msi_init(struct PCIDevice *dev, uint8_t offset,
>>>>>> unsigned int nr_vectors, bool msi64bit, bool msi_per_vector_mask);
>>>>>> diff --git a/hw/msix.c b/hw/msix.c
>>>>>> index ded3c55..5f7d6d3 100644
>>>>>> --- a/hw/msix.c
>>>>>> +++ b/hw/msix.c
>>>>>> @@ -45,6 +45,15 @@ static MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>>>>>> return msg;
>>>>>> }
>>>>>>
>>>>>> +void msix_set_message(PCIDevice *dev, int vector, struct MSIMessage msg)
>>>>>> +{
>>>>>> + uint8_t *table_entry = dev->msix_table_page + vector * PCI_MSIX_ENTRY_SIZE;
>>>>>> +
>>>>>> + pci_set_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR, msg.address);
>>>>>> + pci_set_long(table_entry + PCI_MSIX_ENTRY_DATA, msg.data);
>>>>>> + table_entry[PCI_MSIX_ENTRY_VECTOR_CTRL] &= ~PCI_MSIX_ENTRY_CTRL_MASKBIT;
>>>>>> +}
>>>>>> +
>>>>>> /* Add MSI-X capability to the config space for the device. */
>>>>>> /* Given a bar and its size, add MSI-X table on top of it
>>>>>> * and fill MSI-X capability in the config space.
>>>>>> diff --git a/hw/msix.h b/hw/msix.h
>>>>>> index 50aee82..26a437e 100644
>>>>>> --- a/hw/msix.h
>>>>>> +++ b/hw/msix.h
>>>>>> @@ -4,6 +4,8 @@
>>>>>> #include "qemu-common.h"
>>>>>> #include "pci.h"
>>>>>>
>>>>>> +void msix_set_message(PCIDevice *dev, int vector, MSIMessage msg);
>>>>>> +
>>>>>> int msix_init(PCIDevice *pdev, unsigned short nentries,
>>>>>> MemoryRegion *bar,
>>>>>> unsigned bar_nr, unsigned bar_size);
>>>>>> --
>>>>>> 1.7.10
>>>>>>
>>>>>> ps. double '-' and git version is an end-of-patch scissor as I read somewhere, cannot recall where exactly :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 21/06/12 20:56, Jan Kiszka wrote:
>>>>>>> On 2012-06-21 12:50, Alexey Kardashevskiy wrote:
>>>>>>>> On 21/06/12 20:38, Jan Kiszka wrote:
>>>>>>>>> On 2012-06-21 12:28, Alexey Kardashevskiy wrote:
>>>>>>>>>> On 21/06/12 17:39, Jan Kiszka wrote:
>>>>>>>>>>> On 2012-06-21 09:18, Alexey Kardashevskiy wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> agrhhh. sha1 of the patch changed after rebasing :)
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Added (msi|msix)_(set|get)_message() function for whoever might
>>>>>>>>>>>> want to use them.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently msi_notify()/msix_notify() write to these vectors to
>>>>>>>>>>>> signal the guest about an interrupt so the correct values have to
>>>>>>>>>>>> written there by the guest or QEMU.
>>>>>>>>>>>>
>>>>>>>>>>>> For example, POWER guest never initializes MSI/MSIX vectors, instead
>>>>>>>>>>>> it uses RTAS hypercalls. So in order to support MSIX for virtio-pci on
>>>>>>>>>>>> POWER we have to initialize MSI/MSIX message from QEMU.
>>>>>>>>>>>>
>>>>>>>>>>>> As only set* function are required by now, the "get" functions were added
>>>>>>>>>>>> or made public for a symmetry.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>>>>>>>>> ---
>>>>>>>>>>>> hw/msi.c | 29 +++++++++++++++++++++++++++++
>>>>>>>>>>>> hw/msi.h | 2 ++
>>>>>>>>>>>> hw/msix.c | 11 ++++++++++-
>>>>>>>>>>>> hw/msix.h | 3 +++
>>>>>>>>>>>> 4 files changed, 44 insertions(+), 1 deletion(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/hw/msi.c b/hw/msi.c
>>>>>>>>>>>> index 5233204..9ad84a4 100644
>>>>>>>>>>>> --- a/hw/msi.c
>>>>>>>>>>>> +++ b/hw/msi.c
>>>>>>>>>>>> @@ -105,6 +105,35 @@ static inline uint8_t msi_pending_off(const PCIDevice* dev, bool msi64bit)
>>>>>>>>>>>> return dev->msi_cap + (msi64bit ? PCI_MSI_PENDING_64 : PCI_MSI_PENDING_32);
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> +MSIMessage msi_get_message(PCIDevice *dev)
>>>>>>>>>>>
>>>>>>>>>>> MSIMessage msi_get_message(PCIDevice *dev, unsigned vector)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Who/how/why is going to calculate the vector here?
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> +{
>>>>>>>>>>>> + uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
>>>>>>>>>>>> + bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
>>>>>>>>>>>> + MSIMessage msg;
>>>>>>>>>>>> +
>>>>>>>>>>>> + if (msi64bit) {
>>>>>>>>>>>> + msg.address = pci_get_quad(dev->config + msi_address_lo_off(dev));
>>>>>>>>>>>> + } else {
>>>>>>>>>>>> + msg.address = pci_get_long(dev->config + msi_address_lo_off(dev));
>>>>>>>>>>>> + }
>>>>>>>>>>>> + msg.data = pci_get_word(dev->config + msi_data_off(dev, msi64bit));
>>>>>>>>>>>
>>>>>>>>>>> And I have this here in addition:
>>>>>>>>>>>
>>>>>>>>>>> unsigned int nr_vectors = msi_nr_vectors(flags);
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>> if (nr_vectors > 1) {
>>>>>>>>>>> msg.data &= ~(nr_vectors - 1);
>>>>>>>>>>> msg.data |= vector;
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> See PCI spec and existing code.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What for? I really do not get it why someone might want to read something but not real value.
>>>>>>>>>> What PCI code should I look?
>>>>>>>>>
>>>>>>>>> I'm not sure what your use case for reading the message is. For KVM
>>>>>>>>> device assignment it is preparing an alternative message delivery path
>>>>>>>>> for MSI vectors. And for this we will need vector notifier support for
>>>>>>>>> MSI as well. You can check the MSI-X code for corresponding use cases of
>>>>>>>>> msix_get_message.
>>>>>>>>
>>>>>>>>> And when we already have msi_get_message, another logical use case is
>>>>>>>>> msi_notify. See msix.c again.
>>>>>>>>
>>>>>>>> Aaaa.
>>>>>>>>
>>>>>>>> I have no case for reading the message. All I need is writing. And I want it public as I want to use
>>>>>>>> it from hw/spapr_pci.c. You suggested to add reading, I added "get" to be _symmetric_ to "set"
>>>>>>>> ("get" returns what "set" wrote). You want a different thing which I can do but it is not
>>>>>>>> msi_get_message(), it is something like msi_prepare_message(MSImessage msg) or
>>>>>>>> msi_set_vector(uint16_t data) or simply internal kitchen of msi_notify().
>>>>>>>>
>>>>>>>> Still can do what you suggested, it just does not seem right.
>>>>>>>
>>>>>>> It is right - when looking at it from a different angle. ;)
>>>>>>>
>>>>>>> I don't mind if you add msi_get_message now or leave this to me. Likely
>>>>>>> the latter is better as you have no use case for msi_get_message (and
>>>>>>> also msix_get_message!) outside of their modules, thus we should not
>>>>>>> export those functions anyway.
--
Alexey
next prev parent reply other threads:[~2012-07-19 14:24 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-14 4:29 [Qemu-devel] [PATCH 0/3] adding MSI/MSIX for PCI on POWER Alexey Kardashevskiy
2012-06-14 4:31 ` [Qemu-devel] [PATCH 1/3] msi/msix: added functions to API to set up message address and data Alexey Kardashevskiy
2012-06-14 4:56 ` Alex Williamson
2012-06-14 5:17 ` Alexey Kardashevskiy
2012-06-14 5:38 ` Alex Williamson
2012-06-14 5:44 ` Alexey Kardashevskiy
2012-06-14 18:37 ` Alex Williamson
2012-06-14 5:45 ` Jan Kiszka
2012-06-21 6:46 ` [Qemu-devel] [PATCH] msi/msix: added functions to API to set up message address, " Alexey Kardashevskiy
2012-06-21 6:53 ` Jan Kiszka
2012-06-21 7:18 ` [Qemu-devel] [PATCH] msi/msix: added public API to set/get MSI " Alexey Kardashevskiy
2012-06-21 7:39 ` Jan Kiszka
2012-06-21 10:28 ` Alexey Kardashevskiy
2012-06-21 10:38 ` Jan Kiszka
2012-06-21 10:50 ` Alexey Kardashevskiy
2012-06-21 10:56 ` Jan Kiszka
2012-06-21 11:39 ` [Qemu-devel] [PATCH] msi/msix: added API to set MSI message address " Alexey Kardashevskiy
2012-06-21 11:49 ` Jan Kiszka
2012-06-22 1:03 ` Alexey Kardashevskiy
2012-06-22 1:15 ` Alexey Kardashevskiy
2012-07-02 4:28 ` Alexey Kardashevskiy
2012-07-02 7:24 ` Jan Kiszka
2012-07-06 15:36 ` Alexander Graf
2012-07-06 15:58 ` Jan Kiszka
2012-07-11 18:22 ` Alexander Graf
2012-07-18 12:43 ` Michael S. Tsirkin
2012-07-18 13:17 ` Alexey Kardashevskiy
2012-07-18 15:23 ` Michael S. Tsirkin
2012-07-19 0:32 ` Alexey Kardashevskiy
2012-07-19 9:27 ` Michael S. Tsirkin
2012-07-19 14:24 ` Alexey Kardashevskiy [this message]
2012-07-19 14:43 ` Michael S. Tsirkin
2012-07-19 14:50 ` Alexey Kardashevskiy
2012-07-19 14:56 ` Michael S. Tsirkin
2012-07-19 0:35 ` Alexey Kardashevskiy
2012-07-19 9:27 ` Michael S. Tsirkin
2012-06-21 15:44 ` [Qemu-devel] [PATCH] msi/msix: added public API to set/get MSI message address, " Alex Williamson
2012-06-14 4:33 ` [Qemu-devel] [PATCH 2/3] pseries: added allocator for a block of IRQs Alexey Kardashevskiy
2012-06-14 4:34 ` [Qemu-devel] [PATCH 3/3] pseries pci: added MSI/MSIX support Alexey Kardashevskiy
2012-06-14 4:42 ` [Qemu-devel] [PATCH 0/3] adding MSI/MSIX for PCI on POWER Alexey Kardashevskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50081885.2090906@ozlabs.ru \
--to=aik@ozlabs.ru \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).