All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alexander Graf <agraf@suse.de>,
	qemu-devel <qemu-devel@nongnu.org>,
	qemu-trivial@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-trivial] [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support
Date: Mon, 24 Jun 2013 16:06:42 +0300	[thread overview]
Message-ID: <20130624130642.GJ18508@redhat.com> (raw)
In-Reply-To: <87sj07bsf3.fsf@codemonkey.ws>

On Mon, Jun 24, 2013 at 07:32:32AM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Sun, Jun 23, 2013 at 10:06:05AM -0500, Anthony Liguori wrote:
> >> On Thu, Jun 20, 2013 at 11:46 PM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote:
> >> >> On 06/21/2013 12:34 PM, Alex Williamson wrote:
> >> >>
> >> >>
> >> >> Do not follow you, sorry. For x86, is it that MSI routing table which is
> >> >> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
> >> >> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?
> >> >
> >> > vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data)
> >> >
> >> > This writes directly to the interrupt block on the vCPU.  With KVM, the
> >> > in-kernel APIC does the same write, where the pin to MSIMessage is setup
> >> > by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd.
> >> 
> >> What is this "interrupt block on the vCPU" you speak of?  I reviewed
> > FEE00000H address as seen from PCI bus is a special address range (see
> > 10.11.1 in SDM).
> 
> Ack.
> 
> > Any write by a PCI device to that address range is
> > interpreted as MSI. We do not model this correctly in QEMU yet since
> > all devices, including vcpus, see exactly same memory map.
> 
> This should be a per-device mapping, yes.  But I'm not sure that VCPUs
> should even see anything.  I don't think a VCPU can generate an MSI
> interrupt by writing to this location.
> 
No, and lower 4k of this space is where APIC is mapped as seen from CPU.

> >> the SDM and see nothing in the APIC protocol or the brief description
> >> of MSI as a PCI concept that would indicate anything except that the
> >> PHB handles MSI writes and feeds them to the I/O APIC.
> >> 
> > I/O APIC? Did you mean APIC, but even that will probably be incorrect.
> > I'd say it translates the data to APIC bus message. And with interrupt
> > remapping there is more magic happens between MSI and APIC bus.
> 
> I think the wording in the SDM allows either.
> 
SDM says nothing about it, but since we are guessing anyway my last guess
would be I/O APIC.  I/O APIC has well defined role: it detects level/edge
interrupts on an input pins and send preconfigured APIC message to APIC
bus if one is detected. In fact its even mapped at different address
0FEC00000H. Of course since I/O APIC and the logic that maps MSI writes
to APIC bus message are probably on the same chips anyway you can call
this logic a part of I/O APIC, but this is stretching it too much IMO :)

> >> In fact, the wikipedia article on MSI has:
> >> 
> >> "A common misconception with Message Signaled Interrupts is that they
> >> allow the device to send data to a processor as part of the interrupt.
> >> The data that is sent as part of the write is used by the chipset to
> >> determine which interrupt to trigger on which processor; it is not
> >> available for the device to communicate additional information to the
> >> interrupt handler."
> >> 
> > Not sure who claimed otherwise.
> 
> So to summarize:
> 
> 1) MSI writes are intercepted by the PHB and generates an appropriate
>    IRQ.
> 
> 2) The PHB has a tuple of (src device, address, data) plus whatever
>    information it maintains to do the translation.
> 
> 3) On Power, we can have multiple PHBs.
> 
Looks like that means we need to put more information into the kernel,
not less.

> 4) The kernel interface assumes a single flat table mapping (address,
>    data) to interrupts.  We try to keep that table up-to-date in QEMU.
> 
> 5) The reason the kernel has MSI info at all is to allow for IRQFDs to
>    generate MSI interrupts.
> 
> Is there anything that prevents us from using IRQFDs corresponding to
> the target of an MSI mapping and get rid of the MSI info in the kernel?
> 
Again, you assume that x86 has some pin that MSI triggers. This is not
the case; address/data is minimum that is needed to inject interrupt
there (or moving APIC into userspace, since this is where "translation"
is happening).

> It seems like the only sane way to actually support (2) and (3).
> 
> Regards,
> 
> Anthony Liguori

--
			Gleb.


WARNING: multiple messages have this Message-ID (diff)
From: Gleb Natapov <gleb@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alexander Graf <agraf@suse.de>,
	qemu-devel <qemu-devel@nongnu.org>,
	qemu-trivial@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support
Date: Mon, 24 Jun 2013 16:06:42 +0300	[thread overview]
Message-ID: <20130624130642.GJ18508@redhat.com> (raw)
In-Reply-To: <87sj07bsf3.fsf@codemonkey.ws>

On Mon, Jun 24, 2013 at 07:32:32AM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Sun, Jun 23, 2013 at 10:06:05AM -0500, Anthony Liguori wrote:
> >> On Thu, Jun 20, 2013 at 11:46 PM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote:
> >> >> On 06/21/2013 12:34 PM, Alex Williamson wrote:
> >> >>
> >> >>
> >> >> Do not follow you, sorry. For x86, is it that MSI routing table which is
> >> >> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
> >> >> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?
> >> >
> >> > vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data)
> >> >
> >> > This writes directly to the interrupt block on the vCPU.  With KVM, the
> >> > in-kernel APIC does the same write, where the pin to MSIMessage is setup
> >> > by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd.
> >> 
> >> What is this "interrupt block on the vCPU" you speak of?  I reviewed
> > FEE00000H address as seen from PCI bus is a special address range (see
> > 10.11.1 in SDM).
> 
> Ack.
> 
> > Any write by a PCI device to that address range is
> > interpreted as MSI. We do not model this correctly in QEMU yet since
> > all devices, including vcpus, see exactly same memory map.
> 
> This should be a per-device mapping, yes.  But I'm not sure that VCPUs
> should even see anything.  I don't think a VCPU can generate an MSI
> interrupt by writing to this location.
> 
No, and lower 4k of this space is where APIC is mapped as seen from CPU.

> >> the SDM and see nothing in the APIC protocol or the brief description
> >> of MSI as a PCI concept that would indicate anything except that the
> >> PHB handles MSI writes and feeds them to the I/O APIC.
> >> 
> > I/O APIC? Did you mean APIC, but even that will probably be incorrect.
> > I'd say it translates the data to APIC bus message. And with interrupt
> > remapping there is more magic happens between MSI and APIC bus.
> 
> I think the wording in the SDM allows either.
> 
SDM says nothing about it, but since we are guessing anyway my last guess
would be I/O APIC.  I/O APIC has well defined role: it detects level/edge
interrupts on an input pins and send preconfigured APIC message to APIC
bus if one is detected. In fact its even mapped at different address
0FEC00000H. Of course since I/O APIC and the logic that maps MSI writes
to APIC bus message are probably on the same chips anyway you can call
this logic a part of I/O APIC, but this is stretching it too much IMO :)

> >> In fact, the wikipedia article on MSI has:
> >> 
> >> "A common misconception with Message Signaled Interrupts is that they
> >> allow the device to send data to a processor as part of the interrupt.
> >> The data that is sent as part of the write is used by the chipset to
> >> determine which interrupt to trigger on which processor; it is not
> >> available for the device to communicate additional information to the
> >> interrupt handler."
> >> 
> > Not sure who claimed otherwise.
> 
> So to summarize:
> 
> 1) MSI writes are intercepted by the PHB and generates an appropriate
>    IRQ.
> 
> 2) The PHB has a tuple of (src device, address, data) plus whatever
>    information it maintains to do the translation.
> 
> 3) On Power, we can have multiple PHBs.
> 
Looks like that means we need to put more information into the kernel,
not less.

> 4) The kernel interface assumes a single flat table mapping (address,
>    data) to interrupts.  We try to keep that table up-to-date in QEMU.
> 
> 5) The reason the kernel has MSI info at all is to allow for IRQFDs to
>    generate MSI interrupts.
> 
> Is there anything that prevents us from using IRQFDs corresponding to
> the target of an MSI mapping and get rid of the MSI info in the kernel?
> 
Again, you assume that x86 has some pin that MSI triggers. This is not
the case; address/data is minimum that is needed to inject interrupt
there (or moving APIC into userspace, since this is where "translation"
is happening).

> It seems like the only sane way to actually support (2) and (3).
> 
> Regards,
> 
> Anthony Liguori

--
			Gleb.

  parent reply	other threads:[~2013-06-24 13:44 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 14:08 [Qemu-trivial] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
2013-06-20 14:08 ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-20 15:38 ` [Qemu-trivial] " Michael S. Tsirkin
2013-06-20 15:38   ` [Qemu-devel] " Michael S. Tsirkin
2013-06-20 16:37 ` [Qemu-trivial] " Anthony Liguori
2013-06-20 16:37   ` Anthony Liguori
2013-06-20 23:51   ` [Qemu-trivial] " Alexey Kardashevskiy
2013-06-20 23:51     ` Alexey Kardashevskiy
2013-06-23 14:07     ` [Qemu-trivial] " Michael S. Tsirkin
2013-06-23 14:07       ` Michael S. Tsirkin
2013-06-23 15:02       ` [Qemu-trivial] " Anthony Liguori
2013-06-23 15:02         ` Anthony Liguori
2013-06-23 21:39         ` [Qemu-trivial] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-23 21:39           ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-23 21:58           ` [Qemu-trivial] [Qemu-ppc] [Qemu-devel] " Anthony Liguori
2013-06-23 21:58             ` [Qemu-devel] [Qemu-ppc] " Anthony Liguori
2013-06-24  4:46             ` [Qemu-trivial] [Qemu-ppc] [Qemu-devel] " Alex Williamson
2013-06-24  4:46               ` [Qemu-devel] [Qemu-ppc] " Alex Williamson
2013-06-24 12:24               ` [Qemu-trivial] [Qemu-ppc] [Qemu-devel] " Anthony Liguori
2013-06-24 12:24                 ` [Qemu-devel] [Qemu-ppc] " Anthony Liguori
2013-06-24 12:39                 ` [Qemu-trivial] " Gleb Natapov
2013-06-24 12:39                   ` Gleb Natapov
2013-06-23 21:36       ` [Qemu-trivial] [Qemu-devel] " Benjamin Herrenschmidt
2013-06-23 21:36         ` Benjamin Herrenschmidt
2013-06-24 12:10         ` [Qemu-trivial] " Michael S. Tsirkin
2013-06-24 12:10           ` Michael S. Tsirkin
2013-06-20 16:51 ` [Qemu-trivial] " Alex Williamson
2013-06-20 16:51   ` [Qemu-devel] " Alex Williamson
2013-06-21  1:56   ` [Qemu-trivial] " Alexey Kardashevskiy
2013-06-21  1:56     ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-21  2:34     ` [Qemu-trivial] " Alex Williamson
2013-06-21  2:34       ` [Qemu-devel] " Alex Williamson
2013-06-21  2:49       ` [Qemu-trivial] " Alexey Kardashevskiy
2013-06-21  2:49         ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-21  4:46         ` [Qemu-trivial] " Alex Williamson
2013-06-21  4:46           ` [Qemu-devel] " Alex Williamson
2013-06-21  5:12           ` [Qemu-trivial] " Benjamin Herrenschmidt
2013-06-21  5:12             ` [Qemu-devel] " Benjamin Herrenschmidt
2013-06-21  6:03             ` [Qemu-trivial] " Alex Williamson
2013-06-21  6:03               ` [Qemu-devel] " Alex Williamson
2013-06-21  6:12               ` [Qemu-trivial] " Benjamin Herrenschmidt
2013-06-21  6:12                 ` [Qemu-devel] " Benjamin Herrenschmidt
2013-06-21  6:40                 ` [Qemu-trivial] " Alexey Kardashevskiy
2013-06-21  6:40                   ` [Qemu-devel] " Alexey Kardashevskiy
2013-06-23 15:06           ` [Qemu-trivial] " Anthony Liguori
2013-06-23 15:06             ` Anthony Liguori
2013-06-24  4:44             ` [Qemu-trivial] " Alex Williamson
2013-06-24  4:44               ` Alex Williamson
2013-06-24 12:25               ` [Qemu-trivial] " Anthony Liguori
2013-06-24 12:25                 ` Anthony Liguori
2013-06-24  7:13             ` [Qemu-trivial] " Gleb Natapov
2013-06-24  7:13               ` Gleb Natapov
2013-06-24 12:32               ` [Qemu-trivial] " Anthony Liguori
2013-06-24 12:32                 ` Anthony Liguori
2013-06-24 12:37                 ` [Qemu-trivial] " Alexander Graf
2013-06-24 12:37                   ` Alexander Graf
2013-06-24 13:06                 ` Gleb Natapov [this message]
2013-06-24 13:06                   ` Gleb Natapov
2013-06-24 13:34                   ` [Qemu-trivial] " Anthony Liguori
2013-06-24 13:34                     ` Anthony Liguori
2013-06-24 13:41                     ` [Qemu-trivial] " Michael S. Tsirkin
2013-06-24 13:41                       ` Michael S. Tsirkin
2013-06-24 14:31                       ` [Qemu-trivial] " Anthony Liguori
2013-06-24 14:31                         ` Anthony Liguori
2013-06-24 14:34                         ` [Qemu-trivial] " Alexander Graf
2013-06-24 14:34                           ` Alexander Graf
2013-06-24 15:17                           ` [Qemu-trivial] " Anthony Liguori
2013-06-24 15:17                             ` Anthony Liguori
2013-06-24 16:48                             ` [Qemu-trivial] " Gleb Natapov
2013-06-24 16:48                               ` Gleb Natapov
2013-06-24 16:35                     ` [Qemu-trivial] " Gleb Natapov
2013-06-24 16:35                       ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130624130642.GJ18508@redhat.com \
    --to=gleb@redhat.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=david@gibson.dropbear.id.au \
    --cc=mst@redhat.com \
    --cc=paulus@samba.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=qemu-trivial@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.