qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: "Michael S . Tsirkin" <mst@redhat.com>,
	Alexey Kardashevskiy <aik@ozlabs.ru>,
	Alexander Graf <agraf@suse.de>,
	qemu-devel <qemu-devel@nongnu.org>,
	qemu-trivial@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>,
	qemu-ppc@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Paul Mackerras <paulus@samba.org>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support
Date: Mon, 24 Jun 2013 16:06:42 +0300	[thread overview]
Message-ID: <20130624130642.GJ18508@redhat.com> (raw)
In-Reply-To: <87sj07bsf3.fsf@codemonkey.ws>

On Mon, Jun 24, 2013 at 07:32:32AM -0500, Anthony Liguori wrote:
> Gleb Natapov <gleb@redhat.com> writes:
> 
> > On Sun, Jun 23, 2013 at 10:06:05AM -0500, Anthony Liguori wrote:
> >> On Thu, Jun 20, 2013 at 11:46 PM, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >> > On Fri, 2013-06-21 at 12:49 +1000, Alexey Kardashevskiy wrote:
> >> >> On 06/21/2013 12:34 PM, Alex Williamson wrote:
> >> >>
> >> >>
> >> >> Do not follow you, sorry. For x86, is it that MSI routing table which is
> >> >> updated via KVM_SET_GSI_ROUTING in KVM? When there is no KVM, what piece of
> >> >> code responds on msi_notify() in qemu-x86 and does qemu_irq_pulse()?
> >> >
> >> > vfio_msi_interrupt->msi[x]_notify->stl_le_phys(msg.address, msg.data)
> >> >
> >> > This writes directly to the interrupt block on the vCPU.  With KVM, the
> >> > in-kernel APIC does the same write, where the pin to MSIMessage is setup
> >> > by kvm_irqchip_add_msi_route and the pin is pulled by an irqfd.
> >> 
> >> What is this "interrupt block on the vCPU" you speak of?  I reviewed
> > FEE00000H address as seen from PCI bus is a special address range (see
> > 10.11.1 in SDM).
> 
> Ack.
> 
> > Any write by a PCI device to that address range is
> > interpreted as MSI. We do not model this correctly in QEMU yet since
> > all devices, including vcpus, see exactly same memory map.
> 
> This should be a per-device mapping, yes.  But I'm not sure that VCPUs
> should even see anything.  I don't think a VCPU can generate an MSI
> interrupt by writing to this location.
> 
No, and lower 4k of this space is where APIC is mapped as seen from CPU.

> >> the SDM and see nothing in the APIC protocol or the brief description
> >> of MSI as a PCI concept that would indicate anything except that the
> >> PHB handles MSI writes and feeds them to the I/O APIC.
> >> 
> > I/O APIC? Did you mean APIC, but even that will probably be incorrect.
> > I'd say it translates the data to APIC bus message. And with interrupt
> > remapping there is more magic happens between MSI and APIC bus.
> 
> I think the wording in the SDM allows either.
> 
SDM says nothing about it, but since we are guessing anyway my last guess
would be I/O APIC.  I/O APIC has well defined role: it detects level/edge
interrupts on an input pins and send preconfigured APIC message to APIC
bus if one is detected. In fact its even mapped at different address
0FEC00000H. Of course since I/O APIC and the logic that maps MSI writes
to APIC bus message are probably on the same chips anyway you can call
this logic a part of I/O APIC, but this is stretching it too much IMO :)

> >> In fact, the wikipedia article on MSI has:
> >> 
> >> "A common misconception with Message Signaled Interrupts is that they
> >> allow the device to send data to a processor as part of the interrupt.
> >> The data that is sent as part of the write is used by the chipset to
> >> determine which interrupt to trigger on which processor; it is not
> >> available for the device to communicate additional information to the
> >> interrupt handler."
> >> 
> > Not sure who claimed otherwise.
> 
> So to summarize:
> 
> 1) MSI writes are intercepted by the PHB and generates an appropriate
>    IRQ.
> 
> 2) The PHB has a tuple of (src device, address, data) plus whatever
>    information it maintains to do the translation.
> 
> 3) On Power, we can have multiple PHBs.
> 
Looks like that means we need to put more information into the kernel,
not less.

> 4) The kernel interface assumes a single flat table mapping (address,
>    data) to interrupts.  We try to keep that table up-to-date in QEMU.
> 
> 5) The reason the kernel has MSI info at all is to allow for IRQFDs to
>    generate MSI interrupts.
> 
> Is there anything that prevents us from using IRQFDs corresponding to
> the target of an MSI mapping and get rid of the MSI info in the kernel?
> 
Again, you assume that x86 has some pin that MSI triggers. This is not
the case; address/data is minimum that is needed to inject interrupt
there (or moving APIC into userspace, since this is where "translation"
is happening).

> It seems like the only sane way to actually support (2) and (3).
> 
> Regards,
> 
> Anthony Liguori

--
			Gleb.

  parent reply	other threads:[~2013-06-24 13:07 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-20 14:08 [Qemu-devel] [PATCH] RFC kvm irqfd: add directly mapped MSI IRQ support Alexey Kardashevskiy
2013-06-20 15:38 ` Michael S. Tsirkin
2013-06-20 16:37 ` Anthony Liguori
2013-06-20 23:51   ` Alexey Kardashevskiy
2013-06-23 14:07     ` Michael S. Tsirkin
2013-06-23 15:02       ` Anthony Liguori
2013-06-23 21:39         ` [Qemu-devel] [Qemu-ppc] " Benjamin Herrenschmidt
2013-06-23 21:58           ` Anthony Liguori
2013-06-24  4:46             ` Alex Williamson
2013-06-24 12:24               ` Anthony Liguori
2013-06-24 12:39                 ` Gleb Natapov
2013-06-23 21:36       ` [Qemu-devel] " Benjamin Herrenschmidt
2013-06-24 12:10         ` Michael S. Tsirkin
2013-06-20 16:51 ` Alex Williamson
2013-06-21  1:56   ` Alexey Kardashevskiy
2013-06-21  2:34     ` Alex Williamson
2013-06-21  2:49       ` Alexey Kardashevskiy
2013-06-21  4:46         ` Alex Williamson
2013-06-21  5:12           ` Benjamin Herrenschmidt
2013-06-21  6:03             ` Alex Williamson
2013-06-21  6:12               ` Benjamin Herrenschmidt
2013-06-21  6:40                 ` Alexey Kardashevskiy
2013-06-23 15:06           ` Anthony Liguori
2013-06-24  4:44             ` Alex Williamson
2013-06-24 12:25               ` Anthony Liguori
2013-06-24  7:13             ` Gleb Natapov
2013-06-24 12:32               ` Anthony Liguori
2013-06-24 12:37                 ` Alexander Graf
2013-06-24 13:06                 ` Gleb Natapov [this message]
2013-06-24 13:34                   ` Anthony Liguori
2013-06-24 13:41                     ` Michael S. Tsirkin
2013-06-24 14:31                       ` Anthony Liguori
2013-06-24 14:34                         ` Alexander Graf
2013-06-24 15:17                           ` Anthony Liguori
2013-06-24 16:48                             ` Gleb Natapov
2013-06-24 16:35                     ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130624130642.GJ18508@redhat.com \
    --to=gleb@redhat.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=david@gibson.dropbear.id.au \
    --cc=mst@redhat.com \
    --cc=paulus@samba.org \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=qemu-trivial@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).