linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Gleb Natapov <gleb@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] kvm: enable irq injection from interrupt context
Date: Thu, 16 Sep 2010 16:51:17 +0200	[thread overview]
Message-ID: <20100916145117.GK3008@redhat.com> (raw)
In-Reply-To: <20100916142335.GC24850@redhat.com>

On Thu, Sep 16, 2010 at 04:23:35PM +0200, Michael S. Tsirkin wrote:
> > > > > There is no such thing as device's line status on real hardware, either.
> > > > > Devices do not drive INT# high: they drive it low (all the time)
> > > > > or do not drive it at all.
> > > > Same thing, other naming. Device either drive it low (irq_set(1)) or
> > > > not (irq_set(0)).
> > > 
> > > Well, if I drive it low any number of times it should hae no effect.
> > > 
> > There is no meaning of "driving low the line multiple times" on real HW.
> > You either drive it low or not. We try to emulate this with individual
> > "drive low/high" events in software.
> > > > > 
> > > > > Or consider express, the spec explicitly says:
> > > > > "Note: Duplicate Assert_INTx/Deassert_INTx Messages have no effect, but
> 
> Express has assert and deassert messages. It might be easier for
> you to think in these terms. level 1: assert, level 0: deassert.
> Seems a simple model and what we do models this pretty well.
> 
They are between device and pci controller. I am talking about what
happens between pci controller and irq chip. We are talking about
different things really.

> > > > >  are not errors."
> > > > > 
> > > > > > > 
> > > > > > > > > Another application is out of process virtio (sandboxing!).
> > > > > > > > It will still assert and de-assert irq at the same code, so it will be
> > > > > > > > able to track irq line status.
> > > > > > > > 
> > > > > > > > > Again, pci stuff needs to stay in qemu.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > Nothing to do with PCI whatsoever.
> > > > > > > > 
> > > > > > > > [0] most qemu devices behave incorrectly and trigger level irq more then
> > > > > > > >     needed.
> > > > > > > 
> > > > > > > Which devices?
> > > > > > Most of them. They just call update_irq_status() or something and
> > > > > > re-assert interrupt regardless of what previous status was.
> > > > > 
> > > > > At least for PCI devices, these calls do nothing if status does not change.
> > > > I am not sure what code are you locking at. e1000 device emulation
> > > > doesn't check previous line status before calling qemu_set_irq().
> > > 
> > > Right. If you dig through useless levels of indirection, you will
> > > see that each PCI device has 4 pin levels, when one of these
> > > changes this makes it up level to the parent bus, and so on.
> > Yes. Qemu PCI level does it right. Ideally device would not even invoke 
> > this logic if interrupt status haven't changed.
> 
> It needs to call *some* function to check status
> and assert, right? qemu_set_irq is that function.
qemu_set_irq does not check previous level and calls a callback
unconditionally.

> 
> > > 
> > > > > 
> > > > > > > pci core tracks line status and will never assert the same
> > > > > > > line multiple times.
> > > > > > That's good if pci core does this, but device shouldn't even try it.
> > > > > 
> > > > > I disagree. We don't want to duplicate a ton of code all over
> > > > > the codebase.
> > > > > 
> > > > So abstract it into a function. It shouldn't be part of PCI emulation.
> > > 
> > > I don't know what you mean by this, send a patch and we can discuss?
> > I don't care enough to send patch.  Just remember previous irq status
> > and do not call qemu_set_irq() if it doesn't change. Three lines of
> > code.
> 
> Heh, we have a ton of devices to support.
So?

> And then we need to migrate this extra status, and make sure it's in
> sync with PCI code.  We'll end up with much more more than 3 lines all
> of it in a very sensitive and hard to test parts code.
> 
You should be able to reconstruct it from device state. What should be
in sync with PCI code? 

> > > Note that when I patches PCI interrupt handling for compliance
> > > I made it mimic hardware as closely as possible: devices
> > > can send any # of assert/deassert messages, bus discards duplicates.
> > > 
> > Qemu PCI code is correct as far as I can see. Not all devices are connected
> > via PCI and there is not need to go through couple of layer of
> > indirection to figure out that nothing should be done.
> > 
> 
> If we want to remove the indirection I would be much more
> interested to remove it for all cases, not just when
> nothing should be done.
I don't care. This indirection may be justified for all I know. You try to
shift this discussion to areas I am not interested to look into :) All I
am saying is that each device is capable of knowing its current irq line
state and optimize out function call + additional logic. Whether upper
layer should handle two asserts without de-assert in between is different
point and I think we agree on it.

> 
> > > > > > > 
> > > > > > > > [1] this is how correct PCI device should behave but we override
> > > > > > > >     polarity in ACPI, but now incorrect behaviour is deeply designed
> > > > > > > >     into vhost-net.
> > > > > > > 
> > > > > > > Not really, vhost net signals an eventfd. What happens then is
> > > > > > > up to kvm.
> > > > > > > 
> > > > > > That is what current broken design does and it works, but if you want to
> > > > > > save unneeded calls into kvm fix design.
> > > > > 
> > > > > The interface seems clean enough: vhost handles virtio ring, qemu/kvm handle pci.
> > > > > Making vhost aware of pci breaks this, I would not call that fixing the
> > > > > design.
> > > > > 
> > > > Once again. Nothing to do with PCI, everything to do with device
> > > > emulation. And I propose to abstract interrupt assertion part into
> > > > irqfd, not into vhost.
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > This could work. KVM would need to find all irqfd
> > > objects mapped to gsi and notify them on deassert.
> > > No idea how hard this is.
> > > 
> > What for? Device emulation should do de-assert.
> 
> Sorry, but at this point I have no idea what you call device emulation.
The same thing everyone calls device emulation. In case of virtio-net it
is in hw/virtio-net.c. If vhost-net is in use device emulation is split
between userspace and kernel, but it is still just device emulation.

> qemu has code to de-assert. vhost has code to assert.
Good. So qemu will de-assert. So what do you mean by
"KVM would need to find all irqfd objects mapped to gsi and notify
them on deassert"

> I would like to optimize level interrupts and stop driving
> scheduler insane if at all possible.
> 
Worthy goal. Do it in irqfd. Irqfd shouldn't call kvm_set_irq() if irq
level hasn't changed.

--
			Gleb.

  reply	other threads:[~2010-09-16 14:51 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-15 18:54 [PATCH RFC] kvm: enable irq injection from interrupt context Michael S. Tsirkin
2010-09-16  9:02 ` Gleb Natapov
2010-09-16  9:10   ` Michael S. Tsirkin
2010-09-16  9:25     ` Gleb Natapov
2010-09-16  9:43       ` Michael S. Tsirkin
2010-09-16  9:46       ` Avi Kivity
2010-09-16  9:53         ` Michael S. Tsirkin
2010-09-16 10:13           ` Gleb Natapov
2010-09-16 10:13             ` Michael S. Tsirkin
2010-09-16 10:20               ` Gleb Natapov
2010-09-16 10:44                 ` Michael S. Tsirkin
2010-09-16 10:54                   ` Gleb Natapov
2010-09-16 10:53                     ` Michael S. Tsirkin
2010-09-16 11:17                       ` Gleb Natapov
2010-09-16 12:13                         ` Michael S. Tsirkin
2010-09-16 12:33                           ` Gleb Natapov
2010-09-16 12:57                             ` Michael S. Tsirkin
2010-09-16 13:14                               ` Avi Kivity
2010-09-16 13:38                                 ` Michael S. Tsirkin
2010-09-16 13:50                                   ` Avi Kivity
2010-09-16 13:55                                   ` Gleb Natapov
2010-09-16 13:18                               ` Gleb Natapov
2010-09-16 13:51                                 ` Michael S. Tsirkin
2010-09-16 14:06                                   ` Gleb Natapov
2010-09-16 14:23                                     ` Michael S. Tsirkin
2010-09-16 14:51                                       ` Gleb Natapov [this message]
2010-09-16 15:24                                         ` Michael S. Tsirkin
2010-09-16 15:43                                           ` Gleb Natapov
2010-09-16 22:07                                             ` Michael S. Tsirkin
2010-09-17  7:59                                               ` Gleb Natapov
2010-09-19 10:45                                                 ` Michael S. Tsirkin
2010-09-19 10:56                                                   ` Avi Kivity
2010-09-19 10:55                                                     ` Michael S. Tsirkin
2010-09-19 11:05                                                       ` Avi Kivity
2010-09-19 11:23                                                         ` Michael S. Tsirkin
2010-09-19 11:18                                                       ` Gleb Natapov
2010-09-19 11:21                                                         ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100916145117.GK3008@redhat.com \
    --to=gleb@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).