public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>,
	Richard Weinberger <richard@nod.at>,
	kvm@vger.kernel.org, avi@redhat.com,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting
Date: Fri, 25 May 2012 01:01:46 +0200 (CEST)	[thread overview]
Message-ID: <alpine.LFD.2.02.1205250018550.3231@ionos> (raw)
In-Reply-To: <1337897506.4714.55.camel@ul30vt>

On Thu, 24 May 2012, Alex Williamson wrote:
> On Thu, 2012-05-24 at 18:53 -0300, Jan Kiszka wrote:
> > On 2012-05-24 18:39, Thomas Gleixner wrote:
> > > On Thu, 24 May 2012, Alex Williamson wrote:
> > >> On Thu, 2012-05-24 at 18:02 +0100, Richard Weinberger wrote:
> > >>> +            if (address == msi_start + PCI_MSI_DATA_32)
> > >>> +                handle_cfg_write_msi(pci_dev, assigned_dev);
> > >>
> > >> Why didn't we just use range_covers_byte(address, len, pci_dev->msi_cap
> > >> + PCI_MSI_DATA_32) to start with?  But how does this handle the enable
> > >> bit?
> > > 
> > > The problem with the current implementation is that it only changes
> > > the routing if the msi entry goes from masked to unmasked state.
> > > 
> > > Linux does not mask the entries on affinity changes and never did,
> > > neither for MSI nor for MSI-X.
> > > 
> > > I know it's probably not according to the spec, but we can't fix that
> > > retroactively.
> > 
> > For MSI, this is allowed. For MSI-X, this would clearly be a Linux bug,
> > waiting for hardware to dislike this spec violation.
> > 
> > However, if this is the current behavior of such a prominent guest, I
> > guess we have to stop optimizing the QEMU MSI-X code that it only
> > updates routings on mask changes. Possibly other OSes get this wrong too...
> > 
> > Thanks, for the clarification. Should go into the changelog.
> 
> Hmm, if Linux didn't mask MSIX before updating vectors it'd not only be
> a spec violation, but my testing of the recent changes to fix MSIX
> vector updates for exactly this would have failed...
> 
>         } else if (msix_masked(&orig) && !msix_masked(entry)) {
>             ... update vector...
> 
> So I'm not entirely sure I believe that.  Thanks,

What happens is:

A write to /proc/irq/$N/smp_affinity calls into irq_set_affinity()
which does:

	if (irq_can_move_pcntxt(data)) {
		ret = chip->irq_set_affinity(data, mask, false);
	} else {
		irqd_set_move_pending(data);
		irq_copy_pending(desc, mask);
	}

MSI and MSI-X fall into the !irq_can_move_pcntxt() code path unless
the irq is remapped, which is not the case in a guest. That means that
we merily copy the new mask and set the move pending bit. 

MSI/MSI-X use the edge handler so on the next incoming interrupt, we
do

  irq_desc->chip->irq_ack()

which ends up in ack_apic_edge() which does:

static void ack_apic_edge(struct irq_data *data)
{
	irq_complete_move(data->chip_data);
	irq_move_irq(data);
	ack_APIC_irq();
}

irq_move_irq() is the interesting function. And that does

  irq_desc->chip->irq_mask()

before calling the irq_set_affinity() function which actually changes
the masks.
	      
->irq_mask() ends up in mask_msi_irq(). 

Now that calls msi_set_mask_bit() and for MSI-X that actually masks
the irq. So ignore my MSI-X comment.

But for MSI this ends up in msi_mask_irq() which does:

        if (!desc->msi_attrib.maskbit)
	   return 0;

So in case desc->msi_attrib.maskbit is 0 we do not write anything out
and then the masked/unmasked logic in qemu fails.

Sorry, that I did not decode that down to this level before, but I was
in a hurry and assumed correctly that qemu is doing something
wrong. Not being familiar with that code did not help either :)

So the proper fix is that qemu tells the guest that mask bit is
supported and catches the mask bit toggling before writing it out to
the hardware for those devices which do not support it.

We'll have another look.

Thanks,

	tglx

  reply	other threads:[~2012-05-24 23:01 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-24 17:02 [PATCH 1/2] Remove kvm_commit_irq_routes from error messages Richard Weinberger
2012-05-24 17:02 ` [PATCH 2/2] Device assignment: Fix MSI IRQ affinity setting Richard Weinberger
2012-05-24 18:20   ` Alex Williamson
     [not found]     ` <CAEMbtc+ycsC6u=CZ_Yg6C=WV=VqjA2uEDM5KWPM_7n3sZh_9Pw@mail.gmail.com>
2012-05-24 19:27       ` Richard Weinberger
2012-05-24 21:39     ` Thomas Gleixner
2012-05-24 21:53       ` Jan Kiszka
2012-05-24 22:11         ` Alex Williamson
2012-05-24 23:01           ` Thomas Gleixner [this message]
2012-05-24 23:23             ` Alex Williamson
2012-05-24 23:56               ` Thomas Gleixner
2012-05-25  2:37                 ` Jan Kiszka
2012-05-24 22:17         ` Michael S. Tsirkin
2012-05-24 23:06           ` Thomas Gleixner
2012-05-24 23:19             ` Thomas Gleixner
2012-05-24 22:05       ` Alex Williamson
2012-05-24 20:47   ` Jan Kiszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.02.1205250018550.3231@ionos \
    --to=tglx@linutronix.de \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=richard@nod.at \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox