* Re: pci_enable_msix() fails with ENOMEM/EINVAL
[not found] ` <1353960253.1809.128.camel@bling.home>
@ 2012-11-29 8:42 ` Alex Lyakas
2012-11-29 15:56 ` Alex Williamson
0 siblings, 1 reply; 8+ messages in thread
From: Alex Lyakas @ 2012-11-29 8:42 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
Hi Alex,
did I understand correctly that "vector" value with which the call to
request_threaded_irq() is made is *not* supposed to be zero? Because in my
case, it is always zero, and still the failure I observe is not happening
always. Usually, 3 unique (non-zero) IRQ numbers are assigned to each
attached PCI device of each KVM VM.
I will try to repro this like you suggested and let you know.
Thanks for your help,
Alex.
-----Original Message-----
From: Alex Williamson
Sent: 26 November, 2012 10:04 PM
To: Alex Lyakas
Cc: kvm@vger.kernel.org
Subject: Re: pci_enable_msix() fails with ENOMEM/EINVAL
On Thu, 2012-11-22 at 10:52 +0200, Alex Lyakas wrote:
> Hi Alex,
> thanks for your response.
>
> I printed out the "vector" and "entry" values of dev->host_msix_entries[i]
> within assigned_device_enable_host_msix() before call to
> request_threaded_irq(). I see that they are all 0s:
> kernel: [ 3332.610980] kvm-8095: KVM_ASSIGN_DEV_IRQ assigned_dev_id=924
> kernel: [ 3332.610985] kvm-8095: assigned_device_enable_host_msix()
> assigned_dev_id=924 #0: [v=0 e=0]
> kernel: [ 3332.610989] kvm-8095: assigned_device_enable_host_msix()
> assigned_dev_id=924 #1: [v=0 e=1]
> kernel: [ 3332.610992] kvm-8095: assigned_device_enable_host_msix()
> assigned_dev_id=924 #2: [v=0 e=2]
>
> So I don't really understand how they all ask for irq=0; I must be missing
> something. Is there any other explanation of request_threaded_irq() to
> return EBUSY? From the code I don't see that there is.
The vectors all being zero sounds like an indication that
pci_enable_msix() didn't actually work. Each of those should be a
unique vector. Does booting the host with "nointremap" perhaps make a
difference? Maybe we can isolate the problem to the interrupt remapper
code.
> This issue is reproducible and is not going to go away by itself. Working
> around it is also problematic. We thought to check whether all IRQs are
> properly attached after QEMU sets the vm state to "running". However, vm
> state is set to "running" before IRQ attachments are performed; we
> debugged
> this and found out that they are done from a different thread, from a
> stack
> trace like this:
> kvm_assign_irq()
> assigned_dev_update_msix()
> assigned_dev_pci_write_config()
> pci_host_config_write_common()
> pci_data_write()
> pci_host_data_write()
> memory_region_write_accessor()
> access_with_adjusted_size()
> memory_region_iorange_write()
> ioport_writew_thunk()
> ioport_write()
> cpu_outw()
> kvm_handle_io()
> kvm_cpu_exec()
> qemu_kvm_cpu_thread_fn()
>
> So looks like this is performed on-demand (on first IO), so no reliable
> point to check that IRQs are attached properly.
Correct, MSI-X is setup when the guest enables MSI-X on the device,
which is likely a long way into guest boot. There's no guarantee that
the guest ever enables MSI-X, so there's no association to whether the
guest is "running".
> Another issue that in KVM
> code the return value of pci_host_config_write_common() is not checked, so
> there is no way to report a failure.
A common problem in qemu, imho
> Is there any way you think you can help me debug this further?
It seems like pci_enable_msix is still failing, but perhaps silently
without irqbalance. We need to figure out where and why. Isolating it
to the interrupt remapper with "nointremap" might give us some clues
(this is an Intel VT-d system, right?). Thanks,
Alex
> -----Original Message-----
> From: Alex Williamson
> Sent: 22 November, 2012 12:25 AM
> To: Alex Lyakas
> Cc: kvm@vger.kernel.org
> Subject: Re: pci_enable_msix() fails with ENOMEM/EINVAL
>
> On Wed, 2012-11-21 at 16:19 +0200, Alex Lyakas wrote:
> > Hi,
> > I was advised to turn off irqbalance and reproduced this issue, but
> > the failure is in a different place now. Now request_threaded_irq()
> > fails with EBUSY.
> > According to the code, this can only happen on the path:
> > request_threaded_irq() -> __setup_irq()
> > Now in setup irq, the only place where EBUSY can show up for us is here:
> > ...
> > raw_spin_lock_irqsave(&desc->lock, flags);
> > old_ptr = &desc->action;
> > old = *old_ptr;
> > if (old) {
> > /*
> > * Can't share interrupts unless both agree to and are
> > * the same type (level, edge, polarity). So both flag
> > * fields must have IRQF_SHARED set and the bits which
> > * set the trigger type must match. Also all must
> > * agree on ONESHOT.
> > */
> > if (!((old->flags & new->flags) & IRQF_SHARED) ||
> > ((old->flags ^ new->flags) & IRQF_TRIGGER_MASK) ||
> > ((old->flags ^ new->flags) & IRQF_ONESHOT)) {
> > old_name = old->name;
> > goto mismatch;
> > }
> >
> > /* All handlers must agree on per-cpuness */
> > if ((old->flags & IRQF_PERCPU) !=
> > (new->flags & IRQF_PERCPU))
> > goto mismatch;
> >
> > KVM calls request_threaded_irq() with flags==0, so can it be that
> > different KVM processes request the same IRQ?
>
> Shouldn't be possible, irqs are allocated from a bitmap protected by a
> mutex, see __irq_alloc_descs
>
> > How different KVM
> > processes spawned simultaneously agree between them on IRQ numbers?
>
> They don't, MSI/X vectors are not currently share-able. Can you show
> that you're actually getting duplicate irq vectors? Thanks,
>
> Alex
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-11-29 8:42 ` pci_enable_msix() fails with ENOMEM/EINVAL Alex Lyakas
@ 2012-11-29 15:56 ` Alex Williamson
2012-12-08 12:14 ` Alex Lyakas
0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2012-11-29 15:56 UTC (permalink / raw)
To: Alex Lyakas; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
On Thu, 2012-11-29 at 10:42 +0200, Alex Lyakas wrote:
> Hi Alex,
> did I understand correctly that "vector" value with which the call to
> request_threaded_irq() is made is *not* supposed to be zero? Because in my
> case, it is always zero, and still the failure I observe is not happening
> always. Usually, 3 unique (non-zero) IRQ numbers are assigned to each
> attached PCI device of each KVM VM.
> I will try to repro this like you suggested and let you know.
pci_enable_msix should be setting vectors for all the host_msix_entries.
That non-zero vector is the first parameter to request_threaded_irq. If
I put a printk in the loop installing the interrupt handler, I get:
assigned_device_enable_host_msix entry 0, vector 44
assigned_device_enable_host_msix entry 0, vector 44
assigned_device_enable_host_msix entry 1, vector 45
assigned_device_enable_host_msix entry 0, vector 44
assigned_device_enable_host_msix entry 1, vector 45
assigned_device_enable_host_msix entry 2, vector 103
So we enable MSI-X with only one vector enabled, then a second vector
gets enabled and we disable and re-enable with two, and so on with
three. This is for an assigned e1000e device. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-11-29 15:56 ` Alex Williamson
@ 2012-12-08 12:14 ` Alex Lyakas
2012-12-10 20:45 ` Alex Williamson
0 siblings, 1 reply; 8+ messages in thread
From: Alex Lyakas @ 2012-12-08 12:14 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
[-- Attachment #1: Type: text/plain, Size: 4784 bytes --]
Hi Alex,
first of all, I apologize for not reporting things properly. The "vector"
values that I see passed to request_threaded_irq() are not zeros. They are
unique IRQ values as you mentioned. I do not, however, see the
disabling-and-reenabling sequence as you wrote. The usual sequence for me is
calling KVM_ASSIGN_DEV_IRQ twice for the same assigned_dev-id:
[ 2423.990404] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
[ 2423.991585] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=946 #0: [v=158 e=0]
[ 2423.991810] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=946 #1: [v=159 e=1]
[ 2423.991869] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=946 #2: [v=160 e=2]
[ 2424.054320] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
[ 2424.055337] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=946 #0: [v=158 e=0]
[ 2424.055516] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=946 #1: [v=159 e=1]
[ 2424.055673] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=946 #2: [v=160 e=2]
As you requested, I specified the intremap=off option to the kernel (saw
that nointremap is deprecated), and reproduced. I had to set the
allow_unsafe_assigned_interrupts parameter of kvm.ko module.
With irqbalance running, I saw that pci_enable_msix() does not report
failure, but does not set up the interrupt verctor, and further call to
request_threaded_irq() fails with EBUSY:
[ 2425.292444] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=913
[ 2425.292458] kvm-32353: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=913 #0: [v=0 e=0]
[ 2425.292753] kvm-32353: assigned_device_enable_host_msix()
request_threaded_irq(vector=0) failed assigned_dev_id=913
irq_name=kvm:0000:03:12.1 err=-16
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.294434] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=913 r=-16
Then I turned irqbalance off, and reproduced again; this time
pci_enable_msix() returns EINVAL (-22). I printed out the vector numbers
after pci_enable_msix() failure, and they are all zeros:
[151186.446688] kvm-1923: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
[151186.446705] kvm-1923: assigned_device_enable_host_msix()
pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
err=-22
[151186.484409] kvm-1923: assigned_device_enable_host_msix()
assigned_dev_id=918 #0: [v=0 e=0]
[151186.484413] kvm-1923: assigned_device_enable_host_msix()
assigned_dev_id=918 #1: [v=0 e=1]
[151186.484416] kvm-1923: assigned_device_enable_host_msix()
assigned_dev_id=918 #2: [v=0 e=2]
[151186.484419] kvm-1923: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 r=-22
[151186.522328] kvm-1923: kvm_vm_ioctl_set_msix_nr assigned_dev_id=918
already entries_nr=3
[151186.561093] kvm-1923: kvm_vm_ioctl_set_msix_nr assigned_dev_id=918 r=-22
I attach the full printouts from these two reproductions (kvm-XXX, XXX being
current->pid).
BTW, these errnos seem pretty random; I think if I try to repro this again,
I might get different errnos.
Please let me know whether you have any suggestions on how to debug this
further.
Thanks for your help,
Alex.
-----Original Message-----
From: Alex Williamson
Sent: 29 November, 2012 5:56 PM
To: Alex Lyakas
Cc: kvm@vger.kernel.org ; linux-pci@vger.kernel.org ; Yair Hershko ; Shyam
Kaushik
Subject: Re: pci_enable_msix() fails with ENOMEM/EINVAL
On Thu, 2012-11-29 at 10:42 +0200, Alex Lyakas wrote:
> Hi Alex,
> did I understand correctly that "vector" value with which the call to
> request_threaded_irq() is made is *not* supposed to be zero? Because in my
> case, it is always zero, and still the failure I observe is not happening
> always. Usually, 3 unique (non-zero) IRQ numbers are assigned to each
> attached PCI device of each KVM VM.
> I will try to repro this like you suggested and let you know.
pci_enable_msix should be setting vectors for all the host_msix_entries.
That non-zero vector is the first parameter to request_threaded_irq. If
I put a printk in the loop installing the interrupt handler, I get:
assigned_device_enable_host_msix entry 0, vector 44
assigned_device_enable_host_msix entry 0, vector 44
assigned_device_enable_host_msix entry 1, vector 45
assigned_device_enable_host_msix entry 0, vector 44
assigned_device_enable_host_msix entry 1, vector 45
assigned_device_enable_host_msix entry 2, vector 103
So we enable MSI-X with only one vector enabled, then a second vector
gets enabled and we disable and re-enable with two, and so on with
three. This is for an assigned e1000e device. Thanks,
Alex
[-- Attachment #2: nointremap_noirqbalance.txt --]
[-- Type: text/plain, Size: 4719 bytes --]
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.488251] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=921
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.489215] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=921 #0: [v=143 e=0]
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.489449] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=921 #1: [v=144 e=1]
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.489741] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=921 #2: [v=145 e=2]
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.552144] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=921
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.553223] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=921 #0: [v=143 e=0]
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.553498] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=921 #1: [v=144 e=1]
Dec 8 12:59:13 ubuntu-sata-21 kernel: [151185.553745] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=921 #2: [v=145 e=2]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.811675] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=938
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.812579] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=938 #0: [v=200 e=0]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.812837] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=938 #1: [v=201 e=1]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.813039] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=938 #2: [v=202 e=2]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.847699] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=938
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.848785] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=938 #0: [v=200 e=0]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.849011] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=938 #1: [v=201 e=1]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.849280] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=938 #2: [v=202 e=2]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.943489] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=943
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.944598] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=943 #0: [v=212 e=0]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.944838] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=943 #1: [v=213 e=1]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151185.945097] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=943 #2: [v=214 e=2]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.007430] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=943
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.008407] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=943 #0: [v=212 e=0]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.008628] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=943 #1: [v=213 e=1]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.008902] kvm-1923:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=943 #2: [v=214 e=2]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.446688] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.446705] kvm-1923:
assigned_device_enable_host_msix() pci_enable_msix() failed
assigned_dev_id=918 irq_name=kvm:0000:03:12.6 err=-22
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.484409] kvm-1923:
assigned_device_enable_host_msix() assigned_dev_id=918 #0: [v=0 e=0]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.484413] kvm-1923:
assigned_device_enable_host_msix() assigned_dev_id=918 #1: [v=0 e=1]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.484416] kvm-1923:
assigned_device_enable_host_msix() assigned_dev_id=918 #2: [v=0 e=2]
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.484419] kvm-1923:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 r=-22
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.522328] kvm-1923:
kvm_vm_ioctl_set_msix_nr assigned_dev_id=918 already entries_nr=3
Dec 8 12:59:14 ubuntu-sata-21 kernel: [151186.561093] kvm-1923:
kvm_vm_ioctl_set_msix_nr assigned_dev_id=918 r=-22
[-- Attachment #3: nointremap.txt --]
[-- Type: text/plain, Size: 4490 bytes --]
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2423.990404] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2423.991585] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=946 #0: [v=158 e=0]
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2423.991810] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=946 #1: [v=159 e=1]
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2423.991869] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=946 #2: [v=160 e=2]
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2424.054320] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2424.055337] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=946 #0: [v=158 e=0]
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2424.055516] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=946 #1: [v=159 e=1]
Dec 6 19:36:02 ubuntu-sata-21 kernel: [ 2424.055673] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=946 #2: [v=160 e=2]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.292444] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=913
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.292458] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=913 #0: [v=0 e=0]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.292753] kvm-32353:
assigned_device_enable_host_msix() request_threaded_irq(vector=0) failed
assigned_dev_id=913 irq_name=kvm:0000:03:12.1 err=-16
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.294434] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=913 r=-16
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.295384] kvm-32353:
kvm_vm_ioctl_set_msix_nr assigned_dev_id=913 already entries_nr=3
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.296442] kvm-32353:
kvm_vm_ioctl_set_msix_nr assigned_dev_id=913 r=-22
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.400256] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=945
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.401259] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=945 #0: [v=191 e=0]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.401551] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=945 #1: [v=192 e=1]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.401727] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=945 #2: [v=193 e=2]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.448243] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=945
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.449254] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=945 #0: [v=191 e=0]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.449469] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=945 #1: [v=192 e=1]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.449752] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=945 #2: [v=193 e=2]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.583978] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=896
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.584959] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=896 #0: [v=236 e=0]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.585201] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=896 #1: [v=237 e=1]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.585362] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=896 #2: [v=238 e=2]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.647911] kvm-32353:
KVM_ASSIGN_DEV_IRQ assigned_dev_id=896
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.648903] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=896 #0: [v=236 e=0]
Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.649214] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=896 #1: [v=237 e=1]
Dec 6 19:36:05 ubuntu-sata-21 kernel: [ 2425.649398] kvm-32353:
assigned_device_enable_host_msix(): call request_threaded_irq()
assigned_dev_id=896 #2: [v=238 e=2]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-12-08 12:14 ` Alex Lyakas
@ 2012-12-10 20:45 ` Alex Williamson
2012-12-12 13:42 ` Alex Lyakas
0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2012-12-10 20:45 UTC (permalink / raw)
To: Alex Lyakas; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
On Sat, 2012-12-08 at 14:14 +0200, Alex Lyakas wrote:
> Hi Alex,
> first of all, I apologize for not reporting things properly. The "vector"
> values that I see passed to request_threaded_irq() are not zeros. They are
> unique IRQ values as you mentioned. I do not, however, see the
> disabling-and-reenabling sequence as you wrote.
The disabling-reenabling is an artifact of the new code to handle
runtime vector updates, which enables CPU affinity of vectors and guests
that behave slightly different, like FreeBSD. This is a good reminder
that you're running an old version of qemu and should probably update to
the latest release for debugging.
> The usual sequence for me is
> calling KVM_ASSIGN_DEV_IRQ twice for the same assigned_dev-id:
>
> [ 2423.990404] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
> [ 2423.991585] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #0: [v=158 e=0]
> [ 2423.991810] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #1: [v=159 e=1]
> [ 2423.991869] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #2: [v=160 e=2]
> [ 2424.054320] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
> [ 2424.055337] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #0: [v=158 e=0]
> [ 2424.055516] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #1: [v=159 e=1]
> [ 2424.055673] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #2: [v=160 e=2]
>
> As you requested, I specified the intremap=off option to the kernel (saw
> that nointremap is deprecated), and reproduced. I had to set the
> allow_unsafe_assigned_interrupts parameter of kvm.ko module.
Yep, sorry I forgot to mention that. What kernel are you trying all of
this on? If it's not a recent upstream kernel, please verify on that.
> With irqbalance running, I saw that pci_enable_msix() does not report
> failure, but does not set up the interrupt verctor, and further call to
> request_threaded_irq() fails with EBUSY:
>
> [ 2425.292444] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=913
> [ 2425.292458] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=913 #0: [v=0 e=0]
> [ 2425.292753] kvm-32353: assigned_device_enable_host_msix()
> request_threaded_irq(vector=0) failed assigned_dev_id=913
> irq_name=kvm:0000:03:12.1 err=-16
> Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.294434] kvm-32353:
> KVM_ASSIGN_DEV_IRQ assigned_dev_id=913 r=-16
>
> Then I turned irqbalance off, and reproduced again; this time
> pci_enable_msix() returns EINVAL (-22). I printed out the vector numbers
> after pci_enable_msix() failure, and they are all zeros:
>
> [151186.446688] kvm-1923: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> [151186.446705] kvm-1923: assigned_device_enable_host_msix()
> pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
> err=-22
> [151186.484409] kvm-1923: assigned_device_enable_host_msix()
> assigned_dev_id=918 #0: [v=0 e=0]
> [151186.484413] kvm-1923: assigned_device_enable_host_msix()
> assigned_dev_id=918 #1: [v=0 e=1]
> [151186.484416] kvm-1923: assigned_device_enable_host_msix()
> assigned_dev_id=918 #2: [v=0 e=2]
> [151186.484419] kvm-1923: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 r=-22
> [151186.522328] kvm-1923: kvm_vm_ioctl_set_msix_nr assigned_dev_id=918
> already entries_nr=3
> [151186.561093] kvm-1923: kvm_vm_ioctl_set_msix_nr assigned_dev_id=918 r=-22
>
> I attach the full printouts from these two reproductions (kvm-XXX, XXX being
> current->pid).
>
> BTW, these errnos seem pretty random; I think if I try to repro this again,
> I might get different errnos.
>
> Please let me know whether you have any suggestions on how to debug this
> further.
So you have two failure modes. In one case pci_enable_msix returns
success and in the other it returns failure, but in both cases the
vectors are all zero. I think the next step is to try to reproduce on
current upstream kernel and qemu. If those still generate similar
failures instrument the pci_enable_msix path to pinpoint where it's
going wrong. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-12-10 20:45 ` Alex Williamson
@ 2012-12-12 13:42 ` Alex Lyakas
2012-12-12 20:10 ` Alex Williamson
0 siblings, 1 reply; 8+ messages in thread
From: Alex Lyakas @ 2012-12-12 13:42 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
Hi Alex,
The kernel am using is stock Ubuntu-Precise (as I mentioned)
3.2.0-29-generic #46.
I debugged deeper in pci_enable_msix() path, and I believe I have tracked
down the source of the different errors I am seeing. The issue I see is that
pci_find_capability() *sometimes* fails to find PCI_CAP_ID_MSIX capability.
I tracked this down to __pci_find_next_cap_ttl() not being able to find the
needed cap and returns 0.
This leads to three different errors I saw:
Case 1: pci_enable_msix() calls pci_msi_check_device(dev, nvec,
PCI_CAP_ID_MSIX), which calls pci_find_capability() and further returns
EINVAL
Case 2: pci_enable_msix() calls pci_msix_table_size(), which calls
pci_find_capability(dev, PCI_CAP_ID_MSIX), which returns 0. As a result,
pci_msix_table_size() returns 0. Then the following code:
nr_entries = pci_msix_table_size(dev);
if (nvec > nr_entries)
return nr_entries;
causes pci_enable_msix() to return 0 to the caller, so now the caller thinks
that pci_enable_msix() succeeded! However, IRQs were not assigned, so
further call to request_threaded_irq() fails.
(This looks like a kernel bug.)
Case 3: pci_enable_msix() calls msix_capability_init(), which calls
pci_find_capability(dev, PCI_CAP_ID_MSIX), which fails and returns pos==0.
So now msix_capability_init() does not check the return value and proceeds
with setting things up, and eventually fails in call to msix_map_region() =>
ioremap_nocache(), which returns NULL. So msix_capability_init() returns
ENOMEM and further pci_enable_msix() returns ENOMEM.
(So this is another kernel bug of not checking return value of
pci_find_capability(dev, PCI_CAP_ID_MSIX).)
So bottom line is that different failures I saw stem from the same issue of
(sometimes) not finding PCI_CAP_ID_MSIX. It always happens on the second
call to KVM_ASSIGN_DEV_IRQ. Why are two calls required? For example, the
below trace shows the first call succeeding and setting up interrupts
properly, and the second call failing:
kernel: [ 4851.332462] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
kernel: [ 4851.333553] pci-stub 0000:03:12.6: irq 257 for MSI/MSI-X
kernel: [ 4851.333573] pci-stub 0000:03:12.6: irq 258 for MSI/MSI-X
kernel: [ 4851.333591] pci-stub 0000:03:12.6: irq 259 for MSI/MSI-X
kernel: [ 4851.333600] kvm-7636: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=918 #0: [v=257 e=0]
kernel: [ 4851.333845] kvm-7636: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=918 #1: [v=258 e=1]
kernel: [ 4851.334055] kvm-7636: assigned_device_enable_host_msix(): call
request_threaded_irq() assigned_dev_id=918 #2: [v=259 e=2]
kernel: [ 4851.334241] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 OK
kernel: [ 4851.396339] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
kernel: [ 4851.396353] kvm-???: pci_find_capability
__pci_find_next_cap(devfn=150 pos=0) dev=0000:03:12.6 returned 0
kernel: [ 4851.412754] kvm-???: msix_map_region ioremap_nocache(0x100000,
nr_entries=2048 PCI_MSIX_ENTRY_SIZE=16 dev=0000:03:12.6 pos=0 NULL
kernel: [ 4851.429672] kvm-???: msix_capability_init
msix_map_region(dev=0000:03:12.6, pos=0 control=32767) failed NULL
kernel: [ 4851.438726] kvm-???: pci_enable_msix(0000:03:12.6)
msix_capability_init failed, err=-12
kernel: [ 4851.465439] kvm-7636: assigned_device_enable_host_msix()
pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
err=-12
kernel: [ 4851.484072] kvm-7636: assigned_device_enable_host_msix()
assigned_dev_id=918 #0: [v=0 e=0]
kernel: [ 4851.484076] kvm-7636: assigned_device_enable_host_msix()
assigned_dev_id=918 #1: [v=0 e=1]
kernel: [ 4851.484081] kvm-7636: assigned_device_enable_host_msix()
assigned_dev_id=918 #2: [v=0 e=2]
kernel: [ 4851.484086] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
r=-12
Do you think the issue is the second KVM call, or the issue is the HW not
reporting the capability sometimes?
Looking at the latest 3.7 kernel code sources, I see that all relevant
functions (pci_enable_msix, pci_find_capability and msix_capability_init)
would behave the same way WRT to pci_find_capability() call. And anyways,
moving to upstream kernel is not an option for me at this point.
Please let me know what do you think the next step would be.
Thanks,
Alex.
-----Original Message-----
From: Alex Williamson
Sent: 10 December, 2012 10:45 PM
To: Alex Lyakas
Cc: kvm@vger.kernel.org ; linux-pci@vger.kernel.org ; Yair Hershko ; Shyam
Kaushik
Subject: Re: pci_enable_msix() fails with ENOMEM/EINVAL
On Sat, 2012-12-08 at 14:14 +0200, Alex Lyakas wrote:
> Hi Alex,
> first of all, I apologize for not reporting things properly. The "vector"
> values that I see passed to request_threaded_irq() are not zeros. They are
> unique IRQ values as you mentioned. I do not, however, see the
> disabling-and-reenabling sequence as you wrote.
The disabling-reenabling is an artifact of the new code to handle
runtime vector updates, which enables CPU affinity of vectors and guests
that behave slightly different, like FreeBSD. This is a good reminder
that you're running an old version of qemu and should probably update to
the latest release for debugging.
> The usual sequence for me is
> calling KVM_ASSIGN_DEV_IRQ twice for the same assigned_dev-id:
>
> [ 2423.990404] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
> [ 2423.991585] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #0: [v=158 e=0]
> [ 2423.991810] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #1: [v=159 e=1]
> [ 2423.991869] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #2: [v=160 e=2]
> [ 2424.054320] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=946
> [ 2424.055337] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #0: [v=158 e=0]
> [ 2424.055516] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #1: [v=159 e=1]
> [ 2424.055673] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=946 #2: [v=160 e=2]
>
> As you requested, I specified the intremap=off option to the kernel (saw
> that nointremap is deprecated), and reproduced. I had to set the
> allow_unsafe_assigned_interrupts parameter of kvm.ko module.
Yep, sorry I forgot to mention that. What kernel are you trying all of
this on? If it's not a recent upstream kernel, please verify on that.
> With irqbalance running, I saw that pci_enable_msix() does not report
> failure, but does not set up the interrupt verctor, and further call to
> request_threaded_irq() fails with EBUSY:
>
> [ 2425.292444] kvm-32353: KVM_ASSIGN_DEV_IRQ assigned_dev_id=913
> [ 2425.292458] kvm-32353: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=913 #0: [v=0 e=0]
> [ 2425.292753] kvm-32353: assigned_device_enable_host_msix()
> request_threaded_irq(vector=0) failed assigned_dev_id=913
> irq_name=kvm:0000:03:12.1 err=-16
> Dec 6 19:36:04 ubuntu-sata-21 kernel: [ 2425.294434] kvm-32353:
> KVM_ASSIGN_DEV_IRQ assigned_dev_id=913 r=-16
>
> Then I turned irqbalance off, and reproduced again; this time
> pci_enable_msix() returns EINVAL (-22). I printed out the vector numbers
> after pci_enable_msix() failure, and they are all zeros:
>
> [151186.446688] kvm-1923: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> [151186.446705] kvm-1923: assigned_device_enable_host_msix()
> pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
> err=-22
> [151186.484409] kvm-1923: assigned_device_enable_host_msix()
> assigned_dev_id=918 #0: [v=0 e=0]
> [151186.484413] kvm-1923: assigned_device_enable_host_msix()
> assigned_dev_id=918 #1: [v=0 e=1]
> [151186.484416] kvm-1923: assigned_device_enable_host_msix()
> assigned_dev_id=918 #2: [v=0 e=2]
> [151186.484419] kvm-1923: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 r=-22
> [151186.522328] kvm-1923: kvm_vm_ioctl_set_msix_nr assigned_dev_id=918
> already entries_nr=3
> [151186.561093] kvm-1923: kvm_vm_ioctl_set_msix_nr assigned_dev_id=918
> r=-22
>
> I attach the full printouts from these two reproductions (kvm-XXX, XXX
> being
> current->pid).
>
> BTW, these errnos seem pretty random; I think if I try to repro this
> again,
> I might get different errnos.
>
> Please let me know whether you have any suggestions on how to debug this
> further.
So you have two failure modes. In one case pci_enable_msix returns
success and in the other it returns failure, but in both cases the
vectors are all zero. I think the next step is to try to reproduce on
current upstream kernel and qemu. If those still generate similar
failures instrument the pci_enable_msix path to pinpoint where it's
going wrong. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-12-12 13:42 ` Alex Lyakas
@ 2012-12-12 20:10 ` Alex Williamson
2012-12-18 14:46 ` Alex Lyakas
0 siblings, 1 reply; 8+ messages in thread
From: Alex Williamson @ 2012-12-12 20:10 UTC (permalink / raw)
To: Alex Lyakas; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
On Wed, 2012-12-12 at 15:42 +0200, Alex Lyakas wrote:
> Hi Alex,
> The kernel am using is stock Ubuntu-Precise (as I mentioned)
> 3.2.0-29-generic #46.
>
> I debugged deeper in pci_enable_msix() path, and I believe I have tracked
> down the source of the different errors I am seeing. The issue I see is that
> pci_find_capability() *sometimes* fails to find PCI_CAP_ID_MSIX capability.
> I tracked this down to __pci_find_next_cap_ttl() not being able to find the
> needed cap and returns 0.
Hmm, perhaps a config space access race? Everything seems properly
locked on my kernel.
> This leads to three different errors I saw:
>
> Case 1: pci_enable_msix() calls pci_msi_check_device(dev, nvec,
> PCI_CAP_ID_MSIX), which calls pci_find_capability() and further returns
> EINVAL
>
> Case 2: pci_enable_msix() calls pci_msix_table_size(), which calls
> pci_find_capability(dev, PCI_CAP_ID_MSIX), which returns 0. As a result,
> pci_msix_table_size() returns 0. Then the following code:
> nr_entries = pci_msix_table_size(dev);
> if (nvec > nr_entries)
> return nr_entries;
> causes pci_enable_msix() to return 0 to the caller, so now the caller thinks
> that pci_enable_msix() succeeded! However, IRQs were not assigned, so
> further call to request_threaded_irq() fails.
> (This looks like a kernel bug.)
>
> Case 3: pci_enable_msix() calls msix_capability_init(), which calls
> pci_find_capability(dev, PCI_CAP_ID_MSIX), which fails and returns pos==0.
> So now msix_capability_init() does not check the return value and proceeds
> with setting things up, and eventually fails in call to msix_map_region() =>
> ioremap_nocache(), which returns NULL. So msix_capability_init() returns
> ENOMEM and further pci_enable_msix() returns ENOMEM.
> (So this is another kernel bug of not checking return value of
> pci_find_capability(dev, PCI_CAP_ID_MSIX).)
Case 1 is the one we expect to hit. Case 2 & 3 only happen on this path
if pci_find_capability returns non-deterministic results, so their
behavior may be reasonable.
> So bottom line is that different failures I saw stem from the same issue of
> (sometimes) not finding PCI_CAP_ID_MSIX. It always happens on the second
> call to KVM_ASSIGN_DEV_IRQ. Why are two calls required? For example, the
> below trace shows the first call succeeding and setting up interrupts
> properly, and the second call failing:
>
> kernel: [ 4851.332462] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> kernel: [ 4851.333553] pci-stub 0000:03:12.6: irq 257 for MSI/MSI-X
> kernel: [ 4851.333573] pci-stub 0000:03:12.6: irq 258 for MSI/MSI-X
> kernel: [ 4851.333591] pci-stub 0000:03:12.6: irq 259 for MSI/MSI-X
> kernel: [ 4851.333600] kvm-7636: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=918 #0: [v=257 e=0]
> kernel: [ 4851.333845] kvm-7636: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=918 #1: [v=258 e=1]
> kernel: [ 4851.334055] kvm-7636: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=918 #2: [v=259 e=2]
> kernel: [ 4851.334241] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 OK
> kernel: [ 4851.396339] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> kernel: [ 4851.396353] kvm-???: pci_find_capability
> __pci_find_next_cap(devfn=150 pos=0) dev=0000:03:12.6 returned 0
> kernel: [ 4851.412754] kvm-???: msix_map_region ioremap_nocache(0x100000,
> nr_entries=2048 PCI_MSIX_ENTRY_SIZE=16 dev=0000:03:12.6 pos=0 NULL
> kernel: [ 4851.429672] kvm-???: msix_capability_init
> msix_map_region(dev=0000:03:12.6, pos=0 control=32767) failed NULL
> kernel: [ 4851.438726] kvm-???: pci_enable_msix(0000:03:12.6)
> msix_capability_init failed, err=-12
> kernel: [ 4851.465439] kvm-7636: assigned_device_enable_host_msix()
> pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
> err=-12
> kernel: [ 4851.484072] kvm-7636: assigned_device_enable_host_msix()
> assigned_dev_id=918 #0: [v=0 e=0]
> kernel: [ 4851.484076] kvm-7636: assigned_device_enable_host_msix()
> assigned_dev_id=918 #1: [v=0 e=1]
> kernel: [ 4851.484081] kvm-7636: assigned_device_enable_host_msix()
> assigned_dev_id=918 #2: [v=0 e=2]
> kernel: [ 4851.484086] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> r=-12
It's hard to say why it gets setup twice without a trace from qemu. It
could be that the guest temporarily disabled MSI-X support and
re-enabled it, which causes a teardown and re-setup in the host.
> Do you think the issue is the second KVM call, or the issue is the HW not
> reporting the capability sometimes?
The issue is not the second KVM call, nor do I expect that the HW is
actually failing to return the capability. I suspect it's a race where
some concurrent access to PCI config space is moving the window register
while pci_find_capability is walking config space. dmesg on the host
would probably tell us which config access mechanism you're using.
> Looking at the latest 3.7 kernel code sources, I see that all relevant
> functions (pci_enable_msix, pci_find_capability and msix_capability_init)
> would behave the same way WRT to pci_find_capability() call. And anyways,
> moving to upstream kernel is not an option for me at this point.
You've shown that you're getting a different result out of those
functions, but not why, so it's a bit premature to ignore the fact that
the problem may already be fixed in one of the past 5 kernel release.
If it is fixed, the patch(es) can be identified and backported to your
distro kernel. If it's not fixed, at least we're all looking at the
same source code.
> Please let me know what do you think the next step would be.
dmesg + try v3.7. I don't think you need to try upstream qemu-kvm at
this point since you've identified suspicious behavior that userspace
shouldn't be able to cause. Otherwise check locking around the PCI
config accesses that you're using and make sure that userspace accesses
through pci-sysfs aren't bypassing anything. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-12-12 20:10 ` Alex Williamson
@ 2012-12-18 14:46 ` Alex Lyakas
2012-12-19 23:39 ` Alex Williamson
0 siblings, 1 reply; 8+ messages in thread
From: Alex Lyakas @ 2012-12-18 14:46 UTC (permalink / raw)
To: Alex Williamson; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
Hi Alex,
Thank you for helping me debug this issue. Some additional repros showed
that sometimes HW reports only PCI_CAP_ID_SLOTID and PCI_CAP_ID_EXP, and
sometimes it does not report any capability at all. At this point, I am
unable to install all the required components for reproduction on kernel
3.7. What information should I look for in dmseg? My only way out for now,
it to stick an ugly retry call to pci_enable_msix() in case it fails, or
returns 0 but does not set up the IRQs.
I would also like to piggy-back on this message for another question to you.
You added code to kvm_iommu_map_guest() that checks whether the IOMMU unit
supports interrupt remapping. And if not, the code refuses to attach the PCI
device, unless allow_unsafe_assigned_interrupts is set.
As it turns out some of our servers report ecap=0xf020fe for which
ecap_ir_support(iommu->ecap)) is true, and some report ecap=0xf020f6, for
which ecap_ir_support(iommu->ecap)) is false (the difference is in bit #4).
All servers are exactly the same, with minor differences in firmware
versions. On your opinion, this also indicates some race in the kernel when
fetching the ecap, or somehow the servers have different IOMMU units?
Thanks again,
Alex.
-----Original Message-----
From: Alex Williamson
Sent: 12 December, 2012 10:10 PM
To: Alex Lyakas
Cc: kvm@vger.kernel.org ; linux-pci@vger.kernel.org ; Yair Hershko ; Shyam
Kaushik
Subject: Re: pci_enable_msix() fails with ENOMEM/EINVAL
On Wed, 2012-12-12 at 15:42 +0200, Alex Lyakas wrote:
> Hi Alex,
> The kernel am using is stock Ubuntu-Precise (as I mentioned)
> 3.2.0-29-generic #46.
>
> I debugged deeper in pci_enable_msix() path, and I believe I have tracked
> down the source of the different errors I am seeing. The issue I see is
> that
> pci_find_capability() *sometimes* fails to find PCI_CAP_ID_MSIX
> capability.
> I tracked this down to __pci_find_next_cap_ttl() not being able to find
> the
> needed cap and returns 0.
Hmm, perhaps a config space access race? Everything seems properly
locked on my kernel.
> This leads to three different errors I saw:
>
> Case 1: pci_enable_msix() calls pci_msi_check_device(dev, nvec,
> PCI_CAP_ID_MSIX), which calls pci_find_capability() and further returns
> EINVAL
>
> Case 2: pci_enable_msix() calls pci_msix_table_size(), which calls
> pci_find_capability(dev, PCI_CAP_ID_MSIX), which returns 0. As a result,
> pci_msix_table_size() returns 0. Then the following code:
> nr_entries = pci_msix_table_size(dev);
> if (nvec > nr_entries)
> return nr_entries;
> causes pci_enable_msix() to return 0 to the caller, so now the caller
> thinks
> that pci_enable_msix() succeeded! However, IRQs were not assigned, so
> further call to request_threaded_irq() fails.
> (This looks like a kernel bug.)
>
> Case 3: pci_enable_msix() calls msix_capability_init(), which calls
> pci_find_capability(dev, PCI_CAP_ID_MSIX), which fails and returns pos==0.
> So now msix_capability_init() does not check the return value and proceeds
> with setting things up, and eventually fails in call to msix_map_region()
> =>
> ioremap_nocache(), which returns NULL. So msix_capability_init() returns
> ENOMEM and further pci_enable_msix() returns ENOMEM.
> (So this is another kernel bug of not checking return value of
> pci_find_capability(dev, PCI_CAP_ID_MSIX).)
Case 1 is the one we expect to hit. Case 2 & 3 only happen on this path
if pci_find_capability returns non-deterministic results, so their
behavior may be reasonable.
> So bottom line is that different failures I saw stem from the same issue
> of
> (sometimes) not finding PCI_CAP_ID_MSIX. It always happens on the second
> call to KVM_ASSIGN_DEV_IRQ. Why are two calls required? For example, the
> below trace shows the first call succeeding and setting up interrupts
> properly, and the second call failing:
>
> kernel: [ 4851.332462] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> kernel: [ 4851.333553] pci-stub 0000:03:12.6: irq 257 for MSI/MSI-X
> kernel: [ 4851.333573] pci-stub 0000:03:12.6: irq 258 for MSI/MSI-X
> kernel: [ 4851.333591] pci-stub 0000:03:12.6: irq 259 for MSI/MSI-X
> kernel: [ 4851.333600] kvm-7636: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=918 #0: [v=257 e=0]
> kernel: [ 4851.333845] kvm-7636: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=918 #1: [v=258 e=1]
> kernel: [ 4851.334055] kvm-7636: assigned_device_enable_host_msix(): call
> request_threaded_irq() assigned_dev_id=918 #2: [v=259 e=2]
> kernel: [ 4851.334241] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 OK
> kernel: [ 4851.396339] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> kernel: [ 4851.396353] kvm-???: pci_find_capability
> __pci_find_next_cap(devfn=150 pos=0) dev=0000:03:12.6 returned 0
> kernel: [ 4851.412754] kvm-???: msix_map_region ioremap_nocache(0x100000,
> nr_entries=2048 PCI_MSIX_ENTRY_SIZE=16 dev=0000:03:12.6 pos=0 NULL
> kernel: [ 4851.429672] kvm-???: msix_capability_init
> msix_map_region(dev=0000:03:12.6, pos=0 control=32767) failed NULL
> kernel: [ 4851.438726] kvm-???: pci_enable_msix(0000:03:12.6)
> msix_capability_init failed, err=-12
> kernel: [ 4851.465439] kvm-7636: assigned_device_enable_host_msix()
> pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
> err=-12
> kernel: [ 4851.484072] kvm-7636: assigned_device_enable_host_msix()
> assigned_dev_id=918 #0: [v=0 e=0]
> kernel: [ 4851.484076] kvm-7636: assigned_device_enable_host_msix()
> assigned_dev_id=918 #1: [v=0 e=1]
> kernel: [ 4851.484081] kvm-7636: assigned_device_enable_host_msix()
> assigned_dev_id=918 #2: [v=0 e=2]
> kernel: [ 4851.484086] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> r=-12
It's hard to say why it gets setup twice without a trace from qemu. It
could be that the guest temporarily disabled MSI-X support and
re-enabled it, which causes a teardown and re-setup in the host.
> Do you think the issue is the second KVM call, or the issue is the HW not
> reporting the capability sometimes?
The issue is not the second KVM call, nor do I expect that the HW is
actually failing to return the capability. I suspect it's a race where
some concurrent access to PCI config space is moving the window register
while pci_find_capability is walking config space. dmesg on the host
would probably tell us which config access mechanism you're using.
> Looking at the latest 3.7 kernel code sources, I see that all relevant
> functions (pci_enable_msix, pci_find_capability and msix_capability_init)
> would behave the same way WRT to pci_find_capability() call. And anyways,
> moving to upstream kernel is not an option for me at this point.
You've shown that you're getting a different result out of those
functions, but not why, so it's a bit premature to ignore the fact that
the problem may already be fixed in one of the past 5 kernel release.
If it is fixed, the patch(es) can be identified and backported to your
distro kernel. If it's not fixed, at least we're all looking at the
same source code.
> Please let me know what do you think the next step would be.
dmesg + try v3.7. I don't think you need to try upstream qemu-kvm at
this point since you've identified suspicious behavior that userspace
shouldn't be able to cause. Otherwise check locking around the PCI
config accesses that you're using and make sure that userspace accesses
through pci-sysfs aren't bypassing anything. Thanks,
Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: pci_enable_msix() fails with ENOMEM/EINVAL
2012-12-18 14:46 ` Alex Lyakas
@ 2012-12-19 23:39 ` Alex Williamson
0 siblings, 0 replies; 8+ messages in thread
From: Alex Williamson @ 2012-12-19 23:39 UTC (permalink / raw)
To: Alex Lyakas; +Cc: kvm, linux-pci, Yair Hershko, Shyam Kaushik
On Tue, 2012-12-18 at 16:46 +0200, Alex Lyakas wrote:
> Hi Alex,
>
> Thank you for helping me debug this issue. Some additional repros showed
> that sometimes HW reports only PCI_CAP_ID_SLOTID and PCI_CAP_ID_EXP, and
> sometimes it does not report any capability at all. At this point, I am
> unable to install all the required components for reproduction on kernel
> 3.7. What information should I look for in dmseg? My only way out for now,
> it to stick an ugly retry call to pci_enable_msix() in case it fails, or
> returns 0 but does not set up the IRQs.
It should not be difficult to replace the Ubuntu kernel with stock
upstream unless some other proprietary drivers are contributing to the
problem. Please send the entire host dmesg after encountering an error.
> I would also like to piggy-back on this message for another question to you.
> You added code to kvm_iommu_map_guest() that checks whether the IOMMU unit
> supports interrupt remapping. And if not, the code refuses to attach the PCI
> device, unless allow_unsafe_assigned_interrupts is set.
> As it turns out some of our servers report ecap=0xf020fe for which
> ecap_ir_support(iommu->ecap)) is true, and some report ecap=0xf020f6, for
> which ecap_ir_support(iommu->ecap)) is false (the difference is in bit #4).
> All servers are exactly the same, with minor differences in firmware
> versions. On your opinion, this also indicates some race in the kernel when
> fetching the ecap, or somehow the servers have different IOMMU units?
There's often a BIOS switch to enable interrupt remapping, I'd check
there first. Thanks,
Alex
> -----Original Message-----
> From: Alex Williamson
> Sent: 12 December, 2012 10:10 PM
> To: Alex Lyakas
> Cc: kvm@vger.kernel.org ; linux-pci@vger.kernel.org ; Yair Hershko ; Shyam
> Kaushik
> Subject: Re: pci_enable_msix() fails with ENOMEM/EINVAL
>
> On Wed, 2012-12-12 at 15:42 +0200, Alex Lyakas wrote:
> > Hi Alex,
> > The kernel am using is stock Ubuntu-Precise (as I mentioned)
> > 3.2.0-29-generic #46.
> >
> > I debugged deeper in pci_enable_msix() path, and I believe I have tracked
> > down the source of the different errors I am seeing. The issue I see is
> > that
> > pci_find_capability() *sometimes* fails to find PCI_CAP_ID_MSIX
> > capability.
> > I tracked this down to __pci_find_next_cap_ttl() not being able to find
> > the
> > needed cap and returns 0.
>
> Hmm, perhaps a config space access race? Everything seems properly
> locked on my kernel.
>
> > This leads to three different errors I saw:
> >
> > Case 1: pci_enable_msix() calls pci_msi_check_device(dev, nvec,
> > PCI_CAP_ID_MSIX), which calls pci_find_capability() and further returns
> > EINVAL
> >
> > Case 2: pci_enable_msix() calls pci_msix_table_size(), which calls
> > pci_find_capability(dev, PCI_CAP_ID_MSIX), which returns 0. As a result,
> > pci_msix_table_size() returns 0. Then the following code:
> > nr_entries = pci_msix_table_size(dev);
> > if (nvec > nr_entries)
> > return nr_entries;
> > causes pci_enable_msix() to return 0 to the caller, so now the caller
> > thinks
> > that pci_enable_msix() succeeded! However, IRQs were not assigned, so
> > further call to request_threaded_irq() fails.
> > (This looks like a kernel bug.)
> >
> > Case 3: pci_enable_msix() calls msix_capability_init(), which calls
> > pci_find_capability(dev, PCI_CAP_ID_MSIX), which fails and returns pos==0.
> > So now msix_capability_init() does not check the return value and proceeds
> > with setting things up, and eventually fails in call to msix_map_region()
> > =>
> > ioremap_nocache(), which returns NULL. So msix_capability_init() returns
> > ENOMEM and further pci_enable_msix() returns ENOMEM.
> > (So this is another kernel bug of not checking return value of
> > pci_find_capability(dev, PCI_CAP_ID_MSIX).)
>
> Case 1 is the one we expect to hit. Case 2 & 3 only happen on this path
> if pci_find_capability returns non-deterministic results, so their
> behavior may be reasonable.
>
> > So bottom line is that different failures I saw stem from the same issue
> > of
> > (sometimes) not finding PCI_CAP_ID_MSIX. It always happens on the second
> > call to KVM_ASSIGN_DEV_IRQ. Why are two calls required? For example, the
> > below trace shows the first call succeeding and setting up interrupts
> > properly, and the second call failing:
> >
> > kernel: [ 4851.332462] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> > kernel: [ 4851.333553] pci-stub 0000:03:12.6: irq 257 for MSI/MSI-X
> > kernel: [ 4851.333573] pci-stub 0000:03:12.6: irq 258 for MSI/MSI-X
> > kernel: [ 4851.333591] pci-stub 0000:03:12.6: irq 259 for MSI/MSI-X
> > kernel: [ 4851.333600] kvm-7636: assigned_device_enable_host_msix(): call
> > request_threaded_irq() assigned_dev_id=918 #0: [v=257 e=0]
> > kernel: [ 4851.333845] kvm-7636: assigned_device_enable_host_msix(): call
> > request_threaded_irq() assigned_dev_id=918 #1: [v=258 e=1]
> > kernel: [ 4851.334055] kvm-7636: assigned_device_enable_host_msix(): call
> > request_threaded_irq() assigned_dev_id=918 #2: [v=259 e=2]
> > kernel: [ 4851.334241] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918 OK
> > kernel: [ 4851.396339] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> > kernel: [ 4851.396353] kvm-???: pci_find_capability
> > __pci_find_next_cap(devfn=150 pos=0) dev=0000:03:12.6 returned 0
> > kernel: [ 4851.412754] kvm-???: msix_map_region ioremap_nocache(0x100000,
> > nr_entries=2048 PCI_MSIX_ENTRY_SIZE=16 dev=0000:03:12.6 pos=0 NULL
> > kernel: [ 4851.429672] kvm-???: msix_capability_init
> > msix_map_region(dev=0000:03:12.6, pos=0 control=32767) failed NULL
> > kernel: [ 4851.438726] kvm-???: pci_enable_msix(0000:03:12.6)
> > msix_capability_init failed, err=-12
> > kernel: [ 4851.465439] kvm-7636: assigned_device_enable_host_msix()
> > pci_enable_msix() failed assigned_dev_id=918 irq_name=kvm:0000:03:12.6
> > err=-12
> > kernel: [ 4851.484072] kvm-7636: assigned_device_enable_host_msix()
> > assigned_dev_id=918 #0: [v=0 e=0]
> > kernel: [ 4851.484076] kvm-7636: assigned_device_enable_host_msix()
> > assigned_dev_id=918 #1: [v=0 e=1]
> > kernel: [ 4851.484081] kvm-7636: assigned_device_enable_host_msix()
> > assigned_dev_id=918 #2: [v=0 e=2]
> > kernel: [ 4851.484086] kvm-7636: KVM_ASSIGN_DEV_IRQ assigned_dev_id=918
> > r=-12
>
> It's hard to say why it gets setup twice without a trace from qemu. It
> could be that the guest temporarily disabled MSI-X support and
> re-enabled it, which causes a teardown and re-setup in the host.
>
> > Do you think the issue is the second KVM call, or the issue is the HW not
> > reporting the capability sometimes?
>
> The issue is not the second KVM call, nor do I expect that the HW is
> actually failing to return the capability. I suspect it's a race where
> some concurrent access to PCI config space is moving the window register
> while pci_find_capability is walking config space. dmesg on the host
> would probably tell us which config access mechanism you're using.
>
> > Looking at the latest 3.7 kernel code sources, I see that all relevant
> > functions (pci_enable_msix, pci_find_capability and msix_capability_init)
> > would behave the same way WRT to pci_find_capability() call. And anyways,
> > moving to upstream kernel is not an option for me at this point.
>
> You've shown that you're getting a different result out of those
> functions, but not why, so it's a bit premature to ignore the fact that
> the problem may already be fixed in one of the past 5 kernel release.
> If it is fixed, the patch(es) can be identified and backported to your
> distro kernel. If it's not fixed, at least we're all looking at the
> same source code.
>
> > Please let me know what do you think the next step would be.
>
> dmesg + try v3.7. I don't think you need to try upstream qemu-kvm at
> this point since you've identified suspicious behavior that userspace
> shouldn't be able to cause. Otherwise check locking around the PCI
> config accesses that you're using and make sure that userspace accesses
> through pci-sysfs aren't bypassing anything. Thanks,
>
> Alex
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-12-19 23:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <190FFE9600054603A543CCDA4408AC89@alyakaslap>
[not found] ` <CAOcd+r28ot-060WQhUPkuM1Bt7t-_jOggAZKNSJxurNYVs2b-w@mail.gmail.com>
[not found] ` <1353536723.1809.4.camel@bling.home>
[not found] ` <F764A8A9BBE247FEA862ED33F77A04A1@alyakaslap>
[not found] ` <1353960253.1809.128.camel@bling.home>
2012-11-29 8:42 ` pci_enable_msix() fails with ENOMEM/EINVAL Alex Lyakas
2012-11-29 15:56 ` Alex Williamson
2012-12-08 12:14 ` Alex Lyakas
2012-12-10 20:45 ` Alex Williamson
2012-12-12 13:42 ` Alex Lyakas
2012-12-12 20:10 ` Alex Williamson
2012-12-18 14:46 ` Alex Lyakas
2012-12-19 23:39 ` Alex Williamson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).