From: Greg Edwards <gedwards-LfVdkaOWEx8@public.gmane.org>
To: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org,
David Woodhouse <dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ
Date: Tue, 29 Jul 2014 11:21:58 -0600 [thread overview]
Message-ID: <20140729172158.GA23529@psuche.datadirectnet.com> (raw)
In-Reply-To: <20140729104531.GB9809-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
On Tue, Jul 29, 2014 at 12:45:31PM +0200, Joerg Roedel wrote:
> On Wed, Jul 23, 2014 at 10:13:26AM -0600, Greg Edwards wrote:
>> A user process setting the CPU affinity of an IRQ for a KVM
>> direct-assigned device via /proc/irq/<IRQ#>/smp_affinity can race with
>> the IRQ being released by QEMU, resulting in a NULL iommu pointer
>> dereference in get_irte().
>
> Maybe I wasn't clear enough, what I am missing is a panic message with a
> backtrace from the NULL pointer deref you are seeing in the commit
> message.
Sure, sorry I didn't send that along previously. We originally saw this on a
3.9 kernel, then recently on a 3.14 stable kernel. I hadn't reproduced it on
current tip yet. Here it is for Linus' tree as of this morning:
[ 6638.327851] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
[ 6638.335955] IP: [<ffffffff8190a652>] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.343012] PGD 99172e067 PUD 1026979067 PMD 0
[ 6638.347858] Oops: 0000 [#1] SMP
[ 6638.351386] Modules linked in:
[ 6638.354718] CPU: 1 PID: 3354 Comm: affin Not tainted 3.16.0-rc7-00007-g31dab71 #1
[ 6638.362480] Hardware name: Supermicro SYS-F617R2-RT+/X9DRFR, BIOS 3.0a 01/29/2014
[ 6638.370259] task: ffff881025b0e720 ti: ffff88099173c000 task.ti: ffff88099173c000
[ 6638.378063] RIP: 0010:[<ffffffff8190a652>] [<ffffffff8190a652>] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.387649] RSP: 0018:ffff88099173fdb0 EFLAGS: 00010046
[ 6638.393314] RAX: 0000000000000082 RBX: ffff880a36294600 RCX: 0000000000000082
[ 6638.400824] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8266af00
[ 6638.408341] RBP: ffff88099173fdf8 R08: 0000000000000000 R09: ffff88103ec00490
[ 6638.415867] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88099173fe90
[ 6638.423402] R13: 000000000000005f R14: ffff880faa38fe80 R15: ffff880faa38fe80
[ 6638.430942] FS: 00007f7161f05740(0000) GS:ffff88107fc40000(0000) knlGS:0000000000000000
[ 6638.439456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6638.445640] CR2: 0000000000000090 CR3: 000000099140d000 CR4: 00000000001427e0
[ 6638.453227] Stack:
[ 6638.455699] ffffffff81c44740 ffff88099173fdc8 ffffffff00000000 00000000c991fd3b
[ 6638.463660] ffff880a36294600 ffff88099173fe90 ffff88099173fe90 0000000000000000
[ 6638.471623] 0000000000000286 ffff88099173fe08 ffffffff8190aac5 ffff88099173fe28
[ 6638.479605] Call Trace:
[ 6638.482562] [<ffffffff8190aac5>] set_remapped_irq_affinity+0x25/0x40
[ 6638.489534] [<ffffffff811322dc>] irq_do_set_affinity+0x1c/0x50
[ 6638.495985] [<ffffffff81132458>] irq_set_affinity_locked+0x98/0xd0
[ 6638.502781] [<ffffffff811324d6>] __irq_set_affinity+0x46/0x70
[ 6638.509142] [<ffffffff811362dc>] write_irq_affinity.isra.6+0xdc/0x100
[ 6638.516198] [<ffffffff8113631c>] irq_affinity_list_proc_write+0x1c/0x20
[ 6638.523440] [<ffffffff8129f30d>] proc_reg_write+0x3d/0x80
[ 6638.529474] [<ffffffff812384a7>] vfs_write+0xb7/0x1f0
[ 6638.535165] [<ffffffff81243619>] ? putname+0x29/0x40
[ 6638.540771] [<ffffffff812390c5>] SyS_write+0x55/0xd0
[ 6638.546386] [<ffffffff81adc729>] system_call_fastpath+0x16/0x1b
[ 6638.552970] Code: ff 48 85 d2 74 68 4c 8b 7a 30 4d 85 ff 74 5f 48 c7 c7 00 af 66 82 e8 9e 1b 1d 00 49 8b 57 20 41 0f b7 77 28 48 c7 c7 00 af 66 82 <48> 8b 8a 90 00 00 00 41 0f b7 57 2a 01 f2 48 89 c6 48 63 d2 48
[ 6638.574199] RIP [<ffffffff8190a652>] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.581735] RSP <ffff88099173fdb0>
[ 6638.585872] CR2: 0000000000000090
> Also I am still wondering why it is possible to set affinity from
> userspace while the irq is about to be freed. Shouldn't the proc files
> are already unregistered when the irq is freed?
The QEMU thread is doing a KVM_DEASSIGN_DEV_IRQ ioctl. It looks like the call
chain is:
kvm_vm_ioctl_assigned_device
kvm_vm_ioctl_deassign_dev_irq
kvm_deassign_irq
deassign_host_irq
pci_disable_msix
free_msi_irqs
arch_teardown_msi_irqs
default_teardown_msi_irqs
arch_teardown_msi_irq
native_teardown_msi_irq
irq_free_hwirq
irq_free_hwirqs
irq_free_hwirqs() does:
460 for (i = from, j = cnt; j > 0; i++, j--) {
461 irq_set_status_flags(i, _IRQ_NOREQUEST | _IRQ_NOPROBE);
462 arch_teardown_hwirq(i); <----------- QEMU is down this path to free_irte()
463 }
464 irq_free_descs(from, cnt); <------------ proc files unregistered down this path
I guess the question is if it would be safe to change that ordering? I don't know.
Greg
WARNING: multiple messages have this Message-ID (diff)
From: Greg Edwards <gedwards@ddn.com>
To: Joerg Roedel <joro@8bytes.org>
Cc: David Woodhouse <dwmw2@infradead.org>,
<iommu@lists.linux-foundation.org>,
<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ
Date: Tue, 29 Jul 2014 11:21:58 -0600 [thread overview]
Message-ID: <20140729172158.GA23529@psuche.datadirectnet.com> (raw)
In-Reply-To: <20140729104531.GB9809@8bytes.org>
On Tue, Jul 29, 2014 at 12:45:31PM +0200, Joerg Roedel wrote:
> On Wed, Jul 23, 2014 at 10:13:26AM -0600, Greg Edwards wrote:
>> A user process setting the CPU affinity of an IRQ for a KVM
>> direct-assigned device via /proc/irq/<IRQ#>/smp_affinity can race with
>> the IRQ being released by QEMU, resulting in a NULL iommu pointer
>> dereference in get_irte().
>
> Maybe I wasn't clear enough, what I am missing is a panic message with a
> backtrace from the NULL pointer deref you are seeing in the commit
> message.
Sure, sorry I didn't send that along previously. We originally saw this on a
3.9 kernel, then recently on a 3.14 stable kernel. I hadn't reproduced it on
current tip yet. Here it is for Linus' tree as of this morning:
[ 6638.327851] BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
[ 6638.335955] IP: [<ffffffff8190a652>] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.343012] PGD 99172e067 PUD 1026979067 PMD 0
[ 6638.347858] Oops: 0000 [#1] SMP
[ 6638.351386] Modules linked in:
[ 6638.354718] CPU: 1 PID: 3354 Comm: affin Not tainted 3.16.0-rc7-00007-g31dab71 #1
[ 6638.362480] Hardware name: Supermicro SYS-F617R2-RT+/X9DRFR, BIOS 3.0a 01/29/2014
[ 6638.370259] task: ffff881025b0e720 ti: ffff88099173c000 task.ti: ffff88099173c000
[ 6638.378063] RIP: 0010:[<ffffffff8190a652>] [<ffffffff8190a652>] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.387649] RSP: 0018:ffff88099173fdb0 EFLAGS: 00010046
[ 6638.393314] RAX: 0000000000000082 RBX: ffff880a36294600 RCX: 0000000000000082
[ 6638.400824] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8266af00
[ 6638.408341] RBP: ffff88099173fdf8 R08: 0000000000000000 R09: ffff88103ec00490
[ 6638.415867] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88099173fe90
[ 6638.423402] R13: 000000000000005f R14: ffff880faa38fe80 R15: ffff880faa38fe80
[ 6638.430942] FS: 00007f7161f05740(0000) GS:ffff88107fc40000(0000) knlGS:0000000000000000
[ 6638.439456] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6638.445640] CR2: 0000000000000090 CR3: 000000099140d000 CR4: 00000000001427e0
[ 6638.453227] Stack:
[ 6638.455699] ffffffff81c44740 ffff88099173fdc8 ffffffff00000000 00000000c991fd3b
[ 6638.463660] ffff880a36294600 ffff88099173fe90 ffff88099173fe90 0000000000000000
[ 6638.471623] 0000000000000286 ffff88099173fe08 ffffffff8190aac5 ffff88099173fe28
[ 6638.479605] Call Trace:
[ 6638.482562] [<ffffffff8190aac5>] set_remapped_irq_affinity+0x25/0x40
[ 6638.489534] [<ffffffff811322dc>] irq_do_set_affinity+0x1c/0x50
[ 6638.495985] [<ffffffff81132458>] irq_set_affinity_locked+0x98/0xd0
[ 6638.502781] [<ffffffff811324d6>] __irq_set_affinity+0x46/0x70
[ 6638.509142] [<ffffffff811362dc>] write_irq_affinity.isra.6+0xdc/0x100
[ 6638.516198] [<ffffffff8113631c>] irq_affinity_list_proc_write+0x1c/0x20
[ 6638.523440] [<ffffffff8129f30d>] proc_reg_write+0x3d/0x80
[ 6638.529474] [<ffffffff812384a7>] vfs_write+0xb7/0x1f0
[ 6638.535165] [<ffffffff81243619>] ? putname+0x29/0x40
[ 6638.540771] [<ffffffff812390c5>] SyS_write+0x55/0xd0
[ 6638.546386] [<ffffffff81adc729>] system_call_fastpath+0x16/0x1b
[ 6638.552970] Code: ff 48 85 d2 74 68 4c 8b 7a 30 4d 85 ff 74 5f 48 c7 c7 00 af 66 82 e8 9e 1b 1d 00 49 8b 57 20 41 0f b7 77 28 48 c7 c7 00 af 66 82 <48> 8b 8a 90 00 00 00 41 0f b7 57 2a 01 f2 48 89 c6 48 63 d2 48
[ 6638.574199] RIP [<ffffffff8190a652>] intel_ioapic_set_affinity+0x82/0x1b0
[ 6638.581735] RSP <ffff88099173fdb0>
[ 6638.585872] CR2: 0000000000000090
> Also I am still wondering why it is possible to set affinity from
> userspace while the irq is about to be freed. Shouldn't the proc files
> are already unregistered when the irq is freed?
The QEMU thread is doing a KVM_DEASSIGN_DEV_IRQ ioctl. It looks like the call
chain is:
kvm_vm_ioctl_assigned_device
kvm_vm_ioctl_deassign_dev_irq
kvm_deassign_irq
deassign_host_irq
pci_disable_msix
free_msi_irqs
arch_teardown_msi_irqs
default_teardown_msi_irqs
arch_teardown_msi_irq
native_teardown_msi_irq
irq_free_hwirq
irq_free_hwirqs
irq_free_hwirqs() does:
460 for (i = from, j = cnt; j > 0; i++, j--) {
461 irq_set_status_flags(i, _IRQ_NOREQUEST | _IRQ_NOPROBE);
462 arch_teardown_hwirq(i); <----------- QEMU is down this path to free_irte()
463 }
464 irq_free_descs(from, cnt); <------------ proc files unregistered down this path
I guess the question is if it would be safe to change that ordering? I don't know.
Greg
next prev parent reply other threads:[~2014-07-29 17:21 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-22 14:27 [PATCH] iommu/vt-d: fix race between free_irte() and get_irte() Greg Edwards
2014-07-22 14:27 ` Greg Edwards
[not found] ` <20140722142719.GA28143-+5IcJesBrg1QnCaYh1JbbuXjVpU8mP+x@public.gmane.org>
2014-07-23 14:40 ` Joerg Roedel
2014-07-23 14:40 ` Joerg Roedel
[not found] ` <20140723144024.GA14017-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-07-23 14:49 ` Greg Edwards
2014-07-23 14:49 ` Greg Edwards
[not found] ` <20140723144917.GA26986-+5IcJesBrg1QnCaYh1JbbuXjVpU8mP+x@public.gmane.org>
2014-07-23 15:10 ` Joerg Roedel
2014-07-23 15:10 ` Joerg Roedel
[not found] ` <20140723151040.GB14017-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-07-23 16:13 ` [PATCH v2] iommu/vt-d: race setting IRQ CPU affinity while freeing IRQ Greg Edwards
2014-07-23 16:13 ` Greg Edwards
[not found] ` <20140723161326.GB32422-+5IcJesBrg1QnCaYh1JbbuXjVpU8mP+x@public.gmane.org>
2014-07-29 10:45 ` Joerg Roedel
2014-07-29 10:45 ` Joerg Roedel
[not found] ` <20140729104531.GB9809-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2014-07-29 17:21 ` Greg Edwards [this message]
2014-07-29 17:21 ` Greg Edwards
[not found] ` <20140729172158.GA23529-+5IcJesBrg1QnCaYh1JbbuXjVpU8mP+x@public.gmane.org>
2014-07-31 11:49 ` Joerg Roedel
2014-07-31 11:49 ` Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140729172158.GA23529@psuche.datadirectnet.com \
--to=gedwards-lfvdkaowex8@public.gmane.org \
--cc=dwmw2-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org \
--cc=iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
--cc=joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.