From: Joe Lawrence <joe.lawrence@stratus.com>
To: <linux-kernel@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Joe Lawrence <joe.lawrence@stratus.com>
Subject: irq_desc use-after-free in smp_irq_move_cleanup_interrupt
Date: Mon, 23 Nov 2015 19:59:52 -0500 [thread overview]
Message-ID: <5653B688.4050809@stratus.com> (raw)
Hi Thomas,
I've been chasing down a use-after-free on an irq_desc structure during
repeated device removal testing that crashes 4.3 in
smp_irq_move_cleanup_interrupt. So far I have a bunch of crashes and some
data gleaned from instrumenting the kernel with trace logging. Details to
follow, but in short, I think I'm hitting the use-after-free because of
a782a7e46bb5 "x86/irq: Store irq descriptor in vector array".
During the tests, I usually see RCU stalls (sometimes a crash later) --
when looking at a vmcore, the CPU stall is here:
crash> bt ffff881038ee4200
PID: 0 TASK: ffff881038ee4200 CPU: 1 COMMAND: "swapper/1"
[exception RIP: _raw_spin_lock+0x10]
RIP: ffffffff81672a10 RSP: ffff88103f843f70 RFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000091 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81d36a88
RBP: ffff88103f843fa8 R8: 0000000000000101 R9: 0000000000000f78
R10: 0000000000100000 R11: ffff8810384ccf8c R12: 0000000000000091
R13: ffff88102a819794 R14: 000000000000b5e8 R15: ffff88102a8196f8
CS: 0010 SS: 0018
#0 [ffff88103f843f70] smp_irq_move_cleanup_interrupt at ffffffff8105319b
#1 [ffff88103f843fb0] irq_move_cleanup_interrupt at ffffffff81673a04
--- <IRQ stack> ---
#2 [ffff881038f0fd90] irq_move_cleanup_interrupt at ffffffff81673a04
[exception RIP: unknown or invalid address]
RIP: 000000000000001d RSP: 0000000000000000 RFLAGS: ffff881038f10000
RAX: 0000000000000001 RBX: ffff881038f0c000 RCX: 0000000000000005
RDX: 00000000000a087a RSI: 0000000000051ca2 RDI: 0000000000000018
RBP: ffff88103f84f100 R8: ffff881038f0fea0 R9: ffffe8f000042378
R10: 0000000000000002 R11: 000000ee954efd02 R12: ffffffff810e5060
R13: ffff881038f0fdc8 R14: ffff88103f84f500 R15: ffff88103f84f100
ORIG_RAX: ffff88103f856c40 CS: 8000a044 SS: ffffffffffffffdf
bt: WARNING: possibly bogus exception frame
#3 [ffff881038f0fe38] cpuidle_enter_state at ffffffff8151e278
#4 [ffff881038f0fea8] cpuidle_enter at ffffffff8151e417
#5 [ffff881038f0feb8] call_cpuidle at ffffffff810bef42
#6 [ffff881038f0fed0] cpu_startup_entry at ffffffff810bf1dc
#7 [ffff881038f0ff28] start_secondary at ffffffff8104e9e0
The irq_desc is in R15: ffff88102a8196f8
crash> kmem ffff88102a8196f8
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88103f408c80 kmalloc-512 512 5292 5460 140 32k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea0040aa0600 ffff88102a818000 0 39 17 22
FREE / [ALLOCATED]
ffff88102a8196f8
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0040aa0640 102a819000 0 0 0 2fffff80008000 tail
This irq_desc is no longer allocated, it's been filled with the slub debug
poison pattern (hence the spinlock is stuck):
crash> rd ffff88102a8196f8 0x28
ffff88102a8196f8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819708: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819718: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819728: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819738: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819748: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819758: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819768: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819778: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819788: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819798: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197f8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819808: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819818: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819828: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
The irq vector is in RBX: 0000000000000091
crash> p/d 0x91
$1 = 145
The irq_desc_tree entry for vector 0x91 appears to have been updated to
something new (the device re-add):
crash> tree -t radix irq_desc_tree -s irq_desc | grep 'irq = 0x91' -B11 | grep -e '^ffff' -e 'irq ='
ffff88102fb144e8
irq = 0x91,
But from a custom crash gdb script, the freed irq_desc can still be found
in CPU 1's vector_irq[145]:
cpu[1] vector_irq[145] irq_desc @ 0xffff88102a8196f8
Sifting through a heavily grepped and abbreviated trace log:
- The irq moved from CPU 1, vector 145 to CPU 44, vector 134
- Both CPUs skip cleaning up their vector_irq[] entries for this irq
because data->move_in_progress is set (is this normal?)
- The irq moves again to CPU 34, vector 123
- The underlying device is removed @ 593.106514 jiffies
- The irq_desc is freed @ 593.239879 jiffies
- CPU 1 is last heard from @ 1022.830083 jiffies
[001] 22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8
[044] 35.209945: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[001] 35.212370: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 35.674114: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[001] 35.675395: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 36.265716: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[044] 36.517785: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[001] 36.520115: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[001] 54.636651: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[001] 54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8
[001] 61.670979: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 61.696120: smp_irq_move_cleanup_interrupt : cpu (this) : vector=134 0xffff88102a8196f8 -> (nil)
[034] 69.086107: smp_irq_move_cleanup_interrupt : vector == data->cfg.vector && ... : vector=123 0xffff88102a8196f8
[000] 593.239811: clear_irq_vector : 1 : cpu 34 : vector=123 0xffff88102a8196f8 -> (nil)
[000] 593.239879: free_desc : desc @ 0xffff88102a8196f8
[001] 1022.830083: smp_irq_move_cleanup_interrupt : IS_ERR_OR_NULL : vector=144 (nil)
crash> log | grep 'mpt3sas[0-9]*: _scsih_remove$'
[ 291.658410] mpt3sas0: _scsih_remove
[ 593.106514] mpt3sas1: _scsih_remove << free_desc
[ 874.345239] mpt3sas3: _scsih_remove
Prior to a782a7e46bb5 "x86/irq: Store irq descriptor in vector array", the
vector_irq array held irq values, those interested in the irq_desc would
have to perform their own irq_to_desc() (ie, a radix_tree_lookup of the
irq_desc_tree). The radix tree is updated in free_desc(), so any
subsequent lookups would fail.
After the change though, I'm not so sure that direct access to the
irq_desc is safe -- I couldn't figure out anything preventing
use-after-free (the sparse_irq_lock and the vector_lock seem to protect
their respective data-structures, but not the dual-use of the irq_desc's
by both the irq_desc_tree and the per-cpu vector_irq[].)
Hopefully this is on the right track -- I can modify the trace logging or
run other tests if there is additional information that would be helpful.
Regards,
-- Joe
next reply other threads:[~2015-11-24 1:14 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-24 0:59 Joe Lawrence [this message]
2015-11-25 18:16 ` irq_desc use-after-free in smp_irq_move_cleanup_interrupt Thomas Gleixner
2015-11-25 19:31 ` Thomas Gleixner
2015-11-25 21:12 ` Thomas Gleixner
2015-11-25 22:02 ` Joe Lawrence
2015-11-27 8:06 ` Jiang Liu
2015-11-27 8:25 ` Thomas Gleixner
2015-11-30 8:09 ` [Bugfix 1/5] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer Jiang Liu
2015-11-30 8:09 ` [Bugfix 2/5] x86/irq: Enhance __assign_irq_vector() to rollback in case of failure Jiang Liu
2015-12-10 18:39 ` [tip:x86/urgent] " tip-bot for Jiang Liu
2015-11-30 8:09 ` [Bugfix 3/5] x86/irq: Fix a race window in x86_vector_free_irqs() Jiang Liu
2015-12-10 18:40 ` [tip:x86/urgent] " tip-bot for Jiang Liu
2015-11-30 8:09 ` [Bugfix 4/5] x86/irq: Fix a race condition between vector assigning and cleanup Jiang Liu
2015-12-01 22:46 ` Joe Lawrence
2015-12-08 0:29 ` Joe Lawrence
2015-12-08 21:31 ` Thomas Gleixner
2015-12-10 18:40 ` [tip:x86/urgent] " tip-bot for Jiang Liu
2015-11-30 8:09 ` [Bugfix 5/5] x86/irq: Trivial cleanups for x86 vector allocation code Jiang Liu
2015-12-10 18:42 ` [tip:x86/apic] " tip-bot for Jiang Liu
2015-12-10 18:39 ` [tip:x86/urgent] x86/irq: Do not reuse struct apic_chip_data.old_domain as temporary buffer tip-bot for Jiang Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5653B688.4050809@stratus.com \
--to=joe.lawrence@stratus.com \
--cc=linux-kernel@vger.kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.