public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* v4.14-rc{4,7} null pointer dereference in event_sched_out()
@ 2017-10-30 16:23 Mark Rutland
  2017-11-15 18:00 ` Will Deacon
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2017-10-30 16:23 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, will.deacon, peterz, mingo

Hi,

As a heads-up, while fuzzing arm64 v4.14-rc{4,7} with Syzkaller, I hit a
KASAN splat in event_sched_out():

[  133.225742] ==================================================================
[  133.229374] BUG: KASAN: null-ptr-deref in event_sched_out.isra.47+0x428/0x580
[  133.230843] Read of size 4 at addr 0000000000000178 by task syz-executor0/6905
[  133.233151]
[  133.233664] CPU: 0 PID: 6905 Comm: syz-executor0 Not tainted 4.14.0-rc7-dirty #4
[  133.235750] Hardware name: linux,dummy-virt (DT)
[  133.236598] Call trace:
[  133.237081] [<ffff20000808fef8>] dump_backtrace+0x0/0x658
[  133.238073] [<ffff200008090570>] show_stack+0x20/0x30
[  133.239002] [<ffff2000091c22ec>] dump_stack+0xd0/0x124
[  133.239947] [<ffff200008349d1c>] kasan_report+0x104/0x310
[  133.240940] [<ffff2000083483f8>] __asan_load4+0x58/0xb0
[  133.242262] [<ffff200008271138>] event_sched_out.isra.47+0x428/0x580
[  133.243686] [<ffff2000082712c8>] __perf_remove_from_context+0x38/0xe0
[  133.244948] [<ffff200008265cf8>] event_function_call+0x1c8/0x258
[  133.246197] [<ffff20000826ad04>] perf_remove_from_context+0x54/0xf0
[  133.247514] [<ffff20000827f188>] SyS_perf_event_open+0x1528/0x18e0
[  133.248831] Exception stack(0xffff800038c5fec0 to 0xffff800038c60000)
[  133.250199] fec0: 0000000020b12f88 0000000000001af8 00000000ffffffff 0000000000000008
[  133.251843] fee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  133.253503] ff00: 00000000000000f1 0000000000000000 0000000000405850 00000000003d0f00
[  133.255132] ff20: 0000ffff94514f60 00000000004ae890 0000000000000027 0000000000000001
[  133.256756] ff40: 0000000000000000 0000000000826000 0000000000000000 00000000004c0158
[  133.258392] ff60: 00000000ffffffff 0000000020b12f88 0000000000001af8 000000000046d290
[  133.260006] ff80: 00000000004aaba8 0000000000473af8 0000ffffe5360da0 0000000000000000
[  133.261629] ffa0: 0000ffff94514f60 0000ffff94514640 00000000004020fc 0000ffff94514640
[  133.263253] ffc0: 000000000042d034 00000000a0000000 0000000020b12f88 00000000000000f1
[  133.264886] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  133.266535] [<ffff200008084170>] el0_svc_naked+0x24/0x28
[  133.267648] ==================================================================

... which is triggered by the Syzkaller repro program at the end of this
email. I haven't yet come up with a C reproducer; sorry.

The PC seems to be the load of cpuctx->active_oncpu at the end of the
function, so it looks like cpuctx is NULL.

The system has (homogeneous) armv8_pmuv3, breakpoint, and software PMUs.

I initially hit this on v4.14-rc4, and can reproduce the issue on
v4.14-rc7. I haven't tried any other kernels yet.

I'll continue digging, unless someone else has already solved this.

Thanks,
Mark.

Syzkaller reproducer
---->8----
# {Threaded:true Collide:true Repeat:true Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
mmap(&(0x7f0000000000/0xd3f000)=nil, 0xd3f000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
r0 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0x34, 0x0, 0x8001, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8000000, 0x0}, 0x0, 0xffffffff, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000d3f000/0x1000)=nil, 0x1000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
r1 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x3bd4, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xffffffff80000001, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 0x0, 0xffffffff, r0, 0x0)
add_key(&(0x7f00004fe000)="6465616400", &(0x7f0000d41000)={0x73, 0x79, 0x7a, 0x0, 0x0}, &(0x7f0000509000-0x19)="", 0x0, 0xfffffffffffffffd)
syz_emit_ethernet(0x32, &(0x7f0000b86000-0x36)={@remote={[0xbb, 0xbb, 0xbb, 0xbb, 0xbb], 0x0}, @remote={[0xbb, 0xbb, 0xbb, 0xbb, 0xbb], 0x0}, [], {{0x200000000080a, @arp=@generic={0x322, 0x8edf, 0x6, 0x0, 0xfffffffffffffffe, @empty=[0x0, 0x0, 0x0, 0x0, 0x0, 0x0], "", @random="dc1ce39913fb", "34f3b689bb48f1e976f2bf1b7cf2243b"}}}})
dup3(r1, r1, 0x80000)
r2 = gettid()
ioctl$sock_FIOSETOWN(0xffffffffffffffff, 0x8901, &(0x7f0000d40000)=0x0)
socket$inet6_udp(0xa, 0x2, 0x0)
perf_event_open(&(0x7f0000b13000-0x78)={0x0, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x2, 0x0, 0x0, 0x94, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10001, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: v4.14-rc{4,7} null pointer dereference in event_sched_out()
  2017-10-30 16:23 v4.14-rc{4,7} null pointer dereference in event_sched_out() Mark Rutland
@ 2017-11-15 18:00 ` Will Deacon
  2017-11-24 18:10   ` Mark Rutland
  0 siblings, 1 reply; 4+ messages in thread
From: Will Deacon @ 2017-11-15 18:00 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, peterz, mingo

On Mon, Oct 30, 2017 at 04:23:15PM +0000, Mark Rutland wrote:
> As a heads-up, while fuzzing arm64 v4.14-rc{4,7} with Syzkaller, I hit a
> KASAN splat in event_sched_out():
> 
> [  133.225742] ==================================================================
> [  133.229374] BUG: KASAN: null-ptr-deref in event_sched_out.isra.47+0x428/0x580
> [  133.230843] Read of size 4 at addr 0000000000000178 by task syz-executor0/6905
> [  133.233151]
> [  133.233664] CPU: 0 PID: 6905 Comm: syz-executor0 Not tainted 4.14.0-rc7-dirty #4
> [  133.235750] Hardware name: linux,dummy-virt (DT)
> [  133.236598] Call trace:
> [  133.237081] [<ffff20000808fef8>] dump_backtrace+0x0/0x658
> [  133.238073] [<ffff200008090570>] show_stack+0x20/0x30
> [  133.239002] [<ffff2000091c22ec>] dump_stack+0xd0/0x124
> [  133.239947] [<ffff200008349d1c>] kasan_report+0x104/0x310
> [  133.240940] [<ffff2000083483f8>] __asan_load4+0x58/0xb0
> [  133.242262] [<ffff200008271138>] event_sched_out.isra.47+0x428/0x580
> [  133.243686] [<ffff2000082712c8>] __perf_remove_from_context+0x38/0xe0
> [  133.244948] [<ffff200008265cf8>] event_function_call+0x1c8/0x258
> [  133.246197] [<ffff20000826ad04>] perf_remove_from_context+0x54/0xf0
> [  133.247514] [<ffff20000827f188>] SyS_perf_event_open+0x1528/0x18e0
> [  133.248831] Exception stack(0xffff800038c5fec0 to 0xffff800038c60000)
> [  133.250199] fec0: 0000000020b12f88 0000000000001af8 00000000ffffffff 0000000000000008
> [  133.251843] fee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  133.253503] ff00: 00000000000000f1 0000000000000000 0000000000405850 00000000003d0f00
> [  133.255132] ff20: 0000ffff94514f60 00000000004ae890 0000000000000027 0000000000000001
> [  133.256756] ff40: 0000000000000000 0000000000826000 0000000000000000 00000000004c0158
> [  133.258392] ff60: 00000000ffffffff 0000000020b12f88 0000000000001af8 000000000046d290
> [  133.260006] ff80: 00000000004aaba8 0000000000473af8 0000ffffe5360da0 0000000000000000
> [  133.261629] ffa0: 0000ffff94514f60 0000ffff94514640 00000000004020fc 0000ffff94514640
> [  133.263253] ffc0: 000000000042d034 00000000a0000000 0000000020b12f88 00000000000000f1
> [  133.264886] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  133.266535] [<ffff200008084170>] el0_svc_naked+0x24/0x28
> [  133.267648] ==================================================================
> 
> ... which is triggered by the Syzkaller repro program at the end of this
> email. I haven't yet come up with a C reproducer; sorry.
> 
> The PC seems to be the load of cpuctx->active_oncpu at the end of the
> function, so it looks like cpuctx is NULL.
> 
> The system has (homogeneous) armv8_pmuv3, breakpoint, and software PMUs.
> 
> I initially hit this on v4.14-rc4, and can reproduce the issue on
> v4.14-rc7. I haven't tried any other kernels yet.
> 
> I'll continue digging, unless someone else has already solved this.

Did you get anywhere with this?

Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: v4.14-rc{4,7} null pointer dereference in event_sched_out()
  2017-11-15 18:00 ` Will Deacon
@ 2017-11-24 18:10   ` Mark Rutland
  2017-11-24 18:16     ` Mark Rutland
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Rutland @ 2017-11-24 18:10 UTC (permalink / raw)
  To: Will Deacon; +Cc: linux-kernel, linux-arm-kernel, peterz, mingo

On Wed, Nov 15, 2017 at 06:00:20PM +0000, Will Deacon wrote:
> On Mon, Oct 30, 2017 at 04:23:15PM +0000, Mark Rutland wrote:
> > As a heads-up, while fuzzing arm64 v4.14-rc{4,7} with Syzkaller, I hit a
> > KASAN splat in event_sched_out():
> > 
> > [  133.225742] ==================================================================
> > [  133.229374] BUG: KASAN: null-ptr-deref in event_sched_out.isra.47+0x428/0x580
> > [  133.230843] Read of size 4 at addr 0000000000000178 by task syz-executor0/6905
> > [  133.233151]
> > [  133.233664] CPU: 0 PID: 6905 Comm: syz-executor0 Not tainted 4.14.0-rc7-dirty #4
> > [  133.235750] Hardware name: linux,dummy-virt (DT)
> > [  133.236598] Call trace:
> > [  133.237081] [<ffff20000808fef8>] dump_backtrace+0x0/0x658
> > [  133.238073] [<ffff200008090570>] show_stack+0x20/0x30
> > [  133.239002] [<ffff2000091c22ec>] dump_stack+0xd0/0x124
> > [  133.239947] [<ffff200008349d1c>] kasan_report+0x104/0x310
> > [  133.240940] [<ffff2000083483f8>] __asan_load4+0x58/0xb0
> > [  133.242262] [<ffff200008271138>] event_sched_out.isra.47+0x428/0x580
> > [  133.243686] [<ffff2000082712c8>] __perf_remove_from_context+0x38/0xe0
> > [  133.244948] [<ffff200008265cf8>] event_function_call+0x1c8/0x258
> > [  133.246197] [<ffff20000826ad04>] perf_remove_from_context+0x54/0xf0
> > [  133.247514] [<ffff20000827f188>] SyS_perf_event_open+0x1528/0x18e0
> > [  133.248831] Exception stack(0xffff800038c5fec0 to 0xffff800038c60000)
> > [  133.250199] fec0: 0000000020b12f88 0000000000001af8 00000000ffffffff 0000000000000008
> > [  133.251843] fee0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [  133.253503] ff00: 00000000000000f1 0000000000000000 0000000000405850 00000000003d0f00
> > [  133.255132] ff20: 0000ffff94514f60 00000000004ae890 0000000000000027 0000000000000001
> > [  133.256756] ff40: 0000000000000000 0000000000826000 0000000000000000 00000000004c0158
> > [  133.258392] ff60: 00000000ffffffff 0000000020b12f88 0000000000001af8 000000000046d290
> > [  133.260006] ff80: 00000000004aaba8 0000000000473af8 0000ffffe5360da0 0000000000000000
> > [  133.261629] ffa0: 0000ffff94514f60 0000ffff94514640 00000000004020fc 0000ffff94514640
> > [  133.263253] ffc0: 000000000042d034 00000000a0000000 0000000020b12f88 00000000000000f1
> > [  133.264886] ffe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [  133.266535] [<ffff200008084170>] el0_svc_naked+0x24/0x28
> > [  133.267648] ==================================================================
> > 
> > ... which is triggered by the Syzkaller repro program at the end of this
> > email. I haven't yet come up with a C reproducer; sorry.
> > 
> > The PC seems to be the load of cpuctx->active_oncpu at the end of the
> > function, so it looks like cpuctx is NULL.
> > 
> > The system has (homogeneous) armv8_pmuv3, breakpoint, and software PMUs.
> > 
> > I initially hit this on v4.14-rc4, and can reproduce the issue on
> > v4.14-rc7. I haven't tried any other kernels yet.
> > 
> > I'll continue digging, unless someone else has already solved this.
> 
> Did you get anywhere with this?

I got a *bit* further, but I haven't figured out the underlying issue
yet.

I minimized the reproducer down to the following:

----
# {Threaded:true Collide:true Repeat:true Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}

r2 = gettid()
mmap(&(0x7f0000000000/0xd3f000)=nil, 0xd3f000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
r0 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, 0xffffffffffffffff, 0x0)
mmap(&(0x7f0000d3f000/0x1000)=nil, 0x1000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
r1 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)
dup3(0, 0, 0)
perf_event_open(&(0x7f0000b13000-0x78)={0x0, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)
----

Note: the dup3() is an expensive NOP (since oldfd == newfd), but I think
it's triggering an interesting scheduling pattern, since thus far I
haven't managed to trigger the bug without it.

That creates a perf_cpu_clock event, adds another to that group, and
adds a HW event to that same group. In parallel.

Sometimes at the point the HW event is added, the leading SW event is in
PERF_EVENT_STATE_INACTIVE, but the follower SW event is in
PERF_EVENT_STATE_ACTIVE. The context both are held in is inactive, so
the follower event's state makes no sense.

I added a dump to event_sched_out() that catches this:

[   35.995144] Uh-oh:
[   35.995144]   event ffff800039a1f880
[   35.995144]   event->state 1
[   35.995144]   event->cpu -1
[   35.995144]   pmu ffff20000a3b2600 (perf_cpu_clock, AKA (null))
[   35.995144]   leader ffff800039a1a480
[   35.995144]   leader->state -1
[   35.995144]   pmu ffff20000a3b2600 (perf_cpu_clock, AKA (null))
[   35.995144]   ctx ffff80003932e180, pmu ffff20000a3b2600 (perf_cpu_clock AKA (null))

I'll try to dig into this a bit more next week.

I can't reproduce this with Syzkaller running in a single thread, nor
with some multi-threaded tests I wrote in C, so I guess there's a subtle
race I'm not managing to hit.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: v4.14-rc{4,7} null pointer dereference in event_sched_out()
  2017-11-24 18:10   ` Mark Rutland
@ 2017-11-24 18:16     ` Mark Rutland
  0 siblings, 0 replies; 4+ messages in thread
From: Mark Rutland @ 2017-11-24 18:16 UTC (permalink / raw)
  To: Will Deacon; +Cc: peterz, mingo, linux-kernel, linux-arm-kernel

On Fri, Nov 24, 2017 at 06:10:56PM +0000, Mark Rutland wrote:
> On Wed, Nov 15, 2017 at 06:00:20PM +0000, Will Deacon wrote:
> > On Mon, Oct 30, 2017 at 04:23:15PM +0000, Mark Rutland wrote:
> > > As a heads-up, while fuzzing arm64 v4.14-rc{4,7} with Syzkaller, I hit a
> > > KASAN splat in event_sched_out():
> > Did you get anywhere with this?
> 
> I got a *bit* further, but I haven't figured out the underlying issue
> yet.

Forgot to mention, the above all applies to a vanilla v4.14 arm64
kernel; defconfig + KASAN_INLINE.

Thanks,
Mark.

> 
> I minimized the reproducer down to the following:
> 
> ----
> # {Threaded:true Collide:true Repeat:true Procs:1 Sandbox:none Fault:false FaultCall:-1 FaultNth:0 EnableTun:true UseTmpDir:true HandleSegv:true WaitRepeat:true Debug:false Repro:false}
> 
> r2 = gettid()
> mmap(&(0x7f0000000000/0xd3f000)=nil, 0xd3f000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
> r0 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, 0xffffffffffffffff, 0x0)
> mmap(&(0x7f0000d3f000/0x1000)=nil, 0x1000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
> r1 = perf_event_open(&(0x7f0000d15000-0x78)={0x1, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)
> dup3(0, 0, 0)
> perf_event_open(&(0x7f0000b13000-0x78)={0x0, 0x78, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, r2, 0xffffffff, r0, 0x0)
> ----
> 
> Note: the dup3() is an expensive NOP (since oldfd == newfd), but I think
> it's triggering an interesting scheduling pattern, since thus far I
> haven't managed to trigger the bug without it.
> 
> That creates a perf_cpu_clock event, adds another to that group, and
> adds a HW event to that same group. In parallel.
> 
> Sometimes at the point the HW event is added, the leading SW event is in
> PERF_EVENT_STATE_INACTIVE, but the follower SW event is in
> PERF_EVENT_STATE_ACTIVE. The context both are held in is inactive, so
> the follower event's state makes no sense.
> 
> I added a dump to event_sched_out() that catches this:
> 
> [   35.995144] Uh-oh:
> [   35.995144]   event ffff800039a1f880
> [   35.995144]   event->state 1
> [   35.995144]   event->cpu -1
> [   35.995144]   pmu ffff20000a3b2600 (perf_cpu_clock, AKA (null))
> [   35.995144]   leader ffff800039a1a480
> [   35.995144]   leader->state -1
> [   35.995144]   pmu ffff20000a3b2600 (perf_cpu_clock, AKA (null))
> [   35.995144]   ctx ffff80003932e180, pmu ffff20000a3b2600 (perf_cpu_clock AKA (null))
> 
> I'll try to dig into this a bit more next week.
> 
> I can't reproduce this with Syzkaller running in a single thread, nor
> with some multi-threaded tests I wrote in C, so I guess there's a subtle
> race I'm not managing to hit.
> 
> Thanks,
> Mark.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2017-11-24 18:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-10-30 16:23 v4.14-rc{4,7} null pointer dereference in event_sched_out() Mark Rutland
2017-11-15 18:00 ` Will Deacon
2017-11-24 18:10   ` Mark Rutland
2017-11-24 18:16     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox