Oops in slab.c in CentOS kernel, looking for ideas

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Oops in slab.c in CentOS kernel, looking for ideas
@ 2016-09-27 16:12 Chris Friesen
  2016-09-27 18:33 ` Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2016-09-27 16:12 UTC (permalink / raw)
  To: linux-mm


I've got a CentOS 7 kernel that has been slightly modified, but the mm
subsystem hasn't been touched.  I'm hoping you can give me some guidance.

I have an intermittent Oops that looks like what is below.  The issue
is currently occurring on one CPU of one system, but has been seen
before infrequently.  Once the corruption occurs it causes an Oops on
every call to __mpol_dup() on this CPU.

Basically it appears that __mpol_dup() is failing because the value of
c->freelist in slab_alloc_node() is corrupt, causing the call to
get_freepointer_safe(s, object) to Oops because it tries to dereference
"object + s->offset".  (Where s->offset is zero.)

In the trace, "kmem_cache_alloc+0x87" maps to the following assembly:
   0xffffffff8118be17 <+135>:   mov    (%r12,%rax,1),%rbx

This corresponds to this line in get_freepointer():
	return *(void **)(object + s->offset);

In the assembly code, R12 is "object", and RAX is s->offset.

So the question becomes, why is "object" (which corresponds to c->freelist)
corrupt?

Looking at the value of R12 (0x1ada8000), it's nonzero but also not a
valid pointer. Does the value mean anything to you?  (I'm not really
a memory subsystem guy, so I'm hoping you might have some ideas.)

Do you have any suggestions on how to track down what's going on here?

Thanks,
Chris

PS: Please CC me on replies, I'm not subscribed to the list.




2016-09-24T16:43:45.125 controller-1 kernel: alert [90390.702162] BUG: unable to handle kernel paging request at 000000001ada8000
2016-09-24T16:43:45.125 controller-1 kernel: alert [90390.709965] IP: [<ffffffff8118be17>] kmem_cache_alloc+0x87/0x250
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.716689] PGD 0 
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.718945] Oops: 0000 [#43] PREEMPT SMP 
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.723454] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iptable_raw xt_CHECKSUM xt_connmark iptable_
mangle nbd ebtable_filter ebtables igb_uio(OE) uio drbd(OE) libcrc32c nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables virtio_net nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_conntrack iptable_filter xt_nat xt_comment xt_multiport iptable_nat nf_conn
track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth nfsv3 nfs fscache 8021q garp stp mrp llc cls_u32 sch_sfq sch_htb dm_mod iTCO_wdt iTCO_vendor_support ipmi_devintf intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper
 lrw gf128mul ablk_helper cryptd lpc_ich mfd_core mei_me mei i2c_i801 ipmi_si ipmi_msghandler acpi_power_meter wrs_avp(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahc
i libahci ixgbe mdio igb i2c_algo_bit i2c_core libata dca i40e(OE) vxlan ip6_udp_tunnel udp_tunnel ptp pps_core
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.836474] CPU: 48 PID: 42192 Comm: qemu-kvm Tainted: G      D    OE  ------------   3.10.0-327.28.3.6.tis.x86_64 #1
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.848328] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.860181] task: ffff880463f75a90 ti: ffff8804120f0000 task.ti: ffff8804120f0000
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.868540] RIP: 0010:[<ffffffff8118be17>]  [<ffffffff8118be17>] kmem_cache_alloc+0x87/0x250
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.877980] RSP: 0018:ffff8804120f3d40  EFLAGS: 00010286
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.883913] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000019230
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.891883] RDX: 0000000000019130 RSI: 00000000000000d0 RDI: ffff8804120f3fd8
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.899853] RBP: ffff8804120f3d88 R08: 0000000000018710 R09: ffffffff811832e8
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.907823] R10: 0000000000000000 R11: ffffffffffffff83 R12: 000000001ada8000
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.915794] R13: 00000000000000d0 R14: ffff88103ec06200 R15: ffff88103ec06200
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.923765] FS:  00007fb6cb236e00(0000) GS:ffff88103f540000(0000) knlGS:0000000000000000
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.932804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.939222] CR2: 000000001ada8000 CR3: 0000000464152000 CR4: 00000000003427e0
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.947194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.955166] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.963135] Stack:
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.965380]  00ff880400000002 0000000000000246 ffff88107ffda000 00000000997694b3
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.973688]  0000000000000000 ffff88046d4815a8 00000000003d0f00 ffff8802c8003c60
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.981996]  ffff880463f75a90 ffff8804120f3e38 ffffffff811832e8 ffff88107ffda000
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.990304] Call Trace:
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.993039]  [<ffffffff811832e8>] __mpol_dup+0x38/0x140
2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.998876]  [<ffffffff8118bea2>] ? kmem_cache_alloc+0x112/0x250
2016-09-24T16:43:45.125 controller-1 kernel: warning [90391.005591]  [<ffffffff8100be69>] ? read_tsc+0x9/0x10
2016-09-24T16:43:45.125 controller-1 kernel: warning [90391.011235]  [<ffffffff8105ef91>] copy_process.part.30+0x611/0x1570
2016-09-24T16:43:45.125 controller-1 kernel: warning [90391.018230]  [<ffffffff810600d1>] do_fork+0xe1/0x350
2016-09-24T16:43:45.142 controller-1 kernel: warning [90391.023778]  [<ffffffff810603c6>] SyS_clone+0x16/0x20
2016-09-24T16:43:45.142 controller-1 kernel: warning [90391.029421]  [<ffffffff816792d9>] stub_clone+0x69/0x90


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c
  2016-09-27 16:12 Oops in slab.c in CentOS kernel, looking for ideas Chris Friesen
@ 2016-09-27 18:33 ` Chris Friesen
  2016-09-28  5:14   ` Joonsoo Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2016-09-27 18:33 UTC (permalink / raw)
  To: linux-mm


Sorry, I had a typo in my earlier message.  The issue is actually in slub.c.

Chris

On 09/27/2016 10:12 AM, Chris Friesen wrote:
>
> I've got a CentOS 7 kernel that has been slightly modified, but the mm
> subsystem hasn't been touched.  I'm hoping you can give me some guidance.
>
> I have an intermittent Oops that looks like what is below.  The issue
> is currently occurring on one CPU of one system, but has been seen
> before infrequently.  Once the corruption occurs it causes an Oops on
> every call to __mpol_dup() on this CPU.
>
> Basically it appears that __mpol_dup() is failing because the value of
> c->freelist in slab_alloc_node() is corrupt, causing the call to
> get_freepointer_safe(s, object) to Oops because it tries to dereference
> "object + s->offset".  (Where s->offset is zero.)
>
> In the trace, "kmem_cache_alloc+0x87" maps to the following assembly:
>     0xffffffff8118be17 <+135>:   mov    (%r12,%rax,1),%rbx
>
> This corresponds to this line in get_freepointer():
> 	return *(void **)(object + s->offset);
>
> In the assembly code, R12 is "object", and RAX is s->offset.
>
> So the question becomes, why is "object" (which corresponds to c->freelist)
> corrupt?
>
> Looking at the value of R12 (0x1ada8000), it's nonzero but also not a
> valid pointer. Does the value mean anything to you?  (I'm not really
> a memory subsystem guy, so I'm hoping you might have some ideas.)
>
> Do you have any suggestions on how to track down what's going on here?
>
> Thanks,
> Chris
>
> PS: Please CC me on replies, I'm not subscribed to the list.
>
>
>
>
> 2016-09-24T16:43:45.125 controller-1 kernel: alert [90390.702162] BUG: unable to handle kernel paging request at 000000001ada8000
> 2016-09-24T16:43:45.125 controller-1 kernel: alert [90390.709965] IP: [<ffffffff8118be17>] kmem_cache_alloc+0x87/0x250
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.716689] PGD 0
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.718945] Oops: 0000 [#43] PREEMPT SMP
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.723454] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iptable_raw xt_CHECKSUM xt_connmark iptable_
> mangle nbd ebtable_filter ebtables igb_uio(OE) uio drbd(OE) libcrc32c nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables virtio_net nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_conntrack iptable_filter xt_nat xt_comment xt_multiport iptable_nat nf_conn
> track_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack veth nfsv3 nfs fscache 8021q garp stp mrp llc cls_u32 sch_sfq sch_htb dm_mod iTCO_wdt iTCO_vendor_support ipmi_devintf intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper
>   lrw gf128mul ablk_helper cryptd lpc_ich mfd_core mei_me mei i2c_i801 ipmi_si ipmi_msghandler acpi_power_meter wrs_avp(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ahc
> i libahci ixgbe mdio igb i2c_algo_bit i2c_core libata dca i40e(OE) vxlan ip6_udp_tunnel udp_tunnel ptp pps_core
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.836474] CPU: 48 PID: 42192 Comm: qemu-kvm Tainted: G      D    OE  ------------   3.10.0-327.28.3.6.tis.x86_64 #1
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.848328] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.860181] task: ffff880463f75a90 ti: ffff8804120f0000 task.ti: ffff8804120f0000
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.868540] RIP: 0010:[<ffffffff8118be17>]  [<ffffffff8118be17>] kmem_cache_alloc+0x87/0x250
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.877980] RSP: 0018:ffff8804120f3d40  EFLAGS: 00010286
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.883913] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000019230
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.891883] RDX: 0000000000019130 RSI: 00000000000000d0 RDI: ffff8804120f3fd8
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.899853] RBP: ffff8804120f3d88 R08: 0000000000018710 R09: ffffffff811832e8
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.907823] R10: 0000000000000000 R11: ffffffffffffff83 R12: 000000001ada8000
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.915794] R13: 00000000000000d0 R14: ffff88103ec06200 R15: ffff88103ec06200
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.923765] FS:  00007fb6cb236e00(0000) GS:ffff88103f540000(0000) knlGS:0000000000000000
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.932804] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.939222] CR2: 000000001ada8000 CR3: 0000000464152000 CR4: 00000000003427e0
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.947194] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.955166] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.963135] Stack:
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.965380]  00ff880400000002 0000000000000246 ffff88107ffda000 00000000997694b3
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.973688]  0000000000000000 ffff88046d4815a8 00000000003d0f00 ffff8802c8003c60
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.981996]  ffff880463f75a90 ffff8804120f3e38 ffffffff811832e8 ffff88107ffda000
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.990304] Call Trace:
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.993039]  [<ffffffff811832e8>] __mpol_dup+0x38/0x140
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90390.998876]  [<ffffffff8118bea2>] ? kmem_cache_alloc+0x112/0x250
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90391.005591]  [<ffffffff8100be69>] ? read_tsc+0x9/0x10
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90391.011235]  [<ffffffff8105ef91>] copy_process.part.30+0x611/0x1570
> 2016-09-24T16:43:45.125 controller-1 kernel: warning [90391.018230]  [<ffffffff810600d1>] do_fork+0xe1/0x350
> 2016-09-24T16:43:45.142 controller-1 kernel: warning [90391.023778]  [<ffffffff810603c6>] SyS_clone+0x16/0x20
> 2016-09-24T16:43:45.142 controller-1 kernel: warning [90391.029421]  [<ffffffff816792d9>] stub_clone+0x69/0x90
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c
  2016-09-27 18:33 ` Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c Chris Friesen
@ 2016-09-28  5:14   ` Joonsoo Kim
  2016-09-28 22:10     ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Joonsoo Kim @ 2016-09-28  5:14 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-mm

On Tue, Sep 27, 2016 at 12:33:08PM -0600, Chris Friesen wrote:
> 
> Sorry, I had a typo in my earlier message.  The issue is actually in slub.c.
> 
> Chris
> 
> On 09/27/2016 10:12 AM, Chris Friesen wrote:
> >
> >I've got a CentOS 7 kernel that has been slightly modified, but the mm
> >subsystem hasn't been touched.  I'm hoping you can give me some guidance.
> >
> >I have an intermittent Oops that looks like what is below.  The issue
> >is currently occurring on one CPU of one system, but has been seen
> >before infrequently.  Once the corruption occurs it causes an Oops on
> >every call to __mpol_dup() on this CPU.
> >
> >Basically it appears that __mpol_dup() is failing because the value of
> >c->freelist in slab_alloc_node() is corrupt, causing the call to
> >get_freepointer_safe(s, object) to Oops because it tries to dereference
> >"object + s->offset".  (Where s->offset is zero.)
> >
> >In the trace, "kmem_cache_alloc+0x87" maps to the following assembly:
> >    0xffffffff8118be17 <+135>:   mov    (%r12,%rax,1),%rbx
> >
> >This corresponds to this line in get_freepointer():
> >	return *(void **)(object + s->offset);
> >
> >In the assembly code, R12 is "object", and RAX is s->offset.
> >
> >So the question becomes, why is "object" (which corresponds to c->freelist)
> >corrupt?
> >
> >Looking at the value of R12 (0x1ada8000), it's nonzero but also not a
> >valid pointer. Does the value mean anything to you?  (I'm not really
> >a memory subsystem guy, so I'm hoping you might have some ideas.)
> >
> >Do you have any suggestions on how to track down what's going on here?

Please run with kernel parameter "slub_debug=F" or something.
See Documentation/vm/slub.txt.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c
  2016-09-28  5:14   ` Joonsoo Kim
@ 2016-09-28 22:10     ` Chris Friesen
  2016-09-29  1:46       ` Joonsoo Kim
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2016-09-28 22:10 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-mm

On 09/27/2016 11:14 PM, Joonsoo Kim wrote:
> On Tue, Sep 27, 2016 at 12:33:08PM -0600, Chris Friesen wrote:
>> On 09/27/2016 10:12 AM, Chris Friesen wrote:

>>> Basically it appears that __mpol_dup() is failing because the value of
>>> c->freelist in slab_alloc_node() is corrupt, causing the call to
>>> get_freepointer_safe(s, object) to Oops because it tries to dereference
>>> "object + s->offset".  (Where s->offset is zero.)
>>>
>>> In the trace, "kmem_cache_alloc+0x87" maps to the following assembly:
>>>     0xffffffff8118be17 <+135>:   mov    (%r12,%rax,1),%rbx
>>>
>>> This corresponds to this line in get_freepointer():
>>> 	return *(void **)(object + s->offset);
>>>
>>> In the assembly code, R12 is "object", and RAX is s->offset.
>>>
>>> So the question becomes, why is "object" (which corresponds to c->freelist)
>>> corrupt?
>>>
>>> Looking at the value of R12 (0x1ada8000), it's nonzero but also not a
>>> valid pointer. Does the value mean anything to you?  (I'm not really
>>> a memory subsystem guy, so I'm hoping you might have some ideas.)
>>>
>>> Do you have any suggestions on how to track down what's going on here?
>
> Please run with kernel parameter "slub_debug=F" or something.
> See Documentation/vm/slub.txt.

I enabled /sys/kernel/slab/numa_policy/sanity_checks, but that's only going to 
maybe help if I can cause another CPU to get into the bad state.

I created a kernel module to walk the list of objects starting at 
__this_cpu_ptr(policy_cache->cpu_slab)->freelist.

All other cpus had a freelist value of NULL, or else they pointed at a linked 
list which eventually ended with a NULL pointer.  ("s->offset" is 0, so 
get_freepointer() just dereferences "object")  For example:

cpu: 45, object: ffff88046d483cd8->ffff88046d483de0->ffff88046d483ee8->NULL
cpu: 46, object: NULL

In the case of CPU 48, the value of 
__this_cpu_ptr(policy_cache->cpu_slab)->freelist was good, but dereferencing it 
gave an invalid address:

cpu: 48, object: ffff8804102f0528->000000001ada8000

In the code path that causes problems we call mpol_new(), which calls 
kmem_cache_alloc(policy_cache, GFP_KERNEL) and consumes the object at 
0xffff8804102f0528.  This results in 
__this_cpu_ptr(policy_cache->cpu_slab)->freelist being set to 
0x000000001ada8000.   Then we fork, which calls __mpol_dup() which calls 
kmem_cache_alloc(policy_cache, GFP_KERNEL) with 'object' set to 
0x000000001ada8000, which segfaults when we try to dereference it in 
get_freepointer().

So how do items get added to the freelist?  Do they always get added at the 
head, or is there a path where they could get added at the tail?

Chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c
  2016-09-28 22:10     ` Chris Friesen
@ 2016-09-29  1:46       ` Joonsoo Kim
  0 siblings, 0 replies; 5+ messages in thread
From: Joonsoo Kim @ 2016-09-29  1:46 UTC (permalink / raw)
  To: Chris Friesen; +Cc: linux-mm

On Wed, Sep 28, 2016 at 04:10:15PM -0600, Chris Friesen wrote:
> On 09/27/2016 11:14 PM, Joonsoo Kim wrote:
> >On Tue, Sep 27, 2016 at 12:33:08PM -0600, Chris Friesen wrote:
> >>On 09/27/2016 10:12 AM, Chris Friesen wrote:
> 
> >>>Basically it appears that __mpol_dup() is failing because the value of
> >>>c->freelist in slab_alloc_node() is corrupt, causing the call to
> >>>get_freepointer_safe(s, object) to Oops because it tries to dereference
> >>>"object + s->offset".  (Where s->offset is zero.)
> >>>
> >>>In the trace, "kmem_cache_alloc+0x87" maps to the following assembly:
> >>>    0xffffffff8118be17 <+135>:   mov    (%r12,%rax,1),%rbx
> >>>
> >>>This corresponds to this line in get_freepointer():
> >>>	return *(void **)(object + s->offset);
> >>>
> >>>In the assembly code, R12 is "object", and RAX is s->offset.
> >>>
> >>>So the question becomes, why is "object" (which corresponds to c->freelist)
> >>>corrupt?
> >>>
> >>>Looking at the value of R12 (0x1ada8000), it's nonzero but also not a
> >>>valid pointer. Does the value mean anything to you?  (I'm not really
> >>>a memory subsystem guy, so I'm hoping you might have some ideas.)
> >>>
> >>>Do you have any suggestions on how to track down what's going on here?
> >
> >Please run with kernel parameter "slub_debug=F" or something.
> >See Documentation/vm/slub.txt.
> 
> I enabled /sys/kernel/slab/numa_policy/sanity_checks, but that's
> only going to maybe help if I can cause another CPU to get into the
> bad state.

It would help because it checks all the operations of the slub. If
wrong pointer is freed, it can detect at that moment. It also check
next free object pointer so problem would be found earlier.

If it would not detect your problem, I guess that someone overwrite
content of freed object. Could you check with slub_debug=FZPU.

And, KASAN would help you, too.

> 
> I created a kernel module to walk the list of objects starting at
> __this_cpu_ptr(policy_cache->cpu_slab)->freelist.
> 
> All other cpus had a freelist value of NULL, or else they pointed at
> a linked list which eventually ended with a NULL pointer.
> ("s->offset" is 0, so get_freepointer() just dereferences "object")
> For example:
> 
> cpu: 45, object: ffff88046d483cd8->ffff88046d483de0->ffff88046d483ee8->NULL
> cpu: 46, object: NULL
> 
> In the case of CPU 48, the value of
> __this_cpu_ptr(policy_cache->cpu_slab)->freelist was good, but
> dereferencing it gave an invalid address:
> 
> cpu: 48, object: ffff8804102f0528->000000001ada8000
> 
> 
> In the code path that causes problems we call mpol_new(), which
> calls kmem_cache_alloc(policy_cache, GFP_KERNEL) and consumes the
> object at 0xffff8804102f0528.  This results in
> __this_cpu_ptr(policy_cache->cpu_slab)->freelist being set to
> 0x000000001ada8000.   Then we fork, which calls __mpol_dup() which
> calls kmem_cache_alloc(policy_cache, GFP_KERNEL) with 'object' set
> to 0x000000001ada8000, which segfaults when we try to dereference it
> in get_freepointer().
> 
> So how do items get added to the freelist?  Do they always get added
> at the head, or is there a path where they could get added at the
> tail?

They always get added at the head.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-09-29  1:38 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-27 16:12 Oops in slab.c in CentOS kernel, looking for ideas Chris Friesen
2016-09-27 18:33 ` Oops in slab.c in CentOS kernel, looking for ideas -- correction, it's in slub.c Chris Friesen
2016-09-28  5:14   ` Joonsoo Kim
2016-09-28 22:10     ` Chris Friesen
2016-09-29  1:46       ` Joonsoo Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).