* generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS
@ 2016-02-27 13:02 Eryu Guan
2016-02-27 20:10 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Eryu Guan @ 2016-02-27 13:02 UTC (permalink / raw)
To: xfs; +Cc: Dan Williams, Ross Zwisler
Hi,
Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers
"list_add attempted on force-poisoned entry" warnings on XFS, test hosts
are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts.
[ 2441.772340] run fstests generic/320 at 2016-02-27 05:52:05
[ 2441.916302] XFS (sda5): Unmounting Filesystem
[ 2442.180551] XFS (sda5): Mounting V5 Filesystem
[ 2442.231940] XFS (sda5): Ending clean mount
[ 2460.142155] list_add attempted on force-poisoned entry
[ 2460.142278] ------------[ cut here ]------------
[ 2460.142326] WARNING: at lib/list_debug.c:34
[ 2460.142362] Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_
ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ses enclosure scsi_transport_sas sg shpchp powernv_rng rtc_opal nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sr_
mod sd_mod cdrom mlx4_ib ib_sa ib_mad mlx4_en ib_core vxlan ip6_udp_tunnel ib_addr udp_tunnel mlx4_core ipr libata tg3 ptp pps_core
[ 2460.143083] CPU: 21 PID: 134288 Comm: cp Not tainted 4.5.0-rc5 #25
[ 2460.143141] task: c000000f550adb00 ti: c000000fb5fc0000 task.ti: c000000fb5fc0000
[ 2460.143209] NIP: c00000000043c390 LR: c00000000043c38c CTR: 0000000030041bec
[ 2460.143278] REGS: c000000fb5fc30a0 TRAP: 0700 Not tainted (4.5.0-rc5)
[ 2460.143334] MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22028422 XER: 00000000
[ 2460.143575] CFAR: c0000000008259d8 SOFTE: 0
GPR00: c00000000043c38c c000000fb5fc3320 c00000000108bc00 000000000000002a
GPR04: c000000ff8d49c50 c000000ff8d5b4a0 900000010280b033 0000000000000065
GPR08: 0000000000000000 c000000000bcb284 0000000ff8180000 000000000000076b
GPR12: 0000000000008800 c00000000fb8bd00 0000000000000000 0000000000000000
GPR16: 0000000000000000 00003ffffa430978 0000000000000000 0000000000000001
GPR20: c000000fb08ab880 0000000000008180 d00000002024bae0 0000000000000000
GPR24: 0000000000000000 c000000fc73c9e40 c000000fe914a740 0000000000000002
GPR28: 0000000000000001 c000000fc812ab38 c000000fc812ab38 c000000fb5fc33c0
[ 2460.144450] NIP [c00000000043c390] __list_add+0xb0/0x150
[ 2460.144497] LR [c00000000043c38c] __list_add+0xac/0x150
[ 2460.144542] Call Trace:
[ 2460.144566] [c000000fb5fc3320] [c00000000043c38c] __list_add+0xac/0x150 (unreliable)
[ 2460.144648] [c000000fb5fc33a0] [c00000000081b454] __down+0x4c/0xf8
[ 2460.144718] [c000000fb5fc3410] [c00000000010b6f8] down+0x68/0x70
[ 2460.144809] [c000000fb5fc3450] [d0000000201ebf4c] xfs_buf_lock+0x4c/0x150 [xfs]
[ 2460.144902] [c000000fb5fc3490] [d0000000201ec2f0] _xfs_buf_find+0x2a0/0x4d0 [xfs]
[ 2460.144995] [c000000fb5fc3530] [d0000000201ec70c] xfs_buf_get_map+0x4c/0x250 [xfs]
[ 2460.145088] [c000000fb5fc35d0] [d0000000201ed740] xfs_buf_read_map+0x50/0x1f0 [xfs]
[ 2460.145244] [c000000fb5fc3630] [d0000000202280d8] xfs_trans_read_buf_map+0x1d8/0x390 [xfs]
[ 2460.145412] [c000000fb5fc36a0] [d0000000201d849c] xfs_read_agi+0x9c/0x130 [xfs]
[ 2460.145580] [c000000fb5fc3700] [d0000000201d8580] xfs_ialloc_read_agi+0x50/0x160 [xfs]
[ 2460.145748] [c000000fb5fc3750] [d0000000201d92f0] xfs_dialloc+0x130/0x2f0 [xfs]
[ 2460.145918] [c000000fb5fc37e0] [d000000020203274] xfs_ialloc+0x84/0x550 [xfs]
[ 2460.146068] [c000000fb5fc3860] [d0000000202037d8] xfs_dir_ialloc+0x98/0x270 [xfs]
[ 2460.146240] [c000000fb5fc3960] [d000000020203f24] xfs_create+0x4f4/0x750 [xfs]
[ 2460.146412] [c000000fb5fc3a60] [d0000000201ff0a8] xfs_generic_create+0x208/0x3d0 [xfs]
[ 2460.146572] [c000000fb5fc3af0] [c0000000002af0f8] vfs_create+0x158/0x1f0
[ 2460.146708] [c000000fb5fc3b40] [c0000000002b0cd8] do_last+0x698/0xf40
[ 2460.146845] [c000000fb5fc3c10] [c0000000002b1624] path_openat+0xa4/0x3c0
[ 2460.146982] [c000000fb5fc3c90] [c0000000002b2ec4] do_filp_open+0x74/0xf0
[ 2460.147120] [c000000fb5fc3dc0] [c00000000029c654] do_sys_open+0x1b4/0x2d0
[ 2460.147257] [c000000fb5fc3e30] [c000000000009204] system_call+0x38/0xb4
[ 2460.147392] Instruction dump:
[ 2460.147459] fbfe0000 38210080 e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020
[ 2460.147680] 3c62ff9e 38631a78 483e95f9 60000000 <0fe00000> 4bffff98 60000000 60420000
[ 2460.147902] ---[ end trace aa6c4f990634a77c ]---
The warning itself is introduced by commit 5c2c2587b132 ("mm, dax, pmem:
introduce {get|put}_dev_pagemap() for dax-gup") in 4.5-rc1, and git
bisect points to the same commit. But I'm not sure if it's a regression
or just exposes an old issue.
If more infomation is needed please let me know.
Thanks,
Eryu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS
2016-02-27 13:02 generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS Eryu Guan
@ 2016-02-27 20:10 ` Dan Williams
2016-02-28 5:31 ` Eryu Guan
0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2016-02-27 20:10 UTC (permalink / raw)
To: Eryu Guan; +Cc: Ross Zwisler, XFS Developers
On Sat, Feb 27, 2016 at 5:02 AM, Eryu Guan <eguan@redhat.com> wrote:
> Hi,
>
> Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers
> "list_add attempted on force-poisoned entry" warnings on XFS, test hosts
> are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts.
Hmm, this triggers when a list_head has ->next or ->prev pointing at
the address of force_poison which is only defined in lib/list_debug.c.
The only call site that uses list_force_poison() is in
devm_memremap_pages(). That currently depends on CONFIG_ZONE_DEVICE
which in turn depends on X86_64.
So, this appears to be a false positive and the address of
force_poison is somehow ending up on the stack by accident as that is
the random value being passed in from __down_common:
struct semaphore_waiter waiter;
list_add_tail(&waiter.list, &sem->wait_list);
So, I think we need a more unique poison value that should never
appear on the stack:
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 4a27153574e2..0604806c2f52 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -21,6 +21,7 @@
*/
#define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA)
#define LIST_POISON2 ((void *) 0x200 + POISON_POINTER_DELTA)
+#define LIST_POISON3 ((void *) 0x500 + POISON_POINTER_DELTA)
/********** include/linux/timer.h **********/
/*
diff --git a/lib/list_debug.c b/lib/list_debug.c
index 3345a089ef7b..318bf1c181b2 100644
--- a/lib/list_debug.c
+++ b/lib/list_debug.c
@@ -12,11 +12,10 @@
#include <linux/kernel.h>
#include <linux/rculist.h>
-static struct list_head force_poison;
void list_force_poison(struct list_head *entry)
{
- entry->next = &force_poison;
- entry->prev = &force_poison;
+ entry->next = LIST_POISON3;
+ entry->prev = LIST_POISON3;
}
/*
@@ -30,7 +29,7 @@ void __list_add(struct list_head *new,
struct list_head *prev,
struct list_head *next)
{
- WARN(new->next == &force_poison || new->prev == &force_poison,
+ WARN(new->next == LIST_POISON3 || new->prev == LIST_POISON3,
"list_add attempted on force-poisoned entry\n");
WARN(next->prev != prev,
"list_add corruption. next->prev should be "
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS
2016-02-27 20:10 ` Dan Williams
@ 2016-02-28 5:31 ` Eryu Guan
2016-02-29 18:22 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Eryu Guan @ 2016-02-28 5:31 UTC (permalink / raw)
To: Dan Williams; +Cc: Ross Zwisler, XFS Developers
On Sat, Feb 27, 2016 at 12:10:51PM -0800, Dan Williams wrote:
> On Sat, Feb 27, 2016 at 5:02 AM, Eryu Guan <eguan@redhat.com> wrote:
> > Hi,
> >
> > Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers
> > "list_add attempted on force-poisoned entry" warnings on XFS, test hosts
> > are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts.
>
> Hmm, this triggers when a list_head has ->next or ->prev pointing at
> the address of force_poison which is only defined in lib/list_debug.c.
> The only call site that uses list_force_poison() is in
> devm_memremap_pages(). That currently depends on CONFIG_ZONE_DEVICE
> which in turn depends on X86_64.
>
> So, this appears to be a false positive and the address of
> force_poison is somehow ending up on the stack by accident as that is
> the random value being passed in from __down_common:
>
> struct semaphore_waiter waiter;
>
> list_add_tail(&waiter.list, &sem->wait_list);
>
> So, I think we need a more unique poison value that should never
> appear on the stack:
Unfortunately I can still see the warning after applying this test patch.
Then I added debug code to print the pointer value and re-ran the test.
All five failures printed the same pointer value, failed in the same
pattern:
list_add attempted on force-poisoned entry(0000000000000500), new->next = c00000000136bc00, new->prev = 0000000000000500
Thanks,
Eryu
>
> diff --git a/include/linux/poison.h b/include/linux/poison.h
> index 4a27153574e2..0604806c2f52 100644
> --- a/include/linux/poison.h
> +++ b/include/linux/poison.h
> @@ -21,6 +21,7 @@
> */
> #define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA)
> #define LIST_POISON2 ((void *) 0x200 + POISON_POINTER_DELTA)
> +#define LIST_POISON3 ((void *) 0x500 + POISON_POINTER_DELTA)
>
> /********** include/linux/timer.h **********/
> /*
> diff --git a/lib/list_debug.c b/lib/list_debug.c
> index 3345a089ef7b..318bf1c181b2 100644
> --- a/lib/list_debug.c
> +++ b/lib/list_debug.c
> @@ -12,11 +12,10 @@
> #include <linux/kernel.h>
> #include <linux/rculist.h>
>
> -static struct list_head force_poison;
> void list_force_poison(struct list_head *entry)
> {
> - entry->next = &force_poison;
> - entry->prev = &force_poison;
> + entry->next = LIST_POISON3;
> + entry->prev = LIST_POISON3;
> }
>
> /*
> @@ -30,7 +29,7 @@ void __list_add(struct list_head *new,
> struct list_head *prev,
> struct list_head *next)
> {
> - WARN(new->next == &force_poison || new->prev == &force_poison,
> + WARN(new->next == LIST_POISON3 || new->prev == LIST_POISON3,
> "list_add attempted on force-poisoned entry\n");
> WARN(next->prev != prev,
> "list_add corruption. next->prev should be "
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS
2016-02-28 5:31 ` Eryu Guan
@ 2016-02-29 18:22 ` Dan Williams
2016-03-01 8:00 ` Eryu Guan
0 siblings, 1 reply; 6+ messages in thread
From: Dan Williams @ 2016-02-29 18:22 UTC (permalink / raw)
To: Eryu Guan; +Cc: Ross Zwisler, XFS Developers
On Sat, Feb 27, 2016 at 9:31 PM, Eryu Guan <eguan@redhat.com> wrote:
> On Sat, Feb 27, 2016 at 12:10:51PM -0800, Dan Williams wrote:
>> On Sat, Feb 27, 2016 at 5:02 AM, Eryu Guan <eguan@redhat.com> wrote:
>> > Hi,
>> >
>> > Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers
>> > "list_add attempted on force-poisoned entry" warnings on XFS, test hosts
>> > are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts.
>>
>> Hmm, this triggers when a list_head has ->next or ->prev pointing at
>> the address of force_poison which is only defined in lib/list_debug.c.
>> The only call site that uses list_force_poison() is in
>> devm_memremap_pages(). That currently depends on CONFIG_ZONE_DEVICE
>> which in turn depends on X86_64.
>>
>> So, this appears to be a false positive and the address of
>> force_poison is somehow ending up on the stack by accident as that is
>> the random value being passed in from __down_common:
>>
>> struct semaphore_waiter waiter;
>>
>> list_add_tail(&waiter.list, &sem->wait_list);
>>
>> So, I think we need a more unique poison value that should never
>> appear on the stack:
>
> Unfortunately I can still see the warning after applying this test patch.
>
> Then I added debug code to print the pointer value and re-ran the test.
> All five failures printed the same pointer value, failed in the same
> pattern:
>
> list_add attempted on force-poisoned entry(0000000000000500), new->next = c00000000136bc00, new->prev = 0000000000000500
>
I think this means that no matter what we do the stack will pick up
these poison values unless the list_head is explicitly initialized.
Something like the following:
diff --git a/kernel/locking/semaphore.c b/kernel/locking/semaphore.c
index b8120abe594b..39929b4e6fbb 100644
--- a/kernel/locking/semaphore.c
+++ b/kernel/locking/semaphore.c
@@ -205,7 +205,9 @@ static inline int __sched __down_common(struct
semaphore *sem, long state,
long timeout)
{
struct task_struct *task = current;
- struct semaphore_waiter waiter;
+ struct semaphore_waiter waiter = {
+ .list = LIST_HEAD_INIT(waiter.list),
+ };
list_add_tail(&waiter.list, &sem->wait_list);
waiter.task = task;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS
2016-02-29 18:22 ` Dan Williams
@ 2016-03-01 8:00 ` Eryu Guan
2016-03-01 16:27 ` Dan Williams
0 siblings, 1 reply; 6+ messages in thread
From: Eryu Guan @ 2016-03-01 8:00 UTC (permalink / raw)
To: Dan Williams; +Cc: Ross Zwisler, XFS Developers
On Mon, Feb 29, 2016 at 10:22:06AM -0800, Dan Williams wrote:
> On Sat, Feb 27, 2016 at 9:31 PM, Eryu Guan <eguan@redhat.com> wrote:
> > On Sat, Feb 27, 2016 at 12:10:51PM -0800, Dan Williams wrote:
> >> On Sat, Feb 27, 2016 at 5:02 AM, Eryu Guan <eguan@redhat.com> wrote:
> >> > Hi,
> >> >
> >> > Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers
> >> > "list_add attempted on force-poisoned entry" warnings on XFS, test hosts
> >> > are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts.
> >>
> >> Hmm, this triggers when a list_head has ->next or ->prev pointing at
> >> the address of force_poison which is only defined in lib/list_debug.c.
> >> The only call site that uses list_force_poison() is in
> >> devm_memremap_pages(). That currently depends on CONFIG_ZONE_DEVICE
> >> which in turn depends on X86_64.
> >>
> >> So, this appears to be a false positive and the address of
> >> force_poison is somehow ending up on the stack by accident as that is
> >> the random value being passed in from __down_common:
> >>
> >> struct semaphore_waiter waiter;
> >>
> >> list_add_tail(&waiter.list, &sem->wait_list);
> >>
> >> So, I think we need a more unique poison value that should never
> >> appear on the stack:
> >
> > Unfortunately I can still see the warning after applying this test patch.
> >
> > Then I added debug code to print the pointer value and re-ran the test.
> > All five failures printed the same pointer value, failed in the same
> > pattern:
> >
> > list_add attempted on force-poisoned entry(0000000000000500), new->next = c00000000136bc00, new->prev = 0000000000000500
> >
>
> I think this means that no matter what we do the stack will pick up
> these poison values unless the list_head is explicitly initialized.
> Something like the following:
Umm, it's still reproducible... but seems harder than before, it took me
200+ iterations to hit (less than 10 iterations in previous runs)
[ 5465.401191] run fstests generic/320 at 2016-03-01 00:11:13
[ 5465.561754] XFS (sda5): Unmounting Filesystem
[ 5466.202130] XFS (sda5): Mounting V4 Filesystem
[ 5466.260396] XFS (sda5): Ending clean mount
[ 5482.629036] list_add attempted on force-poisoned entry(0000000000000500), new->next == d0000000059ecdb0, new->prev == 0000000000000500
[ 5482.629070] ------------[ cut here ]------------
[ 5482.629077] WARNING: at lib/list_debug.c:33
[ 5482.629082] Modules linked in: pseries_rng(E) sg(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) sunrpc(E) grace(E) ip_tables(E) xfs(E) libcrc32c(E) sd_mod(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E)
[ 5482.629121] CPU: 4 PID: 7203 Comm: rm Tainted: G E 4.5.0-rc5+ #4
[ 5482.629129] task: c0000005f0712d00 ti: c0000004c749c000 task.ti: c0000004c749c000
[ 5482.629136] NIP: c00000000042db78 LR: c00000000042db74 CTR: 00000000013abb8c
[ 5482.629144] REGS: c0000004c749f3a0 TRAP: 0700 Tainted: G E (4.5.0-rc5+)
[ 5482.629150] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI,TM[E]> CR: 22002822 XER: 0000000b
[ 5482.629173] CFAR: c00000000080a5e4 SOFTE: 0
GPR00: c00000000042db74 c0000004c749f620 c00000000136bc00 000000000000007a
GPR04: c0000005ffc09c58 c0000005ffc1b490 000005cf29ac0100 0000000000000000
GPR08: 0000000000000000 c000000000c3b27c 00000005fefd0000 0000000000000f97
GPR12: 0000000042002844 c00000000e822400 0000000000000002 0000000000000000
GPR16: 000000001000da78 000000001000d758 0000010018009cd0 000000001000dab8
GPR20: 0000000000000001 c0000004c749f960 c0000005f5931e00 c0000005f5931e80
GPR24: c0000000fd01c000 c0000000fbe0a400 fffffffffffff000 0000000000000000
GPR28: c0000005ea59f938 c0000005f5931e88 c0000005f1f6b890 c0000004c749f720
[ 5482.629270] NIP [c00000000042db78] .__list_add+0xa8/0x140
[ 5482.629277] LR [c00000000042db74] .__list_add+0xa4/0x140
[ 5482.629282] Call Trace:
[ 5482.629288] [c0000004c749f620] [c00000000042db74] .__list_add+0xa4/0x140 (unreliable)
[ 5482.629299] [c0000004c749f6b0] [c0000000008010ec] .rwsem_down_read_failed+0x6c/0x1a0
[ 5482.629310] [c0000004c749f760] [c000000000800828] .down_read+0x58/0x60
[ 5482.629396] [c0000004c749f7e0] [d000000005a1a6bc] .xfs_log_commit_cil+0x7c/0x600 [xfs]
[ 5482.629482] [c0000004c749f8f0] [d000000005a12848] .__xfs_trans_commit+0x178/0x300 [xfs]
[ 5482.629567] [c0000004c749f990] [d000000005a12f14] .__xfs_trans_roll+0x74/0x130 [xfs]
[ 5482.629653] [c0000004c749fa30] [d0000000059e8994] .xfs_bmap_finish+0xd4/0x1e0 [xfs]
[ 5482.629738] [c0000004c749fae0] [d000000005a06acc] .xfs_inactive_ifree+0x20c/0x2a0 [xfs]
[ 5482.629830] [c0000004c749fb90] [d000000005a06c14] .xfs_inactive+0xb4/0x190 [xfs]
[ 5482.629913] [c0000004c749fc10] [d000000005a0d8f8] .xfs_fs_evict_inode+0xd8/0x170 [xfs]
[ 5482.629923] [c0000004c749fca0] [c0000000002b60d8] .evict+0xe8/0x220
[ 5482.629932] [c0000004c749fd30] [c0000000002a9278] .do_unlinkat+0x248/0x360
[ 5482.629942] [c0000004c749fe30] [c000000000009204] system_call+0x38/0xb4
[ 5482.629948] Instruction dump:
[ 5482.629953] e8010010 eba1ffe8 ebc1fff0 ebe1fff8 7c0803a6 4e800020 3c62ff77 38800500
[ 5482.629969] 38632550 7d254b78 483dca15 60000000 <0fe00000> 4bffff90 3c62ff77 7fe4fb78
[ 5482.629985] ---[ end trace 71e305f825b24cc9 ]---
Thanks,
Eryu
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS
2016-03-01 8:00 ` Eryu Guan
@ 2016-03-01 16:27 ` Dan Williams
0 siblings, 0 replies; 6+ messages in thread
From: Dan Williams @ 2016-03-01 16:27 UTC (permalink / raw)
To: Eryu Guan; +Cc: Ross Zwisler, XFS Developers
On Tue, Mar 1, 2016 at 12:00 AM, Eryu Guan <eguan@redhat.com> wrote:
> On Mon, Feb 29, 2016 at 10:22:06AM -0800, Dan Williams wrote:
>> On Sat, Feb 27, 2016 at 9:31 PM, Eryu Guan <eguan@redhat.com> wrote:
>> > On Sat, Feb 27, 2016 at 12:10:51PM -0800, Dan Williams wrote:
>> >> On Sat, Feb 27, 2016 at 5:02 AM, Eryu Guan <eguan@redhat.com> wrote:
>> >> > Hi,
>> >> >
>> >> > Starting from 4.5-rc1 kernel, I sometimes see generic/320 triggers
>> >> > "list_add attempted on force-poisoned entry" warnings on XFS, test hosts
>> >> > are arm64/ppc64/ppc64le, haven't seen it on x86_64 hosts.
>> >>
>> >> Hmm, this triggers when a list_head has ->next or ->prev pointing at
>> >> the address of force_poison which is only defined in lib/list_debug.c.
>> >> The only call site that uses list_force_poison() is in
>> >> devm_memremap_pages(). That currently depends on CONFIG_ZONE_DEVICE
>> >> which in turn depends on X86_64.
>> >>
>> >> So, this appears to be a false positive and the address of
>> >> force_poison is somehow ending up on the stack by accident as that is
>> >> the random value being passed in from __down_common:
>> >>
>> >> struct semaphore_waiter waiter;
>> >>
>> >> list_add_tail(&waiter.list, &sem->wait_list);
>> >>
>> >> So, I think we need a more unique poison value that should never
>> >> appear on the stack:
>> >
>> > Unfortunately I can still see the warning after applying this test patch.
>> >
>> > Then I added debug code to print the pointer value and re-ran the test.
>> > All five failures printed the same pointer value, failed in the same
>> > pattern:
>> >
>> > list_add attempted on force-poisoned entry(0000000000000500), new->next = c00000000136bc00, new->prev = 0000000000000500
>> >
>>
>> I think this means that no matter what we do the stack will pick up
>> these poison values unless the list_head is explicitly initialized.
>> Something like the following:
>
> Umm, it's still reproducible... but seems harder than before, it took me
> 200+ iterations to hit (less than 10 iterations in previous runs)
Similar fix, just in rwsem_down_read_failed() this time:
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index a4d4de05b2d1..68678a20da52 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -214,8 +214,10 @@ __visible
struct rw_semaphore __sched *rwsem_down_read_failed(struct rw_semaphore *sem)
{
long count, adjustment = -RWSEM_ACTIVE_READ_BIAS;
- struct rwsem_waiter waiter;
struct task_struct *tsk = current;
+ struct rwsem_waiter waiter = {
+ .list = LIST_HEAD_INIT(waiter.list),
+ };
/* set up my own style of waitqueue */
waiter.task = tsk;
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-03-01 16:28 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-27 13:02 generic/320 triggers "list_add attempted on force-poisoned entry" warning on XFS Eryu Guan
2016-02-27 20:10 ` Dan Williams
2016-02-28 5:31 ` Eryu Guan
2016-02-29 18:22 ` Dan Williams
2016-03-01 8:00 ` Eryu Guan
2016-03-01 16:27 ` Dan Williams
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox