public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* Null pointer dereference while at ACL limit on v5 XFS
@ 2014-06-23 21:48 Michael L. Semon
  2014-06-23 22:08 ` Mark Tinguely
  2014-06-24  2:18 ` Dave Chinner
  0 siblings, 2 replies; 11+ messages in thread
From: Michael L. Semon @ 2014-06-23 21:48 UTC (permalink / raw)
  To: xfs-oss

At the ACL limit of v5-superblock XFS--with a directory filled with both default 
and access ACL entries--I'm getting a null pointer dereference on x86 after 
creating the directory successfully.

Disclaimer:  There's some current issues on 32-bit x86 that, for instance, can 
make badblocks see phantom bad blocks on a read test.  My apologies in advance 
if this turns out to be a false alarm bug report.

My first encounter with this issue involved fsstress.  Here's part of a `crash` 
session from the fsstress run.

root@oldsvrhw:/mnt/crashdump/xfs-fsstress-max-acl-2# crash vmlinux System.map vmcore
crash 7.0.4
# setup was snipped.
DEBUG KERNEL: vmlinux  
    DUMPFILE: vmcore
        CPUS: 1
        DATE: Fri Jun 20 13:04:23 2014
      UPTIME: 00:29:49
LOAD AVERAGE: 1.06, 1.56, 0.75
       TASKS: 78
    NODENAME: oldsvrhw
     RELEASE: 3.16.0-rc1+
     VERSION: #1 SMP Thu Jun 19 20:10:57 EDT 2014
     MACHINE: i686  (730 Mhz)
      MEMORY: 510.4 MB
       PANIC: "Oops: 0000 [#1] SMP DEBUG_PAGEALLOC" (check log for details)
         PID: 41
     COMMAND: "kworker/0:1H"
        TASK: de8f2ac0  [THREAD_INFO: de92e000]
         CPU: 0
       STATE: TASK_RUNNING (PANIC)

crash> dmesg
# ### excerpt:

# ### mounted $SCRATCH_DEV, applied ACLs to $SCRATCH_MNT/test_dir
[ 1499.886170] XFS (hdc5): Mounting V5 Filesystem
[ 1500.057759] XFS (hdc5): Ending clean mount

# ### ran `fsstress -d $SCRATCH_MNT/test-dir/a -n 10000 -p 16`
# ### BTW, does fsstress trash the existing directory before a run?
[ 1654.043846] fsstress (610) used greatest stack depth: 4956 bytes left
[ 1654.063619] fsstress (615) used greatest stack depth: 4920 bytes left
[ 1654.082220] fsstress (623) used greatest stack depth: 4820 bytes left
[ 1654.087344] fsstress (611) used greatest stack depth: 4800 bytes left
[ 1654.094295] fsstress (614) used greatest stack depth: 4784 bytes left
[ 1654.191650] fsstress (608) used greatest stack depth: 4768 bytes left
[ 1663.452036] perf interrupt took too long (2537 > 2500), lowering kernel.perf_event_max_sample_rate to 50000

# ### This was OK, so I hit Ctrl-c, then ran this (not in child directory):
# ### ran `fsstress -d $SCRATCH_MNT/test-dir -n 10000 -p 16`
[ 1789.338622] BUG: unable to handle kernel NULL pointer dereference at 0000000c
[ 1789.338842] IP: [<c1263048>] xfs_ail_check+0x58/0xc0
[ 1789.338994] *pde = 00000000 
[ 1789.339042] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1789.339042] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #1
[ 1789.339042] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
[ 1789.339042] Workqueue: xfslogd xfs_buf_iodone_work
[ 1789.339042] task: de8f2ac0 ti: de92e000 task.ti: de92e000
[ 1789.339042] EIP: 0060:[<c1263048>] EFLAGS: 00010286 CPU: 0
[ 1789.339042] EIP is at xfs_ail_check+0x58/0xc0
[ 1789.339042] EAX: 00000000 EBX: dde37370 ECX: 0000330a EDX: 0000330a
[ 1789.339042] ESI: 00000001 EDI: 00000001 EBP: de92fc9c ESP: de92fc90
[ 1789.339042]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1789.339042] CR0: 8005003b CR2: 0000000c CR3: 1c8ef000 CR4: 000007d0
[ 1789.339042] Stack:
[ 1789.339042]  dde37370 ddc4ea80 00000001 de92fcac c12630c3 dde37370 00000012 de92fd04
[ 1789.339042]  c1263d1d 00000000 00000001 00000000 00000000 ddc4ea88 de92fd38 dc8bba28
[ 1789.339042]  ddc4ea80 00000000 0000330a de92fd44 0000001f 00000001 00000012 00003362
[ 1789.339042] Call Trace:
[ 1789.339042]  [<c12630c3>] xfs_ail_delete+0x13/0x60
[ 1789.339042]  [<c1263d1d>] xfs_trans_ail_update_bulk+0xad/0x3c0
[ 1789.339042]  [<c11fbd35>] xfs_trans_committed_bulk+0x255/0x300
[ 1789.339042]  [<c125dcac>] xlog_cil_committed+0x3c/0x160
[ 1789.339042]  [<c1259f8c>] xlog_state_do_callback+0x17c/0x380
[ 1789.339042]  [<c125a253>] xlog_state_done_syncing+0xc3/0xe0
[ 1789.339042]  [<c125a2de>] xlog_iodone+0x6e/0x100
[ 1789.339042]  [<c11dd08b>] xfs_buf_iodone_work+0x5b/0xe0
[ 1789.339042]  [<c1055bc5>] process_one_work+0x1b5/0x570
[ 1789.339042]  [<c1055b48>] ? process_one_work+0x138/0x570
[ 1789.339042]  [<c10560e5>] ? worker_thread+0x165/0x470
[ 1789.339042]  [<c1056077>] worker_thread+0xf7/0x470
[ 1789.339042]  [<c1055f80>] ? process_one_work+0x570/0x570
[ 1789.339042]  [<c105d061>] kthread+0xa1/0xc0
[ 1789.339042]  [<c108509b>] ? trace_hardirqs_on+0xb/0x10
[ 1789.339042]  [<c1500ae1>] ret_from_kernel_thread+0x21/0x30
[ 1789.339042]  [<c105cfc0>] ? insert_kthread_work+0x80/0x80
[ 1789.339042] Code: c1 b8 d8 9e 62 c1 e8 a8 00 f9 ff 8b 43 04 39 c6 74 10 8b 7b 0c 39 78 0c 8b 53 08 8b 48 08 74 43 73 45 8b 03 39 c6 74 24 8b 73 0c <39> 70 0c 8b 53 08 8b 48 08 74 4d 73 14 b9 38 00 00 00 ba e3 a3
[ 1789.339042] EIP: [<c1263048>] xfs_ail_check+0x58/0xc0 SS:ESP 0068:de92fc90
[ 1789.339042] CR2: 000000000000000c

Since then, I've been trying out different ways of reproducing this 
message.

# ------ shortest way found so far ------

For a seed file, use this URL...

https://docs.google.com/file/d/0B41268QKoNjtMEU5UUZvMXF6ZzQ

Hopefully, the order will go like this (from memory):

# get the seed file, and
xz -d max_acl_file.xz

mkfs.xfs -f -m crc=1 $SCRATCH_DEV
mount $SCRATCH_DEV $SCRATCH_MNT

mkdir $SCRATCH_MNT/acl-dir

setfacl --set-file=max_acl_file $SCRATCH_MNT/acl-dir

cd $SCRATCH_MNT/acl-dir

# or `touch a b c; mkdir d e f`
mkdir a b c
sync

rm -rv ./*
sync

# ----------------------------------------

That's as short as I can get it...if it works.  If not, keep trying 
different things.  The tests need not be heavy:  A few seconds worth 
of fs_mark should populate the directory sufficiently.  The `rm -rv ./*` 
is key.  sync is not required, the oops will happen on its own.

This seems to happen only at a point where one or both ACL limits 
have been hit.  I'm only guessing that when a default entry is made, space 
is allocated for the access entry, and vice versa.

Thanks!

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-23 21:48 Null pointer dereference while at ACL limit on v5 XFS Michael L. Semon
@ 2014-06-23 22:08 ` Mark Tinguely
  2014-06-23 22:13   ` Mark Tinguely
  2014-06-24  2:18 ` Dave Chinner
  1 sibling, 1 reply; 11+ messages in thread
From: Mark Tinguely @ 2014-06-23 22:08 UTC (permalink / raw)
  To: xfs

On 06/23/14 16:48, Michael L. Semon wrote:
> At the ACL limit of v5-superblock XFS--with a directory filled with both default
> and access ACL entries--I'm getting a null pointer dereference on x86 after
> creating the directory successfully.
>
> Disclaimer:  There's some current issues on 32-bit x86 that, for instance, can
> make badblocks see phantom bad blocks on a read test.  My apologies in advance
> if this turns out to be a false alarm bug report.
>
> My first encounter with this issue involved fsstress.  Here's part of a `crash`
> session from the fsstress run.
>
> root@oldsvrhw:/mnt/crashdump/xfs-fsstress-max-acl-2# crash vmlinux System.map vmcore
> crash 7.0.4
> # setup was snipped.
> DEBUG KERNEL: vmlinux
>      DUMPFILE: vmcore
>          CPUS: 1
>          DATE: Fri Jun 20 13:04:23 2014
>        UPTIME: 00:29:49
> LOAD AVERAGE: 1.06, 1.56, 0.75
>         TASKS: 78
>      NODENAME: oldsvrhw
>       RELEASE: 3.16.0-rc1+
>       VERSION: #1 SMP Thu Jun 19 20:10:57 EDT 2014
>       MACHINE: i686  (730 Mhz)
>        MEMORY: 510.4 MB
>         PANIC: "Oops: 0000 [#1] SMP DEBUG_PAGEALLOC" (check log for details)
>           PID: 41
>       COMMAND: "kworker/0:1H"
>          TASK: de8f2ac0  [THREAD_INFO: de92e000]
>           CPU: 0
>         STATE: TASK_RUNNING (PANIC)
>
> crash>  dmesg
> # ### excerpt:
>
> # ### mounted $SCRATCH_DEV, applied ACLs to $SCRATCH_MNT/test_dir
> [ 1499.886170] XFS (hdc5): Mounting V5 Filesystem
> [ 1500.057759] XFS (hdc5): Ending clean mount
>
> # ### ran `fsstress -d $SCRATCH_MNT/test-dir/a -n 10000 -p 16`
> # ### BTW, does fsstress trash the existing directory before a run?
> [ 1654.043846] fsstress (610) used greatest stack depth: 4956 bytes left
> [ 1654.063619] fsstress (615) used greatest stack depth: 4920 bytes left
> [ 1654.082220] fsstress (623) used greatest stack depth: 4820 bytes left
> [ 1654.087344] fsstress (611) used greatest stack depth: 4800 bytes left
> [ 1654.094295] fsstress (614) used greatest stack depth: 4784 bytes left
> [ 1654.191650] fsstress (608) used greatest stack depth: 4768 bytes left
> [ 1663.452036] perf interrupt took too long (2537>  2500), lowering kernel.perf_event_max_sample_rate to 50000
>
> # ### This was OK, so I hit Ctrl-c, then ran this (not in child directory):
> # ### ran `fsstress -d $SCRATCH_MNT/test-dir -n 10000 -p 16`
> [ 1789.338622] BUG: unable to handle kernel NULL pointer dereference at 0000000c
> [ 1789.338842] IP: [<c1263048>] xfs_ail_check+0x58/0xc0
> [ 1789.338994] *pde = 00000000
> [ 1789.339042] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [ 1789.339042] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #1
> [ 1789.339042] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
> [ 1789.339042] Workqueue: xfslogd xfs_buf_iodone_work
> [ 1789.339042] task: de8f2ac0 ti: de92e000 task.ti: de92e000
> [ 1789.339042] EIP: 0060:[<c1263048>] EFLAGS: 00010286 CPU: 0
> [ 1789.339042] EIP is at xfs_ail_check+0x58/0xc0
> [ 1789.339042] EAX: 00000000 EBX: dde37370 ECX: 0000330a EDX: 0000330a
> [ 1789.339042] ESI: 00000001 EDI: 00000001 EBP: de92fc9c ESP: de92fc90
> [ 1789.339042]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 1789.339042] CR0: 8005003b CR2: 0000000c CR3: 1c8ef000 CR4: 000007d0
> [ 1789.339042] Stack:
> [ 1789.339042]  dde37370 ddc4ea80 00000001 de92fcac c12630c3 dde37370 00000012 de92fd04
> [ 1789.339042]  c1263d1d 00000000 00000001 00000000 00000000 ddc4ea88 de92fd38 dc8bba28
> [ 1789.339042]  ddc4ea80 00000000 0000330a de92fd44 0000001f 00000001 00000012 00003362
> [ 1789.339042] Call Trace:
> [ 1789.339042]  [<c12630c3>] xfs_ail_delete+0x13/0x60
> [ 1789.339042]  [<c1263d1d>] xfs_trans_ail_update_bulk+0xad/0x3c0
> [ 1789.339042]  [<c11fbd35>] xfs_trans_committed_bulk+0x255/0x300
> [ 1789.339042]  [<c125dcac>] xlog_cil_committed+0x3c/0x160
> [ 1789.339042]  [<c1259f8c>] xlog_state_do_callback+0x17c/0x380
> [ 1789.339042]  [<c125a253>] xlog_state_done_syncing+0xc3/0xe0
> [ 1789.339042]  [<c125a2de>] xlog_iodone+0x6e/0x100
> [ 1789.339042]  [<c11dd08b>] xfs_buf_iodone_work+0x5b/0xe0
> [ 1789.339042]  [<c1055bc5>] process_one_work+0x1b5/0x570
> [ 1789.339042]  [<c1055b48>] ? process_one_work+0x138/0x570
> [ 1789.339042]  [<c10560e5>] ? worker_thread+0x165/0x470
> [ 1789.339042]  [<c1056077>] worker_thread+0xf7/0x470
> [ 1789.339042]  [<c1055f80>] ? process_one_work+0x570/0x570
> [ 1789.339042]  [<c105d061>] kthread+0xa1/0xc0
> [ 1789.339042]  [<c108509b>] ? trace_hardirqs_on+0xb/0x10
> [ 1789.339042]  [<c1500ae1>] ret_from_kernel_thread+0x21/0x30
> [ 1789.339042]  [<c105cfc0>] ? insert_kthread_work+0x80/0x80
> [ 1789.339042] Code: c1 b8 d8 9e 62 c1 e8 a8 00 f9 ff 8b 43 04 39 c6 74 10 8b 7b 0c 39 78 0c 8b 53 08 8b 48 08 74 43 73 45 8b 03 39 c6 74 24 8b 73 0c<39>  70 0c 8b 53 08 8b 48 08 74 4d 73 14 b9 38 00 00 00 ba e3 a3
> [ 1789.339042] EIP: [<c1263048>] xfs_ail_check+0x58/0xc0 SS:ESP 0068:de92fc90
> [ 1789.339042] CR2: 000000000000000c
>
> Since then, I've been trying out different ways of reproducing this
> message.
>
> # ------ shortest way found so far ------
>
> For a seed file, use this URL...
>
> https://docs.google.com/file/d/0B41268QKoNjtMEU5UUZvMXF6ZzQ
>
> Hopefully, the order will go like this (from memory):
>
> # get the seed file, and
> xz -d max_acl_file.xz
>
> mkfs.xfs -f -m crc=1 $SCRATCH_DEV
> mount $SCRATCH_DEV $SCRATCH_MNT
>
> mkdir $SCRATCH_MNT/acl-dir
>
> setfacl --set-file=max_acl_file $SCRATCH_MNT/acl-dir
>
> cd $SCRATCH_MNT/acl-dir
>
> # or `touch a b c; mkdir d e f`
> mkdir a b c
> sync
>
> rm -rv ./*
> sync
>
> # ----------------------------------------
>
> That's as short as I can get it...if it works.  If not, keep trying
> different things.  The tests need not be heavy:  A few seconds worth
> of fs_mark should populate the directory sufficiently.  The `rm -rv ./*`
> is key.  sync is not required, the oops will happen on its own.
>
> This seems to happen only at a point where one or both ACL limits
> have been hit.  I'm only guessing that when a default entry is made, space
> is allocated for the access entry, and vice versa.
>
> Thanks!
>
> Michael
>

Michael, do you have the vmcore dump for this or was this just from the 
messages.

Thanks.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-23 22:08 ` Mark Tinguely
@ 2014-06-23 22:13   ` Mark Tinguely
  2014-06-24  3:34     ` Michael L. Semon
  0 siblings, 1 reply; 11+ messages in thread
From: Mark Tinguely @ 2014-06-23 22:13 UTC (permalink / raw)
  To: Michael L. Semon; +Cc: xfs

On 06/23/14 17:08, Mark Tinguely wrote:
> On 06/23/14 16:48, Michael L. Semon wrote:
>> At the ACL limit of v5-superblock XFS--with a directory filled with
>> both default
>> and access ACL entries--I'm getting a null pointer dereference on x86
>> after
>> creating the directory successfully.
>>
>> Disclaimer: There's some current issues on 32-bit x86 that, for
>> instance, can
>> make badblocks see phantom bad blocks on a read test. My apologies in
>> advance
>> if this turns out to be a false alarm bug report.
>>
>> My first encounter with this issue involved fsstress. Here's part of a
>> `crash`
>> session from the fsstress run.
>>
>> root@oldsvrhw:/mnt/crashdump/xfs-fsstress-max-acl-2# crash vmlinux
>> System.map vmcore
>> crash 7.0.4
...
 >> Thanks!
 >>
 >> Michael
 >>
 >
 > Michael, do you have the vmcore dump for this or was this just from the
 > messages.
 >
 > Thanks.
 >
 > --Mark.

ummm, duh me. you were running crash ...

Can I look at the core?

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-23 21:48 Null pointer dereference while at ACL limit on v5 XFS Michael L. Semon
  2014-06-23 22:08 ` Mark Tinguely
@ 2014-06-24  2:18 ` Dave Chinner
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2014-06-24  2:18 UTC (permalink / raw)
  To: Michael L. Semon; +Cc: xfs-oss

On Mon, Jun 23, 2014 at 05:48:31PM -0400, Michael L. Semon wrote:
> At the ACL limit of v5-superblock XFS--with a directory filled with both default 
> and access ACL entries--I'm getting a null pointer dereference on x86 after 
> creating the directory successfully.
> 
> Disclaimer:  There's some current issues on 32-bit x86 that, for instance, can 
> make badblocks see phantom bad blocks on a read test.  My apologies in advance 
> if this turns out to be a false alarm bug report.
> 
> My first encounter with this issue involved fsstress.  Here's part of a `crash` 
> session from the fsstress run.

Ok, I haven't been able to reproduce this on x86-64....

> # ### ran `fsstress -d $SCRATCH_MNT/test-dir -n 10000 -p 16`
> [ 1789.338622] BUG: unable to handle kernel NULL pointer dereference at 0000000c
> [ 1789.338842] IP: [<c1263048>] xfs_ail_check+0x58/0xc0

Hmmm - xfs_ail_check()is
checking the LSN ordering of the items on the AIL, and it's crashed
trying to dereference one of the list pointers on the current log
item.


> [ 1789.339042]  [<c12630c3>] xfs_ail_delete+0x13/0x60
> [ 1789.339042]  [<c1263d1d>] xfs_trans_ail_update_bulk+0xad/0x3c0
> [ 1789.339042]  [<c11fbd35>] xfs_trans_committed_bulk+0x255/0x300
> [ 1789.339042]  [<c125dcac>] xlog_cil_committed+0x3c/0x160

And given that it is doing an update, I suspect a problem with
the XFS_LI_IN_AIL flag - that the item is not of the AIL, but has
that flag set.

Can you enable the xfs_ail* tracepoints, set
/proc/sys/kernel/ftrace_dump_on_oops and rerun the test? That should
dump the trace buffer into the kernel dmesg output showing AIL
operations just before the crash occurs. That might tell us what has
happened here...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-23 22:13   ` Mark Tinguely
@ 2014-06-24  3:34     ` Michael L. Semon
  2014-06-24  4:04       ` Dave Chinner
  2014-06-24 16:31       ` Mark Tinguely
  0 siblings, 2 replies; 11+ messages in thread
From: Michael L. Semon @ 2014-06-24  3:34 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On 06/23/2014 06:13 PM, Mark Tinguely wrote:
> On 06/23/14 17:08, Mark Tinguely wrote:
>> On 06/23/14 16:48, Michael L. Semon wrote:
>>> At the ACL limit of v5-superblock XFS--with a directory filled with
>>> both default
>>> and access ACL entries--I'm getting a null pointer dereference on x86
>>> after
>>> creating the directory successfully.
>>>
>>> Disclaimer: There's some current issues on 32-bit x86 that, for
>>> instance, can
>>> make badblocks see phantom bad blocks on a read test. My apologies in
>>> advance
>>> if this turns out to be a false alarm bug report.
>>>
>>> My first encounter with this issue involved fsstress. Here's part of a
>>> `crash`
>>> session from the fsstress run.
>>>
>>> root@oldsvrhw:/mnt/crashdump/xfs-fsstress-max-acl-2# crash vmlinux
>>> System.map vmcore
>>> crash 7.0.4
> ...
>>> Thanks!
>>>
>>> Michael
>>>
>>
>> Michael, do you have the vmcore dump for this or was this just from the
>> messages.
>>
>> Thanks.
>>
>> --Mark.
> 
> ummm, duh me. you were running crash ...
> 
> Can I look at the core?
> 
> --Mark.

Sure!  I've uploaded two sets of core dumps (vmcore, vmlinux, System.map, 
config, sample crash session) and put them here for a short time:

https://drive.google.com/folderview?id=0B41268QKoNjtUGFpcTlCbEdkQXM

xfs-fsstress-max-acl-2.tar.xz has the dmesg that was originally posted.

xfs-fsstress-max-acl-3.tar.xz came from the simple mkdir/rm test.  I got 
lucky with this simple test because the message looks like it came from 
the kernel linked list diagnostic:

[ 1068.431391] ------------[ cut here ]------------
[ 1068.431566] WARNING: CPU: 0 PID: 41 at lib/list_debug.c:59 __list_del_entry+0xce/0x110()
[ 1068.431596] list_del corruption. prev->next should be db5bf580, but was   (null)
[ 1068.431629] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #3
[ 1068.431656] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
[ 1068.431697] Workqueue: xfslogd xfs_buf_iodone_work
[ 1068.431738]  00000000 00000000 de92fc24 c15d4e76 de92fc68 de92fc58 c103ca33 c1737648
[ 1068.431891]  de92fc84 00000029 c173705a 0000003b c13c3e9e 0000003b c13c3e9e 0000003b
[ 1068.432115]  db5bf580 00000001 de92fc70 c103cab3 00000009 de92fc68 c1737648 de92fc84
[ 1068.432267] Call Trace:
[ 1068.432329]  [<c15d4e76>] dump_stack+0x48/0x60
[ 1068.432386]  [<c103ca33>] warn_slowpath_common+0x83/0xa0
[ 1068.432433]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
[ 1068.432478]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
[ 1068.432524]  [<c103cab3>] warn_slowpath_fmt+0x33/0x40
[ 1068.432569]  [<c13c3e9e>] __list_del_entry+0xce/0x110
[ 1068.432615]  [<c13c3eeb>] list_del+0xb/0x20
[ 1068.432674]  [<c126eb4d>] xfs_ail_delete+0x1d/0x60
[ 1068.432721]  [<c126f945>] xfs_trans_ail_update_bulk+0x1a5/0x410
[ 1068.432780]  [<c12070ab>] xfs_trans_committed_bulk+0x2eb/0x320
[ 1068.432827]  [<c126957a>] xlog_cil_committed+0x3a/0x150
[ 1068.432874]  [<c12655ba>] xlog_state_do_callback+0x18a/0x390
[ 1068.432919]  [<c1265883>] xlog_state_done_syncing+0xc3/0xe0
[ 1068.432964]  [<c126590e>] xlog_iodone+0x6e/0x100
[ 1068.433055]  [<c11e821b>] xfs_buf_iodone_work+0x5b/0xe0
[ 1068.433114]  [<c1058557>] process_one_work+0x1b7/0x5d0
[ 1068.433160]  [<c10584da>] ? process_one_work+0x13a/0x5d0
[ 1068.433205]  [<c1058a1b>] ? worker_thread+0xab/0x4b0
[ 1068.433250]  [<c10589a9>] worker_thread+0x39/0x4b0
[ 1068.433304]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1068.433350]  [<c1058970>] ? process_one_work+0x5d0/0x5d0
[ 1068.433398]  [<c105fb58>] kthread+0xa8/0xc0
[ 1068.433444]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1068.433495]  [<c15dc781>] ret_from_kernel_thread+0x21/0x30
[ 1068.433540]  [<c105fab0>] ? insert_kthread_work+0x80/0x80
[ 1068.433567] ---[ end trace 60289514948e4bd7 ]---
[ 1068.433603] BUG: unable to handle kernel NULL pointer dereference at 0000000c
[ 1068.433795] IP: [<c126eac8>] xfs_ail_check+0x58/0xc0
[ 1068.433925] *pde = 00000000 
[ 1068.434027] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1068.434027] CPU: 0 PID: 41 Comm: kworker/0:1H Tainted: G        W     3.16.0-rc1+ #3
[ 1068.434027] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
[ 1068.434027] Workqueue: xfslogd xfs_buf_iodone_work
[ 1068.434027] task: de8faac0 ti: de92e000 task.ti: de92e000
[ 1068.434027] EIP: 0060:[<c126eac8>] EFLAGS: 00010286 CPU: 0
[ 1068.434027] EIP is at xfs_ail_check+0x58/0xc0
[ 1068.434027] EAX: 00000000 EBX: db5bf0b0 ECX: 00000015 EDX: 00000015
[ 1068.434027] ESI: 00000001 EDI: 00000001 EBP: de92fc9c ESP: de92fc90
[ 1068.434027]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1068.434027] CR0: 8005003b CR2: 0000000c CR3: 00ab9000 CR4: 000007d0
[ 1068.434027] Stack:
[ 1068.434027]  ddc81d80 db5bf0b0 00000001 de92fcac c126eb43 db5bf0b0 00000005 de92fd04
[ 1068.434027]  c126f945 00000000 00000001 00000000 00000000 ddc81d88 de92fd38 db04b210
[ 1068.434027]  ddc81d80 00000000 00000015 de92fd44 ddc81d80 00000001 00000037 00000005
[ 1068.434027] Call Trace:
[ 1068.434027]  [<c126eb43>] xfs_ail_delete+0x13/0x60
[ 1068.434027]  [<c126f945>] xfs_trans_ail_update_bulk+0x1a5/0x410
[ 1068.434027]  [<c12070ab>] xfs_trans_committed_bulk+0x2eb/0x320
[ 1068.434027]  [<c126957a>] xlog_cil_committed+0x3a/0x150
[ 1068.434027]  [<c12655ba>] xlog_state_do_callback+0x18a/0x390
[ 1068.434027]  [<c1265883>] xlog_state_done_syncing+0xc3/0xe0
[ 1068.434027]  [<c126590e>] xlog_iodone+0x6e/0x100
[ 1068.434027]  [<c11e821b>] xfs_buf_iodone_work+0x5b/0xe0
[ 1068.434027]  [<c1058557>] process_one_work+0x1b7/0x5d0
[ 1068.434027]  [<c10584da>] ? process_one_work+0x13a/0x5d0
[ 1068.434027]  [<c1058a1b>] ? worker_thread+0xab/0x4b0
[ 1068.434027]  [<c10589a9>] worker_thread+0x39/0x4b0
[ 1068.434027]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1068.434027]  [<c1058970>] ? process_one_work+0x5d0/0x5d0
[ 1068.434027]  [<c105fb58>] kthread+0xa8/0xc0
[ 1068.434027]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1068.434027]  [<c15dc781>] ret_from_kernel_thread+0x21/0x30
[ 1068.434027]  [<c105fab0>] ? insert_kthread_work+0x80/0x80
[ 1068.434027] Code: c1 b8 50 be 72 c1 e8 38 f7 f8 ff 8b 43 04 39 c6 74 10 8b 7b 0c 39 78 0c 8b 53 08 8b 48 08 74 43 73 45 8b 03 39 c6 74 24 8b 73 0c <39> 70 0c 8b 53 08 8b 48 08 74 4d 73 14 b9 38 00 00 00 ba 83 a3
[ 1068.434027] EIP: [<c126eac8>] xfs_ail_check+0x58/0xc0 SS:ESP 0068:de92fc90
[ 1068.434027] CR2: 000000000000000c

I can reproduce the oops in kernel 3.15.0, perhaps with xfs-oss/for-next 
merged, but there's no vmlinux to go with the kernel.  Therefore, I'll have 
to resort to other means (rebuilt kernel with netconsole, re-attaching the 
serial cable, etc.) to get the full crash log.

Thanks for looking into this!  I'll take Dave's advice on tracing, too, but 
it will be morning before I can collect the results.

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-24  3:34     ` Michael L. Semon
@ 2014-06-24  4:04       ` Dave Chinner
  2014-06-24 13:31         ` Michael L. Semon
  2014-07-01 22:27         ` Michael L. Semon
  2014-06-24 16:31       ` Mark Tinguely
  1 sibling, 2 replies; 11+ messages in thread
From: Dave Chinner @ 2014-06-24  4:04 UTC (permalink / raw)
  To: Michael L. Semon; +Cc: Mark Tinguely, xfs

On Mon, Jun 23, 2014 at 11:34:04PM -0400, Michael L. Semon wrote:
> [ 1068.431391] ------------[ cut here ]------------
> [ 1068.431566] WARNING: CPU: 0 PID: 41 at lib/list_debug.c:59 __list_del_entry+0xce/0x110()
> [ 1068.431596] list_del corruption. prev->next should be db5bf580, but was   (null)

Ok, so the current log item points to a log item that has
null pointers (i.e. not on the list).

> [ 1068.431629] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #3
> [ 1068.431656] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
> [ 1068.431697] Workqueue: xfslogd xfs_buf_iodone_work
> [ 1068.431738]  00000000 00000000 de92fc24 c15d4e76 de92fc68 de92fc58 c103ca33 c1737648
> [ 1068.431891]  de92fc84 00000029 c173705a 0000003b c13c3e9e 0000003b c13c3e9e 0000003b
> [ 1068.432115]  db5bf580 00000001 de92fc70 c103cab3 00000009 de92fc68 c1737648 de92fc84
> [ 1068.432267] Call Trace:
> [ 1068.432329]  [<c15d4e76>] dump_stack+0x48/0x60
> [ 1068.432386]  [<c103ca33>] warn_slowpath_common+0x83/0xa0
> [ 1068.432433]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
> [ 1068.432478]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
> [ 1068.432524]  [<c103cab3>] warn_slowpath_fmt+0x33/0x40
> [ 1068.432569]  [<c13c3e9e>] __list_del_entry+0xce/0x110
> [ 1068.432615]  [<c13c3eeb>] list_del+0xb/0x20
> [ 1068.432674]  [<c126eb4d>] xfs_ail_delete+0x1d/0x60
....
> [ 1068.433567] ---[ end trace 60289514948e4bd7 ]---
> [ 1068.433603] BUG: unable to handle kernel NULL pointer dereference at 0000000c
> [ 1068.433795] IP: [<c126eac8>] xfs_ail_check+0x58/0xc0

And that's trying to dereference a pointer from an item that is not
on the list....

So there's linked list corruption occurring here.

> I can reproduce the oops in kernel 3.15.0, perhaps with xfs-oss/for-next 
> merged, but there's no vmlinux to go with the kernel.  Therefore, I'll have 
> to resort to other means (rebuilt kernel with netconsole, re-attaching the 
> serial cable, etc.) to get the full crash log.

How far back can you reproduce it? If it's a recent occurrence, can
you bisect it?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-24  4:04       ` Dave Chinner
@ 2014-06-24 13:31         ` Michael L. Semon
  2014-07-01 22:27         ` Michael L. Semon
  1 sibling, 0 replies; 11+ messages in thread
From: Michael L. Semon @ 2014-06-24 13:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mark Tinguely, xfs

On Tue, 24 Jun 2014, Dave Chinner wrote:

> On Mon, Jun 23, 2014 at 11:34:04PM -0400, Michael L. Semon wrote:
> > [ 1068.431391] ------------[ cut here ]------------
> > [ 1068.431566] WARNING: CPU: 0 PID: 41 at lib/list_debug.c:59 __list_del_entry+0xce/0x110()
> > [ 1068.431596] list_del corruption. prev->next should be db5bf580, but was   (null)
> 
> Ok, so the current log item points to a log item that has
> null pointers (i.e. not on the list).
> 
> > [ 1068.431629] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #3
> > [ 1068.431656] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
> > [ 1068.431697] Workqueue: xfslogd xfs_buf_iodone_work
> > [ 1068.431738]  00000000 00000000 de92fc24 c15d4e76 de92fc68 de92fc58 c103ca33 c1737648
> > [ 1068.431891]  de92fc84 00000029 c173705a 0000003b c13c3e9e 0000003b c13c3e9e 0000003b
> > [ 1068.432115]  db5bf580 00000001 de92fc70 c103cab3 00000009 de92fc68 c1737648 de92fc84
> > [ 1068.432267] Call Trace:
> > [ 1068.432329]  [<c15d4e76>] dump_stack+0x48/0x60
> > [ 1068.432386]  [<c103ca33>] warn_slowpath_common+0x83/0xa0
> > [ 1068.432433]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
> > [ 1068.432478]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
> > [ 1068.432524]  [<c103cab3>] warn_slowpath_fmt+0x33/0x40
> > [ 1068.432569]  [<c13c3e9e>] __list_del_entry+0xce/0x110
> > [ 1068.432615]  [<c13c3eeb>] list_del+0xb/0x20
> > [ 1068.432674]  [<c126eb4d>] xfs_ail_delete+0x1d/0x60
> ....
> > [ 1068.433567] ---[ end trace 60289514948e4bd7 ]---
> > [ 1068.433603] BUG: unable to handle kernel NULL pointer dereference at 0000000c
> > [ 1068.433795] IP: [<c126eac8>] xfs_ail_check+0x58/0xc0
> 
> And that's trying to dereference a pointer from an item that is not
> on the list....
> 
> So there's linked list corruption occurring here.
> 
> > I can reproduce the oops in kernel 3.15.0, perhaps with xfs-oss/for-next 
> > merged, but there's no vmlinux to go with the kernel.  Therefore, I'll have 
> > to resort to other means (rebuilt kernel with netconsole, re-attaching the 
> > serial cable, etc.) to get the full crash log.
> 
> How far back can you reproduce it? If it's a recent occurrence, can
> you bisect it?
> 
> Cheers,
> 
> Dave.

I'll attempt to bisect this issue.  3.15.0 was tried simply because it didn't 
have any noticeable 32-bit oddities.  In fact, I liked 3.15 enough to base 
glibc on its headers.  Here's hoping that I can reproduce the issue on a 
3.10-based utility partition.  [Somewhere out there is a Murphy's Law of 
glibc Upgrades that describes this situation perfectly.]

Should it shed any extra light, here's the ftrace-dump-enabled dmesg from the 
most recent crash (mkdir/rm test), after my closing.

Thanks!  A Pentium III can bisect kernels only so quickly, so this will take 
some time.

Michael

[ 1739.697955] XFS (hdc4): Mounting V5 Filesystem
[ 1739.866245] XFS (hdc4): Ending clean mount
[ 1752.763551] mkdir (406) used greatest stack depth: 4876 bytes left
[ 1762.611092] ------------[ cut here ]------------
[ 1762.611259] WARNING: CPU: 0 PID: 41 at lib/list_debug.c:59 __list_del_entry+0xce/0x110()
[ 1762.611288] list_del corruption. prev->next should be ddf46000, but was   (null)
[ 1762.611320] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #3
[ 1762.611348] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
[ 1762.611389] Workqueue: xfslogd xfs_buf_iodone_work
[ 1762.611432]  00000000 00000000 de92fc24 c15d4e76 de92fc68 de92fc58 c103ca33 c1737648
[ 1762.611584]  de92fc84 00000029 c173705a 0000003b c13c3e9e 0000003b c13c3e9e 0000003b
[ 1762.611734]  ddf46000 00000001 de92fc70 c103cab3 00000009 de92fc68 c1737648 de92fc84
[ 1762.611886] Call Trace:
[ 1762.611947]  [<c15d4e76>] dump_stack+0x48/0x60
[ 1762.611999]  [<c103ca33>] warn_slowpath_common+0x83/0xa0
[ 1762.612121]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
[ 1762.612166]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
[ 1762.612212]  [<c103cab3>] warn_slowpath_fmt+0x33/0x40
[ 1762.612257]  [<c13c3e9e>] __list_del_entry+0xce/0x110
[ 1762.612303]  [<c13c3eeb>] list_del+0xb/0x20
[ 1762.612361]  [<c126eb4d>] xfs_ail_delete+0x1d/0x60
[ 1762.612407]  [<c126f945>] xfs_trans_ail_update_bulk+0x1a5/0x410
[ 1762.612468]  [<c12070ab>] xfs_trans_committed_bulk+0x2eb/0x320
[ 1762.612515]  [<c126957a>] xlog_cil_committed+0x3a/0x150
[ 1762.612561]  [<c12655ba>] xlog_state_do_callback+0x18a/0x390
[ 1762.612606]  [<c1265883>] xlog_state_done_syncing+0xc3/0xe0
[ 1762.612651]  [<c126590e>] xlog_iodone+0x6e/0x100
[ 1762.612697]  [<c11e821b>] xfs_buf_iodone_work+0x5b/0xe0
[ 1762.612753]  [<c1058557>] process_one_work+0x1b7/0x5d0
[ 1762.612798]  [<c10584da>] ? process_one_work+0x13a/0x5d0
[ 1762.612843]  [<c1058a1b>] ? worker_thread+0xab/0x4b0
[ 1762.612888]  [<c10589a9>] worker_thread+0x39/0x4b0
[ 1762.612940]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1762.612986]  [<c1058970>] ? process_one_work+0x5d0/0x5d0
[ 1762.613086]  [<c105fb58>] kthread+0xa8/0xc0
[ 1762.613132]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1762.613184]  [<c15dc781>] ret_from_kernel_thread+0x21/0x30
[ 1762.613229]  [<c105fab0>] ? insert_kthread_work+0x80/0x80
[ 1762.613256] ---[ end trace cf8e7727f1e1a1b6 ]---
[ 1762.613295] BUG: unable to handle kernel NULL pointer dereference at 0000000c
[ 1762.613486] IP: [<c126eac8>] xfs_ail_check+0x58/0xc0
[ 1762.613615] *pde = 00000000 
[ 1762.613728] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 1762.613926] Dumping ftrace buffer:
[ 1762.614009] ---------------------------------
[ 1762.614028]    mount-319     0...2 1715290597us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290617us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/51 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290853us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290860us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/51 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290872us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290877us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/51 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290890us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715290896us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/51 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715293594us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715293602us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/51 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715293617us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715293623us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/51 new lsn 1/51 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340811us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340826us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/97 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340870us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340876us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/97 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340889us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340894us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/97 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340923us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340929us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/97 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340941us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028]    mount-319     0...2 1715340947us : xfs_ail_delete: dev 22:4 lip 0xd582c000 old lsn 1/97 new lsn 1/97 type XFS_LI_EFI flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1725605607us : xfs_ail_insert: dev 22:4 lip 0xddf46160 old lsn 0/0 new lsn 1/108 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] xfsaild/-323     0...2 1725606791us : xfs_ail_push: dev 22:4 lip 0xddf46160 lsn 1/108 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1725607137us : xfs_ail_delete: dev 22:4 lip 0xddf46160 old lsn 1/108 new lsn 1/108 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030171us : xfs_ail_insert: dev 22:4 lip 0xddf46000 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030182us : xfs_ail_insert: dev 22:4 lip 0xddf46210 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030184us : xfs_ail_insert: dev 22:4 lip 0xddf462c0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030185us : xfs_ail_insert: dev 22:4 lip 0xddf46370 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030187us : xfs_ail_insert: dev 22:4 lip 0xddf46420 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030188us : xfs_ail_insert: dev 22:4 lip 0xddf464d0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030189us : xfs_ail_insert: dev 22:4 lip 0xddc5b068 old lsn 0/0 new lsn 1/2 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030190us : xfs_ail_insert: dev 22:4 lip 0xddf46840 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030191us : xfs_ail_insert: dev 22:4 lip 0xddf468f0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030193us : xfs_ail_insert: dev 22:4 lip 0xddf46580 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030194us : xfs_ail_insert: dev 22:4 lip 0xddf469a0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030195us : xfs_ail_insert: dev 22:4 lip 0xddf46630 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030196us : xfs_ail_insert: dev 22:4 lip 0xddf466e0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030198us : xfs_ail_insert: dev 22:4 lip 0xddf46790 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030199us : xfs_ail_insert: dev 22:4 lip 0xddf46a50 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030200us : xfs_ail_insert: dev 22:4 lip 0xddc5b138 old lsn 0/0 new lsn 1/2 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030201us : xfs_ail_insert: dev 22:4 lip 0xddf46dc0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030202us : xfs_ail_insert: dev 22:4 lip 0xddf46e70 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030204us : xfs_ail_insert: dev 22:4 lip 0xddf46b00 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030205us : xfs_ail_insert: dev 22:4 lip 0xddf46f20 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030206us : xfs_ail_insert: dev 22:4 lip 0xddc5b0d0 old lsn 0/0 new lsn 1/2 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030207us : xfs_ail_insert: dev 22:4 lip 0xddf46bb0 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030209us : xfs_ail_insert: dev 22:4 lip 0xddf46c60 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030210us : xfs_ail_insert: dev 22:4 lip 0xddf46d10 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030211us : xfs_ail_insert: dev 22:4 lip 0xd7d3c000 old lsn 0/0 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1755030213us : xfs_ail_insert: dev 22:4 lip 0xddc5b1a0 old lsn 0/0 new lsn 1/2 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767248us : xfs_ail_delete: dev 22:4 lip 0xddf46840 old lsn 1/2 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767266us : xfs_ail_delete: dev 22:4 lip 0xddc5b138 old lsn 1/2 new lsn 1/2 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767386us : xfs_ail_delete: dev 22:4 lip 0xddf468f0 old lsn 1/2 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767444us : xfs_ail_delete: dev 22:4 lip 0xddf46dc0 old lsn 1/2 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767450us : xfs_ail_delete: dev 22:4 lip 0xddc5b1a0 old lsn 1/2 new lsn 1/2 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767490us : xfs_ail_delete: dev 22:4 lip 0xddf46e70 old lsn 1/2 new lsn 1/2 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767524us : xfs_ail_insert: dev 22:4 lip 0xd7d3c210 old lsn 0/0 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767525us : xfs_ail_insert: dev 22:4 lip 0xd7d3c2c0 old lsn 0/0 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767529us : xfs_ail_move: dev 22:4 lip 0xddf46a50 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767531us : xfs_ail_insert: dev 22:4 lip 0xd582c000 old lsn 0/0 new lsn 1/36 type XFS_LI_EFI flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767532us : xfs_ail_move: dev 22:4 lip 0xddf46580 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767534us : xfs_ail_move: dev 22:4 lip 0xddf469a0 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767535us : xfs_ail_insert: dev 22:4 lip 0xd582c240 old lsn 0/0 new lsn 1/36 type XFS_LI_EFI flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767536us : xfs_ail_move: dev 22:4 lip 0xddf46630 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767538us : xfs_ail_move: dev 22:4 lip 0xddf46790 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767539us : xfs_ail_move: dev 22:4 lip 0xddf466e0 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767541us : xfs_ail_insert: dev 22:4 lip 0xddf46160 old lsn 0/0 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767542us : xfs_ail_insert: dev 22:4 lip 0xd7d3c160 old lsn 0/0 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767543us : xfs_ail_move: dev 22:4 lip 0xddf464d0 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767611us : xfs_ail_insert: dev 22:4 lip 0xd582c480 old lsn 0/0 new lsn 1/36 type XFS_LI_EFI flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767613us : xfs_ail_move: dev 22:4 lip 0xddf462c0 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767614us : xfs_ail_move: dev 22:4 lip 0xddf46420 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767616us : xfs_ail_move: dev 22:4 lip 0xddf46370 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767617us : xfs_ail_move: dev 22:4 lip 0xddc5b068 old lsn 1/2 new lsn 1/36 type XFS_LI_INODE flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762767618us : xfs_ail_move: dev 22:4 lip 0xddf46000 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] kworker/-41      0...2 1762769825us : xfs_ail_move: dev 22:4 lip 0xddf46210 old lsn 1/2 new lsn 1/36 type XFS_LI_BUF flags IN_AIL
[ 1762.614028] ---------------------------------
[ 1762.614028] CPU: 0 PID: 41 Comm: kworker/0:1H Tainted: G        W     3.16.0-rc1+ #3
[ 1762.614028] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
[ 1762.614028] Workqueue: xfslogd xfs_buf_iodone_work
[ 1762.614028] task: de8faac0 ti: de92e000 task.ti: de92e000
[ 1762.614028] EIP: 0060:[<c126eac8>] EFLAGS: 00010282 CPU: 0
[ 1762.614028] EIP is at xfs_ail_check+0x58/0xc0
[ 1762.614028] EAX: 00000000 EBX: ddf46210 ECX: 00000002 EDX: 00000002
[ 1762.614028] ESI: 00000001 EDI: 00000001 EBP: de92fc9c ESP: de92fc90
[ 1762.614028]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1762.614028] CR0: 8005003b CR2: 0000000c CR3: 1df09000 CR4: 000007d0
[ 1762.614028] Stack:
[ 1762.614028]  ddc9af00 ddf46210 00000001 de92fcac c126eb43 ddf46210 ddf91310 de92fd04
[ 1762.614028]  c126f945 00000002 00000001 00000024 00000001 ddc9af08 de92fd38 d7d3c000
[ 1762.614028]  ddc9af00 00000000 00000002 de92fd44 ddf46210 00000001 00000024 00000013
[ 1762.614028] Call Trace:
[ 1762.614028]  [<c126eb43>] xfs_ail_delete+0x13/0x60
[ 1762.614028]  [<c126f945>] xfs_trans_ail_update_bulk+0x1a5/0x410
[ 1762.614028]  [<c12070ab>] xfs_trans_committed_bulk+0x2eb/0x320
[ 1762.614028]  [<c126957a>] xlog_cil_committed+0x3a/0x150
[ 1762.614028]  [<c12655ba>] xlog_state_do_callback+0x18a/0x390
[ 1762.614028]  [<c1265883>] xlog_state_done_syncing+0xc3/0xe0
[ 1762.614028]  [<c126590e>] xlog_iodone+0x6e/0x100
[ 1762.614028]  [<c11e821b>] xfs_buf_iodone_work+0x5b/0xe0
[ 1762.614028]  [<c1058557>] process_one_work+0x1b7/0x5d0
[ 1762.614028]  [<c10584da>] ? process_one_work+0x13a/0x5d0
[ 1762.614028]  [<c1058a1b>] ? worker_thread+0xab/0x4b0
[ 1762.614028]  [<c10589a9>] worker_thread+0x39/0x4b0
[ 1762.614028]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1762.614028]  [<c1058970>] ? process_one_work+0x5d0/0x5d0
[ 1762.614028]  [<c105fb58>] kthread+0xa8/0xc0
[ 1762.614028]  [<c108909b>] ? trace_hardirqs_on+0xb/0x10
[ 1762.614028]  [<c15dc781>] ret_from_kernel_thread+0x21/0x30
[ 1762.614028]  [<c105fab0>] ? insert_kthread_work+0x80/0x80
[ 1762.614028] Code: c1 b8 50 be 72 c1 e8 38 f7 f8 ff 8b 43 04 39 c6 74 10 8b 7b 0c 39 78 0c 8b 53 08 8b 48 08 74 43 73 45 8b 03 39 c6 74 24 8b 73 0c <39> 70 0c 8b 53 08 8b 48 08 74 4d 73 14 b9 38 00 00 00 ba 83 a3
[ 1762.614028] EIP: [<c126eac8>] xfs_ail_check+0x58/0xc0 SS:ESP 0068:de92fc90
[ 1762.614028] CR2: 000000000000000c

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-24  3:34     ` Michael L. Semon
  2014-06-24  4:04       ` Dave Chinner
@ 2014-06-24 16:31       ` Mark Tinguely
  2014-06-24 18:25         ` Mark Tinguely
  1 sibling, 1 reply; 11+ messages in thread
From: Mark Tinguely @ 2014-06-24 16:31 UTC (permalink / raw)
  To: Michael L. Semon; +Cc: xfs

On 06/23/14 22:34, Michael L. Semon wrote:
> On 06/23/2014 06:13 PM, Mark Tinguely wrote:
>> On 06/23/14 17:08, Mark Tinguely wrote:
>>> On 06/23/14 16:48, Michael L. Semon wrote:
>>>> At the ACL limit of v5-superblock XFS--with a directory filled with
>>>> both default
>>>> and access ACL entries--I'm getting a null pointer dereference on x86
>>>> after
>>>> creating the directory successfully.
>>>>
>>>> Disclaimer: There's some current issues on 32-bit x86 that, for
>>>> instance, can
>>>> make badblocks see phantom bad blocks on a read test. My apologies in
>>>> advance
>>>> if this turns out to be a false alarm bug report.
>>>>
>>>> My first encounter with this issue involved fsstress. Here's part of a
>>>> `crash`
>>>> session from the fsstress run.
>>>>
>>>> root@oldsvrhw:/mnt/crashdump/xfs-fsstress-max-acl-2# crash vmlinux
>>>> System.map vmcore
>>>> crash 7.0.4
>> ...
>>>> Thanks!
>>>>
>>>> Michael
>>>>
>>>
>>> Michael, do you have the vmcore dump for this or was this just from the
>>> messages.
>>>
>>> Thanks.
>>>
>>> --Mark.
>>
>> ummm, duh me. you were running crash ...
>>
>> Can I look at the core?
>>
>> --Mark.
>
> Sure!  I've uploaded two sets of core dumps (vmcore, vmlinux, System.map,
> config, sample crash session) and put them here for a short time:
>

Both are buffer - like your trace shows that is was updating on the AIL 
and it really is but in both crashes the log item ail next link has been 
NULLed:

xfs-fsstress-max-acl-2:
crash> xfs_buf_log_item dde37370
struct xfs_buf_log_item {
   bli_item = {
     li_ail = {
       next = 0x0,
       prev = 0xdc01d6e8

xfs-fsstress-max-acl-3:
crash> xfs_buf_log_item db5bf0b0
struct xfs_buf_log_item {
   bli_item = {
     li_ail = {
       next = 0x0,
       prev = 0xdb5bf4d0
     },

not good.

--Mark.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-24 16:31       ` Mark Tinguely
@ 2014-06-24 18:25         ` Mark Tinguely
  0 siblings, 0 replies; 11+ messages in thread
From: Mark Tinguely @ 2014-06-24 18:25 UTC (permalink / raw)
  To: Michael L. Semon; +Cc: xfs

On 06/24/14 11:31, Mark Tinguely wrote:
> On 06/23/14 22:34, Michael L. Semon wrote:
>> On 06/23/2014 06:13 PM, Mark Tinguely wrote:
>>> On 06/23/14 17:08, Mark Tinguely wrote:
>>>> On 06/23/14 16:48, Michael L. Semon wrote:
>>>>> At the ACL limit of v5-superblock XFS--with a directory filled with
>>>>> both default
>>>>> and access ACL entries--I'm getting a null pointer dereference on x86
>>>>> after
>>>>> creating the directory successfully.
>>>>>
>>>>> Disclaimer: There's some current issues on 32-bit x86 that, for
>>>>> instance, can
>>>>> make badblocks see phantom bad blocks on a read test. My apologies in
>>>>> advance
>>>>> if this turns out to be a false alarm bug report.
>>>>>
>>>>> My first encounter with this issue involved fsstress. Here's part of a
>>>>> `crash`
>>>>> session from the fsstress run.
>>>>>
>>>>> root@oldsvrhw:/mnt/crashdump/xfs-fsstress-max-acl-2# crash vmlinux
>>>>> System.map vmcore
>>>>> crash 7.0.4
>>> ...
>>>>> Thanks!
>>>>>
>>>>> Michael
>>>>>
>>>>
>>>> Michael, do you have the vmcore dump for this or was this just from the
>>>> messages.
>>>>
>>>> Thanks.
>>>>
>>>> --Mark.
>>>
>>> ummm, duh me. you were running crash ...
>>>
>>> Can I look at the core?
>>>
>>> --Mark.
>>
>> Sure! I've uploaded two sets of core dumps (vmcore, vmlinux, System.map,
>> config, sample crash session) and put them here for a short time:
>>
>
> Both are buffer - like your trace shows that is was updating on the AIL
> and it really is but in both crashes the log item ail next link has been
> NULLed:
>
> xfs-fsstress-max-acl-2:
> crash> xfs_buf_log_item dde37370
> struct xfs_buf_log_item {
> bli_item = {
> li_ail = {
> next = 0x0,
> prev = 0xdc01d6e8
>
> xfs-fsstress-max-acl-3:
> crash> xfs_buf_log_item db5bf0b0
> struct xfs_buf_log_item {
> bli_item = {
> li_ail = {
> next = 0x0,
> prev = 0xdb5bf4d0
> },
>
> not good.
>
> --Mark.

PS. I don't know if this will help but I followed the xfs_log_items 
backwards to xfs_ail and that is okay. The prev pointer on the ail is 
pointing to a corrupted chain:

crash> xfs_ail ddc81d80
struct xfs_ail {
   xa_mount = 0xddd6b800,
   xa_task = 0xddec5580,
   xa_ail = {
     next = 0xdb04b210,
     prev = 0xddca60d0
   },

..
crash> xfs_log_item ddca60d0
struct xfs_log_item {
   li_ail = {
     next = 0xddc81d88,   <- correct, the xfs_ail
     prev = 0xdb5bf580
   },
...
crash> xfs_log_item db5bf580
struct xfs_log_item {
   li_ail = {
     next = 0xdbab6000,   <- wrong, points to a small xfs_item loop.
     prev = 0xde92fcf0
   },

...
small loop:
crash> xfs_log_item de92fcf0
struct xfs_log_item {
   li_ail = {
     next = 0xdb5bf580,
     prev = 0xdb04b370
   },

crash> xfs_log_item db04b370
struct xfs_log_item {
   li_ail = {
     next = 0xde92fcf0,
     prev = 0xdb04b420
   },

crash> xfs_log_item db04b420
struct xfs_log_item {
   li_ail = {
     next = 0xdb04b370,
     prev = 0xdb5bf630
   },

crash> xfs_log_item db5bf630
struct xfs_log_item {
   li_ail = {
     next = 0xdb04b420,
     prev = 0xdbab6000  <- !!
   },

crash> xfs_log_item dbab6000
struct xfs_log_item {
   li_ail = {
     next = 0xdb5bf630,
     prev = 0xdb5bf580   <- end of small loop.
   },

something is happening in an ail insert or delete.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-06-24  4:04       ` Dave Chinner
  2014-06-24 13:31         ` Michael L. Semon
@ 2014-07-01 22:27         ` Michael L. Semon
  2014-07-03 11:56           ` Jeff Liu
  1 sibling, 1 reply; 11+ messages in thread
From: Michael L. Semon @ 2014-07-01 22:27 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Mark Tinguely, xfs

On 06/24/2014 12:04 AM, Dave Chinner wrote:
> On Mon, Jun 23, 2014 at 11:34:04PM -0400, Michael L. Semon wrote:
>> [ 1068.431391] ------------[ cut here ]------------
>> [ 1068.431566] WARNING: CPU: 0 PID: 41 at lib/list_debug.c:59 __list_del_entry+0xce/0x110()
>> [ 1068.431596] list_del corruption. prev->next should be db5bf580, but was   (null)
> 
> Ok, so the current log item points to a log item that has
> null pointers (i.e. not on the list).
> 
>> [ 1068.431629] CPU: 0 PID: 41 Comm: kworker/0:1H Not tainted 3.16.0-rc1+ #3
>> [ 1068.431656] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
>> [ 1068.431697] Workqueue: xfslogd xfs_buf_iodone_work
>> [ 1068.431738]  00000000 00000000 de92fc24 c15d4e76 de92fc68 de92fc58 c103ca33 c1737648
>> [ 1068.431891]  de92fc84 00000029 c173705a 0000003b c13c3e9e 0000003b c13c3e9e 0000003b
>> [ 1068.432115]  db5bf580 00000001 de92fc70 c103cab3 00000009 de92fc68 c1737648 de92fc84
>> [ 1068.432267] Call Trace:
>> [ 1068.432329]  [<c15d4e76>] dump_stack+0x48/0x60
>> [ 1068.432386]  [<c103ca33>] warn_slowpath_common+0x83/0xa0
>> [ 1068.432433]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
>> [ 1068.432478]  [<c13c3e9e>] ? __list_del_entry+0xce/0x110
>> [ 1068.432524]  [<c103cab3>] warn_slowpath_fmt+0x33/0x40
>> [ 1068.432569]  [<c13c3e9e>] __list_del_entry+0xce/0x110
>> [ 1068.432615]  [<c13c3eeb>] list_del+0xb/0x20
>> [ 1068.432674]  [<c126eb4d>] xfs_ail_delete+0x1d/0x60
> ....
>> [ 1068.433567] ---[ end trace 60289514948e4bd7 ]---
>> [ 1068.433603] BUG: unable to handle kernel NULL pointer dereference at 0000000c
>> [ 1068.433795] IP: [<c126eac8>] xfs_ail_check+0x58/0xc0
> 
> And that's trying to dereference a pointer from an item that is not
> on the list....
> 
> So there's linked list corruption occurring here.
> 
>> I can reproduce the oops in kernel 3.15.0, perhaps with xfs-oss/for-next 
>> merged, but there's no vmlinux to go with the kernel.  Therefore, I'll have 
>> to resort to other means (rebuilt kernel with netconsole, re-attaching the 
>> serial cable, etc.) to get the full crash log.
> 
> How far back can you reproduce it? If it's a recent occurrence, can
> you bisect it?
> 
> Cheers,
> 
> Dave.

I've had terrible luck with bisects this week due to PEBKAC errors.  With 3 
commits left to try--one slow, full build (thanks, ARM!) and hopefully 2 
minor builds--this commit is staring me in the face:

commit bba719b5004234e55737e7074b81b337210c511d
Author: Jie Liu <jeff.liu@oracle.com>
Date:   Wed Jan 1 19:28:03 2014 +0800

    xfs: fix off-by-one error in xfs_attr3_rmt_verify

In particular, one kernel had this as the most recent commit and showed 
the current problem behavior.

That is about as far back as I can go before attr3_rmt issues corrupt 
filesystems and cause a "Structure needs cleaning" message during the setfacl 
part of the test.  Certianly, Jeff has improved matters with this patch.

On the normal kernel git, this may correspond to kernel v3.13.0-rc7 or -rc8, 
certainly no earlier than -rc2.  git was bouncing the version numbers around 
quite a bit.

Before Jeff worked his wonders here, efforts to getfacl a directory with max 
ACLs (on a remounted, corrupt filesystem) ended like this...

[   84.819306] XFS: Assertion failed: args->op_flags & XFS_DA_OP_OKNOENT, file: fs/xfs/xfs_da_btree.c, line: 1894
[   84.819500] ------------[ cut here ]------------
[   84.819573] kernel BUG at fs/xfs/xfs_message.c:108!
[   84.819646] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[   84.819826] CPU: 0 PID: 204 Comm: getfacl Not tainted 3.12.0+ #2
[   84.819901] Hardware name: Dell Computer Corporation       L733r                          /CA810E                         , BIOS A14 09/05/2001
[   84.820015] task: ddc7a960 ti: ddc52000 task.ti: ddc52000
[   84.820025] EIP: 0060:[<c125822c>] EFLAGS: 00010296 CPU: 0
[   84.820025] EIP is at assfail+0x2c/0x30
[   84.820025] EAX: 00000062 EBX: 00000000 ECX: 00000007 EDX: 00000000
[   84.820025] ESI: ddc53d4c EDI: ffffffff EBP: ddc53c88 ESP: ddc53c74
[   84.820025]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   84.820025] CR0: 8005003b CR2: b7632fd0 CR3: 1dc75000 CR4: 000007d0
[   84.820025] Stack:
[   84.820025]  00000000 c160833c c160c854 c15fa532 00000766 ddc53cd0 c1290854 00000001
[   84.820025]  00000002 00000008 275b19c4 ddc53d4c 00000000 ddc74010 00000001 0fe80018
[   84.820025]  00580000 00000f90 00000000 00000000 ddc74010 ddc74014 ddc53d4c ddc53d28
[   84.820025] Call Trace:
[   84.820025]  [<c1290854>] xfs_da3_path_shift+0x264/0x470
[   84.820025]  [<c1291109>] xfs_da3_node_lookup_int+0x259/0x420
[   84.820025]  [<c1261d56>] ? kmem_zone_alloc+0x66/0xe0
[   84.820025]  [<c1261de1>] ? kmem_zone_zalloc+0x11/0xd0
[   84.820025]  [<c126ac77>] xfs_attr_node_get+0x47/0x200
[   84.820025]  [<c126af05>] xfs_attr_get_int+0xd5/0xf0
[   84.820025]  [<c126afb1>] xfs_attr_get+0x91/0xb0
[   84.820025]  [<c12cb993>] xfs_get_acl+0x123/0x2c0
[   84.820025]  [<c12cbb4a>] xfs_xattr_acl_get+0x1a/0x70
[   84.820025]  [<c11441b9>] generic_getxattr+0x49/0x70
[   84.820025]  [<c1144170>] ? SyS_fremovexattr+0xa0/0xa0
[   84.820025]  [<c11435ca>] vfs_getxattr+0x6a/0xa0
[   84.820025]  [<c1143683>] getxattr+0x83/0x1d0
[   84.820025]  [<c1124e14>] ? complete_walk+0x94/0x260
[   84.820025]  [<c11278ac>] ? path_lookupat+0x8c/0xba0
[   84.820025]  [<c1114ddf>] ? kmem_cache_alloc+0x4f/0x280
[   84.820025]  [<c1124ffd>] ? final_putname+0x1d/0x40
[   84.820025]  [<c112890f>] ? user_path_at_empty+0x4f/0x90
[   84.820025]  [<c1120134>] ? SyS_lstat64+0x34/0x40
[   84.820025]  [<c112896d>] ? user_path_at+0x1d/0x30
[   84.820025]  [<c1143c48>] SyS_getxattr+0x58/0xa0
[   84.820025]  [<c14edbb8>] sysenter_do_call+0x12/0x36
[   84.820025] Code: 89 e5 83 ec 14 3e 8d 74 26 00 89 44 24 08 b8 3c 83 60 c1 89 4c 24 10 89 54 24 0c 89 44 24 04 c7 04 24 00 00 00 00 e8 94 fd ff ff <0f> 0b 66 90 55 89 e5 83 ec 14 3e 8d 74 26 00 b9 01 00 00 00 89
[   84.820025] EIP: [<c125822c>] assfail+0x2c/0x30 SS:ESP 0068:ddc53c74 

...and there was no real variation going back to 3.11-rc.  That was 
about as far back as this particular glibc (built against 3.10.32) would 
let Linux boot.

I'm happy to continue the bisect for your benefit, just running behind 
schedule on completing it.

Thanks!

Michael

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Null pointer dereference while at ACL limit on v5 XFS
  2014-07-01 22:27         ` Michael L. Semon
@ 2014-07-03 11:56           ` Jeff Liu
  0 siblings, 0 replies; 11+ messages in thread
From: Jeff Liu @ 2014-07-03 11:56 UTC (permalink / raw)
  To: Michael L. Semon, Dave Chinner; +Cc: Mark Tinguely, xfs

On 07/02/2014 06:27 AM, Michael L. Semon wrote:
> On 06/24/2014 12:04 AM, Dave Chinner wrote:
>> On Mon, Jun 23, 2014 at 11:34:04PM -0400, Michael L. Semon wrote:
>>> [ 1068.431391] ------------[ cut here ]------------
>>> [ 1068.431566] WARNING: CPU: 0 PID: 41 at lib/list_debug.c:59 __list_del_entry+0xce/0x110()
>>> [ 1068.431596] list_del corruption. prev->next should be db5bf580, but was   (null)
>>
>> Ok, so the current log item points to a log item that has
>> null pointers (i.e. not on the list).
>>

<snip>

>>> [ 1068.433567] ---[ end trace 60289514948e4bd7 ]---
>>> [ 1068.433603] BUG: unable to handle kernel NULL pointer dereference at 0000000c
>>> [ 1068.433795] IP: [<c126eac8>] xfs_ail_check+0x58/0xc0
>>
>> And that's trying to dereference a pointer from an item that is not
>> on the list....
>>
>> So there's linked list corruption occurring here.
>>
>>> I can reproduce the oops in kernel 3.15.0, perhaps with xfs-oss/for-next 
>>> merged, but there's no vmlinux to go with the kernel.  Therefore, I'll have 
>>> to resort to other means (rebuilt kernel with netconsole, re-attaching the 
>>> serial cable, etc.) to get the full crash log.
>>
>> How far back can you reproduce it? If it's a recent occurrence, can
>> you bisect it?
>>
>> Cheers,
>>
>> Dave.
> 
> I've had terrible luck with bisects this week due to PEBKAC errors.  With 3 
> commits left to try--one slow, full build (thanks, ARM!) and hopefully 2 
> minor builds--this commit is staring me in the face:
> 
> commit bba719b5004234e55737e7074b81b337210c511d
> Author: Jie Liu <jeff.liu@oracle.com>
> Date:   Wed Jan 1 19:28:03 2014 +0800
> 
>     xfs: fix off-by-one error in xfs_attr3_rmt_verify
> 
> In particular, one kernel had this as the most recent commit and showed 
> the current problem behavior.
> 
> That is about as far back as I can go before attr3_rmt issues corrupt 
> filesystems and cause a "Structure needs cleaning" message during the setfacl 
> part of the test.  Certianly, Jeff has improved matters with this patch.
> 
> On the normal kernel git, this may correspond to kernel v3.13.0-rc7 or -rc8, 
> certainly no earlier than -rc2.  git was bouncing the version numbers around 
> quite a bit.
> 
> Before Jeff worked his wonders here, efforts to getfacl a directory with max 
> ACLs (on a remounted, corrupt filesystem) ended like this...

Sorry for my late response as I'm working on another thing these days.

I have tried to reproduce this problem on my x86 virtualBox with xfs-next latest
 code via fsstress but no luck. i.e,

fsstress -d $SCRATCH_MNT/test-dir -n 10000 -p 16

Maybe this issue can be triggered via the seed file you provided, however, I
can not download it due to the stupid China great firewall, even if through proxy. :(


Cheers,
-Jeff

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-07-03 11:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-23 21:48 Null pointer dereference while at ACL limit on v5 XFS Michael L. Semon
2014-06-23 22:08 ` Mark Tinguely
2014-06-23 22:13   ` Mark Tinguely
2014-06-24  3:34     ` Michael L. Semon
2014-06-24  4:04       ` Dave Chinner
2014-06-24 13:31         ` Michael L. Semon
2014-07-01 22:27         ` Michael L. Semon
2014-07-03 11:56           ` Jeff Liu
2014-06-24 16:31       ` Mark Tinguely
2014-06-24 18:25         ` Mark Tinguely
2014-06-24  2:18 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox