* xfs deadlock in stable kernel 3.0.4
@ 2011-09-10 12:23 Stefan Priebe
2011-09-12 15:21 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe @ 2011-09-10 12:23 UTC (permalink / raw)
To: xfs@oss.sgi.com; +Cc: xfs-masters@oss.sgi.com
Hello List,
on some of our heavy loaded servers using xfs we're seeing a deadlock where reading/writing to the xfs filesystem suddenly stops working.
Here you can find sysrq w triggered log messages of the locked processes.
http://pastebin.com/JWjrbrh4
Please help! Thanks!
Please cc me i'm not subscribed.
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* xfs deadlock in stable kernel 3.0.4
@ 2011-09-11 13:12 Stefan Priebe - Profihost AG
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-11 13:12 UTC (permalink / raw)
To: linux-fsdevel
Hello List,
on some of our heavy loaded servers using xfs we're seeing a deadlock
where reading/writing to the xfs filesystem suddenly stops working. They
seem to be running out of log space and then xfs deadlocks.
Here you can find sysrq w triggered log messages of the locked processes.
http://pastebin.com/JWjrbrh4
Please help! Thanks!
Please cc me i'm not subscribed.
Stefan
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-10 12:23 Stefan Priebe
@ 2011-09-12 15:21 ` Christoph Hellwig
2011-09-12 16:46 ` Stefan Priebe
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-12 15:21 UTC (permalink / raw)
To: Stefan Priebe; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Sat, Sep 10, 2011 at 02:23:12PM +0200, Stefan Priebe wrote:
> Hello List,
>
> on some of our heavy loaded servers using xfs we're seeing a deadlock where reading/writing to the xfs filesystem suddenly stops working.
>
> Here you can find sysrq w triggered log messages of the locked processes.
>
> http://pastebin.com/JWjrbrh4
What kind of workload are you running? Also did the workload run fine
with an older kernel, and if yes which one?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-12 15:21 ` Christoph Hellwig
@ 2011-09-12 16:46 ` Stefan Priebe
2011-09-12 20:05 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe @ 2011-09-12 16:46 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Hi,
>> Hello List,
>>
>> on some of our heavy loaded servers using xfs we're seeing a deadlock where reading/writing to the xfs filesystem suddenly stops working.
>>
>> Here you can find sysrq w triggered log messages of the locked processes.
>>
>> http://pastebin.com/JWjrbrh4
>
> What kind of workload are you running? Also did the workload run fine
> with an older kernel, and if yes which one?
Mysql, Web, Mail, ftp ;-) yes it was with 2.6.32. I upgraded from that version.
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-12 16:46 ` Stefan Priebe
@ 2011-09-12 20:05 ` Christoph Hellwig
2011-09-13 6:04 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-12 20:05 UTC (permalink / raw)
To: Stefan Priebe; +Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Mon, Sep 12, 2011 at 06:46:26PM +0200, Stefan Priebe wrote:
> > What kind of workload are you running? Also did the workload run fine
> > with an older kernel, and if yes which one?
>
> Mysql, Web, Mail, ftp ;-) yes it was with 2.6.32. I upgraded from that version.
Just curious, is this the same system that also shows the freezes
reported to the scsi list? If I/Os don't get completed by lower layers
I can see how we get everything in XFS waiting on the log reservations,
given that we never get the log tail pushed.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-12 20:05 ` Christoph Hellwig
@ 2011-09-13 6:04 ` Stefan Priebe - Profihost AG
2011-09-13 19:31 ` Stefan Priebe - Profihost AG
2011-09-13 20:50 ` Christoph Hellwig
0 siblings, 2 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-13 6:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Hi,
> On Mon, Sep 12, 2011 at 06:46:26PM +0200, Stefan Priebe wrote:
>>> What kind of workload are you running? Also did the workload run fine
>>> with an older kernel, and if yes which one?
>>
>> Mysql, Web, Mail, ftp ;-) yes it was with 2.6.32. I upgraded from that version.
>
> Just curious, is this the same system that also shows the freezes
> reported to the scsi list? If I/Os don't get completed by lower layers
> I can see how we get everything in XFS waiting on the log reservations,
> given that we never get the log tail pushed.
I just reported it to the scsi list as i didn't knew where the problems
is. But then some people told be it must be a XFS problem.
Some more informations:
1.) It's running with 2.6.32 and 2.6.38
2.) I can also write to another ext2 part on the same disk array(aacraid
driver) while xfs stucks - so i think it must be an xfs problem
3.) I've also tried running 3.1-rc5 but then i'm seeing this error:
BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
IP: [] inode_dio_done+0x4/0x25
PGD 293724067 PUD 292930067 PMD 0
Oops: 0002 [#1] SMP
CPU 5
Modules linked in: ipt_REJECT xt_tcpudp iptable_filter ip_tables
x_tables coretemp k8temp
Pid: 4775, comm: mysqld Not tainted 3.1-rc5 #1 Supermicro X8DT3/X8DT3
RIP: 0010:[] [] inode_dio_done+0x4/0x25
RSP: 0018:ffff880292b5fad8 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff8806ab4927e0 RCX: 0000000000007524
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff880292b5fad8 R08: ffff880292b5e000 R09: 0000000000000000
R10: ffff88047f85e040 R11: ffff88042ddb5d88 R12: ffff88002b7f8800
R13: ffff88002b7f8800 R14: 0000000000000000 R15: ffff88042d896040
FS: 0000000045c79950(0063) GS:ffff88083fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000000012c CR3: 0000000293408000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process mysqld (pid: 4775, threadinfo ffff880292b5e000, task
ffff88042d896040)
Stack:
ffff880292b5faf8 ffffffff811938cd 0000000192b5fb18 0000000000004000
ffff880292b5fb18 ffffffff810feba2 0000000000000000 ffff88002b7f8920
ffff880292b5fbf8 ffffffff810ff4fb ffff880292b5fb78 ffff880292b5e000
Call Trace:
[] xfs_end_io_direct_write+0x6a/0x6e
[] dio_complete+0x90/0xbb
[] __blockdev_direct_IO+0x92e/0x964
[] ? mempool_alloc_slab+0x11/0x13
[] xfs_vm_direct_IO+0x90/0x101
[] ? __xfs_get_blocks+0x395/0x395
[] ? xfs_finish_ioend_sync+0x1a/0x1a
[] generic_file_direct_write+0xd7/0x147
[] xfs_file_dio_aio_write+0x1b9/0x1d1
[] ? wake_up_state+0xb/0xd
[] xfs_file_aio_write+0x16a/0x21d
[] ? do_futex+0xc0/0x988
[] do_sync_write+0xc7/0x10d
[] vfs_write+0xab/0x103
[] sys_pwrite64+0x5c/0x7d
[] system_call_fastpath+0x16/0x1b
Code: 00 48 8d 34 30 89 d9 4c 89 e7 e8 3a fe ff ff 85 c0 75 0b 44 89 e8
49 01 84 24 90 00 00 00 41 5a 5b 41 5c 41 5d c9 c3 55 48 89 e5 ff 8f 2c
01 00 00 0f 94 c0 84 c0 74 11 48 81 c7 90 00 00 00
RIP [] inode_dio_done+0x4/0x25
RSP
CR2: 000000000000012c
---[ end trace 79ce33ac2f7c10bd ]---
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-13 6:04 ` Stefan Priebe - Profihost AG
@ 2011-09-13 19:31 ` Stefan Priebe - Profihost AG
2011-09-13 20:50 ` Christoph Hellwig
1 sibling, 0 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-13 19:31 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com
Am 13.09.2011 08:04, schrieb Stefan Priebe - Profihost AG:
> Hi,
>
>> On Mon, Sep 12, 2011 at 06:46:26PM +0200, Stefan Priebe wrote:
>>>> What kind of workload are you running? Also did the workload run fine
>>>> with an older kernel, and if yes which one?
>>>
>>> Mysql, Web, Mail, ftp ;-) yes it was with 2.6.32. I upgraded from
>>> that version.
>>
>> Just curious, is this the same system that also shows the freezes
>> reported to the scsi list? If I/Os don't get completed by lower layers
>> I can see how we get everything in XFS waiting on the log reservations,
>> given that we never get the log tail pushed.
>
> I just reported it to the scsi list as i didn't knew where the problems
> is. But then some people told be it must be a XFS problem.
>
> Some more informations:
> 1.) It's running with 2.6.32 and 2.6.38
> 2.) I can also write to another ext2 part on the same disk array(aacraid
> driver) while xfs stucks - so i think it must be an xfs problem
> 3.) I've also tried running 3.1-rc5 but then i'm seeing this error:
>
> ...
>
Any idea what we could try next or how to find the problem? At least
this is happening with different devices and writing to other partitions
is still working.
Greets
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-13 6:04 ` Stefan Priebe - Profihost AG
2011-09-13 19:31 ` Stefan Priebe - Profihost AG
@ 2011-09-13 20:50 ` Christoph Hellwig
2011-09-14 7:26 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-13 20:50 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com
On Tue, Sep 13, 2011 at 08:04:36AM +0200, Stefan Priebe - Profihost AG wrote:
> I just reported it to the scsi list as i didn't knew where the
> problems is. But then some people told be it must be a XFS problem.
>
> Some more informations:
> 1.) It's running with 2.6.32 and 2.6.38
> 2.) I can also write to another ext2 part on the same disk
> array(aacraid driver) while xfs stucks - so i think it must be an
> xfs problem
That points a bit more towards XFS, although we've seen storage setups
create issues depending on the exact workload. The prime culprit for
used to be the md software RAID driver, though.
> 3.) I've also tried running 3.1-rc5 but then i'm seeing this error:
>
> BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
> IP: [] inode_dio_done+0x4/0x25
Oops, that's a bug that I actually introduced myself. Fix below:
Index: linux-2.6/fs/xfs/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_aops.c 2011-09-13 16:38:47.141089046 -0400
+++ linux-2.6/fs/xfs/xfs_aops.c 2011-09-13 16:39:09.991647077 -0400
@@ -1300,6 +1300,7 @@ xfs_end_io_direct_write(
bool is_async)
{
struct xfs_ioend *ioend = iocb->private;
+ struct inode *inode = ioend->io_inode;
/*
* blockdev_direct_IO can return an error even after the I/O
@@ -1331,7 +1332,7 @@ xfs_end_io_direct_write(
}
/* XXX: probably should move into the real I/O completion handler */
- inode_dio_done(ioend->io_inode);
+ inode_dio_done(inode);
}
STATIC ssize_t
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-13 20:50 ` Christoph Hellwig
@ 2011-09-14 7:26 ` Stefan Priebe - Profihost AG
2011-09-14 7:48 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-14 7:26 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, aelder, xfs@oss.sgi.com
Hi,
Am 13.09.2011 22:50, schrieb Christoph Hellwig:
> On Tue, Sep 13, 2011 at 08:04:36AM +0200, Stefan Priebe - Profihost AG wrote:
>> I just reported it to the scsi list as i didn't knew where the
>> problems is. But then some people told be it must be a XFS problem.
>>
>> Some more informations:
>> 1.) It's running with 2.6.32 and 2.6.38
>> 2.) I can also write to another ext2 part on the same disk
>> array(aacraid driver) while xfs stucks - so i think it must be an
>> xfs problem
>
> That points a bit more towards XFS, although we've seen storage setups
> create issues depending on the exact workload. The prime culprit for
> used to be the md software RAID driver, though.
>
>> 3.) I've also tried running 3.1-rc5 but then i'm seeing this error:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 000000000000012c
>> IP: [] inode_dio_done+0x4/0x25
>
> Oops, that's a bug that I actually introduced myself. Fix below:
Thanks for the patch.
Now we have the following situation:
1.) Systems running fine with 2.6.32, 2.6.38 and with 3.1 rc-6 + patch
2.) Sadly it does not run with 3.0.4 for more than 1 hour. And 3.0.X
will become the next long term stable. So there will be a lot of people
using it.
3.) I have seen this deadlock on systems with aacraid and with intel
ahci onboard. (that's all we're using)
4.) I still write to other devices / raids on the same controller while
the XFS root filesystem hangs.
What can we do / try now / next?
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-14 7:26 ` Stefan Priebe - Profihost AG
@ 2011-09-14 7:48 ` Stefan Priebe - Profihost AG
2011-09-14 8:49 ` Stefan Priebe - Profihost AG
2011-09-14 14:30 ` Christoph Hellwig
0 siblings, 2 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-14 7:48 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, aelder, xfs@oss.sgi.com
Hi,
>> Oops, that's a bug that I actually introduced myself. Fix below:
>
> Thanks for the patch.
>
> Now we have the following situation:
>
> 1.) Systems running fine with 2.6.32, 2.6.38 and with 3.1 rc-6 + patch
> 2.) Sadly it does not run with 3.0.4 for more than 1 hour. And 3.0.X
> will become the next long term stable. So there will be a lot of people
> using it.
> 3.) I have seen this deadlock on systems with aacraid and with intel
> ahci onboard. (that's all we're using)
> 4.) I still write to other devices / raids on the same controller while
> the XFS root filesystem hangs.
Sadly it was now crashing with 3.1 rc-6 + patch again. Sorry i was to
fast to write you an email.
Hung Task detection showed me this with 3.1 rc-6:
[] ? might_fault+0x3b/0x88
[] do_filp_open+0x38/0x86
[] ? _raw_spin_unlock+0x26/0x2b
[] ? alloc_fd+0x11d/0x12e
[] do_sys_open+0x114/0x1a3
[] sys_open+0x1b/0x1d
[] system_call_fastpath+0x16/0x1b
1 lock held by mysqld/17058:
#0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [] do_last+0x287/0x693
INFO: task qmail-send:4899 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qmail-send D 0000000000000000 0 4899 1 0x00020000
ffff88081c4afc38 0000000000000046 ffffffff814a52d5 0000000100000000
ffff88082cf5be70 ffff88081c4ae010 0000000000004000 ffff88082cf5b5d0
0000000000011c40 ffff88081c4affd8 ffff88081c4affd8 0000000000011c40
Call Trace:
[] ? __schedule+0x2e8/0x9fd
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_remove+0x136/0x34e
[] ? mutex_lock_nested+0x275/0x290
[] ? mutex_lock_nested+0x281/0x290
[] ? vfs_unlink+0x51/0xdd
[] xfs_vn_unlink+0x3c/0x75
[] vfs_unlink+0x69/0xdd
[] do_unlinkat+0xde/0x170
[] ? retint_swapgs+0xe/0x13
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] ? trace_hardirqs_on_thunk+0x3a/0x3f
[] ? file_free_rcu+0x35/0x35
[] sys_unlink+0x11/0x13
[] ia32_do_call+0x13/0x13
2 locks held by qmail-send/4899:
#0: (&sb->s_type->i_mutex_key#5/1){+.+.+.}, at: [] do_unlinkat+0x63/0x170
#1: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [] vfs_unlink+0x51/0xdd
INFO: task httpd:6316 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
httpd D 0000000000000001 0 6316 6270 0x00000000
ffff880406edfb78 0000000000000046 ffff88041b792c30 0000000100000000
ffff88041b792c80 ffff880406ede010 0000000000004000 ffff88041b7923e0
0000000000011c40 ffff880406edffd8 ffff880406edffd8 0000000000011c40
Call Trace:
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_create+0x200/0x53a
[] ? d_lookup+0x2d/0x42
2 locks held by httpd/6316:
[] ? __d_lookup+0x16a/0x17c
[] ? __d_lookup+0x16a/0x17c
1 lock held by imap/11461:
INFO: task flush-8:0:3658 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
flush-8:0 D 000000000000000b 0 3658 2 0x00000000
ffff88082c389690 0000000000000046 ffff88082c8bac30 0000000100000000
ffff88082c8bac58 ffff88082c388010 0000000000004000 ffff88082c8ba3e0
0000000000011c40 ffff88082c389fd8 ffff88082c389fd8 0000000000011c40
Call Trace:
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_iomap_write_allocate+0xcc/0x2cc
[] ? xfs_ilock_nowait+0x66/0xd5
[] ? up_read+0x1e/0x37
[] xfs_map_blocks+0x159/0x1ee
[] xfs_vm_writepage+0x21e/0x3f9
[] __writepage+0x15/0x3b
[] write_cache_pages+0x28c/0x3a8
[] ? alloc_pages_exact_nid+0x9a/0x9a
[] generic_writepages+0x46/0x61
[] xfs_vm_writepages+0x45/0x4e
[] do_writepages+0x1f/0x28
[] writeback_single_inode+0x18f/0x387
[] writeback_sb_inodes+0x196/0x237
[] ? grab_super_passive+0x52/0x76
[] __writeback_inodes_wb+0x73/0xb6
[] wb_writeback+0x163/0x24b
[] ? trace_hardirqs_on+0xd/0xf
[] ? local_bh_enable_ip+0xbc/0xc1
[] wb_do_writeback+0x183/0x210
[] bdi_writeback_thread+0xc0/0x1e4
[] ? wb_do_writeback+0x210/0x210
[] kthread+0x81/0x89
[] kernel_thread_helper+0x4/0x10
[] ? finish_task_switch+0x45/0xc3
[] ? retint_restore_args+0xe/0xe
[] ? __init_kthread_worker+0x56/0x56
[] ? gs_change+0xb/0xb
1 lock held by flush-8:0/3658:
#0: (&type->s_umount_key#31){++++.+}, at: [] grab_super_passive+0x52/0x76
INFO: task syslogd:4459 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syslogd D 000000000000000c 0 4459 1 0x00000000
ffff88082b4c3d78 0000000000000046 ffffffff814a52d5 ffff88082c8605d8
ffff88082b446ba0 ffff88082b4c2010 0000000000004000 ffff88082b446ba0
0000000000011c40 ffff88082b4c3fd8 ffff88082b4c3fd8 0000000000011c40
Call Trace:
[] ? __schedule+0x2e8/0x9fd
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_file_fsync+0x15f/0x22d
[] vfs_fsync_range+0x18/0x21
[] vfs_fsync+0x17/0x19
[] do_fsync+0x2e/0x44
[] sys_fsync+0xb/0xf
[] system_call_fastpath+0x16/0x1b
no locks held by syslogd/4459.
INFO: task mysqld:4612 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 0000000000000000 0 4612 4567 0x00000000
ffff880429a31d78 0000000000000046 ffffffff814a52d5 ffff88082c8605d8
ffff88042cd8d9b0 ffff880429a30010 0000000000004000 ffff88042cd8d9b0
0000000000011c40 ffff880429a31fd8 ffff880429a31fd8 0000000000011c40
Call Trace:
[] ? __schedule+0x2e8/0x9fd
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_file_fsync+0x15f/0x22d
[] vfs_fsync_range+0x18/0x21
[] vfs_fsync+0x17/0x19
[] do_fsync+0x2e/0x44
[] sys_fsync+0xb/0xf
[] system_call_fastpath+0x16/0x1b
no locks held by mysqld/4612.
INFO: task mysqld:27595 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 0000000000000008 0 27595 4567 0x00000000
ffff88011dda3ca8 0000000000000046 ffffffff814a52d5 ffff880403cd88a0
0000000000000246 ffff88011dda2010 0000000000004000 ffff880403cd8000
0000000000011c40 ffff88011dda3fd8 ffff88011dda3fd8 0000000000011c40
Call Trace:
[] ? __schedule+0x2e8/0x9fd
[] ? mark_held_locks+0xc9/0xef
[] ? mutex_lock_nested+0x16b/0x290
[] schedule+0x57/0x59
[] mutex_lock_nested+0x173/0x290
[] ? do_last+0x287/0x693
[] do_last+0x287/0x693
[] path_openat+0xcd/0x342
[] ? might_fault+0x3b/0x88
[] do_filp_open+0x38/0x86
[] ? _raw_spin_unlock+0x26/0x2b
[] ? alloc_fd+0x11d/0x12e
[] do_sys_open+0x114/0x1a3
[] sys_open+0x1b/0x1d
[] system_call_fastpath+0x16/0x1b
1 lock held by mysqld/27595:
#0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [] do_last+0x287/0x693
INFO: task mysqld:4873 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 0000000000000000 0 4873 4625 0x00000000
ffff88081bf61d78 0000000000000046 ffffffff814a52d5 0000000100000000
ffff88081e82f3f0 ffff88081bf60010 0000000000004000 ffff88081e82eba0
0000000000011c40 ffff88081bf61fd8 ffff88081bf61fd8 0000000000011c40
Call Trace:
[] ? __schedule+0x2e8/0x9fd
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_file_fsync+0x15f/0x22d
[] vfs_fsync_range+0x18/0x21
[] vfs_fsync+0x17/0x19
[] do_fsync+0x2e/0x44
[] sys_fsync+0xb/0xf
[] system_call_fastpath+0x16/0x1b
no locks held by mysqld/4873.
INFO: task mysqld:17058 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mysqld D 000000000000000c 0 17058 4625 0x00000000
ffff88010325fa88 0000000000000046 ffffffff814a52d5 0000000100000000
ffff88025be8f418 ffff88010325e010 0000000000004000 ffff88025be8eba0
0000000000011c40 ffff88010325ffd8 ffff88010325ffd8 0000000000011c40
Call Trace:
[] ? __schedule+0x2e8/0x9fd
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
[] ? trace_hardirqs_on_caller+0x11c/0x153
[] schedule+0x57/0x59
[] xlog_grant_log_space+0x18e/0x4ae
[] ? try_to_wake_up+0x330/0x330
[] xfs_log_reserve+0x11a/0x122
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_create+0x200/0x53a
[] ? __d_lookup+0xbe/0x17c
[] ? __d_lookup+0x16a/0x17c
[] ? d_validate+0x96/0x96
[] xfs_vn_mknod+0x9a/0xf5
[] xfs_vn_create+0xb/0xd
[] vfs_create+0x72/0xa4
[] do_last+0x323/0x693
[] path_openat+0xcd/0x342
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-14 7:48 ` Stefan Priebe - Profihost AG
@ 2011-09-14 8:49 ` Stefan Priebe - Profihost AG
2011-09-14 14:30 ` Christoph Hellwig
2011-09-14 14:30 ` Christoph Hellwig
1 sibling, 1 reply; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-14 8:49 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, aelder, xfs@oss.sgi.com
Hi,
Am 14.09.2011 09:48, schrieb Stefan Priebe - Profihost AG:
> Hi,
>
>>> Oops, that's a bug that I actually introduced myself. Fix below:
>>
>> Thanks for the patch.
>>
>> Now we have the following situation:
>>
>> 1.) Systems running fine with 2.6.32, 2.6.38 and with 3.1 rc-6 + patch
>> 2.) Sadly it does not run with 3.0.4 for more than 1 hour. And 3.0.X
>> will become the next long term stable. So there will be a lot of people
>> using it.
>> 3.) I have seen this deadlock on systems with aacraid and with intel
>> ahci onboard. (that's all we're using)
>> 4.) I still write to other devices / raids on the same controller while
>> the XFS root filesystem hangs.
>
> Sadly it was now crashing with 3.1 rc-6 + patch again. Sorry i was to
> fast to write you an email.
So might it be that the problem at least in 3.1 lies in:
[] ? mark_held_locks+0xc9/0xef
[] ? _raw_spin_unlock_irqrestore+0x3f/0x47
and not in XFS?
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-14 7:48 ` Stefan Priebe - Profihost AG
2011-09-14 8:49 ` Stefan Priebe - Profihost AG
@ 2011-09-14 14:30 ` Christoph Hellwig
2011-09-14 16:06 ` Stefan Priebe - Profihost AG
2011-09-18 9:14 ` Stefan Priebe - Profihost AG
1 sibling, 2 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-14 14:30 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
On Wed, Sep 14, 2011 at 09:48:18AM +0200, Stefan Priebe - Profihost AG wrote:
> #0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [] do_last+0x287/0x693
This means you are running your heavy load with lockdep enabled. I
can't see how it directly causes your issues, but it will slow anything
down to almost a grinding halt on systems with more than say two cores.
Can you run with CONFIG_DEBUG_LOCK_ALLOC / and CONFIG_PROVE_LOCKING
disabled/ It might be worth if you have other really heavy debugging
options enabled, too.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-14 8:49 ` Stefan Priebe - Profihost AG
@ 2011-09-14 14:30 ` Christoph Hellwig
0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-14 14:30 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
On Wed, Sep 14, 2011 at 10:49:20AM +0200, Stefan Priebe - Profihost AG wrote:
> >Sadly it was now crashing with 3.1 rc-6 + patch again. Sorry i was to
> >fast to write you an email.
>
> So might it be that the problem at least in 3.1 lies in:
> [] ? mark_held_locks+0xc9/0xef
> [] ? _raw_spin_unlock_irqrestore+0x3f/0x47
>
> and not in XFS?
That's the lockdep code.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-14 14:30 ` Christoph Hellwig
@ 2011-09-14 16:06 ` Stefan Priebe - Profihost AG
2011-09-18 9:14 ` Stefan Priebe - Profihost AG
1 sibling, 0 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-14 16:06 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com, aelder
Hi,
Am 14.09.2011 16:30, schrieb Christoph Hellwig:
> On Wed, Sep 14, 2011 at 09:48:18AM +0200, Stefan Priebe - Profihost AG wrote:
>> #0: (&sb->s_type->i_mutex_key#5){+.+.+.}, at: [] do_last+0x287/0x693
>
> This means you are running your heavy load with lockdep enabled. I
> can't see how it directly causes your issues, but it will slow anything
> down to almost a grinding halt on systems with more than say two cores.
>
> Can you run with CONFIG_DEBUG_LOCK_ALLOC / and CONFIG_PROVE_LOCKING
> disabled/ It might be worth if you have other really heavy debugging
> options enabled, too.
i just enabled it while trying to find out the cause of my problems.
My actual config has:
# grep -i 'DEBUG' .config|egrep -v "^# "
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_SLUB_DEBUG=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_PNP_DEBUG_MESSAGES=y
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_OCFS2_DEBUG_MASKLOG=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_RODATA=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
my original config had:
# grep -i 'DEBUG' .config_stillnotworking|egrep -v "^# "
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_HAVE_DMA_API_DEBUG=y
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_PNP_DEBUG_MESSAGES=y
CONFIG_AIC7XXX_DEBUG_ENABLE=y
CONFIG_AIC7XXX_DEBUG_MASK=0
CONFIG_AIC79XX_DEBUG_ENABLE=y
CONFIG_AIC79XX_DEBUG_MASK=0
CONFIG_OCFS2_DEBUG_MASKLOG=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_KERNEL=y
CONFIG_SCHED_DEBUG=y
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_RODATA=y
CONFIG_KEYS_DEBUG_PROC_KEYS=y
With both configs i'm seeing the SAME symptoms after a while.
Which options should i disable?
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-14 14:30 ` Christoph Hellwig
2011-09-14 16:06 ` Stefan Priebe - Profihost AG
@ 2011-09-18 9:14 ` Stefan Priebe - Profihost AG
2011-09-18 20:04 ` Christoph Hellwig
2011-09-18 23:02 ` Dave Chinner
1 sibling, 2 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-18 9:14 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com, aelder
Hi,
at least i'm now able to reproduce the issue. I hope this will help to
investigate the issue and hopefully you can reproduce it as well.
I'm using vanilla 3.0.4 kernel + xfs as root filesystem and had detect
hanging taks with 120s set. You'll then see that the bonnie++ command
get's stuck in xlog_grant_log_space while creating or deleting files. I
was using a SSD or a fast Raid 10 (24x SAS Disks) - i was not able to
reproduce it on normal SATA disks even a 20x SATA Raid 10 didn't work.
I used bonnie++ (V 1.96) to reproduce it. Mostly in the 1st run the bug
is triggered - sometimes I needed two runs.
bonnie++ -u root -s 0 -n 1024:32768:0:1024:4096 -d /
I hope that helps - as i now have a testing machine and can trigger the
bug pretty fast (10-30min instead of hours). I can also add debug code
if you want or have one.
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-18 9:14 ` Stefan Priebe - Profihost AG
@ 2011-09-18 20:04 ` Christoph Hellwig
2011-09-19 10:54 ` Stefan Priebe - Profihost AG
2011-09-18 23:02 ` Dave Chinner
1 sibling, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-18 20:04 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, aelder,
xfs@oss.sgi.com
On Sun, Sep 18, 2011 at 11:14:08AM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
>
> at least i'm now able to reproduce the issue. I hope this will help
> to investigate the issue and hopefully you can reproduce it as well.
>
> I'm using vanilla 3.0.4 kernel + xfs as root filesystem and had
> detect hanging taks with 120s set. You'll then see that the bonnie++
> command get's stuck in xlog_grant_log_space while creating or
> deleting files. I was using a SSD or a fast Raid 10 (24x SAS Disks)
> - i was not able to reproduce it on normal SATA disks even a 20x
> SATA Raid 10 didn't work.
Thanks a lot for the reproducer!
I've tried it on my laptop SSD and that didn't reproduce it yet. I'll
try it on monday on a real high end setup.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-18 9:14 ` Stefan Priebe - Profihost AG
2011-09-18 20:04 ` Christoph Hellwig
@ 2011-09-18 23:02 ` Dave Chinner
2011-09-20 0:47 ` Stefan Priebe
2011-09-20 10:09 ` Stefan Priebe - Profihost AG
1 sibling, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2011-09-18 23:02 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
On Sun, Sep 18, 2011 at 11:14:08AM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
>
> at least i'm now able to reproduce the issue. I hope this will help
> to investigate the issue and hopefully you can reproduce it as well.
>
> I'm using vanilla 3.0.4 kernel + xfs as root filesystem and had
> detect hanging taks with 120s set. You'll then see that the bonnie++
> command get's stuck in xlog_grant_log_space while creating or
> deleting files. I was using a SSD or a fast Raid 10 (24x SAS Disks)
> - i was not able to reproduce it on normal SATA disks even a 20x
> SATA Raid 10 didn't work.
>
> I used bonnie++ (V 1.96) to reproduce it. Mostly in the 1st run the
> bug is triggered - sometimes I needed two runs.
>
> bonnie++ -u root -s 0 -n 1024:32768:0:1024:4096 -d /
>
> I hope that helps - as i now have a testing machine and can trigger
> the bug pretty fast (10-30min instead of hours). I can also add
> debug code if you want or have one.
If it is a log space accounting issue, then the output of 'xfs_info
<mtpt>' is really necessary to set the filesystem up the same way
(e.g. same log size, number of AGs, etc) so that it behaves the same
way on different test machines....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-18 20:04 ` Christoph Hellwig
@ 2011-09-19 10:54 ` Stefan Priebe - Profihost AG
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-19 10:54 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, aelder, xfs@oss.sgi.com
Am 18.09.2011 22:04, schrieb Christoph Hellwig:
> On Sun, Sep 18, 2011 at 11:14:08AM +0200, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> at least i'm now able to reproduce the issue. I hope this will help
>> to investigate the issue and hopefully you can reproduce it as well.
>>
>> I'm using vanilla 3.0.4 kernel + xfs as root filesystem and had
>> detect hanging taks with 120s set. You'll then see that the bonnie++
>> command get's stuck in xlog_grant_log_space while creating or
>> deleting files. I was using a SSD or a fast Raid 10 (24x SAS Disks)
>> - i was not able to reproduce it on normal SATA disks even a 20x
>> SATA Raid 10 didn't work.
>
> Thanks a lot for the reproducer!
>
> I've tried it on my laptop SSD and that didn't reproduce it yet. I'll
> try it on monday on a real high end setup.
Sadly my SSD briked tonight while doing heavy testing ;-( I was not able
to reproduce it on every partition. Only on some. Sadly i was not able
to find the common point which causes this.
I've now to setup a new machine and try to reproduce it again.
What i got so far is that bonnie++ is always hanging here:
[] ? radix_tree_gang_lookup_slot+0x6a/0x8d
[] ? xfs_bmap_search_extents+0x56/0xb9
[] ? find_get_pages+0x39/0xd8
[] xlog_wait+0x58/0x70
[] ? try_to_wake_up+0x1c6/0x1c6
[] ? xlog_grant_push_ail+0xb7/0xbf
[] xlog_grant_log_space+0x162/0x2b1
[] xfs_log_reserve+0xbb/0xc4
[] xfs_trans_reserve+0xd6/0x1b1
[] xfs_free_eofblocks+0x16b/0x1fb
[] xfs_release+0x1c7/0x202
[] xfs_file_release+0x10/0x14
[] fput+0xfd/0x1eb
[] filp_close+0x6d/0x78
[] sys_close+0x9a/0xd4
[] system_call_fastpath+0x16/0x1b
The traces we had in the past were difficult to check which process was
causing the lookup. So it doesn't seem to be the xlog_grant_log_space
itself it seems that it is more xfs_bmap_search_extents or
radix_tree_gang_lookup_slot?
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-18 23:02 ` Dave Chinner
@ 2011-09-20 0:47 ` Stefan Priebe
2011-09-20 1:01 ` Stefan Priebe
2011-09-20 10:09 ` Stefan Priebe - Profihost AG
1 sibling, 1 reply; 30+ messages in thread
From: Stefan Priebe @ 2011-09-20 0:47 UTC (permalink / raw)
To: Dave Chinner
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder@sgi.com
Am 19.09.2011 um 01:02 schrieb Dave Chinner <david@fromorbit.com>:
> On Sun, Sep 18, 2011 at 11:14:08AM +0200, Stefan Priebe - Profihost AG wrote:
>> Hi,
>>
>> at least i'm now able to reproduce the issue. I hope this will help
>> to investigate the issue and hopefully you can reproduce it as well.
>>
>> I'm using vanilla 3.0.4 kernel + xfs as root filesystem and had
>> detect hanging taks with 120s set. You'll then see that the bonnie++
>> command get's stuck in xlog_grant_log_space while creating or
>> deleting files. I was using a SSD or a fast Raid 10 (24x SAS Disks)
>> - i was not able to reproduce it on normal SATA disks even a 20x
>> SATA Raid 10 didn't work.
>>
>> I used bonnie++ (V 1.96) to reproduce it. Mostly in the 1st run the
>> bug is triggered - sometimes I needed two runs.
>>
>> bonnie++ -u root -s 0 -n 1024:32768:0:1024:4096 -d /
>>
>> I hope that helps - as i now have a testing machine and can trigger
>> the bug pretty fast (10-30min instead of hours). I can also add
>> debug code if you want or have one.
>
> If it is a log space accounting issue, then the output of 'xfs_info
> <mtpt>' is really necessary to set the filesystem up the same way
> (e.g. same log size, number of AGs, etc) so that it behaves the same
> way on different
I can't get it. It just works on some part. and not on the other. Even xfs_info shows the same
for them. Also i have one part where it only happens when that one is
root (/). When i mount that one as /mnt it does not happen ;-(
Any idea on how to proceed now?
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 0:47 ` Stefan Priebe
@ 2011-09-20 1:01 ` Stefan Priebe
0 siblings, 0 replies; 30+ messages in thread
From: Stefan Priebe @ 2011-09-20 1:01 UTC (permalink / raw)
To: Stefan Priebe
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder@sgi.com
Am 20.09.2011 um 02:47 schrieb Stefan Priebe <s.priebe@profihost.ag>:
> I can't get it. It just works on some part. and not on the other.
So works means here reproducing it with bonnie++.
So i can reproduce it still very fast but i don't know how to create a testcase.
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-18 23:02 ` Dave Chinner
2011-09-20 0:47 ` Stefan Priebe
@ 2011-09-20 10:09 ` Stefan Priebe - Profihost AG
2011-09-20 16:02 ` Christoph Hellwig
1 sibling, 1 reply; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-20 10:09 UTC (permalink / raw)
To: Dave Chinner
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
Hi,
any idea how to get deeper into this? I've tried using kgdb but
strangely the error does not occur when kgdb is remote attached. When i
unattach kgdb and restart bonnie the error happens again.
So it seems to me a little bit like a timing issue?
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 10:09 ` Stefan Priebe - Profihost AG
@ 2011-09-20 16:02 ` Christoph Hellwig
2011-09-20 17:23 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-20 16:02 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
On Tue, Sep 20, 2011 at 12:09:34PM +0200, Stefan Priebe - Profihost AG wrote:
> Hi,
>
> any idea how to get deeper into this? I've tried using kgdb but
> strangely the error does not occur when kgdb is remote attached.
> When i unattach kgdb and restart bonnie the error happens again.
>
> So it seems to me a little bit like a timing issue?
Sounds like it.
Can you summarize all the data that we gather over this thread into one
summary, e.g.
- what kernel does it happens? Seems like 3.0 and 3.1 hit it easily,
2.6.38 some times, 2.6.32 is fine. Did you test anything between
2.6.32 and 2.6.38?
- what hardware hits it often/sometimes/never?
- what is the fs geometry?
- what is the hardware?
- is this a 32 or 64-bit kernel, or do you run both?
I'm pretty sure most got posted somewhere, but let's get a summary
as things was a bit confusing sometimes.
Note that 2.6.38 moved the whole log grant code to a lockless algorithm,
so this might be a likely culprit if you're managing to hit race windows
no one else does, i.e. this really is a timing issue.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 16:02 ` Christoph Hellwig
@ 2011-09-20 17:23 ` Stefan Priebe - Profihost AG
2011-09-20 17:24 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-20 17:23 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com, aelder
> Can you summarize all the data that we gather over this thread into one
> summary, e.g.
Yes - hope it helps.
> - what kernel does it happens? Seems like 3.0 and 3.1 hit it easily,
> 2.6.38 some times, 2.6.32 is fine. Did you test anything between
> 2.6.32 and 2.6.38?
Hits very easily: 3.0.4 and 3.1-rc5
Very rare: 2.6.38 - as it happened only some times i cannot 100%
guarantee that it is really the same issue
No issues at all: 2.6.32
I've not tested anything between 2.6.32 as i cannot reproduce it under
2.6.38 at all - seen once a week of 500.
> - what hardware hits it often/sometimes/never?
I've seen this only on multi core CPUs with > 2.8Ghz and fast SAS Raid
10 or SSD. I cannot say if it's the CPU or the fast disks - as our low
cost systems have only small CPUs and the high end ones have big cpus
with fast disks.
> - what is the fs geometry?
What do you exactly mean? I've seen this on 1TB and 160GB SSD devices
with totally different disk layout.
> - what is the hardware?
see above
> - is this a 32 or 64-bit kernel, or do you run both?
always 64bit
> I'm pretty sure most got posted somewhere, but let's get a summary
> as things was a bit confusing sometimes.
no problem
> Note that 2.6.38 moved the whole log grant code to a lockless algorithm,
> so this might be a likely culprit if you're managing to hit race windows
> no one else does, i.e. this really is a timing issue.
I'm nearly willing todo anything to solve this. What can i do to help.
My last hope from today was to get some code lines with kgdb - sadly it
does not happen at all when kgdb is attached ;-(
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 17:23 ` Stefan Priebe - Profihost AG
@ 2011-09-20 17:24 ` Christoph Hellwig
2011-09-20 17:35 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-20 17:24 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
On Tue, Sep 20, 2011 at 07:23:00PM +0200, Stefan Priebe - Profihost AG wrote:
> > - what is the fs geometry?
> What do you exactly mean? I've seen this on 1TB and 160GB SSD
> devices with totally different disk layout.
The output of mkfs.xfs (of xfs_info after it's been created)
> >Note that 2.6.38 moved the whole log grant code to a lockless algorithm,
> >so this might be a likely culprit if you're managing to hit race windows
> >no one else does, i.e. this really is a timing issue.
> I'm nearly willing todo anything to solve this. What can i do to
> help. My last hope from today was to get some code lines with kgdb -
> sadly it does not happen at all when kgdb is attached ;-(
I'll run tests on a system with a pci-e flash device today. Just to
make sure we are on the same page, can you give me your kernel .config
in addition to the mkfs output above?
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 17:24 ` Christoph Hellwig
@ 2011-09-20 17:35 ` Stefan Priebe - Profihost AG
2011-09-20 22:30 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-20 17:35 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com, aelder
Am 20.09.2011 19:24, schrieb Christoph Hellwig:
> On Tue, Sep 20, 2011 at 07:23:00PM +0200, Stefan Priebe - Profihost AG wrote:
>>> - what is the fs geometry?
>> What do you exactly mean? I've seen this on 1TB and 160GB SSD
>> devices with totally different disk layout.
>
> The output of mkfs.xfs (of xfs_info after it's been created)
ssd:~# xfs_info /dev/sda3
meta-data=/dev/root isize=256 agcount=4, agsize=9517888 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=38071552, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal bsize=4096 blocks=18589, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
> I'll run tests on a system with a pci-e flash device today. Just to
> make sure we are on the same page, can you give me your kernel .config
> in addition to the mkfs output above?
OK i hope you can reproduce it as well.
.config
http://pastebin.com/raw.php?i=m8AAFJ1B
I also found out that i was not able to reproduce it under a freshly new
created xfs part. I needed to copy a bunch of files delete some create
some new and then start the test. I just duplicated multiple times the
root filesystem and then deleted some, created some hardlinks whatever...
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 17:35 ` Stefan Priebe - Profihost AG
@ 2011-09-20 22:30 ` Christoph Hellwig
2011-09-21 7:36 ` Stefan Priebe - Profihost AG
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-20 22:30 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, aelder,
xfs@oss.sgi.com
On Tue, Sep 20, 2011 at 07:35:57PM +0200, Stefan Priebe - Profihost AG wrote:
> Am 20.09.2011 19:24, schrieb Christoph Hellwig:
> >On Tue, Sep 20, 2011 at 07:23:00PM +0200, Stefan Priebe - Profihost AG wrote:
> >>> - what is the fs geometry?
> >>What do you exactly mean? I've seen this on 1TB and 160GB SSD
> >>devices with totally different disk layout.
> >
> >The output of mkfs.xfs (of xfs_info after it's been created)
>
> ssd:~# xfs_info /dev/sda3
> meta-data=/dev/root isize=256 agcount=4, agsize=9517888 blks
> = sectsz=512 attr=2
> data = bsize=4096 blocks=38071552, imaxpct=25
> = sunit=0 swidth=0 blks
> naming =version 2 bsize=4096 ascii-ci=0
> log =internal bsize=4096 blocks=18589, version=2
> = sectsz=512 sunit=0 blks, lazy-count=1
> realtime =none extsz=4096 blocks=0, rtextents=0
Nothing special there.
So far I haven't been able to recreate it. How many runs did you
normally need on 3.1-rc? Note that so far I've run my known working
kernel, I'll test your config plus the drivers I need next.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-20 22:30 ` Christoph Hellwig
@ 2011-09-21 7:36 ` Stefan Priebe - Profihost AG
2011-09-21 11:39 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe - Profihost AG @ 2011-09-21 7:36 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs-masters@oss.sgi.com, aelder, xfs@oss.sgi.com
Am 21.09.2011 00:30, schrieb Christoph Hellwig:
> On Tue, Sep 20, 2011 at 07:35:57PM +0200, Stefan Priebe - Profihost AG wrote:
>> Am 20.09.2011 19:24, schrieb Christoph Hellwig:
>>> On Tue, Sep 20, 2011 at 07:23:00PM +0200, Stefan Priebe - Profihost AG wrote:
>>>>> - what is the fs geometry?
>>>> What do you exactly mean? I've seen this on 1TB and 160GB SSD
>>>> devices with totally different disk layout.
>>>
>>> The output of mkfs.xfs (of xfs_info after it's been created)
>>
>> ssd:~# xfs_info /dev/sda3
>> meta-data=/dev/root isize=256 agcount=4, agsize=9517888 blks
>> = sectsz=512 attr=2
>> data = bsize=4096 blocks=38071552, imaxpct=25
>> = sunit=0 swidth=0 blks
>> naming =version 2 bsize=4096 ascii-ci=0
>> log =internal bsize=4096 blocks=18589, version=2
>> = sectsz=512 sunit=0 blks, lazy-count=1
>> realtime =none extsz=4096 blocks=0, rtextents=0
>
> Nothing special there.
>
> So far I haven't been able to recreate it. How many runs did you
> normally need on 3.1-rc? Note that so far I've run my known working
> kernel, I'll test your config plus the drivers I need next.
I had only used 3.0.4 with bonnie++ to reproduce. 3.1-rc was running on
a prod. system.
Sadly i'm also not able to reproduce it reliable on every partition.
Sometimes it works sometimes not. Just retrying does not help. I had to
copy and delete random files from the part. and then start bonnie++ on
it. Perhaps i can give you a dd dump of the partition. But i had to
recreate one. My Intel SSD is now massivly slower than when i started
the tests. No idea why.
Stefan
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-21 7:36 ` Stefan Priebe - Profihost AG
@ 2011-09-21 11:39 ` Christoph Hellwig
2011-09-21 13:39 ` Stefan Priebe
0 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-21 11:39 UTC (permalink / raw)
To: Stefan Priebe - Profihost AG
Cc: Christoph Hellwig, xfs-masters@oss.sgi.com, xfs@oss.sgi.com,
aelder
On Wed, Sep 21, 2011 at 09:36:42AM +0200, Stefan Priebe - Profihost AG wrote:
> >So far I haven't been able to recreate it. How many runs did you
> >normally need on 3.1-rc? Note that so far I've run my known working
> >kernel, I'll test your config plus the drivers I need next.
>
> I had only used 3.0.4 with bonnie++ to reproduce. 3.1-rc was running
> on a prod. system.
>
> Sadly i'm also not able to reproduce it reliable on every partition.
> Sometimes it works sometimes not. Just retrying does not help. I had
> to copy and delete random files from the part. and then start
> bonnie++ on it. Perhaps i can give you a dd dump of the partition.
> But i had to recreate one. My Intel SSD is now massivly slower than
> when i started the tests. No idea why.
So far it runs fine on 3.1-rc both with my default config and yours,
the latter had been running all night. This is on a 8-core Nehalem
with 8GB of memory, and a fast PCI-e flash device.
One thing I noticed is that your config seems to run many fs tasks
a lot slower than mine, but I'm not entirely sure why.
The only interesting things I noticed in your config where that you
use slub instead of slab, which does a lot of high order allocations
and has caused lots of trouble in the past, and that you enable
CONFIG_CC_OPTIMIZE_FOR_SIZE, which has caused mis-compilation
of complicated code in the past. I don't want to blame it directly,
but I could see how that causes problems with some of the atomic64_t
games XFS plays since 2.6.38.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-21 11:39 ` Christoph Hellwig
@ 2011-09-21 13:39 ` Stefan Priebe
2011-09-21 14:17 ` Christoph Hellwig
0 siblings, 1 reply; 30+ messages in thread
From: Stefan Priebe @ 2011-09-21 13:39 UTC (permalink / raw)
To: Christoph Hellwig
Cc: xfs-masters@oss.sgi.com, xfs@oss.sgi.com, aelder@sgi.com
> One thing I noticed is that your config seems to run many fs tasks
> a lot slower than mine, but I'm not entirely sure why.
Strange would you post your config too?
> The only interesting things I noticed in your config where that you
> use slub instead of slab, which does a lot of high order allocations
> and has caused lots of trouble in the past, and that you enable
> CONFIG_CC_OPTIMIZE_FOR_SIZE, which has caused mis-compilation
> of complicated code in the past. I don't want to blame it directly,
> but I could see how that causes problems with some of the atomic64_t
> games XFS plays since 2.6.38.
Will remove it.
At least dave was able to reproduce so he can probably help too.
Thanks!
Stefan
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: xfs deadlock in stable kernel 3.0.4
2011-09-21 13:39 ` Stefan Priebe
@ 2011-09-21 14:17 ` Christoph Hellwig
0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2011-09-21 14:17 UTC (permalink / raw)
To: Stefan Priebe; +Cc: aelder@sgi.com, xfs@oss.sgi.com
[-- Attachment #1: Type: text/plain, Size: 249 bytes --]
On Wed, Sep 21, 2011 at 03:39:16PM +0200, Stefan Priebe wrote:
>
> > One thing I noticed is that your config seems to run many fs tasks
> > a lot slower than mine, but I'm not entirely sure why.
> Strange would you post your config too?
Attached.
[-- Attachment #2: config.2.6.40.bz2 --]
[-- Type: application/x-bzip2, Size: 26558 bytes --]
[-- Attachment #3: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2011-09-21 14:17 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-11 13:12 xfs deadlock in stable kernel 3.0.4 Stefan Priebe - Profihost AG
-- strict thread matches above, loose matches on Subject: below --
2011-09-10 12:23 Stefan Priebe
2011-09-12 15:21 ` Christoph Hellwig
2011-09-12 16:46 ` Stefan Priebe
2011-09-12 20:05 ` Christoph Hellwig
2011-09-13 6:04 ` Stefan Priebe - Profihost AG
2011-09-13 19:31 ` Stefan Priebe - Profihost AG
2011-09-13 20:50 ` Christoph Hellwig
2011-09-14 7:26 ` Stefan Priebe - Profihost AG
2011-09-14 7:48 ` Stefan Priebe - Profihost AG
2011-09-14 8:49 ` Stefan Priebe - Profihost AG
2011-09-14 14:30 ` Christoph Hellwig
2011-09-14 14:30 ` Christoph Hellwig
2011-09-14 16:06 ` Stefan Priebe - Profihost AG
2011-09-18 9:14 ` Stefan Priebe - Profihost AG
2011-09-18 20:04 ` Christoph Hellwig
2011-09-19 10:54 ` Stefan Priebe - Profihost AG
2011-09-18 23:02 ` Dave Chinner
2011-09-20 0:47 ` Stefan Priebe
2011-09-20 1:01 ` Stefan Priebe
2011-09-20 10:09 ` Stefan Priebe - Profihost AG
2011-09-20 16:02 ` Christoph Hellwig
2011-09-20 17:23 ` Stefan Priebe - Profihost AG
2011-09-20 17:24 ` Christoph Hellwig
2011-09-20 17:35 ` Stefan Priebe - Profihost AG
2011-09-20 22:30 ` Christoph Hellwig
2011-09-21 7:36 ` Stefan Priebe - Profihost AG
2011-09-21 11:39 ` Christoph Hellwig
2011-09-21 13:39 ` Stefan Priebe
2011-09-21 14:17 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.