* XFS handling of synchronous buffers in case of EIO error
@ 2010-12-30 12:28 Ajeet Yadav
2010-12-30 23:13 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Ajeet Yadav @ 2010-12-30 12:28 UTC (permalink / raw)
To: xfs
[-- Attachment #1.1: Type: text/plain, Size: 9404 bytes --]
Kernel: 2.6.30.9
I am trouble shooting a hang in XFS during umount.
Test scenerio: Copy large number of files files using below script, and
remove the USB after 3-5 second
index=0
while [ "$?" == 0 ]
do
index=$((index+1))
sync
cp $1/1KB.txt $2/"$index".test
done
In rare scenerio during USB unplug the umount process hang at xfs_buf_lock.
Below log shows the hung process
We have put printk to buffer handling functions xfs_buf_iodone_callbacks(),
xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()
We always observed the hang only comes when bp->b_relse =
xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute
XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
XFS_BUF_DONE(bp);
XFS_BUF_FINISH_IOWAIT(bp);
buf its never called by xfs_buf_relse() because b_hold = 3.
Also we have seen that this problem always comes when bp->relse != NULL &&
bp->hold > 1.
I do not know whether below prints will help you, but I have taken printk
for super block buffer tracing
S-functionname ( Start of function)
E-functionname (End of function)
I think the problem is related to xfs_buf_iodone_callbacks synchronous
buffer error handling path and release buffer.
------------------------------------------------
buf_lock (Here buffer lock success, taken after down())
S-xfs_buf_rele - 3 (Start of xfs_buf_rele(), b_hold = 3)
Call Trace:
[<8032c1bc>] dump_stack+0x8/0x34 from[<801cab2c>] xfs_buf_rele+0xc8/0x28c
[<801cab2c>] xfs_buf_rele+0xc8/0x28c from[<801cad9c>]
xfs_buf_delwri_dequeue+0xac/0x13c
[<801cad9c>] xfs_buf_delwri_dequeue+0xac/0x13c from[<801cb798>]
xfs_bwrite+0x5c/0x128
[<801cb798>] xfs_bwrite+0x5c/0x128 from[<801d5b48>]
xfs_sync_fsdata+0xbc/0x17c
[<801d5b48>] xfs_sync_fsdata+0xbc/0x17c from[<801d690c>]
xfs_quiesce_data+0x34/0x68
[<801d690c>] xfs_quiesce_data+0x34/0x68 from[<801d2b34>]
xfs_fs_sync_fs+0x30/0xec
[<801d2b34>] xfs_fs_sync_fs+0x30/0xec<7>hub 2-0:1.0: state 7 ports 1 chg
0000 evt 0002
usb 2-1: USB disconnect, address 3
from[<800b7ff0>] sync_filesystems+0x118/0x19c
[<800b7ff0>] sync_filesystems+0x118/0x19c from[<800db490>] do_sync+0x38/0x7c
[<800db490>] do_sync+0x38/0x7c from[<800db510>] sys_sync+0x10/0x20
[<800db510>] sys_sync+0x10/0x20 from[<8000ff44>] stack_done+0x20/0x3c
E-xfs_buf_rele (End of function)
S-xfs_bdstrat_cb
S-xfs_buf_rele - 3
S-xfs_buf_iodone_callbacks
Device sda2, XFS metadata write error block 0x0 in sda2
E-xfs_buf_iodone_callbacks
Call Trace:
[<8032c1bc>] dump_stack+0x8/0x34 from[<801cab2c>] xfs_buf_rele+0xc8/0x28c
[<801cab2c>] xfs_buf_rele+0xc8/0x28c from[<801cb2fc>]
xfs_buf_iorequest+0xe8/0x104
[<801cb2fc>] xfs_buf_iorequest+0xe8/0x104 from[<801cbd2c>]
xfs_bdstrat_cb+0x140/0x178
[<801cbd2c>] xfs_bdstrat_cb+0x140/0x178 from[<801cb7ac>]
xfs_bwrite+0x70/0x128
[<801cb7ac>] xfs_bwrite+0x70/0x128 from[<801d5b48>]
xfs_sync_fsdata+0xbc/0x17c
[<801d5b48>] xfs_sync_fsdata+0xbc/0x17c from[<801d690c>]
xfs_quiesce_data+0x34/0x68
[<801d690c>] xfs_quiesce_data+0x34/0x68 from[<801d2b34>]
xfs_fs_sync_fs+0x30/0xec
[<801d2b34>] xfs_fs_sync_fs+0x30/0xec from[<800b7ff0>]
sync_filesystems+0x118/0x19c
[<800b7ff0>] sync_filesystems+0x118/0x19c from[<800db490>] do_sync+0x38/0x7c
[<800db490>] do_sync+0x38/0x7c from[<800db510>] sys_sync+0x10/0x20
[<800db510>] sys_sync+0x10/0x20 from[<8000ff44>] stack_done+0x20/0x3c
E-xfs_buf_rele
E-xfs_bdstrat_cb
xfs_force_shutdown(sda2,0x1) called from line 1020 of file
fs/xfs/linux-2.6/xfs_buf.c. Return address = 0x801cb7f0
Filesystem "sda2": I/O Error Detected. Shutting down filesystem: sda2
Please umount the filesystem, and rectify the problem(s)
S-xfs_buf_relse
S-xfs_buf_rele - 2
Call Trace:
[<8032c1bc>] dump_stack+0x8/0x34 from[<801cab2c>] xfs_buf_rele+0xc8/0x28c
[<801cab2c>] xfs_buf_rele+0xc8/0x28c from[<801cb828>] xfs_bwrite+0xec/0x128
[<801cb828>] xfs_bwrite+0xec/0x128 from[<801d5b48>]
xfs_sync_fsdata+0xbc/0x17c
[<801d5b48>] xfs_sync_fsdata+0xbc/0x17c from[<801d690c>]
xfs_quiesce_data+0x34/0x68
[<801d690c>] xfs_quiesce_data+0x34/0x68 from[<801d2b34>]
xfs_fs_sync_fs+0x30/0xec
[<801d2b34>] xfs_fs_sync_fs+0x30/0xec from[<800b7ff0>]
sync_filesystems+0x118/0x19c
[<800b7ff0>] sync_filesystems+0x118/0x19c from[<800db490>] do_sync+0x38/0x7c
[<800db490>] do_sync+0x38/0x7c from[<800db510>] sys_sync+0x10/0x20
[<800db510>] sys_sync+0x10/0x20 from[<8000ff44>] stack_done+0x20/0x3c
E-xfs_buf_rele
E-xfs_buf_relse
cp: cannot stat '/dtv/usb/sda2/6.test': Input/output error
Filesystem "sda2": xfs_log_force: error 5 returned.
INFO: task khubd:56 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
khubd D [86a516a8] 8032cb2c 0 56 2
237 26 (kernel thread)
Stack : 00000000 801a2a6c 8643ebe0 86b43a8c 86a51830 8032cb2c 7fffffff
86a516a8
00000002 00000001 86b6dc00 80243a10 e0364448 8032c344 e0373f10
8032cb2c
00000087 0000000b 86b43ab8 801d61bc 3b9aca00 8032d16c 00000008
8019f0dc
86b43a88 00008000 8643ebe0 8616d900 0000000b 00000000 00000000
86bae000
86b43b08 801d62e0 7fffffff 86a516a8 00000002 8032e374 00000000
8008fae4
...
Call Trace:
[<8032ca74>] __schedule+0x618/0x6b8 from[<8032cb2c>] schedule+0x18/0x3c
[<8032cb2c>] schedule+0x18/0x3c from[<8032d16c>] schedule_timeout+0x2c/0x1c0
[<8032d16c>] schedule_timeout+0x2c/0x1c0 from[<8032e374>] __down+0x8c/0xdc
[<8032e374>] __down+0x8c/0xdc from[<8004500c>] down+0x40/0x88
[<8004500c>] down+0x40/0x88 from[<801c9c10>] xfs_buf_lock+0xcc/0x178
[<801c9c10>] xfs_buf_lock+0xcc/0x178 from[<801b6550>] xfs_getsb+0x38/0x54
[<801b6550>] xfs_getsb+0x38/0x54 from[<801d5b00>] xfs_sync_fsdata+0x74/0x17c
[<801d5b00>] xfs_sync_fsdata+0x74/0x17c from[<801d690c>]
xfs_quiesce_data+0x34/0x68
[<801d690c>] xfs_quiesce_data+0x34/0x68 from[<801d2b34>]
xfs_fs_sync_fs+0x30/0xec
[<801d2b34>] xfs_fs_sync_fs+0x30/0xec from[<800b878c>]
__fsync_super+0xa4/0xc8
[<800b878c>] __fsync_super+0xa4/0xc8 from[<800b87c4>] fsync_super+0x14/0x28
[<800b87c4>] fsync_super+0x14/0x28 from[<800e5cc4>] fsync_bdev+0x28/0x64
[<800e5cc4>] fsync_bdev+0x28/0x64 from[<801faaa8>]
invalidate_partition+0x28/0x60
[<801faaa8>] invalidate_partition+0x28/0x60 from[<801001b0>]
del_gendisk+0x40/0xf0
[<801001b0>] del_gendisk+0x40/0xf0 from[<8025cb50>] sd_remove+0x40/0xc8
[<8025cb50>] sd_remove+0x40/0xc8 from[<80259b5c>] scsi_bus_remove+0x44/0x5c
[<80259b5c>] scsi_bus_remove+0x44/0x5c from[<802463d4>]
__device_release_driver+0x80/0xbc
[<802463d4>] __device_release_driver+0x80/0xbc from[<80246540>]
device_release_driver+0x28/0x40
[<80246540>] device_release_driver+0x28/0x40 from[<802457b8>]
bus_remove_device+0xb0/0xf0
[<802457b8>] bus_remove_device+0xb0/0xf0 from[<80243b30>]
device_del+0x120/0x1a8
[<80243b30>] device_del+0x120/0x1a8 from[<80259f50>]
__scsi_remove_device+0x40/0x98
[<80259f50>] __scsi_remove_device+0x40/0x98 from[<802569cc>]
scsi_forget_host+0x88/0xfc
[<802569cc>] scsi_forget_host+0x88/0xfc from[<8024f34c>]
scsi_remove_host+0xf8/0x1ac
[<8024f34c>] scsi_remove_host+0xf8/0x1ac from[<e039d6bc>]
quiesce_and_remove_host+0x9c/0x12c [usb_storage]
[<e039d6bc>] quiesce_and_remove_host+0x9c/0x12c [usb_storage]
from[<e039d83c>] usb_stor_disconnect+0x20/0x3c [usb_storage]
[<e039d83c>] usb_stor_disconnect+0x20/0x3c [usb_storage] from[<e0367c58>]
usb_unbind_interface+0x68/0x128 [usbcore]
[<e0367c58>] usb_unbind_interface+0x68/0x128 [usbcore] from[<802463d4>]
__device_release_driver+0x80/0xbc
[<802463d4>] __device_release_driver+0x80/0xbc from[<80246540>]
device_release_driver+0x28/0x40
[<80246540>] device_release_driver+0x28/0x40 from[<802457b8>]
bus_remove_device+0xb0/0xf0
[<802457b8>] bus_remove_device+0xb0/0xf0 from[<80243b30>]
device_del+0x120/0x1a8
[<80243b30>] device_del+0x120/0x1a8 from[<e0364854>]
usb_disable_device+0x14c/0x234 [usbcore]
[<e0364854>] usb_disable_device+0x14c/0x234 [usbcore] from[<e035d6b8>]
usb_disconnect+0x170/0x37c [usbcore]
[<e035d6b8>] usb_disconnect+0x170/0x37c [usbcore] from[<e035f264>]
hub_thread+0x85c/0x18f8 [usbcore]
[<e035f264>] hub_thread+0x85c/0x18f8 [usbcore] from[<8003ff3c>]
kthread+0x5c/0xa0
[<8003ff3c>] kthread+0x5c/0xa0 from[<80008908>]
kernel_thread_helper+0x10/0x18
-------------------------------------------------------------------------------------
INFO: task usb_mount:395 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
usb_mount D [86163ae8] 8032cb2c 0 395
371 (user thread)
Stack : ffffff9c 86b71ef0 00400ae0 800c0150 86163c70 8032cb2c 86163ae8
00000002
86bbbf18 86bbbf00 00000000 00000000 86b71f00 86b71ef8 7f8b0670
8032cb2c
86bbb780 800b2510 00000001 86bbbf00 86bbbf00 8032f024 86bbbf00
800b21c0
86bbbf00 86bbbf00 86b6dc44 86b6dc44 86163ae8 00000002 86b6dc00
86b6dc00
803fe190 800b9350 86bbbf00 86bbbf18 86bbbef8 86bbbf00 86bbbef8
86bbbef8
...
Call Trace:
[<8032ca74>] __schedule+0x618/0x6b8 from[<8032cb2c>] schedule+0x18/0x3c
[<8032cb2c>] schedule+0x18/0x3c from[<8032f024>]
__down_write_nested+0x104/0x128
[<8032f024>] __down_write_nested+0x104/0x128 from[<800b9350>]
deactivate_super+0x70/0x110
[<800b9350>] deactivate_super+0x70/0x110 from[<800d122c>]
sys_umount+0x310/0x358
[<800d122c>] sys_umount+0x310/0x358 from[<8000ff44>] stack_done+0x20/0x3c
-------------------------------------------------------------------------------------
Filesystem "sda2": xfs_log_force: error 5 returned.
[-- Attachment #1.2: Type: text/html, Size: 11280 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS handling of synchronous buffers in case of EIO error
2010-12-30 12:28 XFS handling of synchronous buffers in case of EIO error Ajeet Yadav
@ 2010-12-30 23:13 ` Dave Chinner
2010-12-31 6:47 ` Ajeet Yadav
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2010-12-30 23:13 UTC (permalink / raw)
To: Ajeet Yadav; +Cc: xfs
On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote:
> Kernel: 2.6.30.9
>
> I am trouble shooting a hang in XFS during umount.
> Test scenerio: Copy large number of files files using below script, and
> remove the USB after 3-5 second
FWIW, in future can you please report what kernel you are testing on?
>
> index=0
> while [ "$?" == 0 ]
> do
> index=$((index+1))
> sync
> cp $1/1KB.txt $2/"$index".test
> done
>
> In rare scenerio during USB unplug the umount process hang at xfs_buf_lock.
> Below log shows the hung process
>
> We have put printk to buffer handling functions xfs_buf_iodone_callbacks(),
> xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()
>
> We always observed the hang only comes when bp->b_relse =
> xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute
> XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> XFS_BUF_DONE(bp);
> XFS_BUF_FINISH_IOWAIT(bp);
>
> buf its never called by xfs_buf_relse() because b_hold = 3.
>
> Also we have seen that this problem always comes when bp->relse != NULL &&
> bp->hold > 1.
This appears to be the same problem as reported here:
http://oss.sgi.com/archives/xfs/2010-12/msg00380.html
> I do not know whether below prints will help you, but I have taken printk
> for super block buffer tracing
> S-functionname ( Start of function)
> E-functionname (End of function)
If you have a recent enough kernel, you can get all this information
from the tracing built into XFS.
As it is, the cause of the problem is that setting bp->b_relse
changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it
doesn't unlock the buffer. This is normally just fine, because
xfs_buf_rele() has a special case to handle buffers with
bp->b_relse(), which adds a hold count and call the release function
when the hold count drops to zero. The b_relse function is supposed
to unlock the buffer by calling xfs_buf_relse() again.
Unfortunately, the superblock buffer is special - the hold count on
it never drops to zero until very late in the unmont process because
it is managed by the filesystem. Hence the bp->b_relse function is
never called, and hence the buffer is never unlocked in this case.
Hence future attempts to access it hang.
I'll need to think about this one for a bit...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS handling of synchronous buffers in case of EIO error
2010-12-30 23:13 ` Dave Chinner
@ 2010-12-31 6:47 ` Ajeet Yadav
2011-01-04 5:19 ` Dave Chinner
0 siblings, 1 reply; 5+ messages in thread
From: Ajeet Yadav @ 2010-12-31 6:47 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 4276 bytes --]
Dear Dave,
Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
But I have seen similar behaviour in another post related to process ls hang
in 2.6.35.9
*
http://oss.sgi.com/pipermail/xfs/2010-December/048691.html
*I have always seen the hang problem comes only if comes when b_relse !=
NULL, and b_hold > 2
I have made below workaround it solved the problem in our case because when
USB is removed we know we get EIO error.
But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()
considering XBF_LOCK flow path.
@@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
/* We actually overwrite the existing b-relse
function at times, but we're gonna be shutting
down
anyway. */
- XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
- XFS_BUF_DONE(bp);
- XFS_BUF_FINISH_IOWAIT(bp);
+ if (XFS_BUF_GETERROR(bp) == EIO){
+ ASSERT(XFS_BUF_TARGET(bp) ==
mp->m_ddev_targp);
+ XFS_BUF_SUPER_STALE(bp);
+ trace_xfs_buf_item_iodone(bp, _RET_IP_);
+ xfs_buf_do_callbacks(bp, lip);
+ XFS_BUF_SET_FSPRIVATE(bp, NULL);
+ XFS_BUF_CLR_IODONE_FUNC(bp);
+ xfs_biodone(bp);
+ } else {
+
XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
+ XFS_BUF_DONE(bp);
+ XFS_BUF_FINISH_IOWAIT(bp);
+ }
}
return;
}
Dec 31, 2010 at 4:43 AM, Dave Chinner <david@fromorbit.com> wrote:
> On Thu, Dec 30, 2010 at 05:58:36PM +0530, Ajeet Yadav wrote:
> > Kernel: 2.6.30.9
> >
> > I am trouble shooting a hang in XFS during umount.
> > Test scenerio: Copy large number of files files using below script, and
> > remove the USB after 3-5 second
>
> FWIW, in future can you please report what kernel you are testing on?
>
> >
> > index=0
> > while [ "$?" == 0 ]
> > do
> > index=$((index+1))
> > sync
> > cp $1/1KB.txt $2/"$index".test
> > done
> >
> > In rare scenerio during USB unplug the umount process hang at
> xfs_buf_lock.
> > Below log shows the hung process
> >
> > We have put printk to buffer handling functions
> xfs_buf_iodone_callbacks(),
> > xfs_buf_error_relse(), xfs_buf_relse() and xfs_buf_rele()
> >
> > We always observed the hang only comes when bp->b_relse =
> > xfs_buf_error_relse(). i.e when xfs_buf_iodone_callbacks() execute
> > XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > XFS_BUF_DONE(bp);
> > XFS_BUF_FINISH_IOWAIT(bp);
> >
> > buf its never called by xfs_buf_relse() because b_hold = 3.
> >
> > Also we have seen that this problem always comes when bp->relse != NULL
> &&
> > bp->hold > 1.
>
> This appears to be the same problem as reported here:
>
> http://oss.sgi.com/archives/xfs/2010-12/msg00380.html
>
>
> > I do not know whether below prints will help you, but I have taken printk
> > for super block buffer tracing
> > S-functionname ( Start of function)
> > E-functionname (End of function)
>
> If you have a recent enough kernel, you can get all this information
> from the tracing built into XFS.
>
> As it is, the cause of the problem is that setting bp->b_relse
> changes the behaviour of xfs_buf_relse() - if bp->b_relse is set, it
> doesn't unlock the buffer. This is normally just fine, because
> xfs_buf_rele() has a special case to handle buffers with
> bp->b_relse(), which adds a hold count and call the release function
> when the hold count drops to zero. The b_relse function is supposed
> to unlock the buffer by calling xfs_buf_relse() again.
>
> Unfortunately, the superblock buffer is special - the hold count on
> it never drops to zero until very late in the unmont process because
> it is managed by the filesystem. Hence the bp->b_relse function is
> never called, and hence the buffer is never unlocked in this case.
> Hence future attempts to access it hang.
>
> I'll need to think about this one for a bit...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
[-- Attachment #1.2: Type: text/html, Size: 5378 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS handling of synchronous buffers in case of EIO error
2010-12-31 6:47 ` Ajeet Yadav
@ 2011-01-04 5:19 ` Dave Chinner
2011-01-05 8:26 ` Ajeet Yadav
0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2011-01-04 5:19 UTC (permalink / raw)
To: Ajeet Yadav; +Cc: xfs
On Fri, Dec 31, 2010 at 12:17:12PM +0530, Ajeet Yadav wrote:
> Dear Dave,
>
> Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
> But I have seen similar behaviour in another post related to process ls hang
> in 2.6.35.9
> *
>
> http://oss.sgi.com/pipermail/xfs/2010-December/048691.html
>
> *I have always seen the hang problem comes only if comes when b_relse !=
> NULL, and b_hold > 2
>
> I have made below workaround it solved the problem in our case because when
> USB is removed we know we get EIO error.
>
> But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()
> considering XBF_LOCK flow path.
>
> @@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
> /* We actually overwrite the existing b-relse
> function at times, but we're gonna be shutting
> down
> anyway. */
> - XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> - XFS_BUF_DONE(bp);
> - XFS_BUF_FINISH_IOWAIT(bp);
> + if (XFS_BUF_GETERROR(bp) == EIO){
> + ASSERT(XFS_BUF_TARGET(bp) ==
> mp->m_ddev_targp);
> + XFS_BUF_SUPER_STALE(bp);
> + trace_xfs_buf_item_iodone(bp, _RET_IP_);
> + xfs_buf_do_callbacks(bp, lip);
> + XFS_BUF_SET_FSPRIVATE(bp, NULL);
> + XFS_BUF_CLR_IODONE_FUNC(bp);
> + xfs_biodone(bp);
> + } else {
> +
> XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> + XFS_BUF_DONE(bp);
> + XFS_BUF_FINISH_IOWAIT(bp);
> + }
> }
> return;
> }
This won't work reliably because it only handles one specific type
of error. We can get more than just EIO back from the lower layers,
and so if the superblock write gets a different error then we'll
still get the same hang.
Effectively what you are doing here is running the
xfs_buf_error_relse() callback directly in line. This will result in
the buffer being unlocked before the error is pulled off the buffer
after xfs_buf_iowait() completes. Essentially that means that some
other thread can reuse the buffer and clear the error before the
waiter has received the error.
I think the correct fix is to call the bp->b_relse function when the
waiter is woken to clear the error and unlock the buffer. I've just
posted a patch to do this for 2.6.38, but it won't trivially backport
to 2.6.34 or 2.6.30 as the synchronous write interfaces into the
buffer cache have been cleaned up and simplified recently. It should
still be relatively easy to handle, though.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: XFS handling of synchronous buffers in case of EIO error
2011-01-04 5:19 ` Dave Chinner
@ 2011-01-05 8:26 ` Ajeet Yadav
0 siblings, 0 replies; 5+ messages in thread
From: Ajeet Yadav @ 2011-01-05 8:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
[-- Attachment #1.1: Type: text/plain, Size: 3227 bytes --]
Thanks, I think its better to end this mail by rerering to your patch.
http://oss.sgi.com/archives/xfs/2011-01/msg00020.html
On Tue, Jan 4, 2011 at 2:19 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Fri, Dec 31, 2010 at 12:17:12PM +0530, Ajeet Yadav wrote:
> > Dear Dave,
> >
> > Our Kernel is 2.6.30.9 but XFS is backported from 2.6.34.
> > But I have seen similar behaviour in another post related to process ls
> hang
> > in 2.6.35.9
> > *
> >
> > http://oss.sgi.com/pipermail/xfs/2010-December/048691.html
> >
> > *I have always seen the hang problem comes only if comes when b_relse !=
> > NULL, and b_hold > 2
> >
> > I have made below workaround it solved the problem in our case because
> when
> > USB is removed we know we get EIO error.
> >
> > But I think we need to review xfs_buf_error_relse() and xfs_buf_relse()
> > considering XBF_LOCK flow path.
> >
> > @@ -1047,9 +1047,19 @@ xfs_buf_iodone_callbacks(
> > /* We actually overwrite the existing b-relse
> > function at times, but we're gonna be shutting
> > down
> > anyway. */
> > - XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > - XFS_BUF_DONE(bp);
> > - XFS_BUF_FINISH_IOWAIT(bp);
> > + if (XFS_BUF_GETERROR(bp) == EIO){
> > + ASSERT(XFS_BUF_TARGET(bp) ==
> > mp->m_ddev_targp);
> > + XFS_BUF_SUPER_STALE(bp);
> > + trace_xfs_buf_item_iodone(bp, _RET_IP_);
> > + xfs_buf_do_callbacks(bp, lip);
> > + XFS_BUF_SET_FSPRIVATE(bp, NULL);
> > + XFS_BUF_CLR_IODONE_FUNC(bp);
> > + xfs_biodone(bp);
> > + } else {
> > +
> > XFS_BUF_SET_BRELSE_FUNC(bp,xfs_buf_error_relse);
> > + XFS_BUF_DONE(bp);
> > + XFS_BUF_FINISH_IOWAIT(bp);
> > + }
> > }
> > return;
> > }
>
> This won't work reliably because it only handles one specific type
> of error. We can get more than just EIO back from the lower layers,
> and so if the superblock write gets a different error then we'll
> still get the same hang.
>
> Effectively what you are doing here is running the
> xfs_buf_error_relse() callback directly in line. This will result in
> the buffer being unlocked before the error is pulled off the buffer
> after xfs_buf_iowait() completes. Essentially that means that some
> other thread can reuse the buffer and clear the error before the
> waiter has received the error.
>
> I think the correct fix is to call the bp->b_relse function when the
> waiter is woken to clear the error and unlock the buffer. I've just
> posted a patch to do this for 2.6.38, but it won't trivially backport
> to 2.6.34 or 2.6.30 as the synchronous write interfaces into the
> buffer cache have been cleaned up and simplified recently. It should
> still be relatively easy to handle, though.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@fromorbit.com
>
[-- Attachment #1.2: Type: text/html, Size: 4075 bytes --]
[-- Attachment #2: Type: text/plain, Size: 121 bytes --]
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-01-05 8:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-30 12:28 XFS handling of synchronous buffers in case of EIO error Ajeet Yadav
2010-12-30 23:13 ` Dave Chinner
2010-12-31 6:47 ` Ajeet Yadav
2011-01-04 5:19 ` Dave Chinner
2011-01-05 8:26 ` Ajeet Yadav
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox