From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p5UCJPnB215021 for ; Thu, 30 Jun 2011 07:19:25 -0500 Received: from ipmail07.adl2.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id B40BF4E42F1 for ; Thu, 30 Jun 2011 05:19:23 -0700 (PDT) Received: from ipmail07.adl2.internode.on.net (ipmail07.adl2.internode.on.net [150.101.137.131]) by cuda.sgi.com with ESMTP id 8CNU9ZLnuSllE7jU for ; Thu, 30 Jun 2011 05:19:23 -0700 (PDT) Date: Thu, 30 Jun 2011 22:19:18 +1000 From: Dave Chinner Subject: Re: XFS and USB Hang on 2.6.35.13 Message-ID: <20110630121918.GK561@dastard> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Amit Sahrawat Cc: xfs@oss.sgi.com On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote: > Hi All, > I encountered a hang on XFS during unplug. > *Test Case:* > #!/bin/sh > index=0 > while [ "$?" == 0 ] > do > index=$(($index+1)) > sync > cp /mnt/1KB.txt /tmp/"$index".test > done > Where /mnt - mount point for vfat and /tmp mount point for XFS, both can be > XFS also. > > During this operation, unplug the USB. I am getting HANG almost everytime I > unplug. Well, that's no surprise. The unplug appears to be losing IOs in progress. > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question will be > why am I not using TOT kernel - I tried but my PC does not boot up with the > latest one) > > *Target=ARM* > *Logs Using Kernel Hung Task Feature* > # sh test.sh > usb 2-1: USB disconnect, address 2 > sd 0:0:0:0: [sda] Unhandled error code > sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00 > sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 01 00 00 02 00 > end_request: I/O error, dev sda, sector 4196353 > sd 0:0:0:0: [sda] Unhandled error code > sd 0:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00 > sd 0:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 00 40 08 08 00 00 18 00 > end_request: I/O error, dev sda, sector 4196360 > end_request: I/O error, dev sda, sector 6293645 > Device sda3, XFS metadata write error block 0x1 in sda3 > xfs_force_shutdown(sda3,0x1) called from line 1031 of file > fs/xfs/xfs_buf_item.c. Return address = 0xc0507b1c So the device was unplugged, there was a disconnect error, a few IO errors and then a shutdown. > > *INFO: task khubd:*33 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > khubd D c06c261c 0 33 2 0x00000000 > Backtrace: > [] (schedule+0x0/0x500) from [] > (_xfs_log_force+0x230/0x284) You need to turn off line wrapping for stuff you paste into email. The cleaned up (i.e. relevant part) trace is: [] (schedule+0x0/0x500) [] (_xfs_log_force+0x0/0x284) [] (xfs_log_force+0x0/0x38) [] (xfs_sync_data+0x0/0x58) [] (xfs_quiesce_data+0x0/0x80) [] (xfs_fs_sync_fs+0x0/0xe0) [] (__sync_filesystem+0x0/0xa0) [] (sync_filesystem+0x0/0x60) [] (fsync_bdev+0x0/0x44) [] (invalidate_partition+0x0/0x3c) [] (del_gendisk+0x0/0x140) [] (sd_remove+0x0/0x84) [] (__device_release_driver+0x0/0xac) [] (device_release_driver+0x0/0x30) [] (bus_remove_device+0x0/0x8c) [] (device_del+0x0/0x170) [] (__scsi_remove_device+0x0/0x90) [] (scsi_forget_host+0x0/0x6c) [] (scsi_remove_host+0x0/0x104) [] (quiesce_and_remove_host+0x0/0x9c) [] (usb_stor_disconnect+0x0/0x28) [] (usb_unbind_interface+0x0/0xdc) [] (__device_release_driver+0x0/0xac) [] (device_release_driver+0x0/0x30) [] (bus_remove_device+0x0/0x8c) [] (device_del+0x0/0x170) [] (usb_disable_device+0x0/0xf8) [] (usb_disconnect+0x0/0xf4) [] (hub_thread+0x0/0xd78) [] (kthread+0x0/0x8c) Well, that just looks utterly braindamaged to me. We just had the device containing the filesystem removed from the system, so the error handling routine ends up trying to sync the filesystem to the device that doesn't exist anymore. WTF? Anyway, that's not the cause of the hang, but just an example of someone not thinking through what their error handling actually does... > *INFO: task xfslogd/1*:40 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > xfslogd/1 D c06c261c 0 40 2 0x00000000 > Backtrace: [] (schedule+0x0/0x500) [] (_xfs_log_force+0x0/0x284) [] (xfs_log_force_umount+0x0/0x1dc) [] (xfs_do_force_shutdown+0x0/0x164) [] (xfs_buf_iodone_callbacks+0x0/0x184) [] (xfs_buf_iodone_work+0x0/0x7c) [] (worker_thread+0x0/0x1e4) [] (kthread+0x0/0x8c) That's where the shutdown has hung - only the xfslogd can complete the IO that will allow the log force to complete, and that is not occurring because it is waiting for the log force to complete before it can complete the IO that will complete the log force... AFAICT, this problem doesn't exist in TOT - the conversion of the xfslogd workqueue to CMWQ allows processing of other xfslogd workqueue events to continue even though this one has gone to sleep. You probably need to change the shutdown type to SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this shutdown context. > r7:00000013 r6:c040c04c r5:c041e61c r4:dbc2decc > *INFO: task sync:*164 blocked for more than 120 seconds. > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > sync D c06c261c 0 164 136 0x00000000 > Backtrace: [] (schedule+0x0/0x500) [] (schedule_timeout+0x0/0x200) [] (wait_for_common+0x0/0x164) [] (wait_for_completion+0x0/0x1c) [] (xfs_buf_iowait+0x0/0x5c) [] (xfs_flush_buftarg+0x0/0x180) [] (xfs_quiesce_data+0x0/0x80) [] (xfs_fs_sync_fs+0x0/0xe0) [] (__sync_filesystem+0x0/0xa0) [] (sync_one_sb+0x0/0x30) [] (iterate_supers+0x0/0xb8) [] (sync_filesystems+0x0/0x2c) [] (sys_sync+0x0/0x44) And that one is probably stuck waiting for the xfslogd to complete the IO. Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs