From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id p6193exr054576 for ; Fri, 1 Jul 2011 04:03:40 -0500 Received: from ipmail04.adl6.internode.on.net (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 1459BB29A9A for ; Fri, 1 Jul 2011 02:03:38 -0700 (PDT) Received: from ipmail04.adl6.internode.on.net (ipmail04.adl6.internode.on.net [150.101.137.141]) by cuda.sgi.com with ESMTP id XN7mHnGrNNqw01kZ for ; Fri, 01 Jul 2011 02:03:38 -0700 (PDT) Date: Fri, 1 Jul 2011 19:03:32 +1000 From: Dave Chinner Subject: Re: XFS and USB Hang on 2.6.35.13 Message-ID: <20110701090332.GO561@dastard> References: <20110630121918.GK561@dastard> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: Amit Sahrawat Cc: xfs@oss.sgi.com On Fri, Jul 01, 2011 at 10:00:54AM +0530, Amit Sahrawat wrote: > On Thu, Jun 30, 2011 at 5:49 PM, Dave Chinner wrote: > > On Thu, Jun 30, 2011 at 04:57:42PM +0530, Amit Sahrawat wrote: > > > Hi All, > > > I encountered a hang on XFS during unplug. > > > *Test Case:* > > > #!/bin/sh > > > index=3D0 > > > while [ "$?" =3D=3D 0 ] > > > do > > > =A0 =A0 =A0 =A0 index=3D$(($index+1)) > > > =A0 =A0 =A0 =A0 sync > > > =A0 =A0 =A0 =A0 cp /mnt/1KB.txt /tmp/"$index".test > > > done > > > Where /mnt - mount point for vfat and /tmp mount point for XFS, both = can be > > > XFS also. > > > > > > During this operation, unplug the USB. I am getting HANG almost every= time I > > > unplug. > > > > Well, that's no surprise. The unplug appears to be losing IOs in > > progress. > > > > > *Kernel Version:* 2.6.35.13 (extremely sorry, I know next question wi= ll be > > > why am I not using TOT kernel - I tried but my PC does not boot up wi= th the > > > latest one) ..... > > > *INFO: task khubd:*33 blocked for more than 120 seconds. > > > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this mess= age. > > > khubd =A0 =A0 =A0 =A0 D c06c261c =A0 =A0 0 =A0 =A033 =A0 =A0 =A02 0x0= 0000000 > > > Backtrace: > > > [] (schedule+0x0/0x500) from [] > > > (_xfs_log_force+0x230/0x284) > > > > You need to turn off line wrapping for stuff you paste into email. > > The cleaned up (i.e. relevant part) trace is: > > > > [] (schedule+0x0/0x500) > > [] (_xfs_log_force+0x0/0x284) > > [] (xfs_log_force+0x0/0x38) > > [] (xfs_sync_data+0x0/0x58) > > [] (xfs_quiesce_data+0x0/0x80) > > [] (xfs_fs_sync_fs+0x0/0xe0) > > [] (__sync_filesystem+0x0/0xa0) > > [] (sync_filesystem+0x0/0x60) > > [] (fsync_bdev+0x0/0x44) > > [] (invalidate_partition+0x0/0x3c) > > [] (del_gendisk+0x0/0x140) > > [] (sd_remove+0x0/0x84) > > [] (__device_release_driver+0x0/0xac) > > [] (device_release_driver+0x0/0x30) > > [] (bus_remove_device+0x0/0x8c) > > [] (device_del+0x0/0x170) > > [] (__scsi_remove_device+0x0/0x90) > > [] (scsi_forget_host+0x0/0x6c) > > [] (scsi_remove_host+0x0/0x104) > > [] (quiesce_and_remove_host+0x0/0x9c) > > [] (usb_stor_disconnect+0x0/0x28) > > [] (usb_unbind_interface+0x0/0xdc) > > [] (__device_release_driver+0x0/0xac) > > [] (device_release_driver+0x0/0x30) > > [] (bus_remove_device+0x0/0x8c) > > [] (device_del+0x0/0x170) > > [] (usb_disable_device+0x0/0xf8) > > [] (usb_disconnect+0x0/0xf4) > > [] (hub_thread+0x0/0xd78) > > [] (kthread+0x0/0x8c) > > > > Well, that just looks utterly braindamaged to me. > > > > We just had the device containing the filesystem removed from the > > system, so the error handling routine ends up trying to sync the > > filesystem to the device that doesn't exist anymore. WTF? > > > = > >>> This is what I think, why is syncing taking place when the Amit, you don't need to quote your own reply. That just confuses mail readers that understand the ">" quoting convention and highlight appropriately, and made me wonder if you'd even replied.... > This is what I think, why is syncing taking place when the > device doesn't exist anymore. What is the gain in doing so? I doubt the person who wrote the error handling even realised that it ended up in such a mess. > I > will try and propose this feature. Not sure what you mean by this.... .... > > AFAICT, this problem doesn't exist in TOT - the conversion of the > > Again I have a problem which seems fixed in TOT :) > = > > xfslogd workqueue to CMWQ allows processing of other xfslogd > > workqueue events to continue even though this one has gone to sleep. > > > > You probably need to change the shutdown type to > > SHUTDOWN_LOG_IO_ERROR to prevent a log flush from occurring in this > > shutdown context. > = > This will fix the error for this kernel version, I will give this a try. > Is this the patchwork for CMWQ: > http://patchwork.xfs.org/patch/2037/ (xfs: improve sync behaviour > in face of aggressive dirtying) ? Please let me know. No. 2.6.35 doesn't have the CMWQ infrastructure, it was introduced in 2.6.38 IIRC. IOWs, there isn't a fix you can just backport - you're going to need to write and test your own fix, and my suggestion for doing that is above. Cheers, Dave. -- = Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs