From: Dave Chinner <david@fromorbit.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Dima Tisnek <dimaqq@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Jens Axboe <axboe@kernel.dk>,
USB list <linux-usb@vger.kernel.org>,
linux-fsdevel@vger.kernel.org,
Kernel development list <linux-kernel@vger.kernel.org>
Subject: Re: mount stuck, khubd blocked
Date: Thu, 21 Jun 2012 11:34:57 +1000 [thread overview]
Message-ID: <20120621013457.GQ30705@dastard> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1206201020450.1804-100000@iolanthe.rowland.org>
On Wed, Jun 20, 2012 at 10:31:37AM -0400, Alan Stern wrote:
> On Wed, 20 Jun 2012, Dave Chinner wrote:
>
> > On Tue, Jun 19, 2012 at 10:45:10AM -0400, Alan Stern wrote:
> > > On Tue, 19 Jun 2012, Dima Tisnek wrote:
> > >
> > > > I made a microsd flash with 2 partitions, sdb1 is data partition, and
> > > > sdb2 is a sentinel partition, 1 block in size.
> > > >
> > > > I attached the usb-microsd reader with that card in it and by mistake
> > > > tried to mount the sentinel partition, I ran:
> > > > mount /dev/sdb2 /mnt/flash/
> > > >
> > > > mount got stuck, I was not able to kill or strace it, I pulled the usb
> > > > reader from the port, mount was still stuck, here's the dmesg log:
> >
> > So where is the mount process stuck? It's holding the lock that
> > khubd is stuck on....
>
> Yes, that's most likely the right explanation.
.....
> > > As can be seen from the stack entries above, this problem lies in the
> > > block or filesystem layer and not in USB or SCSI.
> >
> > Don't blame the higher layers as the cause of the problem simply
> > because they are the ones that show the visible symptoms ;)
>
> Okay, point taken. It's always good to have a new point of view when
> tackling a tough problem.
>
> > The problem lies in the fact that the error handling callback that
> > is run when the device is removed triggers IO to the block device
> > that was just removed. If all outstanding IOs have been error'd out
> > correctly, and all new IOs return errors, then there is no reason
> > for the fsync to block here. i.e. the mount process should have
> > received an error.
> >
> > However, the mount could have hung because underlying device has not
> > been cleaned up properly before the device disconnect has proceeded.
> > i.e. that it is possible that the cause is a SCSI or USB issue, not a
> > filesystem issue. :)
>
> But the mount got stuck _before_ the device was unplugged. Hence
> failure to clean up cannot be the underlying cause.
Perhaps. It might not be stuck - sometimes mount does a lot of IO
(e.g. due to journal recovery or quota checks) and it can't be
killed when this is occurring, and it's only a single system call so
strace won't return anything. Hence the filesystem -could- have been
actively issuing IO whenteh device was pulled.
Only stack traces of all the blocked tasks will tell us any
different...
> > So, what other blocked tasks are there in the system (echo w >
> > /proc/sysrq-trigger)?
> >
> > As it is, I think that invalidate_partition() is doing something
> > somewhat insane for a block device that has been removed - you can't
> > write to it so fsync_bdev() is useless.
>
> That depends. If by "removed" you mean physically disconnected from
> the computer, then yes. But if "removed" means merely unregistered
> from the device core then writes can still succeed.
> invalidate_partition() doesn't know which has happened.
Which means the lower layers probably need to pass that distinction
up to the invalidation function.
> > And cleaning up the dentry
> > and inode caches is something that should be done when unmounting
> > the filesystem, not when the block device goes away as they can
> > trigger more IO and potentially deadlock with other operations that
> > have not handled the IO errors properly. Yes, shut a filesystem down
> > that has had it's block device removed, but filesystem level cleanup
> > should be left to the filesystem, not this error handling path.
> >
> > And another question - why doesn't having an active filesystem on a
> > block device (i.e. an active reference to the gendisk) prevent the
> > block device from being removed from underneath it?
>
> References prevent data structures from being deallocated, not from
> being unregistered (or as James Bottomley likes to call it, "removed
> from visibility").
Except the unregister path appears to assume that a valid block
device available when it is unregistered. That seems to me like
there is a bad assumption being made in this error handling path...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2012-06-21 1:35 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAGGBzX+uGFOXb0WMD-bkDSL8nq=rD4ZFiV=xwAJyqNx=rZ20sw@mail.gmail.com>
2012-06-19 14:45 ` mount stuck, khubd blocked Alan Stern
2012-06-19 21:41 ` Dave Chinner
2012-06-20 14:31 ` Alan Stern
2012-06-21 1:34 ` Dave Chinner [this message]
2012-06-21 14:25 ` Alan Stern
2012-06-22 3:22 ` Dave Chinner
2012-06-22 14:32 ` Alan Stern
2012-06-20 18:47 ` Jeff Moyer
2012-07-23 19:07 ` Alan Stern
2012-07-23 19:22 ` Jeff Moyer
2012-07-23 19:57 ` Alan Stern
2012-07-23 20:19 ` Jeff Moyer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120621013457.GQ30705@dastard \
--to=david@fromorbit.com \
--cc=axboe@kernel.dk \
--cc=dimaqq@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-usb@vger.kernel.org \
--cc=stern@rowland.harvard.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox