Re: mount stuck, khubd blocked

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: Alan Stern <stern@rowland.harvard.edu>
Cc: Dima Tisnek <dimaqq@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Jens Axboe <axboe@kernel.dk>,
	USB list <linux-usb@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	Kernel development list <linux-kernel@vger.kernel.org>
Subject: Re: mount stuck, khubd blocked
Date: Thu, 21 Jun 2012 11:34:57 +1000	[thread overview]
Message-ID: <20120621013457.GQ30705@dastard> (raw)
In-Reply-To: <Pine.LNX.4.44L0.1206201020450.1804-100000@iolanthe.rowland.org>

On Wed, Jun 20, 2012 at 10:31:37AM -0400, Alan Stern wrote:
> On Wed, 20 Jun 2012, Dave Chinner wrote:
> 
> > On Tue, Jun 19, 2012 at 10:45:10AM -0400, Alan Stern wrote:
> > > On Tue, 19 Jun 2012, Dima Tisnek wrote:
> > > 
> > > > I made a microsd flash with 2 partitions, sdb1 is data partition, and
> > > > sdb2 is a sentinel partition, 1 block in size.
> > > > 
> > > > I attached the usb-microsd reader with that card in it and by mistake
> > > > tried to mount the sentinel partition, I ran:
> > > > mount /dev/sdb2 /mnt/flash/
> > > > 
> > > > mount got stuck, I was not able to kill or strace it, I pulled the usb
> > > > reader from the port, mount was still stuck, here's the dmesg log:
> > 
> > So where is the mount process stuck? It's holding the lock that
> > khubd is stuck on....
> 
> Yes, that's most likely the right explanation.

.....

> > > As can be seen from the stack entries above, this problem lies in the 
> > > block or filesystem layer and not in USB or SCSI.
> > 
> > Don't blame the higher layers as the cause of the problem simply
> > because they are the ones that show the visible symptoms ;)
> 
> Okay, point taken.  It's always good to have a new point of view when 
> tackling a tough problem.
> 
> > The problem lies in the fact that the error handling callback that
> > is run when the device is removed triggers IO to the block device
> > that was just removed.  If all outstanding IOs have been error'd out
> > correctly, and all new IOs return errors, then there is no reason
> > for the fsync to block here. i.e. the mount process should have
> > received an error.
> > 
> > However, the mount could have hung because underlying device has not
> > been cleaned up properly before the device disconnect has proceeded.
> > i.e. that it is possible that the cause is a SCSI or USB issue, not a
> > filesystem issue. :)
> 
> But the mount got stuck _before_ the device was unplugged.  Hence
> failure to clean up cannot be the underlying cause.

Perhaps. It might not be stuck - sometimes mount does a lot of IO
(e.g. due to journal recovery or quota checks) and it can't be
killed when this is occurring, and it's only a single system call so
strace won't return anything. Hence the filesystem -could- have been
actively issuing IO whenteh device was pulled.

Only stack traces of all the blocked tasks will tell us any
different...

> > So, what other blocked tasks are there in the system (echo w >
> > /proc/sysrq-trigger)?
> > 
> > As it is, I think that invalidate_partition() is doing something
> > somewhat insane for a block device that has been removed - you can't
> > write to it so fsync_bdev() is useless.
> 
> That depends.  If by "removed" you mean physically disconnected from
> the computer, then yes.  But if "removed" means merely unregistered
> from the device core then writes can still succeed.  
> invalidate_partition() doesn't know which has happened.

Which means the lower layers probably need to pass that distinction
up to the invalidation function.

> >  And cleaning up the dentry
> > and inode caches is something that should be done when unmounting
> > the filesystem, not when the block device goes away as they can
> > trigger more IO and potentially deadlock with other operations that
> > have not handled the IO errors properly. Yes, shut a filesystem down
> > that has had it's block device removed, but filesystem level cleanup
> > should be left to the filesystem, not this error handling path.
> > 
> > And another question - why doesn't having an active filesystem on a
> > block device (i.e. an active reference to the gendisk) prevent the
> > block device from being removed from underneath it?
> 
> References prevent data structures from being deallocated, not from 
> being unregistered (or as James Bottomley likes to call it, "removed 
> from visibility").

Except the unregister path appears to assume that a valid block
device available when it is unregistered. That seems to me like
there is a bad assumption being made in this error handling path...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2012-06-21  1:35 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAGGBzX+uGFOXb0WMD-bkDSL8nq=rD4ZFiV=xwAJyqNx=rZ20sw@mail.gmail.com>
2012-06-19 14:45 ` mount stuck, khubd blocked Alan Stern
2012-06-19 21:41   ` Dave Chinner
2012-06-20 14:31     ` Alan Stern
2012-06-21  1:34       ` Dave Chinner [this message]
2012-06-21 14:25         ` Alan Stern
2012-06-22  3:22           ` Dave Chinner
2012-06-22 14:32             ` Alan Stern
2012-06-20 18:47   ` Jeff Moyer
2012-07-23 19:07     ` Alan Stern
2012-07-23 19:22       ` Jeff Moyer
2012-07-23 19:57         ` Alan Stern
2012-07-23 20:19           ` Jeff Moyer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120621013457.GQ30705@dastard \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=dimaqq@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=stern@rowland.harvard.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox