public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jeff Mahoney <jeffm@suse.com>
To: Andrew Morton <akpm@osdl.org>
Cc: zensonic@zensonic.dk, dm-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: Help tracking down problem --- endless loop in __find_get_block_slow
Date: Fri, 25 Feb 2005 19:03:36 -0500	[thread overview]
Message-ID: <421FBCD8.7090804@suse.com> (raw)
In-Reply-To: <20050222152833.75fb79a2.akpm@osdl.org>

[-- Attachment #1: Type: text/plain, Size: 2844 bytes --]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Morton wrote:
> Jeff Mahoney <jeffm@suse.com> wrote:
> 
>>In my experience, the loop is actually outside of
>>__find_get_block_slow(), in __getblk_slow(). I've been using xmon to
>>interrupt the kernel, and the results vary but are all rooted in the
>>for(;;) loop in __getblk_slow. It appears as though grow_buffers is
>>finding/creating the page, but then __find_get_block can't locate the
>>buffer it needs.
> 
> 
> Yes, that'll happen.  Because there are still buffers attached to the page
> which have the wrong blocksize.  Say, if someone is trying to read a 2k
> buffer_head which is backed by a page which already has 1k buffer_heads
> attached to it.
> 
> Does your kernel not have that big printk in __find_get_block_slow()?  If
> it does, maybe some of the buffers are unmapped.  Try:

I think it's likely I'm experiencing a different bug than the original
poster. I've tried making the printk unconditional, and I get no output.
However, I've continued to track it down, and I believe I've found a
umount race. I can also reproduce it without subfs, with the attached
script.

I added some debug output to aid in my search:
__find_get_block_slow: find_get_page
[block=17508,blksize=2048,index=8754,sizebits=1,size=512] returned null
returning page [index=2188,block=17504,size=512,sizebits=3]
Couldn't find buffer @ block 17508

What I'm observing is that __find_get_block_slow is calculating the
index using the blocksize for the device, and the grow_buffers call is
using the blocksize handed down from the filesystem via sb_bread(). They
*should* be the same, but here's where my suspected race comes in. Since
the buffers are being searched for in the wrong place, they're never
found, causing the infinite loop.

The open_bdev_excl() call in get_sb_bdev() should be keeping callers out
until the block device is actually closed, but it uses the fs_type
struct as the holder which, given that the filesystem to be mounted is
the same one as the one being umounted, will be the same. This allows
the mount attempt to continue. If the superblock for the umounting
filesystem is already in the process of getting shut down, sget() will
create a new superblock and the mount attempt will use that one. The
umount will continue, destroying the old superblock and setting the
blocksize back to its original value, dropping all buffers in the process.

If kill_block_super resets the blocksize while an sb_bread is in
progress, the sizes won't match up and we'll get stuck in the loop.

I'll be working on a fix, but figured I'd send out a quick update.

- -Jeff

- --
Jeff Mahoney
SuSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)

iD8DBQFCH7zXLPWxlyuTD7IRAr/WAJ9B6MLsKl6cv48Qlcklx1saYERv7ACdHWGW
UBXAsQBiEAge3T1R4akLKd0=
=w1zP
-----END PGP SIGNATURE-----

[-- Attachment #2: test.sh --]
[-- Type: application/x-sh, Size: 290 bytes --]

  reply	other threads:[~2005-02-26  0:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-21 10:46 Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen
2005-02-22  9:18 ` Andrew Morton
2005-02-22 22:46   ` Jeff Mahoney
2005-02-22 23:28     ` Andrew Morton
2005-02-26  0:03       ` Jeff Mahoney [this message]
2005-02-28 21:06         ` Jeff Mahoney
2005-02-28 21:09         ` Help tracking down problem --- endless loop in __find_get_block_slow (now with the patch) Jeff Mahoney
2005-02-23 12:00   ` Help tracking down problem --- endless loop in __find_get_block_slow Thomas S. Iversen
2005-02-23 12:10     ` Andrew Morton
2005-02-23 13:02       ` Thomas S. Iversen
2005-02-23 20:09         ` Andrew Morton
2005-02-23 22:24           ` Thomas S. Iversen
2005-02-23 23:17             ` Andrew Morton
2005-02-24  8:54               ` Thomas S. Iversen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=421FBCD8.7090804@suse.com \
    --to=jeffm@suse.com \
    --cc=akpm@osdl.org \
    --cc=dm-devel@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=zensonic@zensonic.dk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox