From: Alistair John Strachan <alistair@devzero.co.uk>
To: Jens Axboe <jens.axboe@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
xfs@oss.sgi.com, Neil Brown <neilb@suse.de>,
Nick Piggin <npiggin@suse.de>,
linux-kernel@vger.kernel.org
Subject: Re: XFS/md/blkdev warning (was Re: Linux 2.6.26-rc2)
Date: Sat, 17 May 2008 19:22:56 +0100 [thread overview]
Message-ID: <200805171922.56272.alistair@devzero.co.uk> (raw)
In-Reply-To: <20080512164920.GE16217@kernel.dk>
(Added LKML CC)
On Monday 12 May 2008 17:49:20 Jens Axboe wrote:
> On Mon, May 12 2008, Linus Torvalds wrote:
> > On Mon, 12 May 2008, Alistair John Strachan wrote:
> > > I've been getting this since -rc1. It's still present in -rc2, so I
> > > thought I'd bug some people. Everything seems to be working fine.
> >
> > Hmm. The problem is that blk_remove_plug() does a non-atomic
> >
> > queue_flag_clear(QUEUE_FLAG_PLUGGED, q);
> >
> > without holding the queue lock.
> >
> > Now, sometimes that's ok, because of higher-level locking on the same
> > queue, so there is no possibility of any races.
> >
> > And yes, this comes through the raid5 layer, and yes, the raid layer
> > holds the 'device_lock' on the raid5_conf_t, so it's all safe from other
> > accesses by that raid5 configuration, but I wonder if at least in theory
> > somebody could access that same device directly.
> >
> > So I do suspect that this whole situation with md needs to be resolved
> > some way. Either the queue is already safe (because of md layer locking),
> > and in that case maybe the queue lock should be changed to point to that
> > md layer lock (or that sanity test simply needs to be removed). Or the
> > queue is unsafe (because non-md users can find it too), and we need to
> > fix the locking.
> >
> > Alternatively, we may just need to totally revert the thing that made the
> > bit operations non-atomic and depend on the locking. This was introduced
> > by Nick in commit 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e ("block: make
> > queue flags non-atomic"), and maybe it simply isn't viable.
>
> There's been a proposed patch for at least a week, so Neil just needs to
> send it in...
(I could be perverting this report a bit by reporting something possibly not
related, but I have a gut feeling about this..)
So I applied Neil's patch which is now upstream to 2.6.26-rc2 and the warning
did go away. But I later found that I have another problem: if I copy more
than my free memory's worth of data, my machine hangs mysteriously.
My guess is that when the kernel runs out of MemFree and starts reclaiming the
cache, something is deadlocking somewhere. Just doing a:
cat /dev/zero >/path/to/file
Is enough to reproduce it. Doing this on my stacked XFS+md+libata causes a
hang, but if I try to reproduce on the only other filesystem I have handy (a
FUSE/ntfs-3g mounted NTFS partition) cache reclaim seems to work fine. Maybe
this test is contrived in a million different ways, but it would seem to
indicate the bug lies either in XFS or md.
I don't have any disks handy at the moment to try another filesystem on top of
md (to eliminate md), and I've not yet tried enabling any kernel debugging
options. When the machine hangs, all disk I/O stops permanently. No logging
messages are shown.
Does anybody have any ideas about what to try or switch on to debug this
problem?
--
Cheers,
Alistair.
137/1 Warrender Park Road, Edinburgh, UK.
next prev parent reply other threads:[~2008-05-17 18:23 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-12 14:55 Linux 2.6.26-rc2 Linus Torvalds
2008-05-12 19:32 ` Bart Van Assche
2008-05-12 19:55 ` Linus Torvalds
2008-05-12 23:22 ` Kasper Sandberg
[not found] ` <200805121726.15576.alistair@devzero.co.uk>
[not found] ` <alpine.LFD.1.10.0805120933310.3019@woody.linux-foundation.org>
[not found] ` <20080512164920.GE16217@kernel.dk>
2008-05-13 1:05 ` [PATCH] Remove blkdev warning triggered by using md Neil Brown
2008-05-17 18:22 ` Alistair John Strachan [this message]
2008-05-17 18:37 ` XFS/md/blkdev warning (was Re: Linux 2.6.26-rc2) Linus Torvalds
2008-05-17 18:41 ` Linus Torvalds
2008-05-17 20:09 ` Alistair John Strachan
2008-05-17 21:17 ` Linus Torvalds
2008-05-17 23:12 ` Alistair John Strachan
2008-05-17 23:39 ` Christoph Hellwig
2008-05-18 14:12 ` Alistair John Strachan
2008-05-13 1:49 ` Oops on -rc2-git1, possibly md_raid1 or xfs related. (Was: " Kasper Sandberg
2008-05-13 1:55 ` Kasper Sandberg
2008-05-13 2:19 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200805171922.56272.alistair@devzero.co.uk \
--to=alistair@devzero.co.uk \
--cc=jens.axboe@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=npiggin@suse.de \
--cc=torvalds@linux-foundation.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox