From: Andrew Morton <akpm@linux-foundation.org>
To: Johann Lombardi <johann@clusterfs.com>
Cc: linux-kernel@vger.kernel.org
Subject: Re: Clear PG_error before reading a page
Date: Wed, 16 May 2007 09:12:17 -0700 [thread overview]
Message-ID: <20070516091217.b9bb5797.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070516153919.GC2630@chiva>
On Wed, 16 May 2007 17:39:19 +0200 Johann Lombardi <johann@clusterfs.com> wrote:
> On Tue, May 15, 2007 at 02:23:39PM -0700, Andrew Morton wrote:
> > > Yes, indeed. However, as soon as a call to get_block() fails,
> > > do_mpage_readpage() will call block_read_full_page() which will attach
> > > buffers to this page.
> > > Consequently, all subsequent reads will go through block_read_full_page().
> >
> > hm, confused. Why is get_block() failing? That has to go and read
> > metadata.
>
> In fact, I am referring to the first part of my test case (i.e. mount the ext3
> fs, enable medium errors in scsi_debug and try to read a file from the fs).
>
> So, when I try to read a file, ext3_get_block() needs to read metadata from the
> disk. However, given that the SCSI disk simulated by scsi_debug reports medium
> errors, ext3_get_block() returns EIO to the caller (i.e. do_mpage_readpage()).
> That's why get_block() is failing.
>
> Then, do_mpage_readpage() calls block_read_full_page() (via "goto confused").
> block_read_full_page() attaches buffers to this page and calls ext3_get_block()
> which fails for the same reason as before. Consequently, block_read_full_page()
> sets the PG_error flag.
> Moreover, all subsequent readpage calls will go through block_read_full_page()
> because the page has now buffers attached.
>
> Basically, my problem is that afterwards, when the device no longer returns
> any errors, the PG_error flag is never cleared and, as a result, I keep
> getting -EIO. That's the problem I'd like to address.
>
hm, OK. So, where are we up to?
I still worry about the fact that changes in this area could cause the
kernel to do a *lot* more IO attempts against failed devices, or failed
sectors. We already have a few problems in that area.
What is the actual real-world operational scenario here? Would it be a
hotplugged disk? A transient network failure in a SAN? IOW, is it
something from which the kernel should automatically recover, or it is a
situation in which manual intervention would be better?
next prev parent reply other threads:[~2007-05-16 16:13 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-15 14:37 Clear PG_error before reading a page Johann Lombardi
2007-05-15 17:11 ` Andrew Morton
2007-05-15 21:01 ` Johann Lombardi
2007-05-15 21:23 ` Andrew Morton
2007-05-16 15:39 ` Johann Lombardi
2007-05-16 15:49 ` Nick Piggin
2007-05-16 16:12 ` Andrew Morton [this message]
2007-05-17 11:42 ` Johann Lombardi
2007-05-17 16:47 ` Andrew Morton
2007-05-29 17:06 ` Johann Lombardi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070516091217.b9bb5797.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=johann@clusterfs.com \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox