public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Richard Purdie <richard@openedhand.com>
To: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>, Hugh Dickins <hugh@veritas.com>
Subject: Re: [PATCH, RFC/T] Fix handling of write failures to swap devices
Date: Mon, 30 Oct 2006 11:55:47 +0000	[thread overview]
Message-ID: <1162209347.6962.2.camel@localhost.localdomain> (raw)
In-Reply-To: <454348B4.60007@yahoo.com.au>

On Sat, 2006-10-28 at 22:10 +1000, Nick Piggin wrote: 
> Richard Purdie wrote:
> > I'm open to suggestions as to how to do it. The main problem was that in
> > in order to mark the page bad you needed to hold an extra reference on
> > the page so that the "mark bad" code would be the last code to touch the
> > page. The swapoff code doesn't need this and I don't like the idea of
> > passing some count value to a common function as that would be
> > confusing. I guess swapoff could start taking an extra reference but I
> > can see people objecting to that too as it doesn't need it. 
> 
> If you have the page locked (which you can't from where you're trying,
> but we'll tackle that next), and the page is swapcache, then doesn't
> that pin you a reference to the swap entry as well? So I still don't
> see why you need that extra reference.... I know there is a
> try_to_unuse_page/entry hiding in try_to_unuse somewhere ;)

The final call to swap_free() releases the swap mapping and beyond that
point you can't tell which swap page you were dealing with.
try_to_unuse_page() calls swap_free() each time it removes a reference.
To make sure we control the last swap_free(), I had it take a reference
with swap_duplicate() and then we guarantee we call the last free. The
last free can then mark the page bad and take care of the accounting.
This does mean try_to_unuse_page() succeeds when the refcount is 2 and
not 1 as in the case of the usual try_to_unuse().

> That isn't your only problem though, and we simply don't want to do
> this (potentially expensive) unusing from interrupt context. Noting
> the error and dealing with it in process context I think is the best
> way to do it.

The reasoning was that this circumstance should be extremely rare. If it
happens, we have a hardware problem. Recovering from that hardware
problem gracefully is more important than a slightly longer interrupt.
But yes, process context would be nicer, *if* we can find a way to do
it. 

> BTW. what's the driver you're using? It might be useful to have an
> option to schedule a timer for the completions (at least the ones
> which generate errors).

Its a custom driver. I'm sure I can force it into using interrupt
context for the completions to try and test things although I'm also
fairly sure the existing patch will break when I do that for the reasons
you mention :-(. 

> >>You say that SetPageError makes the processes die unexpectedly? How and
> >>where? We use SetPageError for IO errors, and it doesn't mean the page
> >>has errors AFAIK. 
> > 
> > Most of the testing I did was on ARM. If you mark a page with
> > SetPageError, the next time userspace tries to access it, you get a data
> > abort and the process dies with a SIGBUS or other faults (even if the
> > page is marked as up to date). The effect was very clear and I just
> > assumed it was behaving as intended. It was this problem I started to
> > address as I didn't like processes randomly dieing...
> 
> I can't for the life of me see how or why this is happening. I don't
> doubt it is a problem, but it smells like another bug.

I can't work out the code path it happens in and until I do, I'm not
sure how I can track it down... 

> > Ignoring the filesystem calls to PageError(), the remaining two are
> > wait_on_page_writeback_range() and write_one_page() which something
> > like:
> > 
> > if (PageError(page))
> > 	return -EIO;
> > 
> > I'd guess something somewhere acts on the EIO and ignores the fact the
> > page is uptodate. I'm struggling a bit to work out the exact codepaths
> > though.
> 
> I don't think we have to worry about any of the filesystem code, even
> if we are talking about a swap file. It should all go straight through
> page_io.c. And wait_on_page_writeback doesn't have its return value
> checked in any of the swap code AFAIK...
> 
> Still, something must be triggering it somewhere.

Something must be but I wish I knew what/where...

> > 3. Any other ideas?
> > 
> > Can you give me a hint as to where in the swap out path you might want
> > to detect the error? The only way I can see is to have the end of
> > swap_writepage() wait_on_page_writeback() but that would appear to have
> > significant performance implications. I'm struggling to see a way you
> > can do this which is always outside interrupt context.
> 
> swap_writepage sounds better than . You've even got the page
> already locked for you, so the job's half done ;)
> 
> What performance implications did you imagine? The fastpath will
> just be a single PageError test, and error case slowpath doesn't
> matter too much beyond having something that actually works.

If swap_writepage() is to check for an error, it will have to wait until
the IO is complete with something like wait_on_page_writeback() before
it can check. The performance implication is the extra
wait_on_page_writeback() on every call to swap_writepage(). In the
meantime, it will have to give up the page lock and then reacquire it.
Unless you're thinking of something different?

Regards,

Richard




  reply	other threads:[~2006-10-30 11:55 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-10-27  7:59 [PATCH, RFC/T] Fix handling of write failures to swap devices Richard Purdie
2006-10-27  8:22 ` Nick Piggin
2006-10-27  8:44   ` Richard Purdie
2006-10-28  4:55     ` Nick Piggin
2006-10-28 10:43       ` Richard Purdie
2006-10-28 12:10         ` Nick Piggin
2006-10-30 11:55           ` Richard Purdie [this message]
2006-11-01  5:26             ` Nick Piggin
2006-11-01  9:24               ` Richard Purdie
2006-11-02 23:26               ` Richard Purdie
2006-12-13 11:43               ` Richard Purdie
2006-11-01  5:36             ` Nick Piggin
2006-11-01  9:32               ` Richard Purdie
2006-10-27  9:35   ` Richard Purdie
2006-10-27 21:19 ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1162209347.6962.2.camel@localhost.localdomain \
    --to=richard@openedhand.com \
    --cc=akpm@osdl.org \
    --cc=hugh@veritas.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nickpiggin@yahoo.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox