[patch] fs: improved handling of page and buffer IO errors

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	linux-fsdevel@vger.kernel.org
Subject: [patch] fs: improved handling of page and buffer IO errors
Date: Tue, 21 Oct 2008 13:21:37 +0200	[thread overview]
Message-ID: <20081021112137.GB12329@wotan.suse.de> (raw)

IO error handling in the core mm/fs still doesn't seem perfect, but with
the recent round of patches and this one, it should be getting on the
right track.

I kind of get the feeling some people would rather forget about all this
and brush it under the carpet. Hopefully I'm mistaken, but if anybody
disagrees with my assertion that error handling, and data integrity
semantics are first-class correctness issues, and therefore are more
important than all other non-correctness problems... speak now and let's
discuss that, please.

Otherwise, unless anybody sees obvious problems with this, hopefully it
can go into -mm for some wider testing (I've tested it with a few filesystems
so far and no immediate problems)

Thanks,
Nick
--

- Don't clear PageUptodate or buffer_uptodate on write failure (this could go
  BUG in the VM, and really, the data in the page is still the most uptodate
  even if we can't write it back to disk). If we decide to invalidate a page on
  write failure (silly, because then the app can't retry the write), the
  correct way to invalidate a page is via the pagecache invalidation calls,
  which will do the right thing with mapped pages for example.
- Don't assume !PageUptodate == EIO. Pages can be invalidated or reclaimed,
  in which case the read should be retried.
- Haven't gone through filesystems yet, but this gets core code into better 
  shape.
- Warnings or bugs can come about because we have !uptodate pages mapped into
  page tables, !uptodate && dirty pages or buffers, etc.

  example:
  WARNING: at /home/npiggin/usr/src/linux-2.6/fs/buffer.c:1185 mark_buffer_dirty+0x7a/0xc0()
Call Trace:
  bd541ca8:  [<60033b76>] warn_on_slowpath+0x56/0x80
  bd541cd8:  [<6004aa37>] wake_up_bit+0x27/0x40
  bd541cf8:  [<600a6fd2>] __writeback_single_inode+0xf2/0x360
  bd541d28:  [<60053f13>] debug_mutex_free_waiter+0x23/0x50
  bd541d48:  [<601f6ea9>] __mutex_lock_slowpath+0x149/0x220
  bd541db8:  [<600688a0>] pdflush+0x0/0x1c0
  bd541dc8:  [<600ac51a>] mark_buffer_dirty+0x7a/0xc0
  bd541de8:  [<600d8125>] ext2_write_super+0x45/0x90
  bd541e08:  [<6008c846>] sync_supers+0x76/0xc0
  bd541e28:  [<600682b0>] wb_kupdate+0x30/0x110
  bd541e98:  [<600688a0>] pdflush+0x0/0x1c0
  bd541ea8:  [<6006898c>] pdflush+0xec/0x1c0
  bd541eb8:  [<60068280>] wb_kupdate+0x0/0x110
  bd541f08:  [<6004a628>] kthread+0x58/0x90
  bd541f48:  [<60025b47>] run_kernel_thread+0x47/0x50
  bd541f58:  [<6004a5d0>] kthread+0x0/0x90
  bd541f98:  [<60025b28>] run_kernel_thread+0x28/0x50
  bd541fc8:  [<600166f7>] new_thread_handler+0x67/0xa0

Signed-off-by: Nick Piggin <npiggin@suse.de>
---
Index: linux-2.6/mm/filemap.c
===================================================================
--- linux-2.6.orig/mm/filemap.c
+++ linux-2.6/mm/filemap.c
@@ -1147,20 +1147,17 @@ readpage:
 			error = lock_page_killable(page);
 			if (unlikely(error))
 				goto readpage_error;
-			if (!PageUptodate(page)) {
-				if (page->mapping == NULL) {
-					/*
-					 * invalidate_inode_pages got it
-					 */
-					unlock_page(page);
-					page_cache_release(page);
-					goto find_page;
-				}
+			if (PageError(page)) {
 				unlock_page(page);
 				shrink_readahead_size_eio(filp, ra);
 				error = -EIO;
 				goto readpage_error;
 			}
+			if (!PageUptodate(page) || !page->mapping) {
+				unlock_page(page);
+				page_cache_release(page);
+				goto find_page;
+			}
 			unlock_page(page);
 		}
 
@@ -1576,7 +1573,7 @@ page_not_uptodate:
 	error = mapping->a_ops->readpage(file, page);
 	if (!error) {
 		wait_on_page_locked(page);
-		if (!PageUptodate(page))
+		if (unlikely(PageError(page)))
 			error = -EIO;
 	}
 	page_cache_release(page);
@@ -1701,6 +1698,7 @@ retry:
 		unlock_page(page);
 		goto out;
 	}
+	ClearPageError(page);
 	err = filler(data, page);
 	if (err < 0) {
 		page_cache_release(page);
@@ -1722,7 +1720,7 @@ EXPORT_SYMBOL(read_cache_page_async);
  * Read into the page cache. If a page already exists, and PageUptodate() is
  * not set, try to fill the page then wait for it to become unlocked.
  *
- * If the page does not get brought uptodate, return -EIO.
+ * If the page IO fails, return -EIO.
  */
 struct page *read_cache_page(struct address_space *mapping,
 				pgoff_t index,
@@ -1735,7 +1733,7 @@ struct page *read_cache_page(struct addr
 	if (IS_ERR(page))
 		goto out;
 	wait_on_page_locked(page);
-	if (!PageUptodate(page)) {
+	if (PageError(page)) {
 		page_cache_release(page);
 		page = ERR_PTR(-EIO);
 	}
@@ -2052,6 +2050,9 @@ again:
 			 *
 			 * Instead, we have to bring it uptodate here.
 			 */
+			if (PageError(page))
+				return -EIO;
+
 			ret = aops->readpage(file, page);
 			page_cache_release(page);
 			if (ret) {
Index: linux-2.6/mm/memory.c
===================================================================
--- linux-2.6.orig/mm/memory.c
+++ linux-2.6/mm/memory.c
@@ -2336,10 +2336,11 @@ static int do_swap_page(struct mm_struct
 	if (unlikely(!pte_same(*page_table, orig_pte)))
 		goto out_nomap;
 
-	if (unlikely(!PageUptodate(page))) {
+	if (unlikely(!PageError(page))) {
 		ret = VM_FAULT_SIGBUS;
 		goto out_nomap;
 	}
+	VM_BUG_ON(!PageUptodate(page)); /* page_io.c should guarantee this */
 
 	/* The page isn't present yet, go ahead with the fault. */
 
Index: linux-2.6/mm/shmem.c
===================================================================
--- linux-2.6.orig/mm/shmem.c
+++ linux-2.6/mm/shmem.c
@@ -1278,7 +1278,7 @@ repeat:
 			page_cache_release(swappage);
 			goto repeat;
 		}
-		if (!PageUptodate(swappage)) {
+		if (PageError(swappage)) {
 			shmem_swp_unmap(entry);
 			spin_unlock(&info->lock);
 			unlock_page(swappage);
@@ -1286,6 +1286,11 @@ repeat:
 			error = -EIO;
 			goto failed;
 		}
+		/*
+		 * swap cache doesn't get invalidated, so if not error it
+		 * should be uptodate
+		 */
+		VM_BUG_ON(!PageUptodate(swappage));
 
 		if (filepage) {
 			shmem_swp_set(info, entry, 0);
Index: linux-2.6/fs/buffer.c
===================================================================
--- linux-2.6.orig/fs/buffer.c
+++ linux-2.6/fs/buffer.c
@@ -121,8 +121,11 @@ static void __end_buffer_read_notouch(st
 	if (uptodate) {
 		set_buffer_uptodate(bh);
 	} else {
-		/* This happens, due to failed READA attempts. */
-		clear_buffer_uptodate(bh);
+		if (buffer_uptodate(bh)) {
+			WARN_ON_ONCE(1);
+			/* This happens, due to failed READA attempts. */
+			clear_buffer_uptodate(bh);
+		}
 	}
 	unlock_buffer(bh);
 }
@@ -142,7 +145,10 @@ void end_buffer_write_sync(struct buffer
 	char b[BDEVNAME_SIZE];
 
 	if (uptodate) {
-		set_buffer_uptodate(bh);
+		if (!buffer_uptodate(bh)) {
+			WARN_ON_ONCE(1);
+			set_buffer_uptodate(bh);
+		}
 	} else {
 		if (!buffer_eopnotsupp(bh) && printk_ratelimit()) {
 			buffer_io_error(bh);
@@ -151,7 +157,6 @@ void end_buffer_write_sync(struct buffer
 				       bdevname(bh->b_bdev, b));
 		}
 		set_buffer_write_io_error(bh);
-		clear_buffer_uptodate(bh);
 	}
 	unlock_buffer(bh);
 	put_bh(bh);
@@ -393,7 +398,10 @@ static void end_buffer_async_read(struct
 	if (uptodate) {
 		set_buffer_uptodate(bh);
 	} else {
-		clear_buffer_uptodate(bh);
+		if (buffer_uptodate(bh)) {
+			WARN_ON_ONCE(1);
+			clear_buffer_uptodate(bh);
+		}
 		if (printk_ratelimit())
 			buffer_io_error(bh);
 		SetPageError(page);
@@ -453,7 +461,10 @@ static void end_buffer_async_write(struc
 
 	page = bh->b_page;
 	if (uptodate) {
-		set_buffer_uptodate(bh);
+		if (!buffer_uptodate(bh)) {
+			WARN_ON_ONCE(1);
+			set_buffer_uptodate(bh);
+		}
 	} else {
 		if (printk_ratelimit()) {
 			buffer_io_error(bh);
@@ -463,7 +474,6 @@ static void end_buffer_async_write(struc
 		}
 		set_bit(AS_EIO, &page->mapping->flags);
 		set_buffer_write_io_error(bh);
-		clear_buffer_uptodate(bh);
 		SetPageError(page);
 	}
 
@@ -1998,8 +2008,6 @@ int block_write_begin(struct file *file,
 
 	status = __block_prepare_write(inode, page, start, end, get_block);
 	if (unlikely(status)) {
-		ClearPageUptodate(page);
-
 		if (ownpage) {
 			unlock_page(page);
 			page_cache_release(page);
@@ -2372,10 +2380,7 @@ int block_prepare_write(struct page *pag
 			get_block_t *get_block)
 {
 	struct inode *inode = page->mapping->host;
-	int err = __block_prepare_write(inode, page, from, to, get_block);
-	if (err)
-		ClearPageUptodate(page);
-	return err;
+	return __block_prepare_write(inode, page, from, to, get_block);
 }
 
 int block_commit_write(struct page *page, unsigned from, unsigned to)
@@ -2752,16 +2757,19 @@ has_buffers:
 
 	/* Ok, it's mapped. Make sure it's up-to-date */
 	if (!PageUptodate(page)) {
+again:
 		err = mapping->a_ops->readpage(NULL, page);
 		if (err) {
 			page_cache_release(page);
 			goto out;
 		}
 		lock_page(page);
-		if (!PageUptodate(page)) {
+		if (!PageError(page)) {
 			err = -EIO;
 			goto unlock;
 		}
+		if (!PageUptodate(page))
+			goto again;
 		if (page_has_buffers(page))
 			goto has_buffers;
 	}
Index: linux-2.6/fs/splice.c
===================================================================
--- linux-2.6.orig/fs/splice.c
+++ linux-2.6/fs/splice.c
@@ -112,11 +112,16 @@ static int page_cache_pipe_buf_confirm(s
 		/*
 		 * Uh oh, read-error from disk.
 		 */
-		if (!PageUptodate(page)) {
+		if (PageError(page)) {
 			err = -EIO;
 			goto error;
 		}
 
+		if (!PageUptodate(page)) {
+			err = -ENODATA;
+			goto error;
+		}
+
 		/*
 		 * Page is ok afterall, we are done.
 		 */
Index: linux-2.6/fs/mpage.c
===================================================================
--- linux-2.6.orig/fs/mpage.c
+++ linux-2.6/fs/mpage.c
@@ -53,7 +53,11 @@ static void mpage_end_io_read(struct bio
 		if (uptodate) {
 			SetPageUptodate(page);
 		} else {
-			ClearPageUptodate(page);
+			if (PageUptodate(page)) {
+				/* let's get rid of this case ASAP */
+				WARN_ON_ONCE(1);
+				ClearPageUptodate(page);
+			}
 			SetPageError(page);
 		}
 		unlock_page(page);
Index: linux-2.6/mm/page_io.c
===================================================================
--- linux-2.6.orig/mm/page_io.c
+++ linux-2.6/mm/page_io.c
@@ -77,7 +77,10 @@ void end_swap_bio_read(struct bio *bio, 
 
 	if (!uptodate) {
 		SetPageError(page);
-		ClearPageUptodate(page);
+		if (PageUptodate(page)) {
+			WARN_ON_ONCE(1);
+			ClearPageUptodate(page);
+		}
 		printk(KERN_ALERT "Read-error on swap-device (%u:%u:%Lu)\n",
 				imajor(bio->bi_bdev->bd_inode),
 				iminor(bio->bi_bdev->bd_inode),
Index: linux-2.6/mm/page-writeback.c
===================================================================
--- linux-2.6.orig/mm/page-writeback.c
+++ linux-2.6/mm/page-writeback.c
@@ -936,8 +936,10 @@ retry:
 				unlock_page(page);
 				ret = 0;
 			}
-			if (ret || (--nr_to_write <= 0))
+			if (ret || (--nr_to_write <= 0)) {
 				done = 1;
+				break;
+			}
 			if (wbc->nonblocking && bdi_write_congested(bdi)) {
 				wbc->encountered_congestion = 1;
 				done = 1;

next             reply	other threads:[~2008-10-21 11:21 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-21 11:21 Nick Piggin [this message]
2008-10-21 12:52 ` [patch] fs: improved handling of page and buffer IO errors Miklos Szeredi
2008-10-21 12:59   ` steve
2008-10-21 13:14     ` Miklos Szeredi
2008-10-21 13:38       ` steve
2008-10-21 14:32         ` Miklos Szeredi
2008-10-21 15:09           ` steve
2008-10-21 16:13             ` Miklos Szeredi
2008-10-22 12:51               ` Jamie Lokier
2008-10-22 14:08                 ` Miklos Szeredi
2008-10-21 14:35         ` Evgeniy Polyakov
2008-10-21 14:59           ` steve
2008-10-21 16:20             ` Miklos Szeredi
2008-10-21 16:25               ` steve
2008-10-21 16:28               ` Miklos Szeredi
2008-10-21 16:29                 ` Matthew Wilcox
2008-10-22 12:48                   ` Jamie Lokier
2008-10-22 13:45                     ` Matthew Wilcox
2008-10-22 14:02                       ` Miklos Szeredi
2008-10-22 14:35                         ` Matthew Wilcox
2008-10-22 14:45                           ` Miklos Szeredi
2008-10-23 13:48                             ` Matthew Wilcox
2008-10-22 22:23     ` Mark Fasheh
2008-10-23  9:59       ` steve
2008-10-23 10:21         ` Nick Piggin
2008-10-23 10:52           ` steve
2008-10-23 11:07             ` Nick Piggin
2008-10-22 13:16   ` Nick Piggin
2008-10-22 20:09     ` Miklos Szeredi
2008-10-21 16:16 ` Andi Kleen
2008-10-21 16:30   ` steve
2008-10-22 10:31   ` Nick Piggin
2008-10-22 18:46     ` Brad Boyer
2008-10-22 20:19       ` Andi Kleen
2008-10-23  7:08       ` Nick Piggin
2008-10-22 23:07     ` Dave Chinner
2008-10-23  7:07       ` Nick Piggin
2008-10-23  9:44         ` steve
2008-10-23 11:15           ` Nick Piggin
2008-10-23 22:48             ` Dave Chinner
2008-10-24  1:05               ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081021112137.GB12329@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).