linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Piggin <npiggin@suse.de>
To: akpm@linux-foundation.org, xfs@oss.sgi.com,
	linux-fsdevel@vger.kernel.org,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [patch 0/9] writeback data integrity and other fixes (take 3)
Date: Wed, 29 Oct 2008 01:04:24 +0100	[thread overview]
Message-ID: <20081029000424.GD15599@wotan.suse.de> (raw)
In-Reply-To: <20081028222746.GB4985@disturbed>

On Wed, Oct 29, 2008 at 09:27:46AM +1100, Dave Chinner wrote:
> On Tue, Oct 28, 2008 at 04:39:53PM +0100, Nick Piggin wrote:
> > On Wed, Oct 29, 2008 at 01:47:15AM +1100, npiggin@suse.de wrote:
> > I haven't seen any -EIO failures from XFS... maybe I'm just not doing the
> > right thing, or there is a caveat I'm not aware of.
> > 
> > All fault injections I noticed had a trace like this:
[...]
> 
> XFS reports bio errors through the I/O completion path, not the
> submission path.

Right, that's just to give you an indication of where it's failing...

 
> > And the kernel would sometimes say this:
> > Buffer I/O error on device ram0, logical block 279
> > lost page write due to I/O error on ram0
> > Buffer I/O error on device ram0, logical block 379
> > lost page write due to I/O error on ram0
> > Buffer I/O error on device ram0, logical block 389
> > lost page write due to I/O error on ram0
> 
> Yes - that's coming from end_buffer_async_write() when an error is
> reported in bio completion. This does:
> 
>  465                 set_bit(AS_EIO, &page->mapping->flags);
>  466                 set_buffer_write_io_error(bh);
>  467                 clear_buffer_uptodate(bh);
>  468                 SetPageError(page);
> 
> Hmmmm - do_fsync() calls filemap_fdatawait() which ends up in
> wait_on_page_writeback_range() which is appears to be checking the
> mapping flags for errors. I wonder why that error is not being
> propagated then? AFAICT both XFS and the fsync code are doing the
> right thing but somewhere the error has gone missing...

Yeah, I couldn't immediately see why nothing comes out. I'll do a bit
more digging.

 
> > I think I also saw a slab bug when running dbench with fault injection on.
> > Running latest Linus kernel.
[...]
>
> Now that is interesting.
> 
> We've got a rolling transaction in progress, and the commit of the
> first part of the transaction has got the I/O error.  That frees the
> transaction structure used during that commit, as well as the
> ticket.
> 
> However, before we committed the initial transaction, we duplicated
> the transaction structure to allow the transaction to continue to
> track all the dirty objects in the first commit. That included
> duplicating the pointer to the ticket.
> 
> Then the EIO is returned to mkdir code with the duplicated
> transaction, which is then cancelled, and that frees the transaction
> and the ticket it holds. However, we'd already freed the ticket.
> 
> Ok, we're only seeing this problem now because I recently modified
> the ticket allocation to use a slab instead of a roll-your-own free
> list structure that wouldn't have been poisoned. Nice to know that
> this change did more than just remove code. ;)
> 
> This might take a little while to fix - a lot of code needs
> auditing - but thanks for reporting the problem.

No problem, hope it helps.

Thanks,
Nick

  reply	other threads:[~2008-10-29  0:04 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-28 14:47 [patch 0/9] writeback data integrity and other fixes (take 3) npiggin
2008-10-28 14:47 ` [patch 1/9] mm: write_cache_pages cyclic fix npiggin
2008-10-29  0:24   ` [patch 1.1/9] mm: write_cache_pages cyclic fix fix Nick Piggin
2008-10-28 14:47 ` [patch 2/9] mm: write_cache_pages early loop termination npiggin
2008-10-28 14:47 ` [patch 3/9] mm: write_cache_pages writepage error fix npiggin
2008-10-28 14:47 ` [patch 4/9] mm: write_cache_pages integrity fix npiggin
2008-10-28 14:47 ` [patch 5/9] mm: write_cache_pages cleanups npiggin
2008-10-28 14:47 ` [patch 6/9] mm: write_cache_pages optimise page cleaning npiggin
2008-10-28 14:47 ` [patch 7/9] mm: write_cache_pages terminate quickly npiggin
2008-10-30 23:07   ` Andrew Morton
2008-10-31  7:29     ` Nick Piggin
2008-10-28 14:47 ` [patch 8/9] mm: write_cache_pages more " npiggin
2008-10-28 14:47 ` [patch 9/9] mm: do_sync_mapping_range integrity fix npiggin
2008-10-30 23:13   ` Andrew Morton
2008-10-31  9:16     ` Nick Piggin
2008-10-31 10:04       ` Andrew Morton
2008-10-31 10:53         ` Nick Piggin
2008-10-31 20:03         ` Jamie Lokier
2008-10-31 14:10       ` Chris Mason
2008-10-31 14:30         ` steve
2008-10-31 15:02           ` Chris Mason
2008-11-01  8:04         ` Nick Piggin
2008-10-28 15:39 ` [patch 0/9] writeback data integrity and other fixes (take 3) Nick Piggin
2008-10-28 22:27   ` Dave Chinner
2008-10-29  0:04     ` Nick Piggin [this message]
2008-10-29  0:16     ` Nick Piggin
2008-10-29  3:16       ` Dave Chinner
2008-10-29  3:26         ` Dave Chinner
2008-10-29  4:11           ` Nick Piggin
2008-10-29  4:57             ` Dave Chinner
2008-10-29  5:06               ` Nick Piggin
2008-10-29  9:13           ` Christoph Hellwig
2008-10-29 21:42             ` Dave Chinner
2008-10-29 21:45               ` Christoph Hellwig
2008-10-29 21:53                 ` Dave Chinner
2008-10-29  4:00         ` Nick Piggin
2008-10-29  5:27           ` Dave Chinner
2008-10-29  9:12         ` Christoph Hellwig
2008-10-29  9:21           ` Nick Piggin
2008-10-29  9:44             ` Christoph Hellwig
2008-10-29 10:30               ` Nick Piggin
2008-10-29 12:22                 ` Jamie Lokier
     [not found]                   ` <20081029122234.GE846-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-10-29 13:32                     ` Ric Wheeler
2008-10-29 14:56                       ` Chris Mason
     [not found]                         ` <1225292196.6448.263.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-10-30  2:16                           ` Nick Piggin
     [not found]                             ` <20081030021601.GF18041-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-10-30 12:51                               ` jim owens
2008-10-30 13:41                                 ` Jim Rees
2008-10-29 21:43                   ` Dave Chinner
2008-10-29  8:51     ` Dave Chinner
2008-10-28 23:14 ` Dave Chinner
2008-10-28 23:57   ` Nick Piggin
2008-10-29  0:05     ` Andrew Morton
2008-10-29  0:10       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081029000424.GD15599@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).