Re: [patch 0/9] writeback data integrity and other fixes (take 3)

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Nick Piggin <npiggin@suse.de>
To: akpm@linux-foundation.org, xfs@oss.sgi.com
Cc: linux-fsdevel@vger.kernel.org, Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@oracle.com>
Subject: Re: [patch 0/9] writeback data integrity and other fixes (take 3)
Date: Tue, 28 Oct 2008 16:39:53 +0100	[thread overview]
Message-ID: <20081028153953.GB3082@wotan.suse.de> (raw)
In-Reply-To: <20081028144715.683011000@suse.de>

On Wed, Oct 29, 2008 at 01:47:15AM +1100, npiggin@suse.de wrote:
> OK, I'm happier with this patchset now. Note that I've taken your patch
> and mangled it a bit at the end of the series.
> 
> This one survives and seems to run OK here, but I'm mainly doing dumb
> stress testing with a handful of filesystems, and data-io error injection
> testing. There are a lot of combinations of ways this function can operate
> and interact obviously, so it would be helpful to get more review.
> 
> Chris, would you possibly have time to run your btrfs tests that are
> sensitive to problems in this code? I could provide you a single patch
> rollup against mainline if it helps.

BTW. XFS seems to be doing something interesting with my simple sync
test case with IO error injection. I map a file MAP_SHARED into a number of
processes, which then each run a loop that dirties the memory then calls
msync(MS_SYNC) on the range.

ext2 mostly reports -EIO back to userspace when a failure is injected AFAIKS.
ext3 (ordered) doesn't until a lot of errors have been injected, but eventually
reports -EIO and shuts down the filesystem. reiserfs seems to report failure
more consistently.

I haven't seen any -EIO failures from XFS... maybe I'm just not doing the
right thing, or there is a caveat I'm not aware of.

All fault injections I noticed had a trace like this:
FAULT_INJECTION: forcing a failure
Call Trace:
9f9cd758:  [<6019f1de>] random32+0xe/0x20
9f9cd768:  [<601a31b9>] should_fail+0xd9/0x130
9f9cd798:  [<6018d0c4>] generic_make_request+0x304/0x4e0
9f9cd7a8:  [<60062301>] mempool_alloc+0x51/0x130
9f9cd858:  [<6018e6bf>] submit_bio+0x4f/0xe0
9f9cd8a8:  [<60165505>] xfs_submit_ioend_bio+0x25/0x40
9f9cd8c8:  [<6016603c>] xfs_submit_ioend+0xbc/0xf0
9f9cd908:  [<60166bf9>] xfs_page_state_convert+0x3d9/0x6a0
9f9cd928:  [<6005d515>] delayacct_end+0x95/0xb0
9f9cda08:  [<60166ffd>] xfs_vm_writepage+0x6d/0x110
9f9cda18:  [<6006618b>] set_page_dirty+0x4b/0xd0
9f9cda58:  [<60066115>] __writepage+0x15/0x40
9f9cda78:  [<60066775>] write_cache_pages+0x255/0x470
9f9cda90:  [<60066100>] __writepage+0x0/0x40
9f9cdb98:  [<600669b0>] generic_writepages+0x20/0x30
9f9cdba8:  [<60165ba3>] xfs_vm_writepages+0x53/0x70
9f9cdbd8:  [<600669eb>] do_writepages+0x2b/0x40
9f9cdbf8:  [<6006004c>] __filemap_fdatawrite_range+0x5c/0x70
9f9cdc58:  [<6006026a>] filemap_fdatawrite+0x1a/0x20
9f9cdc68:  [<600a7a05>] do_fsync+0x45/0xe0
9f9cdc98:  [<6007794b>] sys_msync+0x14b/0x1d0
9f9cdcf8:  [<60019a70>] handle_syscall+0x50/0x80
9f9cdd18:  [<6002a10f>] userspace+0x44f/0x510
9f9cdfc8:  [<60016792>] fork_handler+0x62/0x70

And the kernel would sometimes say this:
Buffer I/O error on device ram0, logical block 279
lost page write due to I/O error on ram0
Buffer I/O error on device ram0, logical block 379
lost page write due to I/O error on ram0
Buffer I/O error on device ram0, logical block 389
lost page write due to I/O error on ram0


I think I also saw a slab bug when running dbench with fault injection on.
Running latest Linus kernel.

bash-3.1# dbench -t10 -c ../client.txt 8
dbench version 3.04 - Copyright Andrew Tridgell 1999-2004

Running for 10 seconds with load '../client.txt' and minimum warmup 2 secs
8 clients started
FAULT_INJECTION: forcing a failure
Call Trace:
9e7bb548:  [<601623ae>] random32+0xe/0x20
9e7bb558:  [<60166389>] should_fail+0xd9/0x130
9e7bb588:  [<60150294>] generic_make_request+0x304/0x4e0
9e7bb598:  [<60062301>] mempool_alloc+0x51/0x130
9e7bb648:  [<6015188f>] submit_bio+0x4f/0xe0
9e7bb698:  [<6012b440>] _xfs_buf_ioapply+0x180/0x2a0
9e7bb6a0:  [<6002f600>] default_wake_function+0x0/0x10
9e7bb6f8:  [<6012bae1>] xfs_buf_iorequest+0x31/0x90
9e7bb718:  [<60112f75>] xlog_bdstrat_cb+0x45/0x50
9e7bb738:  [<60114135>] xlog_sync+0x195/0x440
9e7bb778:  [<60114491>] xlog_state_release_iclog+0xb1/0xc0
9e7bb7a8:  [<60114ca9>] xlog_write+0x539/0x550
9e7bb858:  [<60114e60>] xfs_log_write+0x40/0x60
9e7bb888:  [<6011fbaa>] _xfs_trans_commit+0x19a/0x360
9e7bb8b8:  [<600838e2>] poison_obj+0x42/0x60
9e7bb8d0:  [<60082cb3>] dbg_redzone1+0x13/0x30
9e7bb8e8:  [<60083999>] cache_alloc_debugcheck_after+0x99/0x1c0
9e7bb918:  [<6008517b>] kmem_cache_alloc+0x8b/0x100
9e7bb958:  [<60128084>] kmem_zone_alloc+0x74/0xe0
9e7bb998:  [<60082ad9>] kmem_cache_size+0x9/0x10
9e7bb9a8:  [<60128124>] kmem_zone_zalloc+0x34/0x50
9e7bb9e8:  [<60121e8b>] xfs_dir_ialloc+0x13b/0x2e0
9e7bba58:  [<601f534b>] __down_write+0xb/0x10
9e7bbaa8:  [<60125b9e>] xfs_mkdir+0x37e/0x4b0
9e7bbb38:  [<601f5589>] _spin_unlock+0x9/0x10
9e7bbb78:  [<601301a4>] xfs_vn_mknod+0xf4/0x1a0
9e7bbbd8:  [<6013025e>] xfs_vn_mkdir+0xe/0x10
9e7bbbe8:  [<60091010>] vfs_mkdir+0x90/0xc0
9e7bbc18:  [<600934d6>] sys_mkdirat+0x106/0x120
9e7bbc88:  [<6008629b>] filp_close+0x4b/0x80
9e7bbce8:  [<60093503>] sys_mkdir+0x13/0x20
9e7bbcf8:  [<60019a70>] handle_syscall+0x50/0x80
9e7bbd18:  [<6002a10f>] userspace+0x44f/0x510
9e7bbfc8:  [<60016792>] fork_handler+0x62/0x70

I/O error in filesystem ("ram0") meta-data dev ram0 block 0x8002c       ("xlog_i
odone") error 5 buf count 32768
xfs_force_shutdown(ram0,0x2) called from line 1056 of file /home/npiggin/usr/src
/linux-2.6/fs/xfs/xfs_log.c.  Return address = 0x000000006011370d
Filesystem "ram0": Log I/O Error Detected.  Shutting down filesystem: ram0
Please umount the filesystem, and rectify the problem(s)
xfs_force_shutdown(ram0,0x2) called from line 818 of file /home/npiggin/usr/src/
linux-2.6/fs/xfs/xfs_log.c.  Return address = 0x0000000060114e7d
slab error in verify_redzone_free(): cache `xfs_log_ticket': double free detecte
d
Call Trace:
9e7bb998:  [<6008372f>] __slab_error+0x1f/0x30
9e7bb9a8:  [<60083cae>] cache_free_debugcheck+0x1ee/0x240
9e7bb9b0:  [<60112ef0>] xlog_ticket_put+0x10/0x20
9e7bb9e8:  [<60083f70>] kmem_cache_free+0x50/0xc0
9e7bba18:  [<60112ef0>] xlog_ticket_put+0x10/0x20
9e7bba28:  [<60114dc9>] xfs_log_done+0x59/0xb0
9e7bba68:  [<6011f5de>] xfs_trans_cancel+0x7e/0x140
9e7bbaa8:  [<60125a1e>] xfs_mkdir+0x1fe/0x4b0
9e7bbb38:  [<601f5589>] _spin_unlock+0x9/0x10
9e7bbb78:  [<601301a4>] xfs_vn_mknod+0xf4/0x1a0
9e7bbbd8:  [<6013025e>] xfs_vn_mkdir+0xe/0x10
9e7bbbe8:  [<60091010>] vfs_mkdir+0x90/0xc0
9e7bbc18:  [<600934d6>] sys_mkdirat+0x106/0x120
9e7bbc88:  [<6008629b>] filp_close+0x4b/0x80
9e7bbce8:  [<60093503>] sys_mkdir+0x13/0x20
9e7bbcf8:  [<60019a70>] handle_syscall+0x50/0x80
9e7bbd18:  [<6002a10f>] userspace+0x44f/0x510
9e7bbfc8:  [<60016792>] fork_handler+0x62/0x70

000000009e0d4ec0: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b.
(3) open ./clients/client1 failed for handle 16385 (No such file or directory)
(4) ERROR: handle 16385 was not found
Child failed with status 1

(kernel died soon afterwards)

next prev parent reply	other threads:[~2008-10-28 15:39 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-28 14:47 [patch 0/9] writeback data integrity and other fixes (take 3) npiggin
2008-10-28 14:47 ` [patch 1/9] mm: write_cache_pages cyclic fix npiggin
2008-10-29  0:24   ` [patch 1.1/9] mm: write_cache_pages cyclic fix fix Nick Piggin
2008-10-28 14:47 ` [patch 2/9] mm: write_cache_pages early loop termination npiggin
2008-10-28 14:47 ` [patch 3/9] mm: write_cache_pages writepage error fix npiggin
2008-10-28 14:47 ` [patch 4/9] mm: write_cache_pages integrity fix npiggin
2008-10-28 14:47 ` [patch 5/9] mm: write_cache_pages cleanups npiggin
2008-10-28 14:47 ` [patch 6/9] mm: write_cache_pages optimise page cleaning npiggin
2008-10-28 14:47 ` [patch 7/9] mm: write_cache_pages terminate quickly npiggin
2008-10-30 23:07   ` Andrew Morton
2008-10-31  7:29     ` Nick Piggin
2008-10-28 14:47 ` [patch 8/9] mm: write_cache_pages more " npiggin
2008-10-28 14:47 ` [patch 9/9] mm: do_sync_mapping_range integrity fix npiggin
2008-10-30 23:13   ` Andrew Morton
2008-10-31  9:16     ` Nick Piggin
2008-10-31 10:04       ` Andrew Morton
2008-10-31 10:53         ` Nick Piggin
2008-10-31 20:03         ` Jamie Lokier
2008-10-31 14:10       ` Chris Mason
2008-10-31 14:30         ` steve
2008-10-31 15:02           ` Chris Mason
2008-11-01  8:04         ` Nick Piggin
2008-10-28 15:39 ` Nick Piggin [this message]
2008-10-28 22:27   ` [patch 0/9] writeback data integrity and other fixes (take 3) Dave Chinner
2008-10-29  0:04     ` Nick Piggin
2008-10-29  0:16     ` Nick Piggin
2008-10-29  3:16       ` Dave Chinner
2008-10-29  3:26         ` Dave Chinner
2008-10-29  4:11           ` Nick Piggin
2008-10-29  4:57             ` Dave Chinner
2008-10-29  5:06               ` Nick Piggin
2008-10-29  9:13           ` Christoph Hellwig
2008-10-29 21:42             ` Dave Chinner
2008-10-29 21:45               ` Christoph Hellwig
2008-10-29 21:53                 ` Dave Chinner
2008-10-29  4:00         ` Nick Piggin
2008-10-29  5:27           ` Dave Chinner
2008-10-29  9:12         ` Christoph Hellwig
2008-10-29  9:21           ` Nick Piggin
2008-10-29  9:44             ` Christoph Hellwig
2008-10-29 10:30               ` Nick Piggin
2008-10-29 12:22                 ` Jamie Lokier
     [not found]                   ` <20081029122234.GE846-yetKDKU6eevNLxjTenLetw@public.gmane.org>
2008-10-29 13:32                     ` Ric Wheeler
2008-10-29 14:56                       ` Chris Mason
     [not found]                         ` <1225292196.6448.263.camel-cGoWVVl3WGUrkklhUoBCrlaTQe2KTcn/@public.gmane.org>
2008-10-30  2:16                           ` Nick Piggin
     [not found]                             ` <20081030021601.GF18041-B4tOwbsTzaBolqkO4TVVkw@public.gmane.org>
2008-10-30 12:51                               ` jim owens
2008-10-30 13:41                                 ` Jim Rees
2008-10-29 21:43                   ` Dave Chinner
2008-10-29  8:51     ` Dave Chinner
2008-10-28 23:14 ` Dave Chinner
2008-10-28 23:57   ` Nick Piggin
2008-10-29  0:05     ` Andrew Morton
2008-10-29  0:10       ` Nick Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20081028153953.GB3082@wotan.suse.de \
    --to=npiggin@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=chris.mason@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).