From: Dave Chinner <david@fromorbit.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel@vger.kernel.org,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Wu Fengguang <fengguang.wu@intel.com>
Subject: Re: [RFC PATCH 00/14] Per-sb tracking of dirty inodes
Date: Tue, 5 Aug 2014 18:20:14 +1000 [thread overview]
Message-ID: <20140805082014.GE20518@dastard> (raw)
In-Reply-To: <1406844053-25982-1-git-send-email-jack@suse.cz>
On Fri, Aug 01, 2014 at 12:00:39AM +0200, Jan Kara wrote:
> Hello,
>
> here is my attempt to implement per superblock tracking of dirty inodes.
> I have two motivations for this:
> 1) I've tried to get rid of overwriting of inode's dirty time stamp during
> writeback and filtering of dirty inodes by superblock makes this
> significantly harder. For similar reasons also improving scalability
> of inode dirty tracking is more complicated than it has to be.
> 2) Filesystems like Tux3 (but to some extent also XFS) would like to
> influence order in which inodes are written back. Currently this isn't
> possible. Tracking dirty inodes per superblock makes it easy to later
> implement filesystem callback for writing back inodes and also possibly
> allow filesystems to implement their own dirty tracking if they desire.
>
> The patches pass xfstests run and also some sync livelock avoidance tests
> I have with 4 filesystems on 2 disks so they should be reasonably sound.
> Before I go and base more work on this I'd like to hear some feedback about
> whether people find this sane and workable.
>
> After this patch set it is trivial to provide a per-sb callback for writeback
> (at level of writeback_inodes()). It is also fairly easy to allow filesystem to
> completely override dirty tracking (only needs some restructuring of
> mark_inode_dirty()). I can write these as a proof-of-concept patches for Tux3
> guys once the general approach in this patch set is acked. Or if there are
> some in-tree users (XFS?, btrfs?) I can include them in the patch set.
>
> Any comments welcome!
Hi Jan,
This fails within seconds via generic/013 on a debug XFS. There is an
inode being dirtied, and it it not getting written back before
unmount evicts the inode from the cache. Hence a CONFIG_XFS_DEBUG=y
kernel assert fails like so:
[ 227.620732] XFS: Assertion failed: XFS_FORCED_SHUTDOWN(ip->i_mount) || ip->i_delayed_blks == 0, file: fs/xfs/xfs_super.c, line: 963
[ 227.622506] ------------[ cut here ]------------
[ 227.623212] kernel BUG at fs/xfs/xfs_message.c:107!
[ 227.623947] invalid opcode: 0000 [#1] SMP
[ 227.624606] Modules linked in:
[ 227.624724] CPU: 0 PID: 4878 Comm: umount Not tainted 3.16.0-dgc+ #371
[ 227.624724] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 227.624724] task: ffff880035973160 ti: ffff880031c14000 task.ti: ffff880031c14000
[ 227.624724] RIP: 0010:[<ffffffff814d1d32>] [<ffffffff814d1d32>] assfail+0x22/0x30
[ 227.624724] RSP: 0018:ffff880031c17d88 EFLAGS: 00010282
[ 227.624724] RAX: 0000000000000077 RBX: 0000000000000005 RCX: 000000000000e6e4
[ 227.624724] RDX: 000000000000e4e4 RSI: 0000000000000046 RDI: 0000000000000246
[ 227.624724] RBP: ffff880031c17d88 R08: 000000000000000a R09: 00000000000001e2
[ 227.624724] R10: 0000000000000000 R11: ffff880031c17a3e R12: ffff880032774000
[ 227.624724] R13: ffff880032774200 R14: 0000000000000005 R15: ffff880032774040
[ 227.624724] FS: 00007f2b0424c840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
[ 227.624724] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 227.624724] CR2: 0000000000415048 CR3: 000000003c061000 CR4: 00000000000006f0
[ 227.624724] Stack:
[ 227.624724] ffff880031c17df0 ffffffff814d5128 8000000000173600 0000043c95800034
[ 227.624724] 0000000000000b9b 000000000021e4ac 0000000000000034 ffff880000000001
[ 227.624724] ffff880032774200 ffff880032774288 ffffffff81d7a6e0 ffff880031c17e58
[ 227.624724] Call Trace:
[ 227.624724] [<ffffffff814d5128>] xfs_fs_destroy_inode+0x198/0x1f0
[ 227.624724] [<ffffffff811c0c88>] destroy_inode+0x38/0x60
[ 227.624724] [<ffffffff811c0dc3>] evict+0x113/0x180
[ 227.624724] [<ffffffff811c0e69>] dispose_list+0x39/0x50
[ 227.624724] [<ffffffff811c1bcc>] evict_inodes+0x11c/0x130
[ 227.624724] [<ffffffff811a9118>] generic_shutdown_super+0x48/0xf0
[ 227.624724] [<ffffffff811a94ec>] kill_block_super+0x3c/0x90
[ 227.624724] [<ffffffff811a9819>] deactivate_locked_super+0x49/0x60
[ 227.624724] [<ffffffff811a9dc6>] deactivate_super+0x46/0x60
[ 227.624724] [<ffffffff811c53b6>] mntput_no_expire+0xd6/0x170
[ 227.624724] [<ffffffff811c692e>] SyS_umount+0x8e/0x100
[ 227.624724] [<ffffffff81ccbaa9>] system_call_fastpath+0x16/0x1b
[ 227.624724] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 f1 41 89 d0 48 89 e5 48 89 fa 48 c7 c6 e8 62 11 82 31 ff 31 c0 e8 ce fb ff ff <0f> 0b 66 66 66
[ 227.624724] RIP [<ffffffff814d1d32>] assfail+0x22/0x30
[ 227.624724] RSP <ffff880031c17d88>
[ 227.658382] ---[ end trace 3836149aa028dbf6 ]---
i.e. there are still delayed allocation blocks attached to the
inode. Tracing writeback indicates the inode is definitely dirtying
the page cache for every buffer and page dirtied, but there is no
data writeback occuring on that inode between the time it is last
dirtied and unmount evicting the inode.
I'll look into it some more, but it's happening from multiple
different "last dirtied" locations in XFS (buffered writes, sub-page
zeroing in FALLOC_FL_ZERO_RANGE, EOF zeroing from truncate extending
the file, etc) so it doesn't appear to me to be an XFS bug. Hence
you might find it faster than I will. ;)
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
prev parent reply other threads:[~2014-08-05 8:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-31 22:00 [RFC PATCH 00/14] Per-sb tracking of dirty inodes Jan Kara
2014-07-31 22:00 ` [PATCH 01/14] writeback: Get rid of superblock pinning Jan Kara
2014-07-31 22:00 ` [PATCH 02/14] writeback: Remove writeback_inodes_wb() Jan Kara
2014-07-31 22:00 ` [PATCH 03/14] writeback: Remove useless argument of writeback_single_inode() Jan Kara
2014-07-31 22:00 ` [PATCH 04/14] writeback: Don't put inodes which cannot be written to b_more_io Jan Kara
2014-07-31 22:00 ` [PATCH 05/14] writeback: Move dwork and last_old_flush into backing_dev_info Jan Kara
2014-07-31 22:00 ` [PATCH 06/14] writeback: Switch locking of bandwidth fields to wb_lock Jan Kara
2014-07-31 22:00 ` [PATCH 07/14] writeback: Provide a function to get bdi from bdi_writeback Jan Kara
2014-07-31 22:00 ` [PATCH 08/14] writeback: Schedule future writeback if bdi (not wb) has dirty inodes Jan Kara
2014-07-31 22:00 ` [PATCH 09/14] writeback: Switch some function arguments from bdi_writeback to bdi Jan Kara
2014-07-31 22:00 ` [PATCH 10/14] writeback: Move rechecking of work list into bdi_process_work_items() Jan Kara
2014-07-31 22:00 ` [PATCH 11/14] writeback: Shorten list_lock hold times in bdi_writeback() Jan Kara
2014-07-31 22:00 ` [PATCH 12/14] writeback: Move refill of b_io list into writeback_inodes() Jan Kara
2014-07-31 22:00 ` [PATCH 13/14] writeback: Comment update Jan Kara
2014-07-31 22:00 ` [PATCH 14/14] writeback: Per-sb dirty tracking Jan Kara
2014-08-01 5:14 ` Daniel Phillips
2014-08-05 23:44 ` Dave Chinner
2014-08-06 8:46 ` Jan Kara
2014-08-06 21:13 ` Dave Chinner
2014-08-08 10:46 ` Jan Kara
2014-08-10 23:16 ` Dave Chinner
2014-08-01 5:32 ` [RFC PATCH 00/14] Per-sb tracking of dirty inodes Daniel Phillips
2014-08-05 5:22 ` Dave Chinner
2014-08-05 10:31 ` Jan Kara
2014-08-05 8:20 ` Dave Chinner [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140805082014.GE20518@dastard \
--to=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=jack@suse.cz \
--cc=linux-fsdevel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).