From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-fsdevel@vger.kernel.org, tux3@tux3.org,
Al Viro <viro@zeniv.linux.org.uk>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Optimize wait_sb_inodes()
Date: Thu, 27 Jun 2013 09:14:07 +0900 [thread overview]
Message-ID: <87wqpg76ls.fsf@devron.myhome.or.jp> (raw)
In-Reply-To: <20130626231143.GC28426@dastard> (Dave Chinner's message of "Thu, 27 Jun 2013 09:11:43 +1000")
Dave Chinner <david@fromorbit.com> writes:
>> On another view, wait_sb_inodes() would (arguably) be necessary for
>> legacy FSes. But, for example, if data=journal on ext*, wait_sb_inodes()
>> would be more than useless, because ext* can be done it by own
>> transaction list (and more efficient way).
>>
>> Likewise, on tux3, the state is same with data=journal.
>>
>> Also, even if data=ordered, ext* might be able to check in-flight I/O by
>> ordered data list (with some new additional check, I'm not sure).
>
> Why would you bother solving this problem differently in every
> single filesystem? It's solvable at the VFS by tracking inodes that
> are no longer dirty but still under writeback on the BDI. Then
> converting wait_sb_inodes() to walk all the dirty and writeback
> inodes would be sufficient for data integrity purposes, and it would
> be done under the bdi writeback lock, not the inode_sb_list_lock....
>
> Alternatively, splitting up the inode sb list and lock (say via the
> per-node list_lru structures in -mm and -next that are being added
> for exactly this purpose) would also significantly reduce lock
> contention on both the create/evict fast paths and the
> wait_sb_inodes() walk that is currently done....
>
> So I think that you should address the problem properly at the VFS
> level so everyone benefits, not push interfaces that allow
> filesystem specific hacks to work around VFS level deficiencies...
Optimizing wait_sb_inodes() might help lock contention, but it doesn't
help unnecessary wait/check. Since some FSes know about current
in-flight I/O already in those internal, so I think, those FSes can be
done it here, or are already doing in ->sync_fs().
For example, I guess ext4 implement (untested) would be something like
following. If ->sync_fs() does all, ext4 doesn't need to be bothered by
wait_sb_inodes().
static void ext4_wait_inodes(struct super_block *sb)
{
/* ->sync_fs() guarantees to wait all */
if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
return;
/* FIXME: On data=ordered, we might be able to avoid this too. */
wait_sb_inodes(sb);
}
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
WARNING: multiple messages have this Message-ID (diff)
From: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
To: Dave Chinner <david@fromorbit.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
tux3@tux3.org
Subject: Re: [PATCH] Optimize wait_sb_inodes()
Date: Thu, 27 Jun 2013 09:14:07 +0900 [thread overview]
Message-ID: <87wqpg76ls.fsf@devron.myhome.or.jp> (raw)
In-Reply-To: <20130626231143.GC28426@dastard> (Dave Chinner's message of "Thu, 27 Jun 2013 09:11:43 +1000")
Dave Chinner <david@fromorbit.com> writes:
>> On another view, wait_sb_inodes() would (arguably) be necessary for
>> legacy FSes. But, for example, if data=journal on ext*, wait_sb_inodes()
>> would be more than useless, because ext* can be done it by own
>> transaction list (and more efficient way).
>>
>> Likewise, on tux3, the state is same with data=journal.
>>
>> Also, even if data=ordered, ext* might be able to check in-flight I/O by
>> ordered data list (with some new additional check, I'm not sure).
>
> Why would you bother solving this problem differently in every
> single filesystem? It's solvable at the VFS by tracking inodes that
> are no longer dirty but still under writeback on the BDI. Then
> converting wait_sb_inodes() to walk all the dirty and writeback
> inodes would be sufficient for data integrity purposes, and it would
> be done under the bdi writeback lock, not the inode_sb_list_lock....
>
> Alternatively, splitting up the inode sb list and lock (say via the
> per-node list_lru structures in -mm and -next that are being added
> for exactly this purpose) would also significantly reduce lock
> contention on both the create/evict fast paths and the
> wait_sb_inodes() walk that is currently done....
>
> So I think that you should address the problem properly at the VFS
> level so everyone benefits, not push interfaces that allow
> filesystem specific hacks to work around VFS level deficiencies...
Optimizing wait_sb_inodes() might help lock contention, but it doesn't
help unnecessary wait/check. Since some FSes know about current
in-flight I/O already in those internal, so I think, those FSes can be
done it here, or are already doing in ->sync_fs().
For example, I guess ext4 implement (untested) would be something like
following. If ->sync_fs() does all, ext4 doesn't need to be bothered by
wait_sb_inodes().
static void ext4_wait_inodes(struct super_block *sb)
{
/* ->sync_fs() guarantees to wait all */
if (test_opt(inode->i_sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA)
return;
/* FIXME: On data=ordered, we might be able to avoid this too. */
wait_sb_inodes(sb);
}
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
next prev parent reply other threads:[~2013-06-27 0:14 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-26 8:45 [PATCH] Optimize wait_sb_inodes() OGAWA Hirofumi
2013-06-26 8:45 ` OGAWA Hirofumi
2013-06-26 14:32 ` Jörn Engel
2013-06-27 0:01 ` OGAWA Hirofumi
2013-06-27 0:01 ` OGAWA Hirofumi
2013-06-26 23:11 ` Dave Chinner
2013-06-27 0:14 ` OGAWA Hirofumi [this message]
2013-06-27 0:14 ` OGAWA Hirofumi
2013-06-27 4:47 ` Dave Chinner
2013-06-27 5:18 ` OGAWA Hirofumi
2013-06-27 5:18 ` OGAWA Hirofumi
2013-06-27 6:38 ` Dave Chinner
2013-06-27 7:22 ` OGAWA Hirofumi
2013-06-27 7:22 ` OGAWA Hirofumi
2013-06-27 9:40 ` Dave Chinner
2013-06-27 10:06 ` OGAWA Hirofumi
2013-06-27 10:06 ` OGAWA Hirofumi
2013-06-27 18:36 ` Theodore Ts'o
2013-06-27 23:37 ` OGAWA Hirofumi
2013-06-27 23:37 ` OGAWA Hirofumi
2013-06-27 23:45 ` Theodore Ts'o
2013-06-28 0:30 ` OGAWA Hirofumi
2013-06-28 0:30 ` OGAWA Hirofumi
2013-06-28 5:23 ` Dave Chinner
2013-06-28 7:39 ` OGAWA Hirofumi
2013-06-28 7:39 ` OGAWA Hirofumi
2013-06-28 3:00 ` Daniel Phillips
2013-06-27 7:22 ` Daniel Phillips
2013-06-27 5:50 ` Daniel Phillips
2013-06-27 5:50 ` Daniel Phillips
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wqpg76ls.fsf@devron.myhome.or.jp \
--to=hirofumi@mail.parknet.co.jp \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tux3@tux3.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.