From: Ted Ts'o <tytso@mit.edu>
To: Ext4 Developers List <linux-ext4@vger.kernel.org>
Subject: Re: [PATCH,RFC 7/7] ext4: move ext4_journal_start/stop to mpage_da_map_and_submit()
Date: Thu, 17 Feb 2011 23:23:53 -0500 [thread overview]
Message-ID: <20110218042353.GA4923@thunk.org> (raw)
In-Reply-To: <1297556157-21559-8-git-send-email-tytso@mit.edu>
On Sat, Feb 12, 2011 at 07:15:57PM -0500, Theodore Ts'o wrote:
> Previously, ext4_da_writepages() was responsible for calling
> ext4_journal_start() and ext4_journal_stop(). If the blocks had
> already been allocated (we don't support journal=data in
> ext4_da_writepages), then there's no need to start a new journal
> handle.
>
> By moving ext4_journal_start/stop calls to mpage_da_map_and_submit()
> we should significantly reduce the cpu usage (and cache line bouncing)
> if the journal is enabled. This should (hopefully!) be especially
> noticeable on large SMP systems.
>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Argh, it turns out this doesn't work. I was getting sporadic
deadlocks and I finally figured out the problem. If a process is
holding page locks, it can't call ext4_journal_start() safely in
data=ordered, since there's a chance that there won't be enough
transaction credits and a new transaction will be started. And at
that point, in data=ordered mode, we may end up calling
journal_submit_inode_data_buffers(), which could try to write back the
inode pages in question --- which are already locked.
This means that we need to start the journal handle long before we
know whether or not we really need it. Boo, hiss!
The only way to solve this problem is to do what I've been planning
all for a while, which is to add support in ext4_map_blocks() for a
mode where it will allocate a region of blocks, but *not* update the
extent map. It will have to store the allocation in an in-memory
cache, so that if other CPU's try to request a logical block, it will
get the right answer. However, the actual on-disk extent map can't be
updated until *after* the data is safely written on disk (and the
pages can thus be unlocked).
Once we do that, we'll also be able to ditch ordered mode for good,
since it means that there won't be any chance of stale data being
revealed, without any of performance disasters involved with
data=ordered mode.
I have no idea what these changes will do to Amir's snapshot plans,
but sorry, getting this right is going to be higher priority.
I may end up submitting the rest of this patch series without this
last patch, since it does clean up the code paths a lot, and it should
result in a few small performance improvements --- the big performance
improvement, found in this patch, we'll have to skip until we can fix
up the writeback submission.
- Ted
next prev parent reply other threads:[~2011-02-18 4:23 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-02-13 0:15 [PATCH,RFC 0/7] Simplify buffered write submissions, part II Theodore Ts'o
2011-02-13 0:15 ` [PATCH,RFC 1/7] ext4: fold __mpage_da_writepage() into write_cache_pages_da() Theodore Ts'o
2011-02-13 1:25 ` Josef Bacik
2011-02-13 5:42 ` Ted Ts'o
2011-02-13 12:48 ` Josef Bacik
2011-02-13 0:15 ` [PATCH,RFC 2/7] ext4: simple cleanups to write_cache_pages_da() Theodore Ts'o
2011-02-13 1:31 ` Josef Bacik
2011-02-13 0:15 ` [PATCH,RFC 3/7] ext4: clear the dirty bit for a page in writeback at the last minute Theodore Ts'o
2011-02-13 1:34 ` Josef Bacik
2011-02-13 0:15 ` [PATCH,RFC 4/7] ext4: remove page_skipped hackery in ext4_da_writepages() Theodore Ts'o
2011-02-13 1:36 ` Josef Bacik
2011-02-13 0:15 ` [PATCH,RFC 5/7] ext4: don't lock the next page in write_cache_pages if not needed Theodore Ts'o
2011-02-13 1:38 ` Josef Bacik
2011-02-13 0:15 ` [PATCH,RFC 6/7] ext4: move setup of the mpd structure to write_cache_pages_da() Theodore Ts'o
2011-02-13 1:40 ` Josef Bacik
2011-02-13 0:15 ` [PATCH,RFC 7/7] ext4: move ext4_journal_start/stop to mpage_da_map_and_submit() Theodore Ts'o
2011-02-18 4:23 ` Ted Ts'o [this message]
2011-02-18 10:42 ` Amir Goldstein
2011-02-18 11:44 ` Yongqiang Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110218042353.GA4923@thunk.org \
--to=tytso@mit.edu \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).