From: Andrew Morton <akpm@linux-foundation.org>
To: Alex Tomas <alex@clusterfs.com>
Cc: Andreas Dilger <adilger@clusterfs.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Marat Buharov <marat.buharov@gmail.com>,
Mike Galbraith <efault@gmx.de>,
LKML <linux-kernel@vger.kernel.org>,
Jens Axboe <jens.axboe@oracle.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
Date: Fri, 4 May 2007 00:18:02 -0700 [thread overview]
Message-ID: <20070504001802.0e86e9dd.akpm@linux-foundation.org> (raw)
In-Reply-To: <463AD948.9090103@clusterfs.com>
On Fri, 04 May 2007 10:57:12 +0400 Alex Tomas <alex@clusterfs.com> wrote:
> Andrew Morton wrote:
> > On Fri, 04 May 2007 10:18:12 +0400 Alex Tomas <alex@clusterfs.com> wrote:
> >
> >> Andrew Morton wrote:
> >>> Yes, there can be issues with needing to allocate journal space within the
> >>> context of a commit. But
> >> no-no, this isn't required. we only need to mark pages/blocks within
> >> transaction, otherwise race is possible when we allocate blocks in transaction,
> >> then transacton starts to commit, then we mark pages/blocks to be flushed
> >> before commit.
> >
> > I don't understand. Can you please describe the race in more detail?
>
> if I understood your idea right, then in data=ordered mode, commit thread writes
> all dirty mapped blocks before real commit.
>
> say, we have two threads: t1 is a thread doing flushing and t2 is a commit thread
>
> t1 t2
> find dirty inode I
> find some dirty unallocated blocks
> journal_start()
> allocate blocks
> attach them to I
> journal_stop()
I'm still not understanding. The terms you're using are a bit ambiguous.
What does "find some dirty unallocated blocks" mean? Find a page which is
dirty and which does not have a disk mapping?
Normally the above operation would be implemented via
ext4_writeback_writepage(), and it runs under lock_page().
> going to commit
> find inode I dirty
> do NOT find these blocks because they're
> allocated only, but pages/bhs aren't mapped
> to them
> start commit
I think you're assuming here that commit would be using ->t_sync_datalist
to locate dirty buffer_heads.
But under this proposal, t_sync_datalist just gets removed: the new
ordered-data mode _only_ need to do the sb->inode->page walk. So if I'm
understanding you, the way in which we'd handle any such race is to make
kjournald's writeback of the dirty pages block in lock_page(). Once it
gets the page lock it can look to see if some other thread has mapped the
page to disk.
It may turn out that kjournald needs a private way of getting at the
I_DIRTY_PAGES inodes to do this properly, but I don't _think_ so. If we
had the radix-tree-of-dirty-inodes thing then that's easy enough to do
anyway, with a tagged search. But I expect that a single pass through the
superblock's dirty inodes would suffice for ordered-data. Files which
have chattr +j would screw things up, as usual.
I assume (hope) that your delayed allocation code implements
->writepages()? Doing the allocation one-page-at-a-time sounds painful...
>
> map pages/bhs to just allocate blocks
>
>
> so, either we mark pages/bhs someway within journal_start()--journal_stop() or
> commit thread should do lookup for all dirty pages. the latter doesn't sound nice, IMHO.
>
I don't think I'm understanding you fully yet.
next prev parent reply other threads:[~2007-05-04 7:18 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1177660767.6567.41.camel@Homer.simpson.net>
2007-04-27 8:33 ` [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation) Andrew Morton
2007-04-27 9:23 ` Mike Galbraith
2007-04-27 10:17 ` Mike Galbraith
2007-04-27 11:59 ` Marat Buharov
2007-04-27 12:30 ` Peter Zijlstra
2007-04-27 13:50 ` Mark Lord
2007-04-27 12:39 ` Manoj Joseph
2007-04-27 15:30 ` Linus Torvalds
2007-04-27 19:31 ` Andreas Dilger
2007-04-27 19:44 ` Mike Galbraith
2007-04-27 19:50 ` Linus Torvalds
2007-04-27 20:05 ` Hua Zhong
2007-04-27 20:12 ` Bill Huey
2007-04-28 5:37 ` Mikulas Patocka
2007-04-28 5:45 ` Mikulas Patocka
2007-04-28 21:57 ` Bill Huey
2007-04-28 22:38 ` Mikulas Patocka
2007-04-27 20:29 ` Gabriel C
2007-04-27 20:54 ` Manoj Joseph
2007-04-28 8:45 ` Matthias Andree
2007-04-27 22:18 ` Andrew Morton
2007-05-03 17:38 ` Alex Tomas
2007-05-03 23:54 ` Andrew Morton
2007-05-04 6:18 ` Alex Tomas
2007-05-04 6:38 ` Andrew Morton
2007-05-04 6:57 ` Alex Tomas
2007-05-04 7:18 ` Andrew Morton [this message]
2007-05-04 7:39 ` Alex Tomas
2007-05-04 8:02 ` Andrew Morton
2007-08-16 18:20 ` Alex Tomas
2007-08-16 18:46 ` Andrew Morton
2007-08-17 2:24 ` Alex Tomas
2007-08-17 6:52 ` Andrew Morton
2007-08-17 8:36 ` Alex Tomas
2007-08-17 9:02 ` Andrew Morton
2007-08-17 18:42 ` Alex Tomas
2007-04-28 8:44 ` Matthias Andree
2007-04-28 20:46 ` Mikulas Patocka
2007-04-28 21:12 ` Lee Revell
2007-04-29 20:49 ` Mark Lord
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070504001802.0e86e9dd.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=adilger@clusterfs.com \
--cc=alex@clusterfs.com \
--cc=efault@gmx.de \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marat.buharov@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).