From: Andrew Morton <akpm@linux-foundation.org>
To: Alex Tomas <alex@clusterfs.com>
Cc: Andreas Dilger <adilger@clusterfs.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Marat Buharov <marat.buharov@gmail.com>,
Mike Galbraith <efault@gmx.de>,
LKML <linux-kernel@vger.kernel.org>,
Jens Axboe <jens.axboe@oracle.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
Date: Fri, 4 May 2007 00:18:02 -0700 [thread overview]
Message-ID: <20070504001802.0e86e9dd.akpm@linux-foundation.org> (raw)
In-Reply-To: <463AD948.9090103@clusterfs.com>
On Fri, 04 May 2007 10:57:12 +0400 Alex Tomas <alex@clusterfs.com> wrote:
> Andrew Morton wrote:
> > On Fri, 04 May 2007 10:18:12 +0400 Alex Tomas <alex@clusterfs.com> wrote:
> >
> >> Andrew Morton wrote:
> >>> Yes, there can be issues with needing to allocate journal space within the
> >>> context of a commit. But
> >> no-no, this isn't required. we only need to mark pages/blocks within
> >> transaction, otherwise race is possible when we allocate blocks in transaction,
> >> then transacton starts to commit, then we mark pages/blocks to be flushed
> >> before commit.
> >
> > I don't understand. Can you please describe the race in more detail?
>
> if I understood your idea right, then in data=ordered mode, commit thread writes
> all dirty mapped blocks before real commit.
>
> say, we have two threads: t1 is a thread doing flushing and t2 is a commit thread
>
> t1 t2
> find dirty inode I
> find some dirty unallocated blocks
> journal_start()
> allocate blocks
> attach them to I
> journal_stop()
I'm still not understanding. The terms you're using are a bit ambiguous.
What does "find some dirty unallocated blocks" mean? Find a page which is
dirty and which does not have a disk mapping?
Normally the above operation would be implemented via
ext4_writeback_writepage(), and it runs under lock_page().
> going to commit
> find inode I dirty
> do NOT find these blocks because they're
> allocated only, but pages/bhs aren't mapped
> to them
> start commit
I think you're assuming here that commit would be using ->t_sync_datalist
to locate dirty buffer_heads.
But under this proposal, t_sync_datalist just gets removed: the new
ordered-data mode _only_ need to do the sb->inode->page walk. So if I'm
understanding you, the way in which we'd handle any such race is to make
kjournald's writeback of the dirty pages block in lock_page(). Once it
gets the page lock it can look to see if some other thread has mapped the
page to disk.
It may turn out that kjournald needs a private way of getting at the
I_DIRTY_PAGES inodes to do this properly, but I don't _think_ so. If we
had the radix-tree-of-dirty-inodes thing then that's easy enough to do
anyway, with a tagged search. But I expect that a single pass through the
superblock's dirty inodes would suffice for ordered-data. Files which
have chattr +j would screw things up, as usual.
I assume (hope) that your delayed allocation code implements
->writepages()? Doing the allocation one-page-at-a-time sounds painful...
>
> map pages/bhs to just allocate blocks
>
>
> so, either we mark pages/bhs someway within journal_start()--journal_stop() or
> commit thread should do lookup for all dirty pages. the latter doesn't sound nice, IMHO.
>
I don't think I'm understanding you fully yet.
next prev parent reply other threads:[~2007-05-04 7:18 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-27 7:59 [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation) Mike Galbraith
2007-04-27 8:33 ` Andrew Morton
2007-04-27 9:23 ` Mike Galbraith
2007-04-27 10:17 ` Mike Galbraith
2007-04-27 11:59 ` Marat Buharov
2007-04-27 12:30 ` Peter Zijlstra
2007-04-27 13:50 ` Mark Lord
2007-04-27 12:39 ` Manoj Joseph
2007-04-27 15:30 ` Linus Torvalds
2007-04-27 19:31 ` Andreas Dilger
2007-04-27 19:44 ` Mike Galbraith
2007-04-27 19:50 ` Linus Torvalds
2007-04-27 20:05 ` Hua Zhong
2007-04-27 20:12 ` Miquel van Smoorenburg
2007-04-27 20:12 ` Bill Huey
2007-04-28 5:37 ` Mikulas Patocka
2007-04-28 5:45 ` Mikulas Patocka
2007-04-28 21:57 ` Bill Huey
2007-04-28 22:38 ` Mikulas Patocka
2007-04-27 20:29 ` Gabriel C
2007-04-27 20:45 ` Stephen Clark
2007-04-27 20:54 ` Manoj Joseph
2007-04-28 8:45 ` Matthias Andree
2007-04-27 22:18 ` Andrew Morton
2007-05-03 17:38 ` Alex Tomas
2007-05-03 23:54 ` Andrew Morton
2007-05-04 6:18 ` Alex Tomas
2007-05-04 6:38 ` Andrew Morton
2007-05-04 6:57 ` Alex Tomas
2007-05-04 7:18 ` Andrew Morton [this message]
2007-05-04 7:39 ` Alex Tomas
2007-05-04 8:02 ` Andrew Morton
2007-08-16 18:20 ` Alex Tomas
2007-08-16 18:46 ` Andrew Morton
2007-08-17 2:24 ` Alex Tomas
2007-08-17 6:52 ` Andrew Morton
2007-08-17 8:36 ` Alex Tomas
2007-08-17 9:02 ` Andrew Morton
2007-08-17 18:42 ` Alex Tomas
2007-04-28 8:44 ` Matthias Andree
2007-04-28 20:46 ` Mikulas Patocka
2007-04-28 21:12 ` Lee Revell
2007-04-29 20:49 ` Mark Lord
2007-04-29 21:17 ` Mikulas Patocka
2007-04-27 15:18 ` Linus Torvalds
2007-04-27 15:41 ` John Anthony Kazos Jr.
2007-04-27 15:54 ` Linus Torvalds
2007-04-27 16:24 ` Chuck Ebbert
2007-04-27 19:43 ` Marko Macek
2007-04-27 18:31 ` Andrew Morton
2007-04-27 19:09 ` Zan Lynx
2007-04-27 22:07 ` Andrew Morton
2007-04-27 19:27 ` Mike Galbraith
2007-04-28 8:51 ` Matthias Andree
2007-04-28 8:59 ` Andrew Morton
2007-04-28 16:30 ` Linus Torvalds
2007-04-28 16:56 ` Paolo Ornati
2007-04-27 19:28 ` Mike Galbraith
2007-04-27 20:06 ` Jan Engelhardt
2007-04-27 21:22 ` Linus Torvalds
2007-04-28 4:25 ` Mike Galbraith
2007-04-28 6:32 ` Mike Galbraith
2007-04-28 7:01 ` Andrew Morton
2007-04-28 7:12 ` Mike Galbraith
2007-04-28 6:32 ` Mikulas Patocka
2007-04-28 16:05 ` Linus Torvalds
2007-04-28 16:37 ` Ingo Molnar
2007-04-28 17:11 ` Mikulas Patocka
2007-04-30 6:57 ` Jens Axboe
2007-04-28 17:55 ` Mikulas Patocka
2007-04-30 6:56 ` Jens Axboe
2007-05-02 6:53 ` Jens Axboe
2007-05-02 7:36 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070504001802.0e86e9dd.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=adilger@clusterfs.com \
--cc=alex@clusterfs.com \
--cc=efault@gmx.de \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marat.buharov@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.