From: Andrew Morton <akpm@linux-foundation.org>
To: Alex Tomas <alex@clusterfs.com>
Cc: Andreas Dilger <adilger@clusterfs.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Marat Buharov <marat.buharov@gmail.com>,
Mike Galbraith <efault@gmx.de>,
LKML <linux-kernel@vger.kernel.org>,
Jens Axboe <jens.axboe@oracle.com>,
"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>
Subject: Re: [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation)
Date: Fri, 4 May 2007 01:02:12 -0700 [thread overview]
Message-ID: <20070504010212.ce6eca53.akpm@linux-foundation.org> (raw)
In-Reply-To: <463AE32A.5000902@clusterfs.com>
On Fri, 04 May 2007 11:39:22 +0400 Alex Tomas <alex@clusterfs.com> wrote:
> Andrew Morton wrote:
> > I'm still not understanding. The terms you're using are a bit ambiguous.
> >
> > What does "find some dirty unallocated blocks" mean? Find a page which is
> > dirty and which does not have a disk mapping?
> >
> > Normally the above operation would be implemented via
> > ext4_writeback_writepage(), and it runs under lock_page().
>
> I'm mostly worried about delayed allocation case. My impression was that
> holding number of pages locked isn't a good idea, even if they're locked
> in index order. so, I was going to turn number of pages writeback, then
> allocate blocks for all of them at once, then put proper blocknr's into
> bh's (or PG_mappedtodisk?).
ooh, that sounds hacky and quite worrisome. If someone comes in and does
an fsync() we've lost our synchronisation point. Yes, all callers happen
to do
lock_page();
wait_on_page_writeback();
(I think) but we've never considered a bare PageWriteback() as something
which protects page internals. We're OK wrt page reclaim and we're OK wrt
truncate and invalidate. As long as the page is uptodate we _should_ be OK
wrt readpage(). But still, it'd be better to use the standard locking
rather than inventing new rules, if poss.
I'd be 100% OK with locking multiple pages in ascending pgoff_t order.
Locking the page is the standard way of doing this synchronisation and the
only problem I can think of is that having a tremendous number of pages
locked could cause the wake_up_page() waitqueue hashes to get overloaded
and go slow. But it's also possible to lock many, many pages with
readahead and nobody has reported problems in there.
> >
> >
> >> going to commit
> >> find inode I dirty
> >> do NOT find these blocks because they're
> >> allocated only, but pages/bhs aren't mapped
> >> to them
> >> start commit
> >
> > I think you're assuming here that commit would be using ->t_sync_datalist
> > to locate dirty buffer_heads.
>
> nope, I mean sb->inode->page walk.
>
> > But under this proposal, t_sync_datalist just gets removed: the new
> > ordered-data mode _only_ need to do the sb->inode->page walk. So if I'm
> > understanding you, the way in which we'd handle any such race is to make
> > kjournald's writeback of the dirty pages block in lock_page(). Once it
> > gets the page lock it can look to see if some other thread has mapped the
> > page to disk.
>
> if I'm right holding number of pages locked, then they won't be locked, but
> writeback. of course kjournald can block on writeback as well, but how does
> it find pages with *newly allocated* blocks only?
I don't think we'd want kjournald to do that. Even if a page was dirtied
by an overwrite, we'd want to write it back during commit, just from a
quality-of-implementation point of view. If we were to leave these pages
unwritten during commit then a post-recovery file could have a mix of
up-to-five-second-old data and up-to-30-seconds-old data.
> > It may turn out that kjournald needs a private way of getting at the
> > I_DIRTY_PAGES inodes to do this properly, but I don't _think_ so. If we
> > had the radix-tree-of-dirty-inodes thing then that's easy enough to do
> > anyway, with a tagged search. But I expect that a single pass through the
> > superblock's dirty inodes would suffice for ordered-data. Files which
> > have chattr +j would screw things up, as usual.
>
> not dirty inodes only, but rather some fast way to find pages with newly
> allocated pages.
Newly allocated blocks, you mean?
Just write out the overwritten blocks as well as the new ones, I reckon.
It's what we do now.
next prev parent reply other threads:[~2007-05-04 8:03 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-27 7:59 [ext3][kernels >= 2.6.20.7 at least] KDE going comatose when FS is under heavy write load (massive starvation) Mike Galbraith
2007-04-27 8:33 ` Andrew Morton
2007-04-27 9:23 ` Mike Galbraith
2007-04-27 10:17 ` Mike Galbraith
2007-04-27 11:59 ` Marat Buharov
2007-04-27 12:30 ` Peter Zijlstra
2007-04-27 13:50 ` Mark Lord
2007-04-27 12:39 ` Manoj Joseph
2007-04-27 15:30 ` Linus Torvalds
2007-04-27 19:31 ` Andreas Dilger
2007-04-27 19:44 ` Mike Galbraith
2007-04-27 19:50 ` Linus Torvalds
2007-04-27 20:05 ` Hua Zhong
2007-04-27 20:12 ` Miquel van Smoorenburg
2007-04-27 20:12 ` Bill Huey
2007-04-28 5:37 ` Mikulas Patocka
2007-04-28 5:45 ` Mikulas Patocka
2007-04-28 21:57 ` Bill Huey
2007-04-28 22:38 ` Mikulas Patocka
2007-04-27 20:29 ` Gabriel C
2007-04-27 20:45 ` Stephen Clark
2007-04-27 20:54 ` Manoj Joseph
2007-04-28 8:45 ` Matthias Andree
2007-04-27 22:18 ` Andrew Morton
2007-05-03 17:38 ` Alex Tomas
2007-05-03 23:54 ` Andrew Morton
2007-05-04 6:18 ` Alex Tomas
2007-05-04 6:38 ` Andrew Morton
2007-05-04 6:57 ` Alex Tomas
2007-05-04 7:18 ` Andrew Morton
2007-05-04 7:39 ` Alex Tomas
2007-05-04 8:02 ` Andrew Morton [this message]
2007-08-16 18:20 ` Alex Tomas
2007-08-16 18:46 ` Andrew Morton
2007-08-17 2:24 ` Alex Tomas
2007-08-17 6:52 ` Andrew Morton
2007-08-17 8:36 ` Alex Tomas
2007-08-17 9:02 ` Andrew Morton
2007-08-17 18:42 ` Alex Tomas
2007-04-28 8:44 ` Matthias Andree
2007-04-28 20:46 ` Mikulas Patocka
2007-04-28 21:12 ` Lee Revell
2007-04-29 20:49 ` Mark Lord
2007-04-29 21:17 ` Mikulas Patocka
2007-04-27 15:18 ` Linus Torvalds
2007-04-27 15:41 ` John Anthony Kazos Jr.
2007-04-27 15:54 ` Linus Torvalds
2007-04-27 16:24 ` Chuck Ebbert
2007-04-27 19:43 ` Marko Macek
2007-04-27 18:31 ` Andrew Morton
2007-04-27 19:09 ` Zan Lynx
2007-04-27 22:07 ` Andrew Morton
2007-04-27 19:27 ` Mike Galbraith
2007-04-28 8:51 ` Matthias Andree
2007-04-28 8:59 ` Andrew Morton
2007-04-28 16:30 ` Linus Torvalds
2007-04-28 16:56 ` Paolo Ornati
2007-04-27 19:28 ` Mike Galbraith
2007-04-27 20:06 ` Jan Engelhardt
2007-04-27 21:22 ` Linus Torvalds
2007-04-28 4:25 ` Mike Galbraith
2007-04-28 6:32 ` Mike Galbraith
2007-04-28 7:01 ` Andrew Morton
2007-04-28 7:12 ` Mike Galbraith
2007-04-28 6:32 ` Mikulas Patocka
2007-04-28 16:05 ` Linus Torvalds
2007-04-28 16:37 ` Ingo Molnar
2007-04-28 17:11 ` Mikulas Patocka
2007-04-30 6:57 ` Jens Axboe
2007-04-28 17:55 ` Mikulas Patocka
2007-04-30 6:56 ` Jens Axboe
2007-05-02 6:53 ` Jens Axboe
2007-05-02 7:36 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070504010212.ce6eca53.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=adilger@clusterfs.com \
--cc=alex@clusterfs.com \
--cc=efault@gmx.de \
--cc=jens.axboe@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marat.buharov@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.