From: Andrea Arcangeli <andrea@suse.de>
To: Linus Torvalds <torvalds@transmeta.com>
Cc: Mike Galbraith <mikeg@wen-online.de>,
Anton Blanchard <anton@linuxcare.com.au>,
linux-kernel <linux-kernel@vger.kernel.org>,
Andrew Morton <andrewm@uow.edu.au>
Subject: Re: scheduling problem?
Date: Tue, 2 Jan 2001 22:52:30 +0100 [thread overview]
Message-ID: <20010102225230.D7563@athlon.random> (raw)
In-Reply-To: <20010102210949.C7563@athlon.random> <Pine.LNX.4.10.10101021250160.25414-100000@penguin.transmeta.com>
In-Reply-To: <Pine.LNX.4.10.10101021250160.25414-100000@penguin.transmeta.com>; from torvalds@transmeta.com on Tue, Jan 02, 2001 at 01:02:30PM -0800
On Tue, Jan 02, 2001 at 01:02:30PM -0800, Linus Torvalds wrote:
>
>
> On Tue, 2 Jan 2001, Andrea Arcangeli wrote:
>
> > On Tue, Jan 02, 2001 at 11:02:41AM -0800, Linus Torvalds wrote:
> > > What does the system feel like if you just change the "sleep for bdflush"
> > > logic in wakeup_bdflush() to something like
> > >
> > > wake_up_process(bdflush_tsk);
> > > __set_current_state(TASK_RUNNING);
> > > current->policy |= SCHED_YIELD;
> > > schedule();
> > >
> > > instead of trying to wait for bdflush to wake us up?
> >
> > My bet is a `VM: killing' message.
>
> Maybe in 2.2.x, yes.
>
> > Waiting bdflush back-wakeup is mandatory to do write throttling correctly. The
> > above will break write throttling at least unless something foundamental is
> > changed recently and that doesn't seem the case.
>
> page_launder() should wait for the dirty pages, and that's not something
> 2.2.x ever did.
In late 2.2.x we have sync_page_buffers too but I'm not sure how well it
behaves when the whole MM is costantly kept totally dirty and we don't have
swap. Infact also the 2.4.x implementation:
static void sync_page_buffers(struct buffer_head *bh, int wait)
{
struct buffer_head * tmp = bh;
do {
struct buffer_head *p = tmp;
tmp = tmp->b_this_page;
if (buffer_locked(p)) {
if (wait > 1)
__wait_on_buffer(p);
} else if (buffer_dirty(p))
ll_rw_block(WRITE, 1, &p);
} while (tmp != bh);
}
won't cope with the memory totally dirty. It will make the buffer from dirty to
locked then it will wait I/O completion at the second pass, but it
won't try again to free the page for the third time (when the page is finally
freeable):
if (wait) {
sync_page_buffers(bh, wait);
/* We waited synchronously, so we can free the buffers. */
if (wait > 1 && !loop) {
loop = 1;
goto cleaned_buffers_try_again;
}
Probably not a big deal.
The real point is that even if try_to_free_buffers will deal perfectly with the
VM totally dirty we'll end waiting I/O completion in the wrong place.
setiathome will end waiting I/O completion instead of `cp`. It's not setiathome
but `cp` that should do write throttling. And `cp` will block again very soon
even if setiathome blocks too. The whole point is that the write throttling
must happen in balance_dirty(), _not_ in sync_page_buffers().
Infact from 2.2.19pre2 there's a wait_io per-bh bitflag that remembers when a
dirty bh is very old and it doesn't get flushed away automatically (from
either kupdate or kflushd). So we don't block in sync_page_buffers until it's
necessary to avoid hurting non-IO apps when I/O is going on.
> NOTE! I think that throttling writers is fine and good, but as it stands
> now, the dirty buffer balancing will throttle anybody, not just the
> writer. That's partly because of the 2.4.x mis-feature of doing the
How can it throttle everybody and not only the writers? _Only_ the
writers calls balance_dirty.
> balance_dirty call even for previously dirty buffers (fixed in my tree,
> btw).
Yes I seen, people overwriting dirty data was blocking too, that was
not necessary, but they were still writers.
> It's _really_ bad to wait for bdflush to finish if we hold on to things
> like the superblock lock - which _does_ happen right now. That's why I'm
> pretty convinced that we should NOT blindly do the dirty balance in
> "mark_buffer_dirty()", but instead at more well-defined points (in places
> like "generic_file_write()", for example).
Right way to avoid blocking with lock helds is to replace mark_buffer_dirty
with __mark_buffer_dirty() and to call balance_dirty() later when the locks are
released. That's why it's exported to modules. Everybody is always been
allowed to optimize away the mark_buffer_dirty(), it's just that nobody did
that yet. I think it's useful to keep providing an interface that does the
write throttling automatically.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
next prev parent reply other threads:[~2001-01-02 22:23 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2001-01-02 8:27 scheduling problem? Mike Galbraith
2001-01-02 14:01 ` Anton Blanchard
2001-01-02 14:59 ` Mike Galbraith
2001-01-02 19:02 ` Linus Torvalds
2001-01-02 20:09 ` Andrea Arcangeli
2001-01-02 21:02 ` Linus Torvalds
2001-01-02 21:52 ` Andrea Arcangeli [this message]
2001-01-02 22:01 ` Linus Torvalds
2001-01-02 22:23 ` Linus Torvalds
2001-01-03 4:48 ` Mike Galbraith
2001-01-03 5:52 ` Linus Torvalds
2001-01-03 7:21 ` Mike Galbraith
2001-01-03 11:30 ` Mike Galbraith
2001-01-02 23:13 ` Daniel Phillips
2001-01-03 4:46 ` Mike Galbraith
2001-01-03 14:20 ` Daniel Phillips
2001-01-03 15:02 ` Mike Galbraith
2001-01-03 14:51 ` Daniel Phillips
2001-01-03 15:39 ` Mike Galbraith
2001-01-03 15:59 ` Daniel Phillips
2001-01-03 2:39 ` Roger Larsson
2001-01-03 5:17 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20010102225230.D7563@athlon.random \
--to=andrea@suse.de \
--cc=andrewm@uow.edu.au \
--cc=anton@linuxcare.com.au \
--cc=linux-kernel@vger.kernel.org \
--cc=mikeg@wen-online.de \
--cc=torvalds@transmeta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.