All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: wfg@mail.ustc.edu.cn, a.p.zijlstra@chello.nl, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] remove throttle_vm_writeout()
Date: Thu, 4 Oct 2007 17:48:51 -0700	[thread overview]
Message-ID: <20071004174851.b34a3220.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1Idanu-0002c1-00@dorka.pomaz.szeredi.hu>

On Fri, 05 Oct 2007 02:12:30 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote:

> > 
> > I don't think I understand that.  Sure, it _shouldn't_ be a problem.  But it
> > _is_.  That's what we're trying to fix, isn't it?
> 
> The problem, I believe is in the memory allocation code, not in fuse.

fuse is trying to do something which page reclaim was not designed for. 
Stuff broke.

> In the example, memory allocation may be blocking indefinitely,
> because we have 4MB under writeback, even though 28MB can still be
> made available.  And that _should_ be fixable.

Well yes.  But we need to work out how, without re-breaking the thing which
throttle_vm_writeout() fixed.

> > > So the only thing the kernel should be careful about, is not to block
> > > on an allocation if not strictly necessary.
> > > 
> > > Actually a trivial fix for this problem could be to just tweak the
> > > thresholds, so to make the above scenario impossible.  Although I'm
> > > still not convinced, this patch is perfect, because the dirty
> > > threshold can actually change in time...
> > > 
> > > Index: linux/mm/page-writeback.c
> > > ===================================================================
> > > --- linux.orig/mm/page-writeback.c      2007-10-05 00:31:01.000000000 +0200
> > > +++ linux/mm/page-writeback.c   2007-10-05 00:50:11.000000000 +0200
> > > @@ -515,6 +515,12 @@ void throttle_vm_writeout(gfp_t gfp_mask
> > >          for ( ; ; ) {
> > >                 get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
> > > 
> > > +               /*
> > > +                * Make sure the theshold is over the hard limit of
> > > +                * dirty_thresh + ratelimit_pages * nr_cpus
> > > +                */
> > > +               dirty_thresh += ratelimit_pages * num_online_cpus();
> > > +
> > >                  /*
> > >                   * Boost the allowable dirty threshold a bit for page
> > >                   * allocators so they don't get DoS'ed by heavy writers
> > 
> > I can probably kind of guess what you're trying to do here.  But if
> > ratelimit_pages * num_online_cpus() exceeds the size of the offending zone
> > then things might go bad.
> 
> I think the admin can do quite a bit of other damage, by setting
> dirty_ratio too high.
> 
> Maybe this writeback throttling should just have a fixed limit of 80%
> ZONE_NORMAL, and limit dirty_ratio to something like 50%.

Bear in mind that the same problem will occur for the 16MB ZONE_DMA, and
we cannot limit the system-wide dirty-memory threshold to 12MB.

iow, throttle_vm_writeout() needs to become zone-aware.  Then it only
throttles when, say, 80% of ZONE_FOO is under writeback.

Except I don't think that'll fix the problem 100%: if your fuse kernel
component somehow manages to put 80% of ZONE_FOO under writeback (and
remmeber this might be only 12MB on a 16GB machine) then we get stuck again
- the fuse server process (is that the correct terminology, btw?) ends up
waiting upon itself.

I'll think about it a bit.

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: wfg@mail.ustc.edu.cn, a.p.zijlstra@chello.nl, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] remove throttle_vm_writeout()
Date: Thu, 4 Oct 2007 17:48:51 -0700	[thread overview]
Message-ID: <20071004174851.b34a3220.akpm@linux-foundation.org> (raw)
In-Reply-To: <E1Idanu-0002c1-00@dorka.pomaz.szeredi.hu>

On Fri, 05 Oct 2007 02:12:30 +0200 Miklos Szeredi <miklos@szeredi.hu> wrote:

> > 
> > I don't think I understand that.  Sure, it _shouldn't_ be a problem.  But it
> > _is_.  That's what we're trying to fix, isn't it?
> 
> The problem, I believe is in the memory allocation code, not in fuse.

fuse is trying to do something which page reclaim was not designed for. 
Stuff broke.

> In the example, memory allocation may be blocking indefinitely,
> because we have 4MB under writeback, even though 28MB can still be
> made available.  And that _should_ be fixable.

Well yes.  But we need to work out how, without re-breaking the thing which
throttle_vm_writeout() fixed.

> > > So the only thing the kernel should be careful about, is not to block
> > > on an allocation if not strictly necessary.
> > > 
> > > Actually a trivial fix for this problem could be to just tweak the
> > > thresholds, so to make the above scenario impossible.  Although I'm
> > > still not convinced, this patch is perfect, because the dirty
> > > threshold can actually change in time...
> > > 
> > > Index: linux/mm/page-writeback.c
> > > ===================================================================
> > > --- linux.orig/mm/page-writeback.c      2007-10-05 00:31:01.000000000 +0200
> > > +++ linux/mm/page-writeback.c   2007-10-05 00:50:11.000000000 +0200
> > > @@ -515,6 +515,12 @@ void throttle_vm_writeout(gfp_t gfp_mask
> > >          for ( ; ; ) {
> > >                 get_dirty_limits(&background_thresh, &dirty_thresh, NULL, NULL);
> > > 
> > > +               /*
> > > +                * Make sure the theshold is over the hard limit of
> > > +                * dirty_thresh + ratelimit_pages * nr_cpus
> > > +                */
> > > +               dirty_thresh += ratelimit_pages * num_online_cpus();
> > > +
> > >                  /*
> > >                   * Boost the allowable dirty threshold a bit for page
> > >                   * allocators so they don't get DoS'ed by heavy writers
> > 
> > I can probably kind of guess what you're trying to do here.  But if
> > ratelimit_pages * num_online_cpus() exceeds the size of the offending zone
> > then things might go bad.
> 
> I think the admin can do quite a bit of other damage, by setting
> dirty_ratio too high.
> 
> Maybe this writeback throttling should just have a fixed limit of 80%
> ZONE_NORMAL, and limit dirty_ratio to something like 50%.

Bear in mind that the same problem will occur for the 16MB ZONE_DMA, and
we cannot limit the system-wide dirty-memory threshold to 12MB.

iow, throttle_vm_writeout() needs to become zone-aware.  Then it only
throttles when, say, 80% of ZONE_FOO is under writeback.

Except I don't think that'll fix the problem 100%: if your fuse kernel
component somehow manages to put 80% of ZONE_FOO under writeback (and
remmeber this might be only 12MB on a 16GB machine) then we get stuck again
- the fuse server process (is that the correct terminology, btw?) ends up
waiting upon itself.

I'll think about it a bit.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-10-05  0:50 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-04 12:25 [PATCH] remove throttle_vm_writeout() Miklos Szeredi
2007-10-04 12:25 ` Miklos Szeredi
2007-10-04 12:40 ` Peter Zijlstra
2007-10-04 13:00   ` Miklos Szeredi
2007-10-04 13:00     ` Miklos Szeredi
2007-10-04 13:23     ` Peter Zijlstra
2007-10-04 13:49       ` Miklos Szeredi
2007-10-04 13:49         ` Miklos Szeredi
2007-10-04 16:47         ` Peter Zijlstra
2007-10-04 16:47           ` Peter Zijlstra
2007-10-04 17:46           ` Andrew Morton
2007-10-04 17:46             ` Andrew Morton
2007-10-04 18:10             ` Peter Zijlstra
2007-10-04 18:10               ` Peter Zijlstra
2007-10-04 18:54               ` Andrew Morton
2007-10-04 18:54                 ` Andrew Morton
2007-10-05 12:30             ` Fengguang Wu
2007-10-05 12:30               ` Fengguang Wu
2007-10-05 12:30                 ` Fengguang Wu
2007-10-05 17:20                 ` Andrew Morton
2007-10-05 17:20                   ` Andrew Morton
2007-10-06  2:32                   ` Fengguang Wu
2007-10-06  2:32                     ` Fengguang Wu
2007-10-06  2:32                       ` Fengguang Wu
2007-10-07 23:54               ` David Chinner
2007-10-07 23:54                 ` David Chinner
2007-10-08  0:33                 ` Fengguang Wu
2007-10-08  0:33                   ` Fengguang Wu
2007-10-08  0:33                     ` Fengguang Wu
2007-10-04 21:07           ` Miklos Szeredi
2007-10-04 21:07             ` Miklos Szeredi
2007-10-04 21:56 ` Andrew Morton
2007-10-04 21:56   ` Andrew Morton
2007-10-04 22:39   ` Miklos Szeredi
2007-10-04 22:39     ` Miklos Szeredi
2007-10-04 23:09     ` Andrew Morton
2007-10-04 23:09       ` Andrew Morton
2007-10-04 23:26       ` Miklos Szeredi
2007-10-04 23:26         ` Miklos Szeredi
2007-10-04 23:48         ` Andrew Morton
2007-10-04 23:48           ` Andrew Morton
2007-10-05  0:12           ` Miklos Szeredi
2007-10-05  0:12             ` Miklos Szeredi
2007-10-05  0:48             ` Andrew Morton [this message]
2007-10-05  0:48               ` Andrew Morton
2007-10-05  8:22               ` Peter Zijlstra
2007-10-05  9:22                 ` Miklos Szeredi
2007-10-05  9:22                   ` Miklos Szeredi
2007-10-05  9:47                   ` Peter Zijlstra
2007-10-05 10:27                     ` Miklos Szeredi
2007-10-05 10:27                       ` Miklos Szeredi
2007-10-05 10:32                       ` Miklos Szeredi
2007-10-05 10:32                         ` Miklos Szeredi
2007-10-05 15:43                         ` John Stoffel
2007-10-05 15:43                           ` John Stoffel
2007-10-05 10:57                       ` Peter Zijlstra
2007-10-05 11:27                         ` Miklos Szeredi
2007-10-05 11:27                           ` Miklos Szeredi
2007-10-05 17:50                         ` Trond Myklebust
2007-10-05 17:50                           ` Trond Myklebust
2007-10-05 18:32                           ` Peter Zijlstra
2007-10-05 18:32                             ` Peter Zijlstra
2007-10-05 19:20                             ` Trond Myklebust
2007-10-05 19:20                               ` Trond Myklebust
2007-10-05 19:23                               ` Trond Myklebust
2007-10-05 19:23                                 ` Trond Myklebust
2007-10-05 21:07                                 ` Peter Zijlstra
2007-10-05 21:07                                   ` Peter Zijlstra
2007-10-06  0:40                             ` Fengguang Wu
2007-10-06  0:40                               ` Fengguang Wu
2007-10-06  0:40                                 ` Fengguang Wu
2007-10-05  7:32       ` Peter Zijlstra
2007-10-05 19:54         ` Rik van Riel
2007-10-05 19:54           ` Rik van Riel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20071004174851.b34a3220.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=miklos@szeredi.hu \
    --cc=wfg@mail.ustc.edu.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.