From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: akpm@linux-foundation.org, wfg@mail.ustc.edu.cn,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Trond Myklebust <trond.myklebust@fys.uio.no>
Subject: Re: [PATCH] remove throttle_vm_writeout()
Date: Fri, 05 Oct 2007 12:57:34 +0200 [thread overview]
Message-ID: <1191581854.22357.85.camel@twins> (raw)
In-Reply-To: <E1IdkOf-0002tK-00@dorka.pomaz.szeredi.hu>
[-- Attachment #1: Type: text/plain, Size: 4703 bytes --]
On Fri, 2007-10-05 at 12:27 +0200, Miklos Szeredi wrote:
> > diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> > index 4ef4d22..eff2438 100644
> > --- a/include/linux/writeback.h
> > +++ b/include/linux/writeback.h
> > @@ -88,7 +88,7 @@ static inline void wait_on_inode(struct inode *inode)
> > int wakeup_pdflush(long nr_pages);
> > void laptop_io_completion(void);
> > void laptop_sync_completion(void);
> > -void throttle_vm_writeout(gfp_t gfp_mask);
> > +void throttle_vm_writeout(struct zone *zone, gfp_t gfp_mask);
> >
> > /* These are exported to sysctl. */
> > extern int dirty_background_ratio;
> > diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> > index eec1481..f949997 100644
> > --- a/mm/page-writeback.c
> > +++ b/mm/page-writeback.c
> > @@ -326,11 +326,8 @@ void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
> > }
> > EXPORT_SYMBOL(balance_dirty_pages_ratelimited_nr);
> >
> > -void throttle_vm_writeout(gfp_t gfp_mask)
> > +void throttle_vm_writeout(struct zone *zone, gfp_t gfp_mask)
> > {
> > - long background_thresh;
> > - long dirty_thresh;
> > -
> > if ((gfp_mask & (__GFP_FS|__GFP_IO)) != (__GFP_FS|__GFP_IO)) {
> > /*
> > * The caller might hold locks which can prevent IO completion
> > @@ -342,17 +339,16 @@ void throttle_vm_writeout(gfp_t gfp_mask)
> > }
> >
> > for ( ; ; ) {
> > - get_dirty_limits(&background_thresh, &dirty_thresh, NULL);
> > + unsigned long thresh = zone_page_state(zone, NR_ACTIVE) +
> > + zone_page_state(zone, NR_INACTIVE);
> >
> > - /*
> > - * Boost the allowable dirty threshold a bit for page
> > - * allocators so they don't get DoS'ed by heavy writers
> > - */
> > - dirty_thresh += dirty_thresh / 10; /* wheeee... */
> > + /*
> > + * wait when 75% of the zone's pages are under writeback
> > + */
> > + thresh -= thresh >> 2;
> > + if (zone_page_state(zone, NR_WRITEBACK) < thresh)
> > + break;
> >
> > - if (global_page_state(NR_UNSTABLE_NFS) +
> > - global_page_state(NR_WRITEBACK) <= dirty_thresh)
> > - break;
> > congestion_wait(WRITE, HZ/10);
> > }
> > }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 1be5a63..7dd6bd9 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -948,7 +948,7 @@ static unsigned long shrink_zone(int priority, struct zone *zone,
> > }
> > }
> >
> > - throttle_vm_writeout(sc->gfp_mask);
> > + throttle_vm_writeout(zone, sc->gfp_mask);
> >
> > atomic_dec(&zone->reclaim_in_progress);
> > return nr_reclaimed;
> >
> >
>
> I think that's an improvement in all respects.
>
> However it still does not generally address the deadlock scenario: if
> there's a small DMA zone, and fuse manages to put all of those pages
> under writeout, then there's trouble.
>
> But it's not really fuse specific. If it was a normal filesystem that
> did that, and it needed a GFP_DMA allocation for writeout, it is in
> trouble also, as that allocation would fail (at least no deadlock).
>
> Or is GFP_DMA never used by fs/io writeout paths?
I agree that its not complete. (hence the lack of sign-off etc.)
'normally' writeback pages just need an interrupt to signal the stuff is
written back. ie. the writeback completion path is atomic.
[ result of the thinking below -- the above is esp. true for swap pages
- so maybe we should ensure that !swap traffic can never exceed this
75% - that way, swap can always us get out of a tight spot ]
Maybe we should make that a requirement, although I see how that becomes
rather hard for FUSE (which is where Andrews PF_MEMALLOC suggestion
comes from - but I really dislike PF_MEMALLOC exposed to userspace).
Limiting FUSE to say 50% (suggestion from your other email) sounds like
a horrible hack to me. - Need more time to think on this.
.. o O [ process of thinking ]
While I think, I might as well tell the problem I had with
throttle_vm_writeout(), which is nfs unstable pages. Those don't go away
by waiting for it - as throttle_vm_writeout() does. So once you have an
excess of those, you're stuck as well.
In this patch I totally ignored unstable, but I'm not sure that's the
proper thing to do, I'd need to figure out what happens to an unstable
page when passed into pageout() - or if its passed to pageout at all.
If unstable pages would be passed to pageout(), and it would properly
convert them to writeback and clean them, then there is nothing wrong.
(Trond?)
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2007-10-05 10:57 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-10-04 12:25 [PATCH] remove throttle_vm_writeout() Miklos Szeredi
2007-10-04 12:40 ` Peter Zijlstra
2007-10-04 13:00 ` Miklos Szeredi
2007-10-04 13:23 ` Peter Zijlstra
2007-10-04 13:49 ` Miklos Szeredi
2007-10-04 16:47 ` Peter Zijlstra
2007-10-04 17:46 ` Andrew Morton
2007-10-04 18:10 ` Peter Zijlstra
2007-10-04 18:54 ` Andrew Morton
[not found] ` <20071005123028.GA10372@mail.ustc.edu.cn>
2007-10-05 12:30 ` Fengguang Wu
2007-10-05 17:20 ` Andrew Morton
[not found] ` <20071006023224.GA7526@mail.ustc.edu.cn>
2007-10-06 2:32 ` Fengguang Wu
2007-10-07 23:54 ` David Chinner
[not found] ` <20071008003349.GA5455@mail.ustc.edu.cn>
2007-10-08 0:33 ` Fengguang Wu
2007-10-04 21:07 ` Miklos Szeredi
2007-10-04 21:56 ` Andrew Morton
2007-10-04 22:39 ` Miklos Szeredi
2007-10-04 23:09 ` Andrew Morton
2007-10-04 23:26 ` Miklos Szeredi
2007-10-04 23:48 ` Andrew Morton
2007-10-05 0:12 ` Miklos Szeredi
2007-10-05 0:48 ` Andrew Morton
2007-10-05 8:22 ` Peter Zijlstra
2007-10-05 9:22 ` Miklos Szeredi
2007-10-05 9:47 ` Peter Zijlstra
2007-10-05 10:27 ` Miklos Szeredi
2007-10-05 10:32 ` Miklos Szeredi
2007-10-05 15:43 ` John Stoffel
2007-10-05 10:57 ` Peter Zijlstra [this message]
2007-10-05 11:27 ` Miklos Szeredi
2007-10-05 17:50 ` Trond Myklebust
2007-10-05 18:32 ` Peter Zijlstra
2007-10-05 19:20 ` Trond Myklebust
2007-10-05 19:23 ` Trond Myklebust
2007-10-05 21:07 ` Peter Zijlstra
[not found] ` <20071006004028.GA7121@mail.ustc.edu.cn>
2007-10-06 0:40 ` Fengguang Wu
2007-10-05 7:32 ` Peter Zijlstra
2007-10-05 19:54 ` Rik van Riel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1191581854.22357.85.camel@twins \
--to=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=miklos@szeredi.hu \
--cc=trond.myklebust@fys.uio.no \
--cc=wfg@mail.ustc.edu.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).