linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	linux-mm@kvack.org, Tejun Heo <tj@kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] mm,writeback: Don't use memory reserves for wb_start_writeback
Date: Tue, 29 Mar 2016 18:49:42 +0200	[thread overview]
Message-ID: <20160329164942.GA10963@quack.suse.cz> (raw)
In-Reply-To: <20160329085434.GB3228@dhcp22.suse.cz>

On Tue 29-03-16 10:54:35, Michal Hocko wrote:
> On Thu 24-03-16 14:17:14, Andrew Morton wrote:
> > On Thu, 24 Mar 2016 23:03:16 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote:
> > 
> > > Andrew, can you take this patch?
> > 
> > Tejun.
> > 
> > > ----------------------------------------
> > > >From 5d43acbc5849a63494a732e39374692822145923 Mon Sep 17 00:00:00 2001
> > > From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Date: Sun, 13 Mar 2016 23:03:05 +0900
> > > Subject: [PATCH] mm,writeback: Don't use memory reserves for
> > >  wb_start_writeback
> > > 
> > > When writeback operation cannot make forward progress because memory
> > > allocation requests needed for doing I/O cannot be satisfied (e.g.
> > > under OOM-livelock situation), we can observe flood of order-0 page
> > > allocation failure messages caused by complete depletion of memory
> > > reserves.
> > > 
> > > This is caused by unconditionally allocating "struct wb_writeback_work"
> > > objects using GFP_ATOMIC from PF_MEMALLOC context.
> > > 
> > > __alloc_pages_nodemask() {
> > >   __alloc_pages_slowpath() {
> > >     __alloc_pages_direct_reclaim() {
> > >       __perform_reclaim() {
> > >         current->flags |= PF_MEMALLOC;
> > >         try_to_free_pages() {
> > >           do_try_to_free_pages() {
> > >             wakeup_flusher_threads() {
> > >               wb_start_writeback() {
> > >                 kzalloc(sizeof(*work), GFP_ATOMIC) {
> > >                   /* ALLOC_NO_WATERMARKS via PF_MEMALLOC */
> > >                 }
> > >               }
> > >             }
> > >           }
> > >         }
> > >         current->flags &= ~PF_MEMALLOC;
> > >       }
> > >     }
> > >   }
> > > }
> > > 
> > > Since I/O is stalling, allocating writeback requests forever shall deplete
> > > memory reserves. Fortunately, since wb_start_writeback() can fall back to
> > > wb_wakeup() when allocating "struct wb_writeback_work" failed, we don't
> > > need to allow wb_start_writeback() to use memory reserves.
> > > 
> > > ...
> > >
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -929,7 +929,8 @@ void wb_start_writeback(struct bdi_writeback *wb, long nr_pages,
> > >  	 * This is WB_SYNC_NONE writeback, so if allocation fails just
> > >  	 * wakeup the thread for old dirty data writeback
> > >  	 */
> > > -	work = kzalloc(sizeof(*work), GFP_ATOMIC);
> > > +	work = kzalloc(sizeof(*work),
> > > +		       GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> > >  	if (!work) {
> > >  		trace_writeback_nowork(wb);
> > >  		wb_wakeup(wb);
> > 
> > Oh geeze.  fs/fs-writeback.c has grown waaay too many GFP_ATOMICs :(
> > 
> > How does this actually all work?
> 
> Jack has explained it a bit
> http://lkml.kernel.org/r/20160318131136.GE7152@quack.suse.cz
> 
> > afaict if we fail this
> > wb_writeback_work allocation, wb_workfn->wb_do_writeback will later say
> > "hey, there are no work items!" and will do nothing at all.  Or does
> > wb_workfn() fall into write-1024-pages-anyway mode and if so, how did
> > it know how to do that?

We will end up in wb_do_writeback() which finds there's no work item so it
falls back to doing default background writeback (i.e., write out until
number of dirty pages is below background_dirty_limit).

> > If we had (say) a mempool of wb_writeback_work's (at least for for
> > wb_start_writeback), would that help anything?  Or would writeback
> > simply fail shortly afterwards for other reasons?

Not sure mempools would significantly improve the situation. Writeback code
is able to deal with the failed allocation so I think the issue remains
more with writeback code mostly pointlessly exhausting memory reserves with
atomic allocations.

I think it is somewhat dumb from do_try_to_free_pages() that it calls
wakeup_flusher_threads() so often (I guess it can quickly end up asking to
write more than it is ever sensible to ask). Admittedly it is also dumb from
the writeback code that it is not able to merge requests for writeback - we
could easily merge items created by wb_start_writeback() with matching
'reason' and 'range_cyclic'.

I'm not sure how easy it is to fix the first thing, I think improving the
second one may be worth it and I can have a look at that.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-03-29 16:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-24 14:03 [PATCH] mm,writeback: Don't use memory reserves for wb_start_writeback Tetsuo Handa
2016-03-24 21:17 ` Andrew Morton
2016-03-25 11:54   ` Tetsuo Handa
2016-03-29  8:54   ` Michal Hocko
2016-03-29 16:49     ` Jan Kara [this message]
2016-04-04 10:58       ` Tetsuo Handa
  -- strict thread matches above, loose matches on Subject: below --
2016-04-28 13:26 Tetsuo Handa
2016-03-13  5:32 [PATCH] mm,writeback: Don't use ALLOC_NO_WATERMARKS " Tetsuo Handa
2016-03-13 14:22 ` [PATCH] mm,writeback: Don't use memory reserves " Tetsuo Handa
2016-03-14 16:09   ` Michal Hocko
2016-03-16 20:46     ` Tejun Heo
2016-03-18 13:11       ` Jan Kara
2016-03-18 13:34         ` Michal Hocko
2016-03-18 13:42   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160329164942.GA10963@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).