linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Luis Henriques <luis.henriques@canonical.com>,
	linux-kernel@vger.kernel.org, kernel-team@lists.ubuntu.com,
	Michal Hocko <mhocko@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 3.5 29/64] fs: buffer: move allocation failure loop into the allocator
Date: Thu, 31 Oct 2013 15:48:48 +0100	[thread overview]
Message-ID: <20131031144848.GA3275@quack.suse.cz> (raw)
In-Reply-To: <20131031140008.GB14054@cmpxchg.org>

On Thu 31-10-13 10:00:08, Johannes Weiner wrote:
> On Mon, Oct 28, 2013 at 02:47:48PM +0000, Luis Henriques wrote:
> > 3.5.7.24 -stable review patch.  If anyone has any objections, please let me know.
> > 
> > ------------------
> > 
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > 
> > commit 84235de394d9775bfaa7fa9762a59d91fef0c1fc upstream.
> > 
> > Buffer allocation has a very crude indefinite loop around waking the
> > flusher threads and performing global NOFS direct reclaim because it can
> > not handle allocation failures.
> > 
> > The most immediate problem with this is that the allocation may fail due
> > to a memory cgroup limit, where flushers + direct reclaim might not make
> > any progress towards resolving the situation at all.  Because unlike the
> > global case, a memory cgroup may not have any cache at all, only
> > anonymous pages but no swap.  This situation will lead to a reclaim
> > livelock with insane IO from waking the flushers and thrashing unrelated
> > filesystem cache in a tight loop.
> > 
> > Use __GFP_NOFAIL allocations for buffers for now.  This makes sure that
> > any looping happens in the page allocator, which knows how to
> > orchestrate kswapd, direct reclaim, and the flushers sensibly.  It also
> > allows memory cgroups to detect allocations that can't handle failure
> > and will allow them to ultimately bypass the limit if reclaim can not
> > make progress.
  So I was under the impression that __GFP_NOFAIL is going away, doesn't
it? At least about an year ago there was some effort to remove its users so
we ended up creating loops like the above one (and similar ones for
jbd/jbd2) in cases where handling the failure wasn't easily possible. And now
it seems we are going in the opposite direction... At least we have a
steady flow of patches guaranteed :)

								Honza
> > 
> > Reported-by: azurIt <azurit@pobox.sk>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@suse.cz>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
> > ---
> >  fs/buffer.c     | 14 ++++++++++++--
> >  mm/memcontrol.c |  2 ++
> >  2 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 2c78739..2675e5a 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -957,9 +957,19 @@ grow_dev_page(struct block_device *bdev, sector_t block,
> >  	struct buffer_head *bh;
> >  	sector_t end_block;
> >  	int ret = 0;		/* Will call free_more_memory() */
> > +	gfp_t gfp_mask;
> >  
> > -	page = find_or_create_page(inode->i_mapping, index,
> > -		(mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE);
> > +	gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
> > +	gfp_mask |= __GFP_MOVABLE;
> > +	/*
> > +	 * XXX: __getblk_slow() can not really deal with failure and
> > +	 * will endlessly loop on improvised global reclaim.  Prefer
> > +	 * looping in the allocator rather than here, at least that
> > +	 * code knows what it's doing.
> > +	 */
> > +	gfp_mask |= __GFP_NOFAIL;
> > +
> > +	page = find_or_create_page(inode->i_mapping, index, gfp_mask);
> >  	if (!page)
> >  		return ret;
> >  
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 226b63e..953bf3c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2405,6 +2405,8 @@ done:
> >  	return 0;
> >  nomem:
> >  	*ptr = NULL;
> > +	if (gfp_mask & __GFP_NOFAIL)
> > +		return 0;
> >  	return -ENOMEM;
> >  bypass:
> >  	*ptr = root_mem_cgroup;
> > -- 
> > 1.8.3.2
> > 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  parent reply	other threads:[~2013-10-31 14:48 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-28 14:47 [3.5.y.z extended stable] Linux 3.5.7.24 stable review Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 01/64] ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 02/64] Btrfs: change how we queue blocks for backref checking Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 03/64] watchdog: ts72xx_wdt: locking bug in ioctl Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 04/64] random: run random_int_secret_init() run after all late_initcalls Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 05/64] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 06/64] ALSA: snd-usb-usx2y: remove bogus frame checks Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 07/64] ALSA: hda - Add fixup for ASUS N56VZ Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 08/64] hwmon: (applesmc) Always read until end of data Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 09/64] drm/radeon: fix hw contexts for SUMO2 asics Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 10/64] KVM: PPC: Book3S HV: Fix typo in saving DSCR Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 11/64] random: allow architectures to optionally define random_get_entropy() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 12/64] ext4: fix memory leak in xattr Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 13/64] parisc: fix interruption handler to respect pagefault_disable() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 14/64] mm, show_mem: suppress page counts in non-blockable contexts Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 15/64] mm/mmap: check for RLIMIT_AS before unmapping Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 16/64] mm: do not grow the stack vma just because of an overrun on preceding vma Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 17/64] xhci: Don't enable/disable RWE on bus suspend/resume Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 18/64] xhci: quirk for extra long delay for S4 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 19/64] xhci: Fix spurious wakeups after S5 on Haswell Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 20/64] USB: support new huawei devices in option.c Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 21/64] USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 22/64] USB: serial: option: add support for Inovia SEW858 device Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 23/64] ARM: 7851/1: check for number of arguments in syscall_get/set_arguments() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 24/64] USB: quirks.c: add one device that cannot deal with suspension Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 25/64] dm snapshot: fix data corruption Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 26/64] USB: quirks: add touchscreen that is dazzeled by remote wakeup Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 27/64] usb: serial: option: blacklist Olivetti Olicard200 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 28/64] usb-storage: add quirk for mandatory READ_CAPACITY_16 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 29/64] fs: buffer: move allocation failure loop into the allocator Luis Henriques
2013-10-31 14:00   ` Johannes Weiner
2013-10-31 14:25     ` Luis Henriques
2013-10-31 14:48     ` Jan Kara [this message]
2013-10-31 16:03       ` Johannes Weiner
2013-10-31 16:03       ` Andrew Morton
2013-10-31 19:35         ` Jan Kara
2013-10-28 14:47 ` [PATCH 3.5 30/64] writeback: fix negative bdi max pause Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 31/64] powerpc/pseries/lparcfg: Fix possible overflow are more than 1026 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 32/64] powerpc: Restore registers on error exit from csum_partial_copy_generic() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 33/64] nilfs2: fix issue with race condition of competition between segments for dirty blocks Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 34/64] fuse: hotfix truncate_pagecache() issue Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 35/64] rt2800: fix wrong TX power compensation Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 36/64] [media] sh_vou: almost forever loop in sh_vou_try_fmt_vid_out() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 37/64] tcp: must unclone packets before mangling them Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 38/64] tcp: do not forget FIN in tcp_shifted_skb() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 39/64] net: do not call sock_put() on TIMEWAIT sockets Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 40/64] net: mv643xx_eth: update statistics timer from timer context only Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 41/64] net: mv643xx_eth: fix orphaned statistics timer crash Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 42/64] net: heap overflow in __audit_sockaddr() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 43/64] proc connector: fix info leaks Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 44/64] ipv4: fix ineffective source address selection Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 45/64] can: dev: fix nlmsg size calculation in can_get_size() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 46/64] ipv6: restrict neighbor entry creation to output flow Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 47/64] bridge: Correctly clamp MAX forward_delay when enabling STP Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 48/64] net: vlan: fix nlmsg size calculation in vlan_get_size() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 49/64] l2tp: must disable bh before calling l2tp_xmit_skb() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 50/64] farsync: fix info leak in ioctl Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 51/64] unix_diag: fix info leak Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 52/64] connector: use nlmsg_len() to check message length Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 53/64] bnx2x: record rx queue for LRO packets Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 54/64] net: dst: provide accessor function to dst->xfrm Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 55/64] sctp: Use software crc32 checksum when xfrm transform will happen Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 56/64] sctp: Perform software checksum if packet has to be fragmented Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 57/64] wanxl: fix info leak in ioctl Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 58/64] net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 59/64] net: fix cipso packet validation when !NETLABEL Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 60/64] inet: fix possible memory corruption with UDP_CORK and UFO Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 61/64] davinci_emac.c: Fix IFF_ALLMULTI setup Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 62/64] can: flexcan: fix flexcan_chip_start() on imx6 Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 63/64] can: flexcan: flexcan_chip_start: fix regression, mark one MB for TX and abort pending TX Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 64/64] PCI: fix truncation of resource size to 32 bits Luis Henriques

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131031144848.GA3275@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@lists.ubuntu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luis.henriques@canonical.com \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).