From: Luis Henriques <luis.henriques@canonical.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org,
kernel-team@lists.ubuntu.com, Michal Hocko <mhocko@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: [PATCH 3.5 29/64] fs: buffer: move allocation failure loop into the allocator
Date: Thu, 31 Oct 2013 14:25:37 +0000 [thread overview]
Message-ID: <20131031142537.GC4783@hercules.my.domain> (raw)
In-Reply-To: <20131031140008.GB14054@cmpxchg.org>
On Thu, Oct 31, 2013 at 10:00:08AM -0400, Johannes Weiner wrote:
> This is part of a bigger series and was tagged for stable as a
> reminder only. Please don't apply for now.
Grrr... I need to start cleaning my email inbox before doing a
release. I just saw the discussion in stable@.
I'll do an emergency release reverting this patch. Thanks for
catching this.
Cheers,
--
Luis
>
> On Mon, Oct 28, 2013 at 02:47:48PM +0000, Luis Henriques wrote:
> > 3.5.7.24 -stable review patch. If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Johannes Weiner <hannes@cmpxchg.org>
> >
> > commit 84235de394d9775bfaa7fa9762a59d91fef0c1fc upstream.
> >
> > Buffer allocation has a very crude indefinite loop around waking the
> > flusher threads and performing global NOFS direct reclaim because it can
> > not handle allocation failures.
> >
> > The most immediate problem with this is that the allocation may fail due
> > to a memory cgroup limit, where flushers + direct reclaim might not make
> > any progress towards resolving the situation at all. Because unlike the
> > global case, a memory cgroup may not have any cache at all, only
> > anonymous pages but no swap. This situation will lead to a reclaim
> > livelock with insane IO from waking the flushers and thrashing unrelated
> > filesystem cache in a tight loop.
> >
> > Use __GFP_NOFAIL allocations for buffers for now. This makes sure that
> > any looping happens in the page allocator, which knows how to
> > orchestrate kswapd, direct reclaim, and the flushers sensibly. It also
> > allows memory cgroups to detect allocations that can't handle failure
> > and will allow them to ultimately bypass the limit if reclaim can not
> > make progress.
> >
> > Reported-by: azurIt <azurit@pobox.sk>
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Michal Hocko <mhocko@suse.cz>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> > Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
> > ---
> > fs/buffer.c | 14 ++++++++++++--
> > mm/memcontrol.c | 2 ++
> > 2 files changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 2c78739..2675e5a 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -957,9 +957,19 @@ grow_dev_page(struct block_device *bdev, sector_t block,
> > struct buffer_head *bh;
> > sector_t end_block;
> > int ret = 0; /* Will call free_more_memory() */
> > + gfp_t gfp_mask;
> >
> > - page = find_or_create_page(inode->i_mapping, index,
> > - (mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS)|__GFP_MOVABLE);
> > + gfp_mask = mapping_gfp_mask(inode->i_mapping) & ~__GFP_FS;
> > + gfp_mask |= __GFP_MOVABLE;
> > + /*
> > + * XXX: __getblk_slow() can not really deal with failure and
> > + * will endlessly loop on improvised global reclaim. Prefer
> > + * looping in the allocator rather than here, at least that
> > + * code knows what it's doing.
> > + */
> > + gfp_mask |= __GFP_NOFAIL;
> > +
> > + page = find_or_create_page(inode->i_mapping, index, gfp_mask);
> > if (!page)
> > return ret;
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 226b63e..953bf3c 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -2405,6 +2405,8 @@ done:
> > return 0;
> > nomem:
> > *ptr = NULL;
> > + if (gfp_mask & __GFP_NOFAIL)
> > + return 0;
> > return -ENOMEM;
> > bypass:
> > *ptr = root_mem_cgroup;
> > --
> > 1.8.3.2
> >
> --
> To unsubscribe from this list: send the line "unsubscribe stable" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2013-10-31 14:25 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-28 14:47 [3.5.y.z extended stable] Linux 3.5.7.24 stable review Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 01/64] ACPI / IPMI: Fix atomic context requirement of ipmi_msg_handler() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 02/64] Btrfs: change how we queue blocks for backref checking Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 03/64] watchdog: ts72xx_wdt: locking bug in ioctl Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 04/64] random: run random_int_secret_init() run after all late_initcalls Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 05/64] tile: use a more conservative __my_cpu_offset in CONFIG_PREEMPT Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 06/64] ALSA: snd-usb-usx2y: remove bogus frame checks Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 07/64] ALSA: hda - Add fixup for ASUS N56VZ Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 08/64] hwmon: (applesmc) Always read until end of data Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 09/64] drm/radeon: fix hw contexts for SUMO2 asics Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 10/64] KVM: PPC: Book3S HV: Fix typo in saving DSCR Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 11/64] random: allow architectures to optionally define random_get_entropy() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 12/64] ext4: fix memory leak in xattr Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 13/64] parisc: fix interruption handler to respect pagefault_disable() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 14/64] mm, show_mem: suppress page counts in non-blockable contexts Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 15/64] mm/mmap: check for RLIMIT_AS before unmapping Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 16/64] mm: do not grow the stack vma just because of an overrun on preceding vma Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 17/64] xhci: Don't enable/disable RWE on bus suspend/resume Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 18/64] xhci: quirk for extra long delay for S4 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 19/64] xhci: Fix spurious wakeups after S5 on Haswell Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 20/64] USB: support new huawei devices in option.c Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 21/64] USB: serial: ti_usb_3410_5052: add Abbott strip port ID to combined table as well Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 22/64] USB: serial: option: add support for Inovia SEW858 device Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 23/64] ARM: 7851/1: check for number of arguments in syscall_get/set_arguments() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 24/64] USB: quirks.c: add one device that cannot deal with suspension Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 25/64] dm snapshot: fix data corruption Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 26/64] USB: quirks: add touchscreen that is dazzeled by remote wakeup Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 27/64] usb: serial: option: blacklist Olivetti Olicard200 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 28/64] usb-storage: add quirk for mandatory READ_CAPACITY_16 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 29/64] fs: buffer: move allocation failure loop into the allocator Luis Henriques
2013-10-31 14:00 ` Johannes Weiner
2013-10-31 14:25 ` Luis Henriques [this message]
2013-10-28 14:47 ` [PATCH 3.5 30/64] writeback: fix negative bdi max pause Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 31/64] powerpc/pseries/lparcfg: Fix possible overflow are more than 1026 Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 32/64] powerpc: Restore registers on error exit from csum_partial_copy_generic() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 33/64] nilfs2: fix issue with race condition of competition between segments for dirty blocks Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 34/64] fuse: hotfix truncate_pagecache() issue Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 35/64] rt2800: fix wrong TX power compensation Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 36/64] [media] sh_vou: almost forever loop in sh_vou_try_fmt_vid_out() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 37/64] tcp: must unclone packets before mangling them Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 38/64] tcp: do not forget FIN in tcp_shifted_skb() Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 39/64] net: do not call sock_put() on TIMEWAIT sockets Luis Henriques
2013-10-28 14:47 ` [PATCH 3.5 40/64] net: mv643xx_eth: update statistics timer from timer context only Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 41/64] net: mv643xx_eth: fix orphaned statistics timer crash Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 42/64] net: heap overflow in __audit_sockaddr() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 43/64] proc connector: fix info leaks Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 44/64] ipv4: fix ineffective source address selection Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 45/64] can: dev: fix nlmsg size calculation in can_get_size() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 46/64] ipv6: restrict neighbor entry creation to output flow Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 47/64] bridge: Correctly clamp MAX forward_delay when enabling STP Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 48/64] net: vlan: fix nlmsg size calculation in vlan_get_size() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 49/64] l2tp: must disable bh before calling l2tp_xmit_skb() Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 50/64] farsync: fix info leak in ioctl Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 51/64] unix_diag: fix info leak Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 52/64] connector: use nlmsg_len() to check message length Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 53/64] bnx2x: record rx queue for LRO packets Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 54/64] net: dst: provide accessor function to dst->xfrm Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 55/64] sctp: Use software crc32 checksum when xfrm transform will happen Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 56/64] sctp: Perform software checksum if packet has to be fragmented Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 57/64] wanxl: fix info leak in ioctl Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 58/64] net: unix: inherit SOCK_PASS{CRED, SEC} flags from socket to fix race Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 59/64] net: fix cipso packet validation when !NETLABEL Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 60/64] inet: fix possible memory corruption with UDP_CORK and UFO Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 61/64] davinci_emac.c: Fix IFF_ALLMULTI setup Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 62/64] can: flexcan: fix flexcan_chip_start() on imx6 Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 63/64] can: flexcan: flexcan_chip_start: fix regression, mark one MB for TX and abort pending TX Luis Henriques
2013-10-28 14:48 ` [PATCH 3.5 64/64] PCI: fix truncation of resource size to 32 bits Luis Henriques
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131031142537.GC4783@hercules.my.domain \
--to=luis.henriques@canonical.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@lists.ubuntu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mhocko@suse.cz \
--cc=stable@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).