From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Michal Hocko <mhocko@kernel.org>, Christoph Hellwig <hch@lst.de>,
Brian Foster <bfoster@redhat.com>,
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>,
Xiong Zhou <xzhou@redhat.com>,
linux-xfs@vger.kernel.org, linux-mm@kvack.org,
LKML <linux-kernel@vger.kernel.org>,
linux-fsdevel@vger.kernel.org, Michal Hocko <mhocko@suse.com>
Subject: Re: [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail
Date: Sat, 4 Mar 2017 15:48:12 +1100 [thread overview]
Message-ID: <20170304044812.GK17542@dastard> (raw)
In-Reply-To: <20170303231912.GA5073@birch.djwong.org>
On Fri, Mar 03, 2017 at 03:19:12PM -0800, Darrick J. Wong wrote:
> On Sat, Mar 04, 2017 at 09:54:44AM +1100, Dave Chinner wrote:
> > On Thu, Mar 02, 2017 at 04:45:40PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > Even though kmem_zalloc_greedy is documented it might fail the current
> > > code doesn't really implement this properly and loops on the smallest
> > > allowed size for ever. This is a problem because vzalloc might fail
> > > permanently - we might run out of vmalloc space or since 5d17a73a2ebe
> > > ("vmalloc: back off when the current task is killed") when the current
> > > task is killed. The later one makes the failure scenario much more
> > > probable than it used to be because it makes vmalloc() failures
> > > permanent for tasks with fatal signals pending.. Fix this by bailing out
> > > if the minimum size request failed.
> > >
> > > This has been noticed by a hung generic/269 xfstest by Xiong Zhou.
> > >
> > > fsstress: vmalloc: allocation failure, allocated 12288 of 20480 bytes, mode:0x14080c2(GFP_KERNEL|__GFP_HIGHMEM|__GFP_ZERO), nodemask=(null)
> > > fsstress cpuset=/ mems_allowed=0-1
> > > CPU: 1 PID: 23460 Comm: fsstress Not tainted 4.10.0-master-45554b2+ #21
> > > Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/05/2016
> > > Call Trace:
> > > dump_stack+0x63/0x87
> > > warn_alloc+0x114/0x1c0
> > > ? alloc_pages_current+0x88/0x120
> > > __vmalloc_node_range+0x250/0x2a0
> > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > > ? free_hot_cold_page+0x21f/0x280
> > > vzalloc+0x54/0x60
> > > ? kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > > kmem_zalloc_greedy+0x2b/0x40 [xfs]
> > > xfs_bulkstat+0x11b/0x730 [xfs]
> > > ? xfs_bulkstat_one_int+0x340/0x340 [xfs]
> > > ? selinux_capable+0x20/0x30
> > > ? security_capable+0x48/0x60
> > > xfs_ioc_bulkstat+0xe4/0x190 [xfs]
> > > xfs_file_ioctl+0x9dd/0xad0 [xfs]
> > > ? do_filp_open+0xa5/0x100
> > > do_vfs_ioctl+0xa7/0x5e0
> > > SyS_ioctl+0x79/0x90
> > > do_syscall_64+0x67/0x180
> > > entry_SYSCALL64_slow_path+0x25/0x25
> > >
> > > fsstress keeps looping inside kmem_zalloc_greedy without any way out
> > > because vmalloc keeps failing due to fatal_signal_pending.
> > >
> > > Reported-by: Xiong Zhou <xzhou@redhat.com>
> > > Analyzed-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > > fs/xfs/kmem.c | 2 ++
> > > 1 file changed, 2 insertions(+)
> > >
> > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > index 339c696bbc01..ee95f5c6db45 100644
> > > --- a/fs/xfs/kmem.c
> > > +++ b/fs/xfs/kmem.c
> > > @@ -34,6 +34,8 @@ kmem_zalloc_greedy(size_t *size, size_t minsize, size_t maxsize)
> > > size_t kmsize = maxsize;
> > >
> > > while (!(ptr = vzalloc(kmsize))) {
> > > + if (kmsize == minsize)
> > > + break;
> > > if ((kmsize >>= 1) <= minsize)
> > > kmsize = minsize;
> > > }
> >
> > Seems wrong to me - this function used to have lots of callers and
> > over time we've slowly removed them or replaced them with something
> > else. I'd suggest removing it completely, replacing the call sites
> > with kmem_zalloc_large().
>
> Heh. I thought the reason why _greedy still exists (for its sole user
> bulkstat) is that bulkstat had the flexibility to deal with receiving
> 0, 1, or 4 pages. So yeah, we could just kill it.
irbuf is sized to minimise AGI locking, but if memory is low
it just uses what it can get. Keep in mind the number of inodes we
need to process is determined by the userspace buffer size, which
can easily be sized to hold tens of thousands of struct
xfs_bulkstat.
> But thinking even more stingily about memory, are there applications
> that care about being able to bulkstat 16384 inodes at once?
IIRC, xfsdump can bulkstat up to 64k inodes per call....
> How badly
> does bulkstat need to be able to bulk-process more than a page's worth
> of inobt records, anyway?
Benchmark it on a busy system doing lots of other AGI work (e.g. a
busy NFS server workload with a working set of tens of millions of
inodes so it doesn't fit in cache) and find out. That's generally
how I answer those sorts of questions...
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-03-04 4:48 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-01 4:46 mm allocation failure and hang when running xfstests generic/269 on xfs Xiong Zhou
2017-03-02 0:37 ` Christoph Hellwig
2017-03-02 5:19 ` Xiong Zhou
2017-03-02 6:41 ` Bob Liu
2017-03-02 6:47 ` Anshuman Khandual
2017-03-02 8:42 ` Michal Hocko
2017-03-02 9:23 ` Xiong Zhou
2017-03-02 10:04 ` Tetsuo Handa
2017-03-02 10:35 ` Michal Hocko
2017-03-02 10:53 ` mm allocation failure and hang when running xfstests generic/269on xfs Tetsuo Handa
2017-03-02 12:24 ` mm allocation failure and hang when running xfstests generic/269 on xfs Brian Foster
2017-03-02 12:49 ` Michal Hocko
2017-03-02 13:00 ` Brian Foster
2017-03-02 13:07 ` Tetsuo Handa
2017-03-02 13:27 ` Michal Hocko
2017-03-02 13:41 ` Brian Foster
2017-03-02 13:50 ` Michal Hocko
2017-03-02 14:23 ` Brian Foster
2017-03-02 14:34 ` Michal Hocko
2017-03-02 14:51 ` Brian Foster
2017-03-02 15:14 ` Michal Hocko
2017-03-02 15:30 ` Brian Foster
2017-03-02 15:45 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Michal Hocko
2017-03-02 15:45 ` [PATCH 2/2] xfs: back off from kmem_zalloc_greedy if the task is killed Michal Hocko
2017-03-02 15:49 ` Christoph Hellwig
2017-03-02 15:59 ` Brian Foster
2017-03-02 15:49 ` [PATCH 1/2] xfs: allow kmem_zalloc_greedy to fail Christoph Hellwig
2017-03-02 15:59 ` Brian Foster
2017-03-02 16:16 ` Michal Hocko
2017-03-02 16:44 ` Darrick J. Wong
2017-03-03 22:54 ` Dave Chinner
2017-03-03 23:19 ` Darrick J. Wong
2017-03-04 4:48 ` Dave Chinner [this message]
2017-03-06 13:21 ` Michal Hocko
2017-03-02 15:47 ` mm allocation failure and hang when running xfstests generic/269 on xfs Michal Hocko
2017-03-02 15:47 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170304044812.GK17542@dastard \
--to=david@fromorbit.com \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=xzhou@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).