* Re: OOM detection regressions since 4.7 [not found] ` <20160829145203.GA30660@aepfle.de> @ 2016-08-29 17:28 ` Linus Torvalds 2016-08-29 17:52 ` Jeff Layton 0 siblings, 1 reply; 2+ messages in thread From: Linus Torvalds @ 2016-08-29 17:28 UTC (permalink / raw) To: Olaf Hering, Bruce Fields, Jeff Layton Cc: Michal Hocko, Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby, Greg KH, Vlastimil Babka, Joonsoo Kim, linux-mm, LKML, Linux NFS Mailing List On Mon, Aug 29, 2016 at 7:52 AM, Olaf Hering <olaf@aepfle.de> wrote: > > Today I noticed the nfsserver was disabled, probably since months already. > Starting it gives a OOM, not sure if this is new with 4.7+. That's not an oom, that's just an allocation failure. And with order-4, that's actually pretty normal. Nobody should use order-4 (that's 16 contiguous pages, fragmentation can easily make that hard - *much* harder than the small order-2 or order-2 cases that we should largely be able to rely on). In fact, people who do multi-order allocations should always have a fallback, and use __GFP_NOWARN. > [93348.306406] Call Trace: > [93348.306490] [<ffffffff81198cef>] __alloc_pages_slowpath+0x1af/0xa10 > [93348.306501] [<ffffffff811997a0>] __alloc_pages_nodemask+0x250/0x290 > [93348.306511] [<ffffffff811f1c3d>] cache_grow_begin+0x8d/0x540 > [93348.306520] [<ffffffff811f23d1>] fallback_alloc+0x161/0x200 > [93348.306530] [<ffffffff811f43f2>] __kmalloc+0x1d2/0x570 > [93348.306589] [<ffffffffa08f025a>] nfsd_reply_cache_init+0xaa/0x110 [nfsd] Hmm. That's kmalloc itself falling back after already failing to grow the slab cache earlier (the earlier allocations *were* done with NOWARN afaik). It does look like nfsdstarts out by allocating the hash table with one single fairly big allocation, and has no fallback position. I suspect the code expects to be started at boot time, when this just isn't an issue. The fact that you loaded the nfsd kernel module with memory already fragmented after heavy use is likely why nobody else has seen this. Adding the nfsd people to the cc, because just from a robustness standpoint I suspect it would be better if the code did something like (a) shrink the hash table if the allocation fails (we've got some examples of that elsewhere) or (b) fall back on a vmalloc allocation (that's certainly the simpler model) We do have a "kvfree()" helper function for the "free either a kmalloc or vmalloc allocation" but we don't actually have a good helper pattern for the allocation side. People just do it by hand, at least partly because we have so many different ways to allocate things - zeroing, non-zeroing, node-specific or not, atomic or not (atomic cannot fall back to vmalloc, obviously) etc etc. Bruce, Jeff, comments? Linus ^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: OOM detection regressions since 4.7 2016-08-29 17:28 ` OOM detection regressions since 4.7 Linus Torvalds @ 2016-08-29 17:52 ` Jeff Layton 0 siblings, 0 replies; 2+ messages in thread From: Jeff Layton @ 2016-08-29 17:52 UTC (permalink / raw) To: Linus Torvalds, Olaf Hering, Bruce Fields Cc: Michal Hocko, Andrew Morton, Markus Trippelsdorf, Arkadiusz Miskiewicz, Ralf-Peter Rohbeck, Jiri Slaby, Greg KH, Vlastimil Babka, Joonsoo Kim, linux-mm, LKML, Linux NFS Mailing List On Mon, 2016-08-29 at 10:28 -0700, Linus Torvalds wrote: > > On Mon, Aug 29, 2016 at 7:52 AM, Olaf Hering <olaf@aepfle.de> wrote: > > > > > > Today I noticed the nfsserver was disabled, probably since months already. > > Starting it gives a OOM, not sure if this is new with 4.7+. > > That's not an oom, that's just an allocation failure. > > And with order-4, that's actually pretty normal. Nobody should use > order-4 (that's 16 contiguous pages, fragmentation can easily make > that hard - *much* harder than the small order-2 or order-2 cases that > we should largely be able to rely on). > > In fact, people who do multi-order allocations should always have a > fallback, and use __GFP_NOWARN. > > > > > [93348.306406] Call Trace: > > [93348.306490] [<ffffffff81198cef>] __alloc_pages_slowpath+0x1af/0xa10 > > [93348.306501] [<ffffffff811997a0>] __alloc_pages_nodemask+0x250/0x290 > > [93348.306511] [<ffffffff811f1c3d>] cache_grow_begin+0x8d/0x540 > > [93348.306520] [<ffffffff811f23d1>] fallback_alloc+0x161/0x200 > > [93348.306530] [<ffffffff811f43f2>] __kmalloc+0x1d2/0x570 > > [93348.306589] [<ffffffffa08f025a>] nfsd_reply_cache_init+0xaa/0x110 [nfsd] > > Hmm. That's kmalloc itself falling back after already failing to grow > the slab cache earlier (the earlier allocations *were* done with > NOWARN afaik). > > It does look like nfsdstarts out by allocating the hash table with one > single fairly big allocation, and has no fallback position. > > I suspect the code expects to be started at boot time, when this just > isn't an issue. The fact that you loaded the nfsd kernel module with > memory already fragmented after heavy use is likely why nobody else > has seen this. > > Adding the nfsd people to the cc, because just from a robustness > standpoint I suspect it would be better if the code did something like > > (a) shrink the hash table if the allocation fails (we've got some > examples of that elsewhere) > > or > > (b) fall back on a vmalloc allocation (that's certainly the simpler model) > > We do have a "kvfree()" helper function for the "free either a kmalloc > or vmalloc allocation" but we don't actually have a good helper > pattern for the allocation side. People just do it by hand, at least > partly because we have so many different ways to allocate things - > zeroing, non-zeroing, node-specific or not, atomic or not (atomic > cannot fall back to vmalloc, obviously) etc etc. > > Bruce, Jeff, comments? > > Linus Yeah, that makes total sense. Hmm...we _do_ already auto-size the hash at init time already, so shrinking it downward and retrying if the allocation fails wouldn't be hard to do. Maybe I can just cut it in half and throw a pr_warn to tell the admin in that case. In any case...I'll take a look at how we can improve it. Thanks for the heads-up! -- Jeff Layton <jlayton@poochiereds.net> ^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-08-29 17:52 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20160822093249.GA14916@dhcp22.suse.cz>
[not found] ` <20160822093707.GG13596@dhcp22.suse.cz>
[not found] ` <20160822100528.GB11890@kroah.com>
[not found] ` <20160822105441.GH13596@dhcp22.suse.cz>
[not found] ` <20160822133114.GA15302@kroah.com>
[not found] ` <20160822134227.GM13596@dhcp22.suse.cz>
[not found] ` <20160822150517.62dc7cce74f1af6c1f204549@linux-foundation.org>
[not found] ` <20160823074339.GB23577@dhcp22.suse.cz>
[not found] ` <20160825071103.GC4230@dhcp22.suse.cz>
[not found] ` <20160825071728.GA3169@aepfle.de>
[not found] ` <20160829145203.GA30660@aepfle.de>
2016-08-29 17:28 ` OOM detection regressions since 4.7 Linus Torvalds
2016-08-29 17:52 ` Jeff Layton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).