* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! [not found] <bug-87891-27@https.bugzilla.kernel.org/> @ 2014-11-11 23:31 ` Andrew Morton 2014-11-12 0:36 ` Christoph Lameter ` (2 more replies) 0 siblings, 3 replies; 24+ messages in thread From: Andrew Morton @ 2014-11-11 23:31 UTC (permalink / raw) To: Ming Lei, Pekka Enberg, Joonsoo Kim, Christoph Lameter, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa Cc: bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Thu, 06 Nov 2014 17:28:41 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=87891 > > Bug ID: 87891 > Summary: kernel BUG at mm/slab.c:2625! > Product: Memory Management > Version: 2.5 > Kernel Version: 3.17.2 > Hardware: i386 > OS: Linux > Tree: Mainline > Status: NEW > Severity: blocking > Priority: P1 > Component: Slab Allocator > Assignee: akpm@linux-foundation.org > Reporter: luke-jr+linuxbugs@utopios.org > Regression: No Well this is interesting. > [359782.842112] kernel BUG at mm/slab.c:2625! > ... > [359782.843008] Call Trace: > [359782.843017] [<ffffffff8115181f>] __kmalloc+0xdf/0x200 > [359782.843037] [<ffffffffa0466285>] ? ttm_page_pool_free+0x35/0x180 [ttm] > [359782.843060] [<ffffffffa0466285>] ttm_page_pool_free+0x35/0x180 [ttm] > [359782.843084] [<ffffffffa046674e>] ttm_pool_shrink_scan+0xae/0xd0 [ttm] > [359782.843108] [<ffffffff8111c2fb>] shrink_slab_node+0x12b/0x2e0 > [359782.843129] [<ffffffff81127ed4>] ? fragmentation_index+0x14/0x70 > [359782.843150] [<ffffffff8110fc3a>] ? zone_watermark_ok+0x1a/0x20 > [359782.843171] [<ffffffff8111ceb8>] shrink_slab+0xc8/0x110 > [359782.843189] [<ffffffff81120480>] do_try_to_free_pages+0x300/0x410 > [359782.843210] [<ffffffff8112084b>] try_to_free_pages+0xbb/0x190 > [359782.843230] [<ffffffff81113136>] __alloc_pages_nodemask+0x696/0xa90 > [359782.843253] [<ffffffff8115810a>] do_huge_pmd_anonymous_page+0xfa/0x3f0 > [359782.843278] [<ffffffff812dffe7>] ? debug_smp_processor_id+0x17/0x20 > [359782.843300] [<ffffffff81118dc7>] ? __lru_cache_add+0x57/0xa0 > [359782.843321] [<ffffffff811385ce>] handle_mm_fault+0x37e/0xdd0 It went pagefault ->__alloc_pages_nodemask ->shrink_slab ->ttm_pool_shrink_scan ->ttm_page_pool_free ->kmalloc ->cache_grow ->BUG_ON(flags & GFP_SLAB_BUG_MASK); And I don't really know why - I'm not seeing anything in there which can set a GFP flag which is outside GFP_SLAB_BUG_MASK. However I see lots of nits. Core MM: __alloc_pages_nodemask() does if (unlikely(!page)) { /* * Runtime PM, block IO and its error handling path * can deadlock because I/O on the device might not * complete. */ gfp_mask = memalloc_noio_flags(gfp_mask); page = __alloc_pages_slowpath(gfp_mask, order, zonelist, high_zoneidx, nodemask, preferred_zone, classzone_idx, migratetype); } so it permanently alters the value of incoming arg gfp_mask. This means that the following trace_mm_page_alloc() will print the wrong value of gfp_mask, and if we later do the `goto retry_cpuset', we retry with a possibly different gfp_mask. Isn't this a bug? Also, why are we even passing a gfp_t down to the shrinkers? So they can work out the allocation context - things like __GFP_IO, __GFP_FS, etc? Is it even appropriate to use that mask for a new allocation attempt within a particular shrinker? ttm: I think it's a bad idea to be calling kmalloc() in the slab shrinker function. We *know* that the system is low on memory and is trying to free things up. Trying to allocate *more* memory at this time is asking for trouble. ttm_page_pool_free() could easily be tweaked to use a fixed-size local array of page*'s t avoid that allocation. Could someone implement this please? slab: There's no point in doing #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. What's it trying to do here? And it's quite infuriating to go BUG when the code could easily warn and fix it up. And it's quite infuriating to go BUG because one of the bits was set, but not tell us which bit it was! Could the slab guys please review this? From: Andrew Morton <akpm@linux-foundation.org> Subject: slab: improve checking for invalid gfp_flags - The code goes BUG, but doesn't tell us which bits were unexpectedly set. Print that out. - The code goes BUG when it could jsut fix things up and proceed. Do that. - ~__GFP_BITS_MASK already includes __GFP_DMA32 and __GFP_HIGHMEM, so remove those from the GFP_SLAB_BUG_MASK definition. Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- include/linux/gfp.h | 2 +- mm/slab.c | 5 ++++- mm/slub.c | 5 ++++- 3 files changed, 9 insertions(+), 3 deletions(-) diff -puN include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags include/linux/gfp.h --- a/include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags +++ a/include/linux/gfp.h @@ -145,7 +145,7 @@ struct vm_area_struct; #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) /* Do not use these with a slab allocator */ -#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) +#define GFP_SLAB_BUG_MASK (~__GFP_BITS_MASK) /* Flag - indicates that the buffer will be suitable for DMA. Ignored on some platforms, used as appropriate on others */ diff -puN mm/slab.c~slab-improve-checking-for-invalid-gfp_flags mm/slab.c --- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slab.c @@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache * Be lazy and only check for valid flags here, keeping it out of the * critical path in kmem_cache_alloc(). */ - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (WARN_ON(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + flags &= ~GFP_SLAB_BUG_MASK; + } local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); /* Take the node list lock to change the colour_next on this node */ diff -puN mm/slub.c~slab-improve-checking-for-invalid-gfp_flags mm/slub.c --- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slub.c @@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem int order; int idx; - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (WARN_ON(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + flags &= ~GFP_SLAB_BUG_MASK; + } page = allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-11 23:31 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Andrew Morton @ 2014-11-12 0:36 ` Christoph Lameter 2014-11-12 0:49 ` Andrew Morton 2014-11-12 0:44 ` Joonsoo Kim 2014-11-13 7:04 ` Vlastimil Babka 2 siblings, 1 reply; 24+ messages in thread From: Christoph Lameter @ 2014-11-12 0:36 UTC (permalink / raw) To: Andrew Morton Cc: Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, 11 Nov 2014, Andrew Morton wrote: > There's no point in doing > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 0:36 ` Christoph Lameter @ 2014-11-12 0:49 ` Andrew Morton 2014-11-12 0:54 ` Luke Dashjr 2014-11-12 1:22 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Kirill A. Shutemov 0 siblings, 2 replies; 24+ messages in thread From: Andrew Morton @ 2014-11-12 0:49 UTC (permalink / raw) To: Christoph Lameter Cc: Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > There's no point in doing > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. Ah, yes, OK. I suppose it's possible that __GFP_HIGHMEM was set. do_huge_pmd_anonymous_page ->pte_alloc_one ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) but I haven't traced that through and that's 32-bit. But anyway - Luke, please attach your .config to https://bugzilla.kernel.org/show_bug.cgi?id=87891? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 0:49 ` Andrew Morton @ 2014-11-12 0:54 ` Luke Dashjr 2014-11-12 1:02 ` Andrew Morton 2014-11-12 1:22 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Kirill A. Shutemov 1 sibling, 1 reply; 24+ messages in thread From: Luke Dashjr @ 2014-11-12 0:54 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > But anyway - Luke, please attach your .config to > https://bugzilla.kernel.org/show_bug.cgi?id=87891? Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 Luke -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 0:54 ` Luke Dashjr @ 2014-11-12 1:02 ` Andrew Morton 2014-11-12 1:22 ` Joonsoo Kim 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2014-11-12 1:02 UTC (permalink / raw) To: Luke Dashjr Cc: Christoph Lameter, Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote: > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > > But anyway - Luke, please attach your .config to > > https://bugzilla.kernel.org/show_bug.cgi?id=87891? > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 > OK, thanks. No CONFIG_HIGHMEM of course. I'm stumped. It might just have been a random memory bitflip or other corruption of course. Is it repeatable at all? If it is, please add the below and retest? --- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slab.c @@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache * Be lazy and only check for valid flags here, keeping it out of the * critical path in kmem_cache_alloc(). */ - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); /* Take the node list lock to change the colour_next on this node */ diff -puN mm/slub.c~slab-improve-checking-for-invalid-gfp_flags mm/slub.c --- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slub.c @@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem int order; int idx; - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } page = allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:02 ` Andrew Morton @ 2014-11-12 1:22 ` Joonsoo Kim 2014-11-12 1:44 ` Andrew Morton 0 siblings, 1 reply; 24+ messages in thread From: Joonsoo Kim @ 2014-11-12 1:22 UTC (permalink / raw) To: Andrew Morton Cc: Luke Dashjr, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote: > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote: > > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > > > But anyway - Luke, please attach your .config to > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891? > > > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 > > > > OK, thanks. No CONFIG_HIGHMEM of course. I'm stumped. Hello, Andrew. I think that the cause is GFP_HIGHMEM. GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM. Please look at the do_huge_pmd_anonymous_page(). It calls alloc_hugepage_vma() and then alloc_pages_vma() is called with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE and then GFP_HIGHUSER_MOVABLE. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:22 ` Joonsoo Kim @ 2014-11-12 1:44 ` Andrew Morton 2014-11-12 2:13 ` Joonsoo Kim 2014-11-12 4:08 ` Tetsuo Handa 0 siblings, 2 replies; 24+ messages in thread From: Andrew Morton @ 2014-11-12 1:44 UTC (permalink / raw) To: Joonsoo Kim Cc: Luke Dashjr, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote: > > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote: > > > > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > > > > But anyway - Luke, please attach your .config to > > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891? > > > > > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 > > > > > > > OK, thanks. No CONFIG_HIGHMEM of course. I'm stumped. > > Hello, Andrew. > > I think that the cause is GFP_HIGHMEM. > GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM. > Please look at the do_huge_pmd_anonymous_page(). > It calls alloc_hugepage_vma() and then alloc_pages_vma() is called > with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE > and then GFP_HIGHUSER_MOVABLE. OK. So where's the bug? I'm inclined to say that it's in ttm. It's taking a gfp_mask which means "this is the allocation attempt which we are attempting to satisfy" and uses that for its own allocation. But ttm has no business using that gfp_mask for its own allocation attempt. If anything it should use something like, err, GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH although as I mentioned earlier, it would be better to avoid allocation altogether. Poor ttm guys - this is a bit of a trap we set for them. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:44 ` Andrew Morton @ 2014-11-12 2:13 ` Joonsoo Kim 2014-11-12 4:08 ` Tetsuo Handa 1 sibling, 0 replies; 24+ messages in thread From: Joonsoo Kim @ 2014-11-12 2:13 UTC (permalink / raw) To: Andrew Morton Cc: Luke Dashjr, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, Nov 11, 2014 at 05:44:12PM -0800, Andrew Morton wrote: > On Wed, 12 Nov 2014 10:22:45 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > On Tue, Nov 11, 2014 at 05:02:43PM -0800, Andrew Morton wrote: > > > On Wed, 12 Nov 2014 00:54:01 +0000 Luke Dashjr <luke@dashjr.org> wrote: > > > > > > > On Wednesday, November 12, 2014 12:49:13 AM Andrew Morton wrote: > > > > > But anyway - Luke, please attach your .config to > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891? > > > > > > > > Done: https://bugzilla.kernel.org/attachment.cgi?id=157381 > > > > > > > > > > OK, thanks. No CONFIG_HIGHMEM of course. I'm stumped. > > > > Hello, Andrew. > > > > I think that the cause is GFP_HIGHMEM. > > GFP_HIGHMEM is always defined regardless CONFIG_HIGHMEM. > > Please look at the do_huge_pmd_anonymous_page(). > > It calls alloc_hugepage_vma() and then alloc_pages_vma() is called > > with alloc_hugepage_gfpmask(). This gfpmask includes GFP_TRANSHUGE > > and then GFP_HIGHUSER_MOVABLE. > > OK. > > So where's the bug? I'm inclined to say that it's in ttm. It's taking I agree that. > a gfp_mask which means "this is the allocation attempt which we are > attempting to satisfy" and uses that for its own allocation. > > But ttm has no business using that gfp_mask for its own allocation > attempt. If anything it should use something like, err, > > GFP_KERNEL & ~__GFP_IO & ~__GFP_FS | __GFP_HIGH > > although as I mentioned earlier, it would be better to avoid allocation > altogether. Yes, avoiding would be the best. If not possible, introducing new common helper for changing shrinker control's gfp to valid allocation gfp is better than just open code. Thanks. > > Poor ttm guys - this is a bit of a trap we set for them. > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:44 ` Andrew Morton 2014-11-12 2:13 ` Joonsoo Kim @ 2014-11-12 4:08 ` Tetsuo Handa 2014-11-12 4:38 ` Andrew Morton 1 sibling, 1 reply; 24+ messages in thread From: Tetsuo Handa @ 2014-11-12 4:08 UTC (permalink / raw) To: Andrew Morton Cc: Luke Dashjr, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm, Joonsoo Kim Andrew Morton wrote: > Poor ttm guys - this is a bit of a trap we set for them. Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\") changed to use sc->gfp_mask rather than GFP_KERNEL. - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), - GFP_KERNEL); + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); But this bug is caused by sc->gfp_mask containing some flags which are not in GFP_KERNEL, right? Then, I think - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL); would hide this bug. But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) for two reasons when __alloc_pages_nodemask() is called from shrinker functions. (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow recursive __alloc_pages_nodemask() calls, kernel stack could overflow under extreme memory pressure. (2) Some shrinker functions are using sleepable locks which could make kswapd sleep for unpredictable duration. If kswapd is unexpectedly blocked inside shrinker functions and somebody is expecting that kswapd is running for reclaiming memory, it is a memory allocation deadlock. Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to avoid deadlock inside shrinker functions.\") prevents unlimited recursive __alloc_pages_nodemask() calls. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 4:08 ` Tetsuo Handa @ 2014-11-12 4:38 ` Andrew Morton 2014-11-13 13:43 ` [PATCH] drm/ttm: Avoid memory allocation from shrinker functions Tetsuo Handa 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2014-11-12 4:38 UTC (permalink / raw) To: Tetsuo Handa Cc: Luke Dashjr, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm, Joonsoo Kim On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > Andrew Morton wrote: > > Poor ttm guys - this is a bit of a trap we set for them. > > Commit a91576d7916f6cce (\"drm/ttm: Pass GFP flags in order to avoid deadlock.\") > changed to use sc->gfp_mask rather than GFP_KERNEL. > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), > - GFP_KERNEL); > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > > But this bug is caused by sc->gfp_mask containing some flags which are not > in GFP_KERNEL, right? Then, I think > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL); > > would hide this bug. > > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) Well no - ttm_page_pool_free() should stop calling kmalloc altogether. Just do struct page *pages_to_free[16]; and rework the code to free 16 pages at a time. Easy. Apart from all the other things we're discussing here, it should do this because kmalloc() isn't very reliable within a shrinker. > for > two reasons when __alloc_pages_nodemask() is called from shrinker functions. > > (1) Stack usage by __alloc_pages_nodemask() is large. If we unlimitedly allow > recursive __alloc_pages_nodemask() calls, kernel stack could overflow > under extreme memory pressure. > > (2) Some shrinker functions are using sleepable locks which could make kswapd > sleep for unpredictable duration. If kswapd is unexpectedly blocked inside > shrinker functions and somebody is expecting that kswapd is running for > reclaiming memory, it is a memory allocation deadlock. > > Speak of ttm module, commit 22e71691fd54c637 (\"drm/ttm: Use mutex_trylock() to > avoid deadlock inside shrinker functions.\") prevents unlimited recursive > __alloc_pages_nodemask() calls. Yes, there are such problems. Shrinkers do all sorts of surprising things - some of the filesystem ones do disk writes! And these involve all sorts of locking and memory allocations. But they won't be directly using scan_control.gfp_mask. They may be using open-coded __GFP_NOFS for the allocations. The complicated ones pass the IO over to kernel threads and wait for them to complete, which addresses the stack consumption concerns (at least). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH] drm/ttm: Avoid memory allocation from shrinker functions. 2014-11-12 4:38 ` Andrew Morton @ 2014-11-13 13:43 ` Tetsuo Handa 0 siblings, 0 replies; 24+ messages in thread From: Tetsuo Handa @ 2014-11-13 13:43 UTC (permalink / raw) To: akpm Cc: luke, cl, ming.lei, penberg, rientjes, mel, hannes, suokkos, airlied, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm, iamjoonsoo.kim Andrew Morton wrote: > On Wed, 12 Nov 2014 13:08:55 +0900 Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> wrote: > > > Andrew Morton wrote: > > > Poor ttm guys - this is a bit of a trap we set for them. > > > > Commit a91576d7916f6cce ("drm/ttm: Pass GFP flags in order to avoid deadlock.") > > changed to use sc->gfp_mask rather than GFP_KERNEL. > > > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), > > - GFP_KERNEL); > > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > > > > But this bug is caused by sc->gfp_mask containing some flags which are not > > in GFP_KERNEL, right? Then, I think > > > > - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); > > + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp & GFP_KERNEL); > > > > would hide this bug. > > > > But I think we should use GFP_ATOMIC (or drop __GFP_WAIT flag) > > Well no - ttm_page_pool_free() should stop calling kmalloc altogether. > Just do > > struct page *pages_to_free[16]; > > and rework the code to free 16 pages at a time. Easy. Well, ttm code wants to process 512 pages at a time for performance. Memory footprint increased by 512 * sizeof(struct page *) buffer is only 4096 bytes. What about using static buffer like below? ---------- >From d3cb5393c9c8099d6b37e769f78c31af1541fe8c Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Thu, 13 Nov 2014 22:21:54 +0900 Subject: [PATCH] drm/ttm: Avoid memory allocation from shrinker functions. Commit a91576d7916f6cce ("drm/ttm: Pass GFP flags in order to avoid deadlock.") caused BUG_ON() due to sc->gfp_mask containing flags which are not in GFP_KERNEL. https://bugzilla.kernel.org/show_bug.cgi?id=87891 Changing from sc->gfp_mask to (sc->gfp_mask & GFP_KERNEL) would avoid the BUG_ON(), but avoiding memory allocation from shrinker function is better and reliable fix. Shrinker function is already serialized by global lock, and clean up function is called after shrinker function is unregistered. Thus, we can use static buffer when called from shrinker function and clean up function. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: stable <stable@kernel.org> [2.6.35+] --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 26 +++++++++++++++----------- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 25 +++++++++++++++---------- 2 files changed, 30 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c index 09874d6..025c429 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -297,11 +297,12 @@ static void ttm_pool_update_free_locked(struct ttm_page_pool *pool, * * @pool: to free the pages from * @free_all: If set to true will free all pages in pool - * @gfp: GFP flags. + * @use_static: Safe to use static buffer **/ static int ttm_page_pool_free(struct ttm_page_pool *pool, unsigned nr_free, - gfp_t gfp) + bool use_static) { + static struct page *static_buf[NUM_PAGES_TO_ALLOC]; unsigned long irq_flags; struct page *p; struct page **pages_to_free; @@ -311,7 +312,11 @@ static int ttm_page_pool_free(struct ttm_page_pool *pool, unsigned nr_free, if (NUM_PAGES_TO_ALLOC < nr_free) npages_to_free = NUM_PAGES_TO_ALLOC; - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); + if (use_static) + pages_to_free = static_buf; + else + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), + GFP_KERNEL); if (!pages_to_free) { pr_err("Failed to allocate memory for pool free operation\n"); return 0; @@ -374,7 +379,8 @@ restart: if (freed_pages) ttm_pages_put(pages_to_free, freed_pages); out: - kfree(pages_to_free); + if (pages_to_free != static_buf) + kfree(pages_to_free); return nr_free; } @@ -383,8 +389,6 @@ out: * * XXX: (dchinner) Deadlock warning! * - * We need to pass sc->gfp_mask to ttm_page_pool_free(). - * * This code is crying out for a shrinker per pool.... */ static unsigned long @@ -407,8 +411,8 @@ ttm_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (shrink_pages == 0) break; pool = &_manager->pools[(i + pool_offset)%NUM_POOLS]; - shrink_pages = ttm_page_pool_free(pool, nr_free, - sc->gfp_mask); + /* OK to use static buffer since global mutex is held. */ + shrink_pages = ttm_page_pool_free(pool, nr_free, true); freed += nr_free - shrink_pages; } mutex_unlock(&lock); @@ -710,7 +714,7 @@ static void ttm_put_pages(struct page **pages, unsigned npages, int flags, } spin_unlock_irqrestore(&pool->lock, irq_flags); if (npages) - ttm_page_pool_free(pool, npages, GFP_KERNEL); + ttm_page_pool_free(pool, npages, false); } /* @@ -849,9 +853,9 @@ void ttm_page_alloc_fini(void) pr_info("Finalizing pool allocator\n"); ttm_pool_mm_shrink_fini(_manager); + /* OK to use static buffer since global mutex is no longer used. */ for (i = 0; i < NUM_POOLS; ++i) - ttm_page_pool_free(&_manager->pools[i], FREE_ALL_PAGES, - GFP_KERNEL); + ttm_page_pool_free(&_manager->pools[i], FREE_ALL_PAGES, true); kobject_put(&_manager->kobj); _manager = NULL; diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c index c96db43..01e1d27 100644 --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -411,11 +411,12 @@ static void ttm_dma_page_put(struct dma_pool *pool, struct dma_page *d_page) * * @pool: to free the pages from * @nr_free: If set to true will free all pages in pool - * @gfp: GFP flags. + * @use_static: Safe to use static buffer **/ static unsigned ttm_dma_page_pool_free(struct dma_pool *pool, unsigned nr_free, - gfp_t gfp) + bool use_static) { + static struct page *static_buf[NUM_PAGES_TO_ALLOC]; unsigned long irq_flags; struct dma_page *dma_p, *tmp; struct page **pages_to_free; @@ -432,7 +433,11 @@ static unsigned ttm_dma_page_pool_free(struct dma_pool *pool, unsigned nr_free, npages_to_free, nr_free); } #endif - pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), gfp); + if (use_static) + pages_to_free = static_buf; + else + pages_to_free = kmalloc(npages_to_free * sizeof(struct page *), + GFP_KERNEL); if (!pages_to_free) { pr_err("%s: Failed to allocate memory for pool free operation\n", @@ -502,7 +507,8 @@ restart: if (freed_pages) ttm_dma_pages_put(pool, &d_pages, pages_to_free, freed_pages); out: - kfree(pages_to_free); + if (pages_to_free != static_buf) + kfree(pages_to_free); return nr_free; } @@ -531,7 +537,8 @@ static void ttm_dma_free_pool(struct device *dev, enum pool_type type) if (pool->type != type) continue; /* Takes a spinlock.. */ - ttm_dma_page_pool_free(pool, FREE_ALL_PAGES, GFP_KERNEL); + /* OK to use static buffer since global mutex is held. */ + ttm_dma_page_pool_free(pool, FREE_ALL_PAGES, true); WARN_ON(((pool->npages_in_use + pool->npages_free) != 0)); /* This code path is called after _all_ references to the * struct device has been dropped - so nobody should be @@ -986,7 +993,7 @@ void ttm_dma_unpopulate(struct ttm_dma_tt *ttm_dma, struct device *dev) /* shrink pool if necessary (only on !is_cached pools)*/ if (npages) - ttm_dma_page_pool_free(pool, npages, GFP_KERNEL); + ttm_dma_page_pool_free(pool, npages, false); ttm->state = tt_unpopulated; } EXPORT_SYMBOL_GPL(ttm_dma_unpopulate); @@ -996,8 +1003,6 @@ EXPORT_SYMBOL_GPL(ttm_dma_unpopulate); * * XXX: (dchinner) Deadlock warning! * - * We need to pass sc->gfp_mask to ttm_dma_page_pool_free(). - * * I'm getting sadder as I hear more pathetical whimpers about needing per-pool * shrinkers */ @@ -1030,8 +1035,8 @@ ttm_dma_pool_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) if (++idx < pool_offset) continue; nr_free = shrink_pages; - shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, - sc->gfp_mask); + /* OK to use static buffer since global mutex is held. */ + shrink_pages = ttm_dma_page_pool_free(p->pool, nr_free, true); freed += nr_free - shrink_pages; pr_debug("%s: (%s:%d) Asked to shrink %d, have %d more to go\n", -- 1.8.3.1 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 0:49 ` Andrew Morton 2014-11-12 0:54 ` Luke Dashjr @ 2014-11-12 1:22 ` Kirill A. Shutemov 2014-11-12 1:47 ` Kirill A. Shutemov 2014-11-12 2:17 ` Joonsoo Kim 1 sibling, 2 replies; 24+ messages in thread From: Kirill A. Shutemov @ 2014-11-12 1:22 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > There's no point in doing > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > Ah, yes, OK. > > I suppose it's possible that __GFP_HIGHMEM was set. > > do_huge_pmd_anonymous_page > ->pte_alloc_one > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) do_huge_pmd_anonymous_page alloc_hugepage_vma alloc_pages_vma(GFP_TRANSHUGE) GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:22 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Kirill A. Shutemov @ 2014-11-12 1:47 ` Kirill A. Shutemov 2014-11-12 1:56 ` Andrew Morton 2014-11-12 2:17 ` Joonsoo Kim 1 sibling, 1 reply; 24+ messages in thread From: Kirill A. Shutemov @ 2014-11-12 1:47 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > There's no point in doing > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > Ah, yes, OK. > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > do_huge_pmd_anonymous_page > > ->pte_alloc_one > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > do_huge_pmd_anonymous_page > alloc_hugepage_vma > alloc_pages_vma(GFP_TRANSHUGE) > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. Looks like it's reasonable to sanitize flags in shrink_slab() by dropping flags incompatible with slab expectation. Like this: diff --git a/mm/vmscan.c b/mm/vmscan.c index dcb47074ae03..eb165d29c5e5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl, if (nr_pages_scanned == 0) nr_pages_scanned = SWAP_CLUSTER_MAX; + shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM); + if (!down_read_trylock(&shrinker_rwsem)) { /* * If we would return 0, our callers would understand that we -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:47 ` Kirill A. Shutemov @ 2014-11-12 1:56 ` Andrew Morton 2014-11-12 2:07 ` Kirill A. Shutemov 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2014-11-12 1:56 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Christoph Lameter, Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, 12 Nov 2014 03:47:03 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote: > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > There's no point in doing > > > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > Ah, yes, OK. > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > do_huge_pmd_anonymous_page > > > ->pte_alloc_one > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > do_huge_pmd_anonymous_page > > alloc_hugepage_vma > > alloc_pages_vma(GFP_TRANSHUGE) > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > Looks like it's reasonable to sanitize flags in shrink_slab() by dropping > flags incompatible with slab expectation. Like this: > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index dcb47074ae03..eb165d29c5e5 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl, > if (nr_pages_scanned == 0) > nr_pages_scanned = SWAP_CLUSTER_MAX; > > + shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM); > + > if (!down_read_trylock(&shrinker_rwsem)) { > /* > * If we would return 0, our callers would understand that we Well no, because nobody is supposed to be passing this gfp_mask back into a new allocation attempt anyway. It would be better to do shrinkctl->gfp_mask |= __GFP_IMMEDIATELY_GO_BUG; ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:56 ` Andrew Morton @ 2014-11-12 2:07 ` Kirill A. Shutemov 0 siblings, 0 replies; 24+ messages in thread From: Kirill A. Shutemov @ 2014-11-12 2:07 UTC (permalink / raw) To: Andrew Morton Cc: Christoph Lameter, Ming Lei, Pekka Enberg, Joonsoo Kim, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, Nov 11, 2014 at 05:56:03PM -0800, Andrew Morton wrote: > On Wed, 12 Nov 2014 03:47:03 +0200 "Kirill A. Shutemov" <kirill@shutemov.name> wrote: > > > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > > > There's no point in doing > > > > > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > > > Ah, yes, OK. > > > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > > > do_huge_pmd_anonymous_page > > > > ->pte_alloc_one > > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > > > do_huge_pmd_anonymous_page > > > alloc_hugepage_vma > > > alloc_pages_vma(GFP_TRANSHUGE) > > > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > > > Looks like it's reasonable to sanitize flags in shrink_slab() by dropping > > flags incompatible with slab expectation. Like this: > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index dcb47074ae03..eb165d29c5e5 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -369,6 +369,8 @@ unsigned long shrink_slab(struct shrink_control *shrinkctl, > > if (nr_pages_scanned == 0) > > nr_pages_scanned = SWAP_CLUSTER_MAX; > > > > + shrinkctl->gfp_mask &= ~(__GFP_DMA32 | __GFP_HIGHMEM); > > + > > if (!down_read_trylock(&shrinker_rwsem)) { > > /* > > * If we would return 0, our callers would understand that we > > Well no, because nobody is supposed to be passing this gfp_mask back > into a new allocation attempt anyway. It would be better to do > > shrinkctl->gfp_mask |= __GFP_IMMEDIATELY_GO_BUG; > > ? >From my POV, the problem is that we combine what-need-to-be-freed gfp_mask with if-have-to-allocate gfp_mask: we want to respect __GFP_IO/FS on alloc, but not nessesary both if there's no restriction from the context. For shrink_slab(), __GFP_DMA32 and __GFP_HIGHMEM don't make sense in both cases. __GFP_IMMEDIATELY_GO_BUG would work too, but we also need to provide macros to construct alloc-suitable mask from the given one for yes-i-really-have-to-allocate case. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 1:22 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Kirill A. Shutemov 2014-11-12 1:47 ` Kirill A. Shutemov @ 2014-11-12 2:17 ` Joonsoo Kim 2014-11-12 2:37 ` Kirill A. Shutemov 2014-11-12 10:39 ` Mel Gorman 1 sibling, 2 replies; 24+ messages in thread From: Joonsoo Kim @ 2014-11-12 2:17 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Andrew Morton, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > There's no point in doing > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > Ah, yes, OK. > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > do_huge_pmd_anonymous_page > > ->pte_alloc_one > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > do_huge_pmd_anonymous_page > alloc_hugepage_vma > alloc_pages_vma(GFP_TRANSHUGE) > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. Hello, Kirill. BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't movable? After breaking hugepage, it could be movable, but, it may prevent CMA from working correctly until break. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 2:17 ` Joonsoo Kim @ 2014-11-12 2:37 ` Kirill A. Shutemov 2014-11-12 8:21 ` Joonsoo Kim 2014-11-12 10:39 ` Mel Gorman 1 sibling, 1 reply; 24+ messages in thread From: Kirill A. Shutemov @ 2014-11-12 2:37 UTC (permalink / raw) To: Joonsoo Kim Cc: Andrew Morton, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote: > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > There's no point in doing > > > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > Ah, yes, OK. > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > do_huge_pmd_anonymous_page > > > ->pte_alloc_one > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > do_huge_pmd_anonymous_page > > alloc_hugepage_vma > > alloc_pages_vma(GFP_TRANSHUGE) > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > Hello, Kirill. > > BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't > movable? After breaking hugepage, it could be movable, but, it may > prevent CMA from working correctly until break. Again, the same alloc vs. free gfp_mask: we want page allocator to move pages around to find space from THP, but resulting page is no really movable. I've tried to look into making THP movable: it requires quite a bit of infrastructure changes around rmap: try_to_unmap*(), remove_migration_pmd(), migration entries for PMDs, etc. I gets ugly pretty fast :-/ I probably need to give it second try. No promises. -- Kirill A. Shutemov -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 2:37 ` Kirill A. Shutemov @ 2014-11-12 8:21 ` Joonsoo Kim 0 siblings, 0 replies; 24+ messages in thread From: Joonsoo Kim @ 2014-11-12 8:21 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Andrew Morton, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, Nov 12, 2014 at 04:37:46AM +0200, Kirill A. Shutemov wrote: > On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote: > > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > > > There's no point in doing > > > > > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > > > Ah, yes, OK. > > > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > > > do_huge_pmd_anonymous_page > > > > ->pte_alloc_one > > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > > > do_huge_pmd_anonymous_page > > > alloc_hugepage_vma > > > alloc_pages_vma(GFP_TRANSHUGE) > > > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > > > Hello, Kirill. > > > > BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't > > movable? After breaking hugepage, it could be movable, but, it may > > prevent CMA from working correctly until break. > > Again, the same alloc vs. free gfp_mask: we want page allocator to move > pages around to find space from THP, but resulting page is no really > movable. Hmm... AFAIK, without MOVABLE flag page allocator will try to move pages to find space for THP page. Am I missing something? > > I've tried to look into making THP movable: it requires quite a bit of > infrastructure changes around rmap: try_to_unmap*(), remove_migration_pmd(), > migration entries for PMDs, etc. I gets ugly pretty fast :-/ > I probably need to give it second try. No promises. Good to hear. :) I think that we can go another way that breaks the hugepage. This operation makes it movable and CMA would be succeed. Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 2:17 ` Joonsoo Kim 2014-11-12 2:37 ` Kirill A. Shutemov @ 2014-11-12 10:39 ` Mel Gorman 2014-11-13 6:37 ` Joonsoo Kim 1 sibling, 1 reply; 24+ messages in thread From: Mel Gorman @ 2014-11-12 10:39 UTC (permalink / raw) To: Joonsoo Kim Cc: Kirill A. Shutemov, Andrew Morton, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote: > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > There's no point in doing > > > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > Ah, yes, OK. > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > do_huge_pmd_anonymous_page > > > ->pte_alloc_one > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > do_huge_pmd_anonymous_page > > alloc_hugepage_vma > > alloc_pages_vma(GFP_TRANSHUGE) > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > Hello, Kirill. > > BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't > movable? After breaking hugepage, it could be movable, but, it may > prevent CMA from working correctly until break. > Because THP can use the Movable zone if it's allocated. When movable was introduced it did not just mean migratable. It meant it could also be moved to swap. THP can be broken up and swapped so it tagged as movable. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 10:39 ` Mel Gorman @ 2014-11-13 6:37 ` Joonsoo Kim 0 siblings, 0 replies; 24+ messages in thread From: Joonsoo Kim @ 2014-11-13 6:37 UTC (permalink / raw) To: Mel Gorman Cc: Kirill A. Shutemov, Andrew Morton, Christoph Lameter, Ming Lei, Pekka Enberg, David Rientjes, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, Nov 12, 2014 at 10:39:24AM +0000, Mel Gorman wrote: > On Wed, Nov 12, 2014 at 11:17:16AM +0900, Joonsoo Kim wrote: > > On Wed, Nov 12, 2014 at 03:22:41AM +0200, Kirill A. Shutemov wrote: > > > On Tue, Nov 11, 2014 at 04:49:13PM -0800, Andrew Morton wrote: > > > > On Tue, 11 Nov 2014 18:36:28 -0600 (CST) Christoph Lameter <cl@linux.com> wrote: > > > > > > > > > On Tue, 11 Nov 2014, Andrew Morton wrote: > > > > > > > > > > > There's no point in doing > > > > > > > > > > > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > > > > > > > > > > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > > > > > > > > > > ?? ~__GFP_BITS_MASK means bits 25 to 31 are set. > > > > > > > > > > __GFP_DMA32 is bit 2 and __GFP_HIGHMEM is bit 1. > > > > > > > > Ah, yes, OK. > > > > > > > > I suppose it's possible that __GFP_HIGHMEM was set. > > > > > > > > do_huge_pmd_anonymous_page > > > > ->pte_alloc_one > > > > ->alloc_pages(__userpte_alloc_gfp==__GFP_HIGHMEM) > > > > > > do_huge_pmd_anonymous_page > > > alloc_hugepage_vma > > > alloc_pages_vma(GFP_TRANSHUGE) > > > > > > GFP_TRANSHUGE contains GFP_HIGHUSER_MOVABLE, which has __GFP_HIGHMEM. > > > > Hello, Kirill. > > > > BTW, why does GFP_TRANSHUGE have MOVABLE flag despite it isn't > > movable? After breaking hugepage, it could be movable, but, it may > > prevent CMA from working correctly until break. > > > > Because THP can use the Movable zone if it's allocated. When movable was > introduced it did not just mean migratable. It meant it could also be > moved to swap. THP can be broken up and swapped so it tagged as movable. Great explanation! Thanks Mel. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-11 23:31 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Andrew Morton 2014-11-12 0:36 ` Christoph Lameter @ 2014-11-12 0:44 ` Joonsoo Kim 2014-11-12 0:53 ` Andrew Morton 2014-11-13 7:04 ` Vlastimil Babka 2 siblings, 1 reply; 24+ messages in thread From: Joonsoo Kim @ 2014-11-12 0:44 UTC (permalink / raw) To: Andrew Morton Cc: Ming Lei, Pekka Enberg, Christoph Lameter, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, Nov 11, 2014 at 03:31:20PM -0800, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Thu, 06 Nov 2014 17:28:41 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > > > https://bugzilla.kernel.org/show_bug.cgi?id=87891 > > > > Bug ID: 87891 > > Summary: kernel BUG at mm/slab.c:2625! > > Product: Memory Management > > Version: 2.5 > > Kernel Version: 3.17.2 > > Hardware: i386 > > OS: Linux > > Tree: Mainline > > Status: NEW > > Severity: blocking > > Priority: P1 > > Component: Slab Allocator > > Assignee: akpm@linux-foundation.org > > Reporter: luke-jr+linuxbugs@utopios.org > > Regression: No > > Well this is interesting. > > > > [359782.842112] kernel BUG at mm/slab.c:2625! > > ... > > [359782.843008] Call Trace: > > [359782.843017] [<ffffffff8115181f>] __kmalloc+0xdf/0x200 > > [359782.843037] [<ffffffffa0466285>] ? ttm_page_pool_free+0x35/0x180 [ttm] > > [359782.843060] [<ffffffffa0466285>] ttm_page_pool_free+0x35/0x180 [ttm] > > [359782.843084] [<ffffffffa046674e>] ttm_pool_shrink_scan+0xae/0xd0 [ttm] > > [359782.843108] [<ffffffff8111c2fb>] shrink_slab_node+0x12b/0x2e0 > > [359782.843129] [<ffffffff81127ed4>] ? fragmentation_index+0x14/0x70 > > [359782.843150] [<ffffffff8110fc3a>] ? zone_watermark_ok+0x1a/0x20 > > [359782.843171] [<ffffffff8111ceb8>] shrink_slab+0xc8/0x110 > > [359782.843189] [<ffffffff81120480>] do_try_to_free_pages+0x300/0x410 > > [359782.843210] [<ffffffff8112084b>] try_to_free_pages+0xbb/0x190 > > [359782.843230] [<ffffffff81113136>] __alloc_pages_nodemask+0x696/0xa90 > > [359782.843253] [<ffffffff8115810a>] do_huge_pmd_anonymous_page+0xfa/0x3f0 > > [359782.843278] [<ffffffff812dffe7>] ? debug_smp_processor_id+0x17/0x20 > > [359782.843300] [<ffffffff81118dc7>] ? __lru_cache_add+0x57/0xa0 > > [359782.843321] [<ffffffff811385ce>] handle_mm_fault+0x37e/0xdd0 > > It went pagefault > ->__alloc_pages_nodemask > ->shrink_slab > ->ttm_pool_shrink_scan > ->ttm_page_pool_free > ->kmalloc > ->cache_grow > ->BUG_ON(flags & GFP_SLAB_BUG_MASK); > > And I don't really know why - I'm not seeing anything in there which > can set a GFP flag which is outside GFP_SLAB_BUG_MASK. However I see > lots of nits. > > Core MM: > > __alloc_pages_nodemask() does > > if (unlikely(!page)) { > /* > * Runtime PM, block IO and its error handling path > * can deadlock because I/O on the device might not > * complete. > */ > gfp_mask = memalloc_noio_flags(gfp_mask); > page = __alloc_pages_slowpath(gfp_mask, order, > zonelist, high_zoneidx, nodemask, > preferred_zone, classzone_idx, migratetype); > } > > so it permanently alters the value of incoming arg gfp_mask. This > means that the following trace_mm_page_alloc() will print the wrong > value of gfp_mask, and if we later do the `goto retry_cpuset', we retry > with a possibly different gfp_mask. Isn't this a bug? > > > Also, why are we even passing a gfp_t down to the shrinkers? So they > can work out the allocation context - things like __GFP_IO, __GFP_FS, > etc? Is it even appropriate to use that mask for a new allocation > attempt within a particular shrinker? > > > ttm: > > I think it's a bad idea to be calling kmalloc() in the slab shrinker > function. We *know* that the system is low on memory and is trying to > free things up. Trying to allocate *more* memory at this time is > asking for trouble. ttm_page_pool_free() could easily be tweaked to > use a fixed-size local array of page*'s t avoid that allocation. Could > someone implement this please? > > > slab: > > There's no point in doing > > #define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > > because __GFP_DMA32|__GFP_HIGHMEM are already part of ~__GFP_BITS_MASK. > What's it trying to do here? Hello, Andrew. __GFP_DMA32 and __GFP_HIGHMEM isn't included in ~__GFP_BITS_MASK. ~__GFP_BITS_MASK means all the high bits excluding all gfp bits. As you already know, HIGHMEM isn't appropriate for slab because there is no direct mapping on this memory. And, if we want memory only from the memory on DMA32 area, specific kmem_cache is needed. But, there is no interface for it, so allocation for DMA32 is also restricted here. > > And it's quite infuriating to go BUG when the code could easily warn > and fix it up. If user wants memory on HIGHMEM, it can be easily fixed by following change because all memory is compatible for HIGHMEM. But, if user wants memory on DMA32, it's not easy to fix because memory on NORMAL isn't compatible with DMA32. slab could return object from another slab page even if cache_grow() is successfully called. So BUG_ON() here looks right thing to me. We cannot know in advance whether ignoring this flag cause more serious result or not. > > And it's quite infuriating to go BUG because one of the bits was set, > but not tell us which bit it was! Agreed. Let's fix it. Thanks. > > Could the slab guys please review this? > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: slab: improve checking for invalid gfp_flags > > - The code goes BUG, but doesn't tell us which bits were unexpectedly > set. Print that out. > > - The code goes BUG when it could jsut fix things up and proceed. Do that. > > - ~__GFP_BITS_MASK already includes __GFP_DMA32 and __GFP_HIGHMEM, so > remove those from the GFP_SLAB_BUG_MASK definition. > > Cc: Christoph Lameter <cl@linux.com> > Cc: Pekka Enberg <penberg@kernel.org> > Cc: David Rientjes <rientjes@google.com> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > include/linux/gfp.h | 2 +- > mm/slab.c | 5 ++++- > mm/slub.c | 5 ++++- > 3 files changed, 9 insertions(+), 3 deletions(-) > > diff -puN include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags include/linux/gfp.h > --- a/include/linux/gfp.h~slab-improve-checking-for-invalid-gfp_flags > +++ a/include/linux/gfp.h > @@ -145,7 +145,7 @@ struct vm_area_struct; > #define GFP_CONSTRAINT_MASK (__GFP_HARDWALL|__GFP_THISNODE) > > /* Do not use these with a slab allocator */ > -#define GFP_SLAB_BUG_MASK (__GFP_DMA32|__GFP_HIGHMEM|~__GFP_BITS_MASK) > +#define GFP_SLAB_BUG_MASK (~__GFP_BITS_MASK) > > /* Flag - indicates that the buffer will be suitable for DMA. Ignored on some > platforms, used as appropriate on others */ > diff -puN mm/slab.c~slab-improve-checking-for-invalid-gfp_flags mm/slab.c > --- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags > +++ a/mm/slab.c > @@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache > * Be lazy and only check for valid flags here, keeping it out of the > * critical path in kmem_cache_alloc(). > */ > - BUG_ON(flags & GFP_SLAB_BUG_MASK); > + if (WARN_ON(flags & GFP_SLAB_BUG_MASK)) { > + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); > + flags &= ~GFP_SLAB_BUG_MASK; > + } > local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); > > /* Take the node list lock to change the colour_next on this node */ > diff -puN mm/slub.c~slab-improve-checking-for-invalid-gfp_flags mm/slub.c > --- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags > +++ a/mm/slub.c > @@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem > int order; > int idx; > > - BUG_ON(flags & GFP_SLAB_BUG_MASK); > + if (WARN_ON(flags & GFP_SLAB_BUG_MASK)) { > + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); > + flags &= ~GFP_SLAB_BUG_MASK; > + } > > page = allocate_slab(s, > flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); > _ > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 0:44 ` Joonsoo Kim @ 2014-11-12 0:53 ` Andrew Morton 2014-11-12 1:04 ` Christoph Lameter 0 siblings, 1 reply; 24+ messages in thread From: Andrew Morton @ 2014-11-12 0:53 UTC (permalink / raw) To: Joonsoo Kim Cc: Ming Lei, Pekka Enberg, Christoph Lameter, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Wed, 12 Nov 2014 09:44:19 +0900 Joonsoo Kim <iamjoonsoo.kim@lge.com> wrote: > > > > And it's quite infuriating to go BUG when the code could easily warn > > and fix it up. > > If user wants memory on HIGHMEM, it can be easily fixed by following > change because all memory is compatible for HIGHMEM. But, if user wants > memory on DMA32, it's not easy to fix because memory on NORMAL isn't > compatible with DMA32. slab could return object from another slab page > even if cache_grow() is successfully called. So BUG_ON() here > looks right thing to me. We cannot know in advance whether ignoring this > flag cause more serious result or not. Well, attempting to fix it up and continue is nice, but we can live with the BUG. Not knowing which bit was set is bad. diff -puN mm/slab.c~slab-improve-checking-for-invalid-gfp_flags mm/slab.c --- a/mm/slab.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slab.c @@ -2590,7 +2590,10 @@ static int cache_grow(struct kmem_cache * Be lazy and only check for valid flags here, keeping it out of the * critical path in kmem_cache_alloc(). */ - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); /* Take the node list lock to change the colour_next on this node */ --- a/mm/slub.c~slab-improve-checking-for-invalid-gfp_flags +++ a/mm/slub.c @@ -1377,7 +1377,10 @@ static struct page *new_slab(struct kmem int order; int idx; - BUG_ON(flags & GFP_SLAB_BUG_MASK); + if (unlikely(flags & GFP_SLAB_BUG_MASK)) { + pr_emerg("gfp: %u\n", flags & GFP_SLAB_BUG_MASK); + BUG(); + } page = allocate_slab(s, flags & (GFP_RECLAIM_MASK | GFP_CONSTRAINT_MASK), node); _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-12 0:53 ` Andrew Morton @ 2014-11-12 1:04 ` Christoph Lameter 0 siblings, 0 replies; 24+ messages in thread From: Christoph Lameter @ 2014-11-12 1:04 UTC (permalink / raw) To: Andrew Morton Cc: Joonsoo Kim, Ming Lei, Pekka Enberg, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa, bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On Tue, 11 Nov 2014, Andrew Morton wrote: > Well, attempting to fix it up and continue is nice, but we can live > with the BUG. > > Not knowing which bit was set is bad. Could we change BUG_ON to diplay the value? This keeps on coming up. If you want to add this to the slab allocators then please add to mm/slab_common.c and refer to it from the other allocators. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [Bug 87891] New: kernel BUG at mm/slab.c:2625! 2014-11-11 23:31 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Andrew Morton 2014-11-12 0:36 ` Christoph Lameter 2014-11-12 0:44 ` Joonsoo Kim @ 2014-11-13 7:04 ` Vlastimil Babka 2 siblings, 0 replies; 24+ messages in thread From: Vlastimil Babka @ 2014-11-13 7:04 UTC (permalink / raw) To: Andrew Morton, Ming Lei, Pekka Enberg, Joonsoo Kim, Christoph Lameter, David Rientjes, Mel Gorman, Johannes Weiner, Pauli Nieminen, Dave Airlie, Tetsuo Handa Cc: bugzilla-daemon, luke-jr+linuxbugs, dri-devel, linux-mm On 11/12/2014 12:31 AM, Andrew Morton wrote: > > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Thu, 06 Nov 2014 17:28:41 +0000 bugzilla-daemon@bugzilla.kernel.org wrote: > >> https://bugzilla.kernel.org/show_bug.cgi?id=87891 >> >> Bug ID: 87891 >> Summary: kernel BUG at mm/slab.c:2625! >> Product: Memory Management >> Version: 2.5 >> Kernel Version: 3.17.2 >> Hardware: i386 >> OS: Linux >> Tree: Mainline >> Status: NEW >> Severity: blocking >> Priority: P1 >> Component: Slab Allocator >> Assignee: akpm@linux-foundation.org >> Reporter: luke-jr+linuxbugs@utopios.org >> Regression: No > > Well this is interesting. > > >> [359782.842112] kernel BUG at mm/slab.c:2625! >> ... >> [359782.843008] Call Trace: >> [359782.843017] [<ffffffff8115181f>] __kmalloc+0xdf/0x200 >> [359782.843037] [<ffffffffa0466285>] ? ttm_page_pool_free+0x35/0x180 [ttm] >> [359782.843060] [<ffffffffa0466285>] ttm_page_pool_free+0x35/0x180 [ttm] >> [359782.843084] [<ffffffffa046674e>] ttm_pool_shrink_scan+0xae/0xd0 [ttm] >> [359782.843108] [<ffffffff8111c2fb>] shrink_slab_node+0x12b/0x2e0 >> [359782.843129] [<ffffffff81127ed4>] ? fragmentation_index+0x14/0x70 >> [359782.843150] [<ffffffff8110fc3a>] ? zone_watermark_ok+0x1a/0x20 >> [359782.843171] [<ffffffff8111ceb8>] shrink_slab+0xc8/0x110 >> [359782.843189] [<ffffffff81120480>] do_try_to_free_pages+0x300/0x410 >> [359782.843210] [<ffffffff8112084b>] try_to_free_pages+0xbb/0x190 >> [359782.843230] [<ffffffff81113136>] __alloc_pages_nodemask+0x696/0xa90 >> [359782.843253] [<ffffffff8115810a>] do_huge_pmd_anonymous_page+0xfa/0x3f0 >> [359782.843278] [<ffffffff812dffe7>] ? debug_smp_processor_id+0x17/0x20 >> [359782.843300] [<ffffffff81118dc7>] ? __lru_cache_add+0x57/0xa0 >> [359782.843321] [<ffffffff811385ce>] handle_mm_fault+0x37e/0xdd0 > > It went pagefault > ->__alloc_pages_nodemask > ->shrink_slab > ->ttm_pool_shrink_scan > ->ttm_page_pool_free > ->kmalloc > ->cache_grow > ->BUG_ON(flags & GFP_SLAB_BUG_MASK); > > And I don't really know why - I'm not seeing anything in there which > can set a GFP flag which is outside GFP_SLAB_BUG_MASK. However I see > lots of nits. > > Core MM: > > __alloc_pages_nodemask() does > > if (unlikely(!page)) { > /* > * Runtime PM, block IO and its error handling path > * can deadlock because I/O on the device might not > * complete. > */ > gfp_mask = memalloc_noio_flags(gfp_mask); > page = __alloc_pages_slowpath(gfp_mask, order, > zonelist, high_zoneidx, nodemask, > preferred_zone, classzone_idx, migratetype); > } > > so it permanently alters the value of incoming arg gfp_mask. This > means that the following trace_mm_page_alloc() will print the wrong > value of gfp_mask, and if we later do the `goto retry_cpuset', we retry > with a possibly different gfp_mask. Isn't this a bug? I think so. I noticed and fixed it in the RFC about reducing alloc_pages* parameters [1], but it's buried in patch 2/4 Guess I should have made it a separate non-RFC patch. Will do soon hopefully. Vlastimil [1] https://lkml.org/lkml/2014/8/6/249 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2014-11-13 13:44 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <bug-87891-27@https.bugzilla.kernel.org/> 2014-11-11 23:31 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Andrew Morton 2014-11-12 0:36 ` Christoph Lameter 2014-11-12 0:49 ` Andrew Morton 2014-11-12 0:54 ` Luke Dashjr 2014-11-12 1:02 ` Andrew Morton 2014-11-12 1:22 ` Joonsoo Kim 2014-11-12 1:44 ` Andrew Morton 2014-11-12 2:13 ` Joonsoo Kim 2014-11-12 4:08 ` Tetsuo Handa 2014-11-12 4:38 ` Andrew Morton 2014-11-13 13:43 ` [PATCH] drm/ttm: Avoid memory allocation from shrinker functions Tetsuo Handa 2014-11-12 1:22 ` [Bug 87891] New: kernel BUG at mm/slab.c:2625! Kirill A. Shutemov 2014-11-12 1:47 ` Kirill A. Shutemov 2014-11-12 1:56 ` Andrew Morton 2014-11-12 2:07 ` Kirill A. Shutemov 2014-11-12 2:17 ` Joonsoo Kim 2014-11-12 2:37 ` Kirill A. Shutemov 2014-11-12 8:21 ` Joonsoo Kim 2014-11-12 10:39 ` Mel Gorman 2014-11-13 6:37 ` Joonsoo Kim 2014-11-12 0:44 ` Joonsoo Kim 2014-11-12 0:53 ` Andrew Morton 2014-11-12 1:04 ` Christoph Lameter 2014-11-13 7:04 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).