diff for duplicates of <545A419C.3090900@suse.cz> diff --git a/a/1.txt b/N1/1.txt index a9646bf..5f9f96a 100644 --- a/a/1.txt +++ b/N1/1.txt @@ -42,3 +42,73 @@ But I noticed a possible issue that could lead to your problem. Can you please try the following patch? --------8<------- +>From fe9c963cc665cdab50abb41f3babb5b19d08fab1 Mon Sep 17 00:00:00 2001 +From: Vlastimil Babka <vbabka@suse.cz> +Date: Wed, 5 Nov 2014 14:19:18 +0100 +Subject: [PATCH] mm, compaction: do not reset deferred compaction + optimistically + +In try_to_compact_pages() we reset deferred compaction for a zone where we +think compaction has succeeded. Although this action does not reset the +counters affecting deferred compaction period, just bumping the deferred order +means that another compaction attempt will be able to pass the check in +compaction_deferred() and proceed with compaction. + +This is a problem when try_to_compact_pages() thinks compaction was successful +just because the watermark check is missing proper classzone_idx parameter, +but then the allocation attempt itself will fail due to its watermark check +having the proper value. Although __alloc_pages_direct_compact() will re-defer +compaction in such case, this happens only in the case of sync compaction. +Async compaction will leave the zone open for another compaction attempt which +may reset the deferred order again. This could possibly explain what +P. Christeas reported - a system where many processes include the following +backtrace: + + [<ffffffff813b1025>] preempt_schedule_irq+0x3c/0x59 + [<ffffffff813b4810>] retint_kernel+0x20/0x30 + [<ffffffff810d7481>] ? __zone_watermark_ok+0x77/0x85 + [<ffffffff810d8256>] zone_watermark_ok+0x1a/0x1c + [<ffffffff810eee56>] compact_zone+0x215/0x4b2 + [<ffffffff810ef13f>] compact_zone_order+0x4c/0x5f + [<ffffffff810ef2fe>] try_to_compact_pages+0xc4/0x1e8 + [<ffffffff813ad7f8>] __alloc_pages_direct_compact+0x61/0x1bf + [<ffffffff810da299>] __alloc_pages_nodemask+0x409/0x799 + [<ffffffff8110d3fd>] new_slab+0x5f/0x21c + +The issue has been made visible by commit 53853e2d2bfb ("mm, compaction: defer +each zone individually instead of preferred zone"), since before the commit, +deferred compaction for fallback zones (where classzone_idx matters) was not +considered separately. + +Although work is underway to fix the underlying zone watermark check mismatch, +this patch fixes the immediate problem by removing the optimistic defer reset +completely. Its usefulness is questionable anyway, since if the allocation +really succeeds, a full defer reset (including the period counters) follows. + +Fixes: 53853e2d2bfb748a8b5aa2fd1de15699266865e0 +Reported-by: P. Christeas <xrg@linux.gr> +Signed-off-by: Vlastimil Babka <vbabka@suse.cz> +--- + mm/compaction.c | 7 ------- + 1 file changed, 7 deletions(-) + +diff --git a/mm/compaction.c b/mm/compaction.c +index ec74cf0..f0335f9 100644 +--- a/mm/compaction.c ++++ b/mm/compaction.c +@@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist, + alloc_flags)) { + *candidate_zone = zone; + /* +- * We think the allocation will succeed in this zone, +- * but it is not certain, hence the false. The caller +- * will repeat this with true if allocation indeed +- * succeeds in this zone. +- */ +- compaction_defer_reset(zone, order, false); +- /* + * It is possible that async compaction aborted due to + * need_resched() and the watermarks were ok thanks to + * somebody else freeing memory. The allocation can +-- +2.1.2 diff --git a/a/content_digest b/N1/content_digest index 006db74..0232154 100644 --- a/a/content_digest +++ b/N1/content_digest @@ -53,6 +53,76 @@ "But I noticed a possible issue that could lead to your problem.\n" "Can you please try the following patch?\n" "\n" - --------8<------- + "--------8<-------\n" + ">From fe9c963cc665cdab50abb41f3babb5b19d08fab1 Mon Sep 17 00:00:00 2001\n" + "From: Vlastimil Babka <vbabka@suse.cz>\n" + "Date: Wed, 5 Nov 2014 14:19:18 +0100\n" + "Subject: [PATCH] mm, compaction: do not reset deferred compaction\n" + " optimistically\n" + "\n" + "In try_to_compact_pages() we reset deferred compaction for a zone where we\n" + "think compaction has succeeded. Although this action does not reset the\n" + "counters affecting deferred compaction period, just bumping the deferred order\n" + "means that another compaction attempt will be able to pass the check in\n" + "compaction_deferred() and proceed with compaction.\n" + "\n" + "This is a problem when try_to_compact_pages() thinks compaction was successful\n" + "just because the watermark check is missing proper classzone_idx parameter,\n" + "but then the allocation attempt itself will fail due to its watermark check\n" + "having the proper value. Although __alloc_pages_direct_compact() will re-defer\n" + "compaction in such case, this happens only in the case of sync compaction.\n" + "Async compaction will leave the zone open for another compaction attempt which\n" + "may reset the deferred order again. This could possibly explain what\n" + "P. Christeas reported - a system where many processes include the following\n" + "backtrace:\n" + "\n" + " [<ffffffff813b1025>] preempt_schedule_irq+0x3c/0x59\n" + " [<ffffffff813b4810>] retint_kernel+0x20/0x30\n" + " [<ffffffff810d7481>] ? __zone_watermark_ok+0x77/0x85\n" + " [<ffffffff810d8256>] zone_watermark_ok+0x1a/0x1c\n" + " [<ffffffff810eee56>] compact_zone+0x215/0x4b2\n" + " [<ffffffff810ef13f>] compact_zone_order+0x4c/0x5f\n" + " [<ffffffff810ef2fe>] try_to_compact_pages+0xc4/0x1e8\n" + " [<ffffffff813ad7f8>] __alloc_pages_direct_compact+0x61/0x1bf\n" + " [<ffffffff810da299>] __alloc_pages_nodemask+0x409/0x799\n" + " [<ffffffff8110d3fd>] new_slab+0x5f/0x21c\n" + "\n" + "The issue has been made visible by commit 53853e2d2bfb (\"mm, compaction: defer\n" + "each zone individually instead of preferred zone\"), since before the commit,\n" + "deferred compaction for fallback zones (where classzone_idx matters) was not\n" + "considered separately.\n" + "\n" + "Although work is underway to fix the underlying zone watermark check mismatch,\n" + "this patch fixes the immediate problem by removing the optimistic defer reset\n" + "completely. Its usefulness is questionable anyway, since if the allocation\n" + "really succeeds, a full defer reset (including the period counters) follows.\n" + "\n" + "Fixes: 53853e2d2bfb748a8b5aa2fd1de15699266865e0\n" + "Reported-by: P. Christeas <xrg@linux.gr>\n" + "Signed-off-by: Vlastimil Babka <vbabka@suse.cz>\n" + "---\n" + " mm/compaction.c | 7 -------\n" + " 1 file changed, 7 deletions(-)\n" + "\n" + "diff --git a/mm/compaction.c b/mm/compaction.c\n" + "index ec74cf0..f0335f9 100644\n" + "--- a/mm/compaction.c\n" + "+++ b/mm/compaction.c\n" + "@@ -1325,13 +1325,6 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,\n" + " \t\t\t\t alloc_flags)) {\n" + " \t\t\t*candidate_zone = zone;\n" + " \t\t\t/*\n" + "-\t\t\t * We think the allocation will succeed in this zone,\n" + "-\t\t\t * but it is not certain, hence the false. The caller\n" + "-\t\t\t * will repeat this with true if allocation indeed\n" + "-\t\t\t * succeeds in this zone.\n" + "-\t\t\t */\n" + "-\t\t\tcompaction_defer_reset(zone, order, false);\n" + "-\t\t\t/*\n" + " \t\t\t * It is possible that async compaction aborted due to\n" + " \t\t\t * need_resched() and the watermarks were ok thanks to\n" + " \t\t\t * somebody else freeing memory. The allocation can\n" + "-- \n" + 2.1.2 -90518147a7aec8383d7e6adaecb4394d1566682525c81b567a84b5c87578fafb +29d95f8f19a2df9cb74da5157dbc5ca2e6649a05ba939dde3ef48f498f2972f3
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.