* [RFC] Bogus zone->watermark[WMARK_MIN] for big systems
@ 2015-02-17 20:33 Dave Hansen
0 siblings, 0 replies; only message in thread
From: Dave Hansen @ 2015-02-17 20:33 UTC (permalink / raw)
To: Linux-MM, LKML
[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]
I've got a 2TB 8-node system (256GB per NUMA node) that's behaving a bit
strangely (OOMs with GB of free memory).
Its watermarks look wonky, with a min watermark of 0 pages for DMA and
only 11 pages for DMA32:
> Node 0 DMA free:7428kB min:0kB low:0kB high:0kB ...
> Node 0 DMA32 free:1024084kB min:44kB low:52kB high:64kB ... present:1941936kB managed:1862456kB
> Node 0 Normal free:4808kB min:6348kB low:7932kB high:9520kB ... present:266338304kB managed:262138972kB
This looks to be caused by us trying to evenly distribute the
min_free_kbytes value across the zones, but with such a huge size
imbalance (16MB zone vs 2TB system), 1/131072th of the default
min_free_kbytes ends up <1 page.
Should we be setting up some absolute floors on the watermarks, like the
attached patch?
BTW, it seems to be this code:
> static void __setup_per_zone_wmarks(void)
> {
> unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
...
> for_each_zone(zone) {
> u64 tmp;
>
> spin_lock_irqsave(&zone->lock, flags);
> tmp = (u64)pages_min * zone->managed_pages;
> do_div(tmp, lowmem_pages);
[-- Attachment #2: mm-absolute-floors-for-watermarks.patch --]
[-- Type: text/x-patch, Size: 1170 bytes --]
---
b/mm/page_alloc.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff -puN mm/page_alloc.c~mm-absolute-floors-for-watermarks mm/page_alloc.c
--- a/mm/page_alloc.c~mm-absolute-floors-for-watermarks 2015-02-17 11:19:48.470054562 -0800
+++ b/mm/page_alloc.c 2015-02-17 11:26:48.164983632 -0800
@@ -5739,6 +5739,14 @@ static void __setup_per_zone_wmarks(void
}
for_each_zone(zone) {
+ /*
+ * For very small zones (think 16MB ZONE_DMA on a 4TB system),
+ * proportionally distributing pages_min can lean to
+ * watermarks of 0. Give it an absolute floor so we always
+ * have at least a minimal watermark based on the size of the
+ * *zone*, not the system.
+ */
+ unsigned long absolute_min = zone->managed_pages / 256;
u64 tmp;
spin_lock_irqsave(&zone->lock, flags);
@@ -5766,7 +5774,8 @@ static void __setup_per_zone_wmarks(void
*/
zone->watermark[WMARK_MIN] = tmp;
}
-
+ zone->watermark[WMARK_MIN] = max(zone->watermark[WMARK_MIN],
+ absolute_min);
zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + (tmp >> 2);
zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
_
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2015-02-17 20:33 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-17 20:33 [RFC] Bogus zone->watermark[WMARK_MIN] for big systems Dave Hansen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.