All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Bogus zone->watermark[WMARK_MIN] for big systems
@ 2015-02-17 20:33 Dave Hansen
  0 siblings, 0 replies; only message in thread
From: Dave Hansen @ 2015-02-17 20:33 UTC (permalink / raw)
  To: Linux-MM, LKML

[-- Attachment #1: Type: text/plain, Size: 1198 bytes --]

I've got a 2TB 8-node system (256GB per NUMA node) that's behaving a bit
strangely (OOMs with GB of free memory).

Its watermarks look wonky, with a min watermark of 0 pages for DMA and
only 11 pages for DMA32:

> Node 0 DMA    free:7428kB    min:0kB    low:0kB    high:0kB    ...
> Node 0 DMA32  free:1024084kB min:44kB   low:52kB   high:64kB   ... present:1941936kB   managed:1862456kB
> Node 0 Normal free:4808kB    min:6348kB low:7932kB high:9520kB ... present:266338304kB managed:262138972kB

This looks to be caused by us trying to evenly distribute the
min_free_kbytes value across the zones, but with such a huge size
imbalance (16MB zone vs 2TB system), 1/131072th of the default
min_free_kbytes ends up <1 page.

Should we be setting up some absolute floors on the watermarks, like the
attached patch?

BTW, it seems to be this code:

> static void __setup_per_zone_wmarks(void)
> {
>         unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
...
>         for_each_zone(zone) {
>                 u64 tmp;
> 
>                 spin_lock_irqsave(&zone->lock, flags);
>                 tmp = (u64)pages_min * zone->managed_pages;
>                 do_div(tmp, lowmem_pages);


[-- Attachment #2: mm-absolute-floors-for-watermarks.patch --]
[-- Type: text/x-patch, Size: 1170 bytes --]



---

 b/mm/page_alloc.c |   11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff -puN mm/page_alloc.c~mm-absolute-floors-for-watermarks mm/page_alloc.c
--- a/mm/page_alloc.c~mm-absolute-floors-for-watermarks	2015-02-17 11:19:48.470054562 -0800
+++ b/mm/page_alloc.c	2015-02-17 11:26:48.164983632 -0800
@@ -5739,6 +5739,14 @@ static void __setup_per_zone_wmarks(void
 	}
 
 	for_each_zone(zone) {
+		/*
+		 * For very small zones (think 16MB ZONE_DMA on a 4TB system),
+		 * proportionally distributing pages_min can lean to
+		 * watermarks of 0.  Give it an absolute floor so we always
+		 * have at least a minimal watermark based on the size of the
+		 * *zone*, not the system.
+		 */
+		unsigned long absolute_min = zone->managed_pages / 256;
 		u64 tmp;
 
 		spin_lock_irqsave(&zone->lock, flags);
@@ -5766,7 +5774,8 @@ static void __setup_per_zone_wmarks(void
 			 */
 			zone->watermark[WMARK_MIN] = tmp;
 		}
-
+		zone->watermark[WMARK_MIN]  = max(zone->watermark[WMARK_MIN],
+						  absolute_min);
 		zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + (tmp >> 2);
 		zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
 
_

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2015-02-17 20:33 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-17 20:33 [RFC] Bogus zone->watermark[WMARK_MIN] for big systems Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.