From: Con Kolivas <kernel@kolivas.org>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: swapping and the value of /proc/sys/vm/swappiness
Date: Fri, 17 Sep 2004 10:22:54 +1000 [thread overview]
Message-ID: <414A2E5E.6060704@kolivas.org> (raw)
In-Reply-To: <20040916185019.GC11241@logos.cnet>
Marcelo Tosatti wrote:
> Con!
>
> Spent some time reading your patch...
Great!
> Well if "distress" is getting higher (with similar workload/pressure)
> thats because VM is having a harder time freeing pages (priority increases,
> distress increases).
>
> You say "distress is getting higher in later kernels". Can you expand
> more on that? How did you find this out, and can you be more especific
> wrt "later kernels".
When I say earlier kernels I mean prior to 2.6.8.
I'm still referring to the "hard_swappiness patch" that Florin was using
to fix his problem which diagnosed that distress increased. I'm sorry if
my newer patch confuses the issue. hard_swappiness effectively changed this:
-distress = 100 >> zone->prev_priority
-mapped_ratio = (sc->nr_mapped * 100) / total_memory;
-swap_tendency = mapped_ratio / 2 + distress + vm_swappiness
-if (swap_tendency >= 100)
- reclaim_mapped = 1;
into this:
+mapped_ratio = (sc->nr_mapped * 100) / total_memory;
+swap_tendency = mapped_ratio / 2 + vm_swappiness
+if (swap_tendency >= 100)
+ reclaim_mapped = 1;
This made swap_tendency dependant _only_ on the mapped_ratio. Now if you
load up the same desktop and applications your mapped_ratio will be
virtually identical regardless of the kernel. If you then copy a large
file or convert a large video file etc, then the mapped ratio will be
unchanged. Therefore if the swapping increased with this workload in
2.6.8 and later kernels but did _not_ increase with hard_swappiness it
must be the "distress" value which is entirely dependant on
zone->prev_priority. Does that make my conclusion clearer?
Below here you're referring to my mapped_watermark patch so I'll address
that separately to avoid confusion.
> I see you add a "z->nr_unmapped" watermark a bit above "z->pages_high",
> and use that to set "pgdat->mapped_nrpages" to what needs to be freed
> so z->free_pages reaches "z->nr_unmapped".
>
> And then you use that per-pgdat "mapped_nrpages" count to avoid:
>
> - moving mapped pages to inactive list (wasting the swappiness algorithm)
> - swapping out pages at shrink_list
>
> Those two only happen when pgdat->mapped_nrpages is zero, which
> becomes true when we go below pages_low.
>
> To resume, deactivation/swapout of mapped pages only happens when we
> go any zone pages_low.
>
> Correct?
Yes apart from one big caveat. scanning is expensive, so it only scans
at lowest priority (DEF_PRIORITY). If it fails to release enough memory
it simply returns quietly. This means that if vm pressure is hard enough
and occurs frequently/fast enough it will still drop down below
pages_high even if the watermarks have not been re-achieved. Then the
normal algorithm will take over.
> Now with v2.6 stock kernel, kswapd will deactivate (using vm_swappiness algorithm)
> and swapout pages between the low and high zone watermarks.
>
> That avoids swapping out as hard as possible until we go below pages_low.
>
> IMHO this might be OK for common desktop workloads where people complain
> about swap, but might be harmful for other workloads where swapping out on
> advance unused anonymous process memory is a _gain_.
As I said, it only does it lightly, and it's tunable.
> I dont understand this check on balance_pgdat (kswapd worker function):
> + if (maplimit && sc.nr_mapped * 100 / total_memory > vm_mapped)
> + return 0;
> +
>
> So "if not any zone is under pages_low, and more than vm_mapped % of ram
> is mapped, bail out."
This will only be hit if "maplimit" is true. This means we have entered
balance_pgdat only due to the unmapped watermark (zone->pages_min * 4).
Here is where the real "tunable" comes into play. If greater than
vm_mapped % of ram is mapped (ie application) pages, it will not do
anything at this watermark. By default it is set to 66%. Setting it to 0
inactivates this patch entirely and makes the vm behave much like
setting swappiness to 100 in mainline.
> I still think swapout behaviour can be correctly tuned with vm_swappiness,
> and agree with Andrew on that we should not change anything in the algorithm
> if this can be tuned.
I agree it can be, but something in the logic has definitely changed,
and a different value is not giving users like Florin the desired result
any more.
Cheers,
Con
next prev parent reply other threads:[~2004-09-17 0:23 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-06 19:11 swapping and the value of /proc/sys/vm/swappiness Ray Bryant
2004-09-06 20:10 ` Andrew Morton
2004-09-06 21:22 ` Ray Bryant
2004-09-06 21:36 ` Andrew Morton
2004-09-06 22:37 ` William Lee Irwin III
2004-09-06 23:51 ` Nick Piggin
2004-09-07 0:31 ` Ray Bryant
2004-09-06 22:48 ` William Lee Irwin III
2004-09-06 23:09 ` Con Kolivas
2004-09-06 23:27 ` Andrew Morton
2004-09-06 23:34 ` Con Kolivas
2004-09-07 0:03 ` Marcelo Tosatti
2004-09-07 1:34 ` Con Kolivas
2004-09-07 10:38 ` Nick Piggin
2004-09-07 10:56 ` Con Kolivas
2004-09-08 16:45 ` Marcelo Tosatti
2004-09-09 1:12 ` Con Kolivas
2004-09-07 17:03 ` Ray Bryant
2004-09-07 21:20 ` Marcelo Tosatti
2004-09-08 2:18 ` Marcelo Tosatti
2004-09-08 14:20 ` Ray Bryant
2004-09-08 16:54 ` Marcelo Tosatti
2004-09-08 19:35 ` Ray Bryant
2004-09-08 19:30 ` Marcelo Tosatti
2004-09-09 3:06 ` Ray Bryant
2004-09-09 2:14 ` Marcelo Tosatti
2004-09-09 14:21 ` Ray Bryant
2004-09-09 3:09 ` William Lee Irwin III
2004-09-09 14:16 ` Ray Bryant
2004-09-09 17:23 ` William Lee Irwin III
2004-09-28 1:54 ` Ray Bryant
2004-09-28 3:36 ` Nick Piggin
2004-09-29 0:36 ` Nick Piggin
2004-09-29 4:23 ` Ray Bryant
2004-09-30 17:15 ` Ray Bryant
2004-09-08 17:31 ` Martin J. Bligh
2004-09-08 18:04 ` Rik van Riel
2004-09-08 19:50 ` Diego Calleja
2004-09-08 21:10 ` Martin J. Bligh
2004-09-08 21:55 ` Diego Calleja
2004-09-08 22:20 ` Martin J. Bligh
2004-09-08 23:22 ` Rik van Riel
2004-09-09 16:26 ` Bill Davidsen
2004-09-08 22:28 ` Alan Cox
2004-09-08 23:42 ` Martin J. Bligh
2004-09-08 19:54 ` Ray Bryant
2004-09-08 15:19 ` Ray Bryant
2004-09-14 18:31 ` Florin Andrei
2004-09-14 20:15 ` Marcelo Tosatti
2004-09-14 22:53 ` Con Kolivas
2004-09-14 21:41 ` Marcelo Tosatti
2004-09-15 0:22 ` Con Kolivas
2004-09-16 18:50 ` Marcelo Tosatti
2004-09-17 0:22 ` Con Kolivas [this message]
2004-09-15 16:54 ` Florin Andrei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=414A2E5E.6060704@kolivas.org \
--to=kernel@kolivas.org \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox