From: Con Kolivas <kernel@kolivas.org>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: swapping and the value of /proc/sys/vm/swappiness
Date: Fri, 17 Sep 2004 10:22:54 +1000 [thread overview]
Message-ID: <414A2E5E.6060704@kolivas.org> (raw)
In-Reply-To: <20040916185019.GC11241@logos.cnet>
Marcelo Tosatti wrote:
> Con!
>
> Spent some time reading your patch...
Great!
> Well if "distress" is getting higher (with similar workload/pressure)
> thats because VM is having a harder time freeing pages (priority increases,
> distress increases).
>
> You say "distress is getting higher in later kernels". Can you expand
> more on that? How did you find this out, and can you be more especific
> wrt "later kernels".
When I say earlier kernels I mean prior to 2.6.8.
I'm still referring to the "hard_swappiness patch" that Florin was using
to fix his problem which diagnosed that distress increased. I'm sorry if
my newer patch confuses the issue. hard_swappiness effectively changed this:
-distress = 100 >> zone->prev_priority
-mapped_ratio = (sc->nr_mapped * 100) / total_memory;
-swap_tendency = mapped_ratio / 2 + distress + vm_swappiness
-if (swap_tendency >= 100)
- reclaim_mapped = 1;
into this:
+mapped_ratio = (sc->nr_mapped * 100) / total_memory;
+swap_tendency = mapped_ratio / 2 + vm_swappiness
+if (swap_tendency >= 100)
+ reclaim_mapped = 1;
This made swap_tendency dependant _only_ on the mapped_ratio. Now if you
load up the same desktop and applications your mapped_ratio will be
virtually identical regardless of the kernel. If you then copy a large
file or convert a large video file etc, then the mapped ratio will be
unchanged. Therefore if the swapping increased with this workload in
2.6.8 and later kernels but did _not_ increase with hard_swappiness it
must be the "distress" value which is entirely dependant on
zone->prev_priority. Does that make my conclusion clearer?
Below here you're referring to my mapped_watermark patch so I'll address
that separately to avoid confusion.
> I see you add a "z->nr_unmapped" watermark a bit above "z->pages_high",
> and use that to set "pgdat->mapped_nrpages" to what needs to be freed
> so z->free_pages reaches "z->nr_unmapped".
>
> And then you use that per-pgdat "mapped_nrpages" count to avoid:
>
> - moving mapped pages to inactive list (wasting the swappiness algorithm)
> - swapping out pages at shrink_list
>
> Those two only happen when pgdat->mapped_nrpages is zero, which
> becomes true when we go below pages_low.
>
> To resume, deactivation/swapout of mapped pages only happens when we
> go any zone pages_low.
>
> Correct?
Yes apart from one big caveat. scanning is expensive, so it only scans
at lowest priority (DEF_PRIORITY). If it fails to release enough memory
it simply returns quietly. This means that if vm pressure is hard enough
and occurs frequently/fast enough it will still drop down below
pages_high even if the watermarks have not been re-achieved. Then the
normal algorithm will take over.
> Now with v2.6 stock kernel, kswapd will deactivate (using vm_swappiness algorithm)
> and swapout pages between the low and high zone watermarks.
>
> That avoids swapping out as hard as possible until we go below pages_low.
>
> IMHO this might be OK for common desktop workloads where people complain
> about swap, but might be harmful for other workloads where swapping out on
> advance unused anonymous process memory is a _gain_.
As I said, it only does it lightly, and it's tunable.
> I dont understand this check on balance_pgdat (kswapd worker function):
> + if (maplimit && sc.nr_mapped * 100 / total_memory > vm_mapped)
> + return 0;
> +
>
> So "if not any zone is under pages_low, and more than vm_mapped % of ram
> is mapped, bail out."
This will only be hit if "maplimit" is true. This means we have entered
balance_pgdat only due to the unmapped watermark (zone->pages_min * 4).
Here is where the real "tunable" comes into play. If greater than
vm_mapped % of ram is mapped (ie application) pages, it will not do
anything at this watermark. By default it is set to 66%. Setting it to 0
inactivates this patch entirely and makes the vm behave much like
setting swappiness to 100 in mainline.
> I still think swapout behaviour can be correctly tuned with vm_swappiness,
> and agree with Andrew on that we should not change anything in the algorithm
> if this can be tuned.
I agree it can be, but something in the logic has definitely changed,
and a different value is not giving users like Florin the desired result
any more.
Cheers,
Con
next prev parent reply other threads:[~2004-09-17 0:23 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-09-06 19:11 swapping and the value of /proc/sys/vm/swappiness Ray Bryant
2004-09-06 20:10 ` Andrew Morton
2004-09-06 20:10 ` Andrew Morton
2004-09-06 21:22 ` Ray Bryant
2004-09-06 21:22 ` Ray Bryant
2004-09-06 21:36 ` Andrew Morton
2004-09-06 21:36 ` Andrew Morton
2004-09-06 22:37 ` William Lee Irwin III
2004-09-06 22:37 ` William Lee Irwin III
2004-09-06 23:51 ` Nick Piggin
2004-09-06 23:51 ` Nick Piggin
2004-09-07 0:31 ` Ray Bryant
2004-09-07 0:31 ` Ray Bryant
2004-09-06 22:48 ` William Lee Irwin III
2004-09-06 22:48 ` William Lee Irwin III
2004-09-06 23:09 ` Con Kolivas
2004-09-06 23:09 ` Con Kolivas
2004-09-06 23:27 ` Andrew Morton
2004-09-06 23:27 ` Andrew Morton
2004-09-06 23:34 ` Con Kolivas
2004-09-06 23:34 ` Con Kolivas
2004-09-07 0:03 ` Marcelo Tosatti
2004-09-07 0:03 ` Marcelo Tosatti
2004-09-07 1:34 ` Con Kolivas
2004-09-07 1:34 ` Con Kolivas
2004-09-07 10:38 ` Nick Piggin
2004-09-07 10:38 ` Nick Piggin
2004-09-07 10:56 ` Con Kolivas
2004-09-08 16:45 ` Marcelo Tosatti
2004-09-08 16:45 ` Marcelo Tosatti
2004-09-09 1:12 ` Con Kolivas
2004-09-07 17:03 ` Ray Bryant
2004-09-07 17:03 ` Ray Bryant
2004-09-07 21:20 ` Marcelo Tosatti
2004-09-07 21:20 ` Marcelo Tosatti
2004-09-08 2:18 ` Marcelo Tosatti
2004-09-08 2:18 ` Marcelo Tosatti
2004-09-08 14:20 ` Ray Bryant
2004-09-08 16:54 ` Marcelo Tosatti
2004-09-08 16:54 ` Marcelo Tosatti
2004-09-08 19:35 ` Ray Bryant
2004-09-08 19:35 ` Ray Bryant
2004-09-08 19:30 ` Marcelo Tosatti
2004-09-08 19:30 ` Marcelo Tosatti
2004-09-09 3:06 ` Ray Bryant
2004-09-09 3:06 ` Ray Bryant
2004-09-09 2:14 ` Marcelo Tosatti
2004-09-09 2:14 ` Marcelo Tosatti
2004-09-09 14:21 ` Ray Bryant
2004-09-09 14:21 ` Ray Bryant
2004-09-09 3:09 ` William Lee Irwin III
2004-09-09 3:09 ` William Lee Irwin III
2004-09-09 14:16 ` Ray Bryant
2004-09-09 17:23 ` William Lee Irwin III
2004-09-09 17:23 ` William Lee Irwin III
2004-09-28 1:54 ` Ray Bryant
2004-09-28 3:36 ` Nick Piggin
2004-09-28 3:36 ` Nick Piggin
2004-09-29 0:36 ` Nick Piggin
2004-09-29 4:23 ` Ray Bryant
2004-09-29 4:23 ` Ray Bryant
2004-09-30 17:15 ` Ray Bryant
2004-09-30 17:15 ` Ray Bryant
2004-09-08 17:31 ` Martin J. Bligh
2004-09-08 17:31 ` Martin J. Bligh
2004-09-08 18:04 ` Rik van Riel
2004-09-08 18:04 ` Rik van Riel
2004-09-08 19:50 ` Diego Calleja
2004-09-08 19:50 ` Diego Calleja
2004-09-08 21:10 ` Martin J. Bligh
2004-09-08 21:10 ` Martin J. Bligh
2004-09-08 21:55 ` Diego Calleja
2004-09-08 21:55 ` Diego Calleja
2004-09-08 22:20 ` Martin J. Bligh
2004-09-08 22:20 ` Martin J. Bligh
2004-09-08 23:22 ` Rik van Riel
2004-09-08 23:22 ` Rik van Riel
2004-09-09 16:26 ` Bill Davidsen
2004-09-08 22:28 ` Alan Cox
2004-09-08 22:28 ` Alan Cox
2004-09-08 23:42 ` Martin J. Bligh
2004-09-08 23:42 ` Martin J. Bligh
2004-09-08 19:54 ` Ray Bryant
2004-09-08 19:54 ` Ray Bryant
2004-09-08 15:19 ` Ray Bryant
2004-09-08 15:19 ` Ray Bryant
2004-09-14 18:31 ` Florin Andrei
2004-09-14 20:15 ` Marcelo Tosatti
2004-09-14 22:53 ` Con Kolivas
2004-09-14 21:41 ` Marcelo Tosatti
2004-09-15 0:22 ` Con Kolivas
2004-09-16 18:50 ` Marcelo Tosatti
2004-09-17 0:22 ` Con Kolivas [this message]
2004-09-15 16:54 ` Florin Andrei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=414A2E5E.6060704@kolivas.org \
--to=kernel@kolivas.org \
--cc=akpm@osdl.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marcelo.tosatti@cyclades.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.