public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Con Kolivas <kernel@kolivas.org>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: swapping and the value of /proc/sys/vm/swappiness
Date: Fri, 17 Sep 2004 10:22:54 +1000	[thread overview]
Message-ID: <414A2E5E.6060704@kolivas.org> (raw)
In-Reply-To: <20040916185019.GC11241@logos.cnet>

Marcelo Tosatti wrote:
> Con!
> 
> Spent some time reading your patch...

Great!

> Well if "distress" is getting higher (with similar workload/pressure) 
> thats because VM is having a harder time freeing pages (priority increases,
> distress increases).
> 
> You say "distress is getting higher in later kernels". Can you expand
> more on that? How did you find this out, and can you be more especific
> wrt "later kernels".

When I say earlier kernels I mean prior to 2.6.8.

I'm still referring to the "hard_swappiness patch" that Florin was using 
to fix his problem which diagnosed that distress increased. I'm sorry if 
my newer patch confuses the issue. hard_swappiness effectively changed this:

-distress = 100 >> zone->prev_priority
-mapped_ratio = (sc->nr_mapped * 100) / total_memory;
-swap_tendency = mapped_ratio / 2 + distress + vm_swappiness
-if (swap_tendency >= 100)
-		reclaim_mapped = 1;

into this:

+mapped_ratio = (sc->nr_mapped * 100) / total_memory;
+swap_tendency = mapped_ratio / 2 + vm_swappiness
+if (swap_tendency >= 100)
+		reclaim_mapped = 1;

This made swap_tendency dependant _only_ on the mapped_ratio. Now if you 
load up the same desktop and applications your mapped_ratio will be 
virtually identical regardless of the kernel. If you then copy a large 
file or convert a large video file etc, then the mapped ratio will be 
unchanged. Therefore if the swapping increased with this workload in 
2.6.8 and later kernels but did _not_ increase with hard_swappiness it 
must be the "distress" value which is entirely dependant on 
zone->prev_priority. Does that make my conclusion clearer?


Below here you're referring to my mapped_watermark patch so I'll address 
that separately to avoid confusion.

> I see you add a "z->nr_unmapped" watermark a bit above "z->pages_high", 
> and use that to set "pgdat->mapped_nrpages" to what needs to be freed 
> so z->free_pages reaches "z->nr_unmapped".
> 
> And then you use that per-pgdat "mapped_nrpages" count to avoid:
> 
> - moving mapped pages to inactive list (wasting the swappiness algorithm)
> - swapping out pages at shrink_list
> 
> Those two only happen when pgdat->mapped_nrpages is zero, which 
> becomes true when we go below pages_low.
> 
> To resume, deactivation/swapout of mapped pages only happens when we 
> go any zone pages_low.
> 
> Correct?

Yes apart from one big caveat. scanning is expensive, so it only scans 
at lowest priority (DEF_PRIORITY). If it fails to release enough memory 
it simply returns quietly. This means that if vm pressure is hard enough 
and occurs frequently/fast enough it will still drop down below 
pages_high even if the watermarks have not been re-achieved. Then the 
normal algorithm will take over.

> Now with v2.6 stock kernel, kswapd will deactivate (using vm_swappiness algorithm)
> and swapout pages between the low and high zone watermarks. 
> 
> That avoids swapping out as hard as possible until we go below pages_low. 
> 
> IMHO this might be OK for common desktop workloads where people complain 
> about swap, but might be harmful for other workloads where swapping out on
> advance unused anonymous process memory is a _gain_.

As I said, it only does it lightly, and it's tunable.

> I dont understand this check on balance_pgdat (kswapd worker function):

> +       if (maplimit && sc.nr_mapped * 100 / total_memory > vm_mapped)
> +               return 0;
> +
> 
> So "if not any zone is under pages_low, and more than vm_mapped % of ram
> is mapped, bail out." 

This will only be hit if "maplimit" is true. This means we have entered 
balance_pgdat only due to the unmapped watermark (zone->pages_min * 4). 
Here is where the real "tunable" comes into play. If greater than 
vm_mapped % of ram is mapped (ie application) pages, it will not do 
anything at this watermark. By default it is set to 66%. Setting it to 0 
inactivates this patch entirely and makes the vm behave much like 
setting swappiness to 100 in mainline.

> I still think swapout behaviour can be correctly tuned with vm_swappiness,
> and agree with Andrew on that we should not change anything in the algorithm
> if this can be tuned.

I agree it can be, but something in the logic has definitely changed, 
and a different value is not giving users like Florin the desired result 
any more.

Cheers,
Con

  reply	other threads:[~2004-09-17  0:23 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-06 19:11 swapping and the value of /proc/sys/vm/swappiness Ray Bryant
2004-09-06 20:10 ` Andrew Morton
2004-09-06 21:22   ` Ray Bryant
2004-09-06 21:36     ` Andrew Morton
2004-09-06 22:37     ` William Lee Irwin III
2004-09-06 23:51       ` Nick Piggin
2004-09-07  0:31         ` Ray Bryant
2004-09-06 22:48 ` William Lee Irwin III
2004-09-06 23:09 ` Con Kolivas
2004-09-06 23:27   ` Andrew Morton
2004-09-06 23:34     ` Con Kolivas
2004-09-07  0:03       ` Marcelo Tosatti
2004-09-07  1:34         ` Con Kolivas
2004-09-07 10:38         ` Nick Piggin
2004-09-07 10:56           ` Con Kolivas
2004-09-08 16:45             ` Marcelo Tosatti
2004-09-09  1:12               ` Con Kolivas
2004-09-07 17:03           ` Ray Bryant
2004-09-07 21:20         ` Marcelo Tosatti
2004-09-08  2:18           ` Marcelo Tosatti
2004-09-08 14:20           ` Ray Bryant
2004-09-08 16:54             ` Marcelo Tosatti
2004-09-08 19:35               ` Ray Bryant
2004-09-08 19:30                 ` Marcelo Tosatti
2004-09-09  3:06                   ` Ray Bryant
2004-09-09  2:14                     ` Marcelo Tosatti
2004-09-09 14:21                       ` Ray Bryant
2004-09-09  3:09                     ` William Lee Irwin III
2004-09-09 14:16                       ` Ray Bryant
2004-09-09 17:23                         ` William Lee Irwin III
2004-09-28  1:54                       ` Ray Bryant
2004-09-28  3:36                         ` Nick Piggin
2004-09-29  0:36                           ` Nick Piggin
2004-09-29  4:23                             ` Ray Bryant
2004-09-30 17:15                             ` Ray Bryant
2004-09-08 17:31             ` Martin J. Bligh
2004-09-08 18:04               ` Rik van Riel
2004-09-08 19:50                 ` Diego Calleja
2004-09-08 21:10                   ` Martin J. Bligh
2004-09-08 21:55                     ` Diego Calleja
2004-09-08 22:20                       ` Martin J. Bligh
2004-09-08 23:22                         ` Rik van Riel
2004-09-09 16:26                         ` Bill Davidsen
2004-09-08 22:28                     ` Alan Cox
2004-09-08 23:42                       ` Martin J. Bligh
2004-09-08 19:54               ` Ray Bryant
2004-09-08 15:19           ` Ray Bryant
2004-09-14 18:31     ` Florin Andrei
2004-09-14 20:15       ` Marcelo Tosatti
2004-09-14 22:53         ` Con Kolivas
2004-09-14 21:41           ` Marcelo Tosatti
2004-09-15  0:22             ` Con Kolivas
2004-09-16 18:50               ` Marcelo Tosatti
2004-09-17  0:22                 ` Con Kolivas [this message]
2004-09-15 16:54         ` Florin Andrei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=414A2E5E.6060704@kolivas.org \
    --to=kernel@kolivas.org \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox