All of lore.kernel.org
 help / color / mirror / Atom feed
From: Con Kolivas <kernel@kolivas.org>
To: Marcelo Tosatti <marcelo.tosatti@cyclades.com>
Cc: linux-kernel@vger.kernel.org, akpm@osdl.org
Subject: Re: swapping and the value of /proc/sys/vm/swappiness
Date: Fri, 17 Sep 2004 10:22:54 +1000	[thread overview]
Message-ID: <414A2E5E.6060704@kolivas.org> (raw)
In-Reply-To: <20040916185019.GC11241@logos.cnet>

Marcelo Tosatti wrote:
> Con!
> 
> Spent some time reading your patch...

Great!

> Well if "distress" is getting higher (with similar workload/pressure) 
> thats because VM is having a harder time freeing pages (priority increases,
> distress increases).
> 
> You say "distress is getting higher in later kernels". Can you expand
> more on that? How did you find this out, and can you be more especific
> wrt "later kernels".

When I say earlier kernels I mean prior to 2.6.8.

I'm still referring to the "hard_swappiness patch" that Florin was using 
to fix his problem which diagnosed that distress increased. I'm sorry if 
my newer patch confuses the issue. hard_swappiness effectively changed this:

-distress = 100 >> zone->prev_priority
-mapped_ratio = (sc->nr_mapped * 100) / total_memory;
-swap_tendency = mapped_ratio / 2 + distress + vm_swappiness
-if (swap_tendency >= 100)
-		reclaim_mapped = 1;

into this:

+mapped_ratio = (sc->nr_mapped * 100) / total_memory;
+swap_tendency = mapped_ratio / 2 + vm_swappiness
+if (swap_tendency >= 100)
+		reclaim_mapped = 1;

This made swap_tendency dependant _only_ on the mapped_ratio. Now if you 
load up the same desktop and applications your mapped_ratio will be 
virtually identical regardless of the kernel. If you then copy a large 
file or convert a large video file etc, then the mapped ratio will be 
unchanged. Therefore if the swapping increased with this workload in 
2.6.8 and later kernels but did _not_ increase with hard_swappiness it 
must be the "distress" value which is entirely dependant on 
zone->prev_priority. Does that make my conclusion clearer?


Below here you're referring to my mapped_watermark patch so I'll address 
that separately to avoid confusion.

> I see you add a "z->nr_unmapped" watermark a bit above "z->pages_high", 
> and use that to set "pgdat->mapped_nrpages" to what needs to be freed 
> so z->free_pages reaches "z->nr_unmapped".
> 
> And then you use that per-pgdat "mapped_nrpages" count to avoid:
> 
> - moving mapped pages to inactive list (wasting the swappiness algorithm)
> - swapping out pages at shrink_list
> 
> Those two only happen when pgdat->mapped_nrpages is zero, which 
> becomes true when we go below pages_low.
> 
> To resume, deactivation/swapout of mapped pages only happens when we 
> go any zone pages_low.
> 
> Correct?

Yes apart from one big caveat. scanning is expensive, so it only scans 
at lowest priority (DEF_PRIORITY). If it fails to release enough memory 
it simply returns quietly. This means that if vm pressure is hard enough 
and occurs frequently/fast enough it will still drop down below 
pages_high even if the watermarks have not been re-achieved. Then the 
normal algorithm will take over.

> Now with v2.6 stock kernel, kswapd will deactivate (using vm_swappiness algorithm)
> and swapout pages between the low and high zone watermarks. 
> 
> That avoids swapping out as hard as possible until we go below pages_low. 
> 
> IMHO this might be OK for common desktop workloads where people complain 
> about swap, but might be harmful for other workloads where swapping out on
> advance unused anonymous process memory is a _gain_.

As I said, it only does it lightly, and it's tunable.

> I dont understand this check on balance_pgdat (kswapd worker function):

> +       if (maplimit && sc.nr_mapped * 100 / total_memory > vm_mapped)
> +               return 0;
> +
> 
> So "if not any zone is under pages_low, and more than vm_mapped % of ram
> is mapped, bail out." 

This will only be hit if "maplimit" is true. This means we have entered 
balance_pgdat only due to the unmapped watermark (zone->pages_min * 4). 
Here is where the real "tunable" comes into play. If greater than 
vm_mapped % of ram is mapped (ie application) pages, it will not do 
anything at this watermark. By default it is set to 66%. Setting it to 0 
inactivates this patch entirely and makes the vm behave much like 
setting swappiness to 100 in mainline.

> I still think swapout behaviour can be correctly tuned with vm_swappiness,
> and agree with Andrew on that we should not change anything in the algorithm
> if this can be tuned.

I agree it can be, but something in the logic has definitely changed, 
and a different value is not giving users like Florin the desired result 
any more.

Cheers,
Con

  reply	other threads:[~2004-09-17  0:23 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-09-06 19:11 swapping and the value of /proc/sys/vm/swappiness Ray Bryant
2004-09-06 20:10 ` Andrew Morton
2004-09-06 20:10   ` Andrew Morton
2004-09-06 21:22   ` Ray Bryant
2004-09-06 21:22     ` Ray Bryant
2004-09-06 21:36     ` Andrew Morton
2004-09-06 21:36       ` Andrew Morton
2004-09-06 22:37     ` William Lee Irwin III
2004-09-06 22:37       ` William Lee Irwin III
2004-09-06 23:51       ` Nick Piggin
2004-09-06 23:51         ` Nick Piggin
2004-09-07  0:31         ` Ray Bryant
2004-09-07  0:31           ` Ray Bryant
2004-09-06 22:48 ` William Lee Irwin III
2004-09-06 22:48   ` William Lee Irwin III
2004-09-06 23:09 ` Con Kolivas
2004-09-06 23:09   ` Con Kolivas
2004-09-06 23:27   ` Andrew Morton
2004-09-06 23:27     ` Andrew Morton
2004-09-06 23:34     ` Con Kolivas
2004-09-06 23:34       ` Con Kolivas
2004-09-07  0:03       ` Marcelo Tosatti
2004-09-07  0:03         ` Marcelo Tosatti
2004-09-07  1:34         ` Con Kolivas
2004-09-07  1:34           ` Con Kolivas
2004-09-07 10:38         ` Nick Piggin
2004-09-07 10:38           ` Nick Piggin
2004-09-07 10:56           ` Con Kolivas
2004-09-08 16:45             ` Marcelo Tosatti
2004-09-08 16:45               ` Marcelo Tosatti
2004-09-09  1:12               ` Con Kolivas
2004-09-07 17:03           ` Ray Bryant
2004-09-07 17:03             ` Ray Bryant
2004-09-07 21:20         ` Marcelo Tosatti
2004-09-07 21:20           ` Marcelo Tosatti
2004-09-08  2:18           ` Marcelo Tosatti
2004-09-08  2:18             ` Marcelo Tosatti
2004-09-08 14:20           ` Ray Bryant
2004-09-08 16:54             ` Marcelo Tosatti
2004-09-08 16:54               ` Marcelo Tosatti
2004-09-08 19:35               ` Ray Bryant
2004-09-08 19:35                 ` Ray Bryant
2004-09-08 19:30                 ` Marcelo Tosatti
2004-09-08 19:30                   ` Marcelo Tosatti
2004-09-09  3:06                   ` Ray Bryant
2004-09-09  3:06                     ` Ray Bryant
2004-09-09  2:14                     ` Marcelo Tosatti
2004-09-09  2:14                       ` Marcelo Tosatti
2004-09-09 14:21                       ` Ray Bryant
2004-09-09 14:21                         ` Ray Bryant
2004-09-09  3:09                     ` William Lee Irwin III
2004-09-09  3:09                       ` William Lee Irwin III
2004-09-09 14:16                       ` Ray Bryant
2004-09-09 17:23                         ` William Lee Irwin III
2004-09-09 17:23                           ` William Lee Irwin III
2004-09-28  1:54                       ` Ray Bryant
2004-09-28  3:36                         ` Nick Piggin
2004-09-28  3:36                           ` Nick Piggin
2004-09-29  0:36                           ` Nick Piggin
2004-09-29  4:23                             ` Ray Bryant
2004-09-29  4:23                               ` Ray Bryant
2004-09-30 17:15                             ` Ray Bryant
2004-09-30 17:15                               ` Ray Bryant
2004-09-08 17:31             ` Martin J. Bligh
2004-09-08 17:31               ` Martin J. Bligh
2004-09-08 18:04               ` Rik van Riel
2004-09-08 18:04                 ` Rik van Riel
2004-09-08 19:50                 ` Diego Calleja
2004-09-08 19:50                   ` Diego Calleja
2004-09-08 21:10                   ` Martin J. Bligh
2004-09-08 21:10                     ` Martin J. Bligh
2004-09-08 21:55                     ` Diego Calleja
2004-09-08 21:55                       ` Diego Calleja
2004-09-08 22:20                       ` Martin J. Bligh
2004-09-08 22:20                         ` Martin J. Bligh
2004-09-08 23:22                         ` Rik van Riel
2004-09-08 23:22                           ` Rik van Riel
2004-09-09 16:26                         ` Bill Davidsen
2004-09-08 22:28                     ` Alan Cox
2004-09-08 22:28                       ` Alan Cox
2004-09-08 23:42                       ` Martin J. Bligh
2004-09-08 23:42                         ` Martin J. Bligh
2004-09-08 19:54               ` Ray Bryant
2004-09-08 19:54                 ` Ray Bryant
2004-09-08 15:19           ` Ray Bryant
2004-09-08 15:19             ` Ray Bryant
2004-09-14 18:31     ` Florin Andrei
2004-09-14 20:15       ` Marcelo Tosatti
2004-09-14 22:53         ` Con Kolivas
2004-09-14 21:41           ` Marcelo Tosatti
2004-09-15  0:22             ` Con Kolivas
2004-09-16 18:50               ` Marcelo Tosatti
2004-09-17  0:22                 ` Con Kolivas [this message]
2004-09-15 16:54         ` Florin Andrei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=414A2E5E.6060704@kolivas.org \
    --to=kernel@kolivas.org \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcelo.tosatti@cyclades.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.