Re: [PATCH] mm: wait for congestion to clear on all zones

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Zlatko Calusic <zlatko.calusic@iskon.hr>
To: Simon Jeons <simon.jeons@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: wait for congestion to clear on all zones
Date: Mon, 14 Jan 2013 15:37:30 +0100	[thread overview]
Message-ID: <50F4182A.80006@iskon.hr> (raw)
In-Reply-To: <1358038004.1466.4.camel@kernel.cn.ibm.com>

On 13.01.2013 01:46, Simon Jeons wrote:
> On Fri, 2013-01-11 at 12:25 +0100, Zlatko Calusic wrote:
>> On 11.01.2013 02:25, Simon Jeons wrote:
>>> On Wed, 2013-01-09 at 22:41 +0100, Zlatko Calusic wrote:
>>>> From: Zlatko Calusic <zlatko.calusic@iskon.hr>
>>>>
>>>> Currently we take a short nap (HZ/10) and wait for congestion to clear
>>>> before taking another pass with lower priority in balance_pgdat(). But
>>>> we do that only for the highest zone that we encounter is unbalanced
>>>> and congested.
>>>>
>>>> This patch changes that to wait on all congested zones in a single
>>>> pass in the hope that it will save us some scanning that way. Also we
>>>> take a nap as soon as congested zone is encountered and sc.priority <
>>>> DEF_PRIORITY - 2 (aka kswapd in trouble).
>>>
>>> But you still didn't explain what's the problem you meat and what
>>> scenario can get benefit from your change.
>>>
>>
>> I did in my reply to Andrew. Here's the relevant part:
>>
>>> I have an observation that without it, under some circumstances that
>>> are VERY HARD to repeat (many days need to pass and some stars to align
>>> to see the effect), the page cache gets hit hard, 2/3 of it evicted in
>>> a split second. And it's not even under high load! So, I'm still
>>> monitoring it, but so far the memory utilization really seems better
>>> with the patch applied (no more mysterious page cache shootdowns).
>>
>> The scenario that should get benefit is everyday. I observed problems during
>> light but constant reading from disk (< 10MB/s). And sending that data
>> over the network at the same time. Think backup that compresses data on the
>> fly before pushing it over the network (so it's not very fast).
>>
>> The trouble is that you can't just fix up a quick benchmark and measure the
>> impact, because many days need to pass for the bug to show up in all it's beauty.
>>
>> Is there anybody out there who'd like to comment on the patch logic? I.e. do
>> you think that waiting on every congested zone is the more correct solution
>> than waiting on only one (only the highest one, and ignoring the fact that
>> there may be other even more congested zones)?
> 
> What's the benefit of waiting on every congested zone than waiting on
> only one against your scenario?
> 

The good:

Actually, we are _already_ waiting on every congested zone. And have
been for more than a year. So, all this discussion is... moot.

Andrew, ignore this patch, I'll send you a much better one in a minute.
There shouldn't be nearly so many questions about that one. ;)

The bad:

Obviously then, this patch didn't fix my issue. It just took a little
bit longer for it to appear again.

The ugly:

Here's what I observe on one of my machines:

Node 0, zone      DMA
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
Node 0, zone    DMA32
    nr_vmscan_write 23164
    nr_vmscan_immediate_reclaim 582038
Node 0, zone   Normal
    nr_vmscan_write 16584344  <-- ugh!
    nr_vmscan_immediate_reclaim 1118415

But that's just a sneak peek, I'll open a proper thread to discuss this
when I collect a little bit more data. BTW, that Normal zone with
extraordinary amount of writebacks under memory pressure is 4 times
smaller than DMA32 zone, that's why I consider it ugly. :P
-- 
Zlatko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)

From: Zlatko Calusic <zlatko.calusic@iskon.hr>
To: Simon Jeons <simon.jeons@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>, Hugh Dickins <hughd@google.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	linux-mm <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] mm: wait for congestion to clear on all zones
Date: Mon, 14 Jan 2013 15:37:30 +0100	[thread overview]
Message-ID: <50F4182A.80006@iskon.hr> (raw)
In-Reply-To: <1358038004.1466.4.camel@kernel.cn.ibm.com>

On 13.01.2013 01:46, Simon Jeons wrote:
> On Fri, 2013-01-11 at 12:25 +0100, Zlatko Calusic wrote:
>> On 11.01.2013 02:25, Simon Jeons wrote:
>>> On Wed, 2013-01-09 at 22:41 +0100, Zlatko Calusic wrote:
>>>> From: Zlatko Calusic <zlatko.calusic@iskon.hr>
>>>>
>>>> Currently we take a short nap (HZ/10) and wait for congestion to clear
>>>> before taking another pass with lower priority in balance_pgdat(). But
>>>> we do that only for the highest zone that we encounter is unbalanced
>>>> and congested.
>>>>
>>>> This patch changes that to wait on all congested zones in a single
>>>> pass in the hope that it will save us some scanning that way. Also we
>>>> take a nap as soon as congested zone is encountered and sc.priority <
>>>> DEF_PRIORITY - 2 (aka kswapd in trouble).
>>>
>>> But you still didn't explain what's the problem you meat and what
>>> scenario can get benefit from your change.
>>>
>>
>> I did in my reply to Andrew. Here's the relevant part:
>>
>>> I have an observation that without it, under some circumstances that
>>> are VERY HARD to repeat (many days need to pass and some stars to align
>>> to see the effect), the page cache gets hit hard, 2/3 of it evicted in
>>> a split second. And it's not even under high load! So, I'm still
>>> monitoring it, but so far the memory utilization really seems better
>>> with the patch applied (no more mysterious page cache shootdowns).
>>
>> The scenario that should get benefit is everyday. I observed problems during
>> light but constant reading from disk (< 10MB/s). And sending that data
>> over the network at the same time. Think backup that compresses data on the
>> fly before pushing it over the network (so it's not very fast).
>>
>> The trouble is that you can't just fix up a quick benchmark and measure the
>> impact, because many days need to pass for the bug to show up in all it's beauty.
>>
>> Is there anybody out there who'd like to comment on the patch logic? I.e. do
>> you think that waiting on every congested zone is the more correct solution
>> than waiting on only one (only the highest one, and ignoring the fact that
>> there may be other even more congested zones)?
> 
> What's the benefit of waiting on every congested zone than waiting on
> only one against your scenario?
> 

The good:

Actually, we are _already_ waiting on every congested zone. And have
been for more than a year. So, all this discussion is... moot.

Andrew, ignore this patch, I'll send you a much better one in a minute.
There shouldn't be nearly so many questions about that one. ;)

The bad:

Obviously then, this patch didn't fix my issue. It just took a little
bit longer for it to appear again.

The ugly:

Here's what I observe on one of my machines:

Node 0, zone      DMA
    nr_vmscan_write 0
    nr_vmscan_immediate_reclaim 0
Node 0, zone    DMA32
    nr_vmscan_write 23164
    nr_vmscan_immediate_reclaim 582038
Node 0, zone   Normal
    nr_vmscan_write 16584344  <-- ugh!
    nr_vmscan_immediate_reclaim 1118415

But that's just a sneak peek, I'll open a proper thread to discuss this
when I collect a little bit more data. BTW, that Normal zone with
extraordinary amount of writebacks under memory pressure is 4 times
smaller than DMA32 zone, that's why I consider it ugly. :P
-- 
Zlatko

next prev parent reply	other threads:[~2013-01-14 14:37 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-09 21:41 [PATCH] mm: wait for congestion to clear on all zones Zlatko Calusic
2013-01-09 21:41 ` Zlatko Calusic
2013-01-09 21:48 ` Andrew Morton
2013-01-09 21:48   ` Andrew Morton
2013-01-09 22:15   ` Zlatko Calusic
2013-01-09 22:15     ` Zlatko Calusic
2013-01-09 22:52   ` Zlatko Calusic
2013-01-09 22:52     ` Zlatko Calusic
2013-01-11  1:25 ` Simon Jeons
2013-01-11  1:25   ` Simon Jeons
2013-01-11 11:25   ` Zlatko Calusic
2013-01-11 11:25     ` Zlatko Calusic
2013-01-13  0:46     ` Simon Jeons
2013-01-13  0:46       ` Simon Jeons
2013-01-14 14:37       ` Zlatko Calusic [this message]
2013-01-14 14:37         ` Zlatko Calusic

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50F4182A.80006@iskon.hr \
    --to=zlatko.calusic@iskon.hr \
    --cc=akpm@linux-foundation.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=minchan.kim@gmail.com \
    --cc=simon.jeons@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.