From: Mel Gorman <mgorman@suse.de>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
P?draig Brady <P@draigBrady.com>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Colin King <colin.king@canonical.com>,
Andrew Lutomirski <luto@mit.edu>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully
Date: Thu, 21 Jul 2011 18:01:12 +0100 [thread overview]
Message-ID: <20110721170112.GU5349@suse.de> (raw)
In-Reply-To: <20110721163649.GG1713@barrios-desktop>
On Fri, Jul 22, 2011 at 01:36:49AM +0900, Minchan Kim wrote:
> > > > <SNIP>
> > > > @@ -2740,17 +2742,23 @@ static int kswapd(void *p)
> > > > tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > > set_freezable();
> > > >
> > > > - order = 0;
> > > > - classzone_idx = MAX_NR_ZONES - 1;
> > > > + order = new_order = 0;
> > > > + classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
> > > > for ( ; ; ) {
> > > > - unsigned long new_order;
> > > > - int new_classzone_idx;
> > > > int ret;
> > > >
> > > > - new_order = pgdat->kswapd_max_order;
> > > > - new_classzone_idx = pgdat->classzone_idx;
> > > > - pgdat->kswapd_max_order = 0;
> > > > - pgdat->classzone_idx = MAX_NR_ZONES - 1;
> > > > + /*
> > > > + * If the last balance_pgdat was unsuccessful it's unlikely a
> > > > + * new request of a similar or harder type will succeed soon
> > > > + * so consider going to sleep on the basis we reclaimed at
> > > > + */
> > > > + if (classzone_idx >= new_classzone_idx && order == new_order) {
> > > > + new_order = pgdat->kswapd_max_order;
> > > > + new_classzone_idx = pgdat->classzone_idx;
> > > > + pgdat->kswapd_max_order = 0;
> > > > + pgdat->classzone_idx = pgdat->nr_zones - 1;
> > > > + }
> > > > +
> > >
> > > But in this part.
> > > Why do we need this?
> >
> > Lets say it's a fork-heavy workload and it is routinely being woken
> > for order-1 allocations and the highest zone is very small. For the
> > most part, it's ok because the allocations are being satisfied from
> > the lower zones which kswapd has no problem balancing.
> >
> > However, by reading the information even after failing to
> > balance, kswapd continues balancing for order-1 due to reading
> > pgdat->kswapd_max_order, each time failing for the highest zone. It
> > only takes one wakeup request per balance_pgdat() to keep kswapd
> > awake trying to balance the highest zone in a continual loop.
>
> You made balace_pgdat's classzone_idx as communicated back so classzone_idx returned
> would be not high zone and in [1/4], you changed that sleeping_prematurely consider only
> classzone_idx not nr_zones. So I think it should sleep if low zones is balanced.
>
If a wakeup for order-1 happened during the last pgdat, the
classzone_idx as communicated back from balance_pgdat() is lost and it
will not sleep in this ordering of events
kswapd other processes
====== ===============
order = balance_pgdat(pgdat, order, &classzone_idx);
wakeup for order-1
kswapd balances lower zone
allocate from lower zone
balance_pgdat fails balance for highest zone, returns
with lower classzone_idx and possibly lower order
new_order = pgdat->kswapd_max_order (order == 1)
new_classzone_idx = pgdat->classzone_idx (highest zone)
if (order < new_order || classzone_idx > new_classzone_idx) {
order = new_order;
classzone_idx = new_classzone_idx; (failure from balance_pgdat() lost)
}
order = balance_pgdat(pgdat, order, &classzone_idx);
The wakup for order-1 at any point during balance_pgdat() is enough to
keep kswapd awake even though the process that called wakeup_kswapd
would be able to allocate from the lower zones without significant
difficulty.
This is why if balance_pgdat() fails its request, it should go to sleep
if watermarks for the lower zones are met until woken by another
process.
--
Mel Gorman
SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
P?draig Brady <P@draigBrady.com>,
James Bottomley <James.Bottomley@HansenPartnership.com>,
Colin King <colin.king@canonical.com>,
Andrew Lutomirski <luto@mit.edu>, Rik van Riel <riel@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
linux-mm <linux-mm@kvack.org>,
linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully
Date: Thu, 21 Jul 2011 18:01:12 +0100 [thread overview]
Message-ID: <20110721170112.GU5349@suse.de> (raw)
In-Reply-To: <20110721163649.GG1713@barrios-desktop>
On Fri, Jul 22, 2011 at 01:36:49AM +0900, Minchan Kim wrote:
> > > > <SNIP>
> > > > @@ -2740,17 +2742,23 @@ static int kswapd(void *p)
> > > > tsk->flags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> > > > set_freezable();
> > > >
> > > > - order = 0;
> > > > - classzone_idx = MAX_NR_ZONES - 1;
> > > > + order = new_order = 0;
> > > > + classzone_idx = new_classzone_idx = pgdat->nr_zones - 1;
> > > > for ( ; ; ) {
> > > > - unsigned long new_order;
> > > > - int new_classzone_idx;
> > > > int ret;
> > > >
> > > > - new_order = pgdat->kswapd_max_order;
> > > > - new_classzone_idx = pgdat->classzone_idx;
> > > > - pgdat->kswapd_max_order = 0;
> > > > - pgdat->classzone_idx = MAX_NR_ZONES - 1;
> > > > + /*
> > > > + * If the last balance_pgdat was unsuccessful it's unlikely a
> > > > + * new request of a similar or harder type will succeed soon
> > > > + * so consider going to sleep on the basis we reclaimed at
> > > > + */
> > > > + if (classzone_idx >= new_classzone_idx && order == new_order) {
> > > > + new_order = pgdat->kswapd_max_order;
> > > > + new_classzone_idx = pgdat->classzone_idx;
> > > > + pgdat->kswapd_max_order = 0;
> > > > + pgdat->classzone_idx = pgdat->nr_zones - 1;
> > > > + }
> > > > +
> > >
> > > But in this part.
> > > Why do we need this?
> >
> > Lets say it's a fork-heavy workload and it is routinely being woken
> > for order-1 allocations and the highest zone is very small. For the
> > most part, it's ok because the allocations are being satisfied from
> > the lower zones which kswapd has no problem balancing.
> >
> > However, by reading the information even after failing to
> > balance, kswapd continues balancing for order-1 due to reading
> > pgdat->kswapd_max_order, each time failing for the highest zone. It
> > only takes one wakeup request per balance_pgdat() to keep kswapd
> > awake trying to balance the highest zone in a continual loop.
>
> You made balace_pgdat's classzone_idx as communicated back so classzone_idx returned
> would be not high zone and in [1/4], you changed that sleeping_prematurely consider only
> classzone_idx not nr_zones. So I think it should sleep if low zones is balanced.
>
If a wakeup for order-1 happened during the last pgdat, the
classzone_idx as communicated back from balance_pgdat() is lost and it
will not sleep in this ordering of events
kswapd other processes
====== ===============
order = balance_pgdat(pgdat, order, &classzone_idx);
wakeup for order-1
kswapd balances lower zone
allocate from lower zone
balance_pgdat fails balance for highest zone, returns
with lower classzone_idx and possibly lower order
new_order = pgdat->kswapd_max_order (order == 1)
new_classzone_idx = pgdat->classzone_idx (highest zone)
if (order < new_order || classzone_idx > new_classzone_idx) {
order = new_order;
classzone_idx = new_classzone_idx; (failure from balance_pgdat() lost)
}
order = balance_pgdat(pgdat, order, &classzone_idx);
The wakup for order-1 at any point during balance_pgdat() is enough to
keep kswapd awake even though the process that called wakeup_kswapd
would be able to allocate from the lower zones without significant
difficulty.
This is why if balance_pgdat() fails its request, it should go to sleep
if watermarks for the lower zones are met until woken by another
process.
--
Mel Gorman
SUSE Labs
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-07-21 17:01 UTC|newest]
Thread overview: 80+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-24 14:44 [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Mel Gorman
2011-06-24 14:44 ` Mel Gorman
2011-06-24 14:44 ` [PATCH 1/4] mm: vmscan: Correct check for kswapd sleeping in sleeping_prematurely Mel Gorman
2011-06-24 14:44 ` Mel Gorman
2011-06-25 21:33 ` Rik van Riel
2011-06-25 21:33 ` Rik van Riel
2011-06-27 6:10 ` Minchan Kim
2011-06-27 6:10 ` Minchan Kim
2011-06-28 21:49 ` Andrew Morton
2011-06-28 21:49 ` Andrew Morton
2011-06-29 10:57 ` Pádraig Brady
2011-06-29 10:57 ` Pádraig Brady
2011-06-30 9:39 ` Mel Gorman
2011-06-30 9:39 ` Mel Gorman
2011-06-30 2:23 ` KOSAKI Motohiro
2011-06-30 2:23 ` KOSAKI Motohiro
2011-06-24 14:44 ` [PATCH 2/4] mm: vmscan: Do not apply pressure to slab if we are not applying pressure to zone Mel Gorman
2011-06-24 14:44 ` Mel Gorman
2011-06-25 21:40 ` Rik van Riel
2011-06-25 21:40 ` Rik van Riel
2011-06-28 23:38 ` Minchan Kim
2011-06-28 23:38 ` Minchan Kim
2011-06-30 2:37 ` KOSAKI Motohiro
2011-06-30 2:37 ` KOSAKI Motohiro
2011-06-24 14:44 ` [PATCH 3/4] mm: vmscan: Evaluate the watermarks against the correct classzone Mel Gorman
2011-06-24 14:44 ` Mel Gorman
2011-06-25 21:42 ` Rik van Riel
2011-06-25 21:42 ` Rik van Riel
2011-06-27 6:53 ` Minchan Kim
2011-06-27 6:53 ` Minchan Kim
2011-06-28 12:52 ` Mel Gorman
2011-06-28 12:52 ` Mel Gorman
2011-06-28 23:23 ` Minchan Kim
2011-06-28 23:23 ` Minchan Kim
2011-06-28 23:23 ` Minchan Kim
2011-06-28 23:23 ` Minchan Kim
2011-06-24 14:44 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman
2011-06-24 14:44 ` Mel Gorman
2011-06-25 23:17 ` Rik van Riel
2011-06-25 23:17 ` Rik van Riel
2011-06-30 9:05 ` KOSAKI Motohiro
2011-06-30 9:05 ` KOSAKI Motohiro
2011-06-30 10:19 ` Mel Gorman
2011-06-30 10:19 ` Mel Gorman
2011-07-19 16:09 ` Minchan Kim
2011-07-19 16:09 ` Minchan Kim
2011-07-20 10:48 ` Mel Gorman
2011-07-20 10:48 ` Mel Gorman
2011-07-21 15:30 ` Minchan Kim
2011-07-21 15:30 ` Minchan Kim
2011-07-21 16:07 ` Mel Gorman
2011-07-21 16:07 ` Mel Gorman
2011-07-21 16:36 ` Minchan Kim
2011-07-21 16:36 ` Minchan Kim
2011-07-21 17:01 ` Mel Gorman [this message]
2011-07-21 17:01 ` Mel Gorman
2011-07-22 0:21 ` Minchan Kim
2011-07-22 0:21 ` Minchan Kim
2011-07-22 7:42 ` Mel Gorman
2011-07-22 7:42 ` Mel Gorman
2011-06-25 14:23 ` [PATCH 0/4] Stop kswapd consuming 100% CPU when highest zone is small Andrew Lutomirski
2011-06-25 14:23 ` Andrew Lutomirski
2011-07-21 15:37 ` Minchan Kim
2011-07-21 15:37 ` Minchan Kim
2011-07-21 16:09 ` Mel Gorman
2011-07-21 16:09 ` Mel Gorman
2011-07-21 16:24 ` Minchan Kim
2011-07-21 16:24 ` Minchan Kim
2011-07-21 16:36 ` Andrew Lutomirski
2011-07-21 16:36 ` Andrew Lutomirski
2011-07-21 16:42 ` Minchan Kim
2011-07-21 16:42 ` Minchan Kim
2011-07-21 16:58 ` Andrew Lutomirski
2011-07-21 16:58 ` Andrew Lutomirski
2011-07-22 0:30 ` Minchan Kim
2011-07-22 0:30 ` Minchan Kim
2011-07-22 13:21 ` Andrew Lutomirski
2011-07-22 13:21 ` Andrew Lutomirski
-- strict thread matches above, loose matches on Subject: below --
2011-06-24 13:43 Mel Gorman
2011-06-24 13:43 ` [PATCH 4/4] mm: vmscan: Only read new_classzone_idx from pgdat when reclaiming successfully Mel Gorman
2011-06-24 13:43 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110721170112.GU5349@suse.de \
--to=mgorman@suse.de \
--cc=James.Bottomley@HansenPartnership.com \
--cc=P@draigBrady.com \
--cc=akpm@linux-foundation.org \
--cc=colin.king@canonical.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@mit.edu \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.