All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
	Andrew Lutomirski <luto@mit.edu>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	fengguang.wu@intel.com, andi@firstfloor.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, hannes@cmpxchg.org,
	riel@redhat.com
Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)
Date: Mon, 23 May 2011 18:35:35 +0100	[thread overview]
Message-ID: <20110523173535.GZ5279@suse.de> (raw)
In-Reply-To: <20110523164225.GA14734@random.random>

On Mon, May 23, 2011 at 06:42:25PM +0200, Andrea Arcangeli wrote:
> On Mon, May 23, 2011 at 08:12:50AM +0900, Minchan Kim wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 292582c..1663d24 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> >        if (scanned == 0)
> >                scanned = SWAP_CLUSTER_MAX;
> > 
> > -       if (!down_read_trylock(&shrinker_rwsem))
> > -               return 1;       /* Assume we'll be able to shrink next time */
> > +       if (!down_read_trylock(&shrinker_rwsem)) {
> > +               /* Assume we'll be able to shrink next time */
> > +               ret = 1;
> > +               goto out;
> > +       }
> 
> It looks cleaner to return -1 here to differentiate the failure in
> taking the lock from when we take the lock and just 1 object is
> freed. Callers seems to be ok with -1 already and more intuitive for
> the while (nr > 10) loops too (those loops could be changed to "while
> (nr > 0)" if all shrinkers are accurate and not doing something
> inaccurate like the above code did, the shrinkers retvals I didn't
> check yet).
> 

Only one caller reads the value of shrink_slab() and while it would
survive -1 being returned, it gains nothing. I don't see it as being
much clearer than the existing return value of 1.

> >        up_read(&shrinker_rwsem);
> > +out:
> > +       cond_resched();
> >        return ret;
> >  }
> 
> If we enter the loop some of the shrinkers will reschedule but it
> looks good for the last iteration that may have still run for some
> time before returning.

Yes.

> The actual failure of shrinker_rwsem seems only
> theoretical though (but ok to cover it too with the cond_resched, but
> in practice this should be more for the case where shrinker_rwsem
> doesn't fail).
> 

Profiles from some users imply that this condition is being hit. I
can't 100% prove it as I can't reproduce the problem locally
(seems to require a sandybridge laptop for some reason). Tests did
show that kswapd CPU usage was reduced as well as the liklihood
of hanging when shrink_slab used cond_resched() like this. See
https://lkml.org/lkml/2011/5/17/274 .

> > @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
> > *pgdat, int order, long remaining,
> >         * must be balanced
> >         */
> >        if (order)
> > -               return pgdat_balanced(pgdat, balanced, classzone_idx);
> > +               return !pgdat_balanced(pgdat, balanced, classzone_idx);
> >        else
> >                return !all_zones_ok;
> >  }
> 
> I now wonder if this is why compaction in kswapd didn't work out well
> and kswapd would spin at 100% load so much when compaction was added,

It's possible.

> plus with kswapd-compaction patch I think this code should be changed
> to:
> 
>  if (!COMPACTION_BUILD && order)
>   return !pgdat_balanced();
>  else
>   return !all_zones_ok;
> 
> (but only with kswapd-compaction)
> 

Why? kswapd can enter lumpy reclaim when !COMPACTION_BUILD. While this
is hardly desirable, I don't see why kswapd should use different logic
for balancing depending on whether compaction is used or not.

> I should probably give kswapd-compaction another spin after fixing
> this, because with compaction kswapd should be super successful at
> satisfying zone_watermark_ok_safe(zone, _order_...) in the
> sleeping_prematurely high watermark check, leading to pgdat_balanced
> returning true most of the time (which would make kswapd go crazy spin
> instead of stopping as it was supposed to). Mel, do you also think
> it's worth another try with a fixed sleeping_prematurely like above?
> 

It's worth a try anyway although I think it's more important to figure
out if all_unreclaimable is being improperly set or not.

> Another thing, I'm not excited of the schedule_timeout(HZ/10) in
> kswapd_try_to_sleep(), it seems all for the statistics.

It's to catch where kswapd balances a zone but continual allocations put
the zone under the high watermark quickly. It's to keep kswapd awake to
reduce the likelihood that processes get hit the min watermark and
stall.

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
	Andrew Lutomirski <luto@mit.edu>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	fengguang.wu@intel.com, andi@firstfloor.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, hannes@cmpxchg.org,
	riel@redhat.com
Subject: Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)
Date: Mon, 23 May 2011 18:35:35 +0100	[thread overview]
Message-ID: <20110523173535.GZ5279@suse.de> (raw)
In-Reply-To: <20110523164225.GA14734@random.random>

On Mon, May 23, 2011 at 06:42:25PM +0200, Andrea Arcangeli wrote:
> On Mon, May 23, 2011 at 08:12:50AM +0900, Minchan Kim wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 292582c..1663d24 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> >        if (scanned == 0)
> >                scanned = SWAP_CLUSTER_MAX;
> > 
> > -       if (!down_read_trylock(&shrinker_rwsem))
> > -               return 1;       /* Assume we'll be able to shrink next time */
> > +       if (!down_read_trylock(&shrinker_rwsem)) {
> > +               /* Assume we'll be able to shrink next time */
> > +               ret = 1;
> > +               goto out;
> > +       }
> 
> It looks cleaner to return -1 here to differentiate the failure in
> taking the lock from when we take the lock and just 1 object is
> freed. Callers seems to be ok with -1 already and more intuitive for
> the while (nr > 10) loops too (those loops could be changed to "while
> (nr > 0)" if all shrinkers are accurate and not doing something
> inaccurate like the above code did, the shrinkers retvals I didn't
> check yet).
> 

Only one caller reads the value of shrink_slab() and while it would
survive -1 being returned, it gains nothing. I don't see it as being
much clearer than the existing return value of 1.

> >        up_read(&shrinker_rwsem);
> > +out:
> > +       cond_resched();
> >        return ret;
> >  }
> 
> If we enter the loop some of the shrinkers will reschedule but it
> looks good for the last iteration that may have still run for some
> time before returning.

Yes.

> The actual failure of shrinker_rwsem seems only
> theoretical though (but ok to cover it too with the cond_resched, but
> in practice this should be more for the case where shrinker_rwsem
> doesn't fail).
> 

Profiles from some users imply that this condition is being hit. I
can't 100% prove it as I can't reproduce the problem locally
(seems to require a sandybridge laptop for some reason). Tests did
show that kswapd CPU usage was reduced as well as the liklihood
of hanging when shrink_slab used cond_resched() like this. See
https://lkml.org/lkml/2011/5/17/274 .

> > @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
> > *pgdat, int order, long remaining,
> >         * must be balanced
> >         */
> >        if (order)
> > -               return pgdat_balanced(pgdat, balanced, classzone_idx);
> > +               return !pgdat_balanced(pgdat, balanced, classzone_idx);
> >        else
> >                return !all_zones_ok;
> >  }
> 
> I now wonder if this is why compaction in kswapd didn't work out well
> and kswapd would spin at 100% load so much when compaction was added,

It's possible.

> plus with kswapd-compaction patch I think this code should be changed
> to:
> 
>  if (!COMPACTION_BUILD && order)
>   return !pgdat_balanced();
>  else
>   return !all_zones_ok;
> 
> (but only with kswapd-compaction)
> 

Why? kswapd can enter lumpy reclaim when !COMPACTION_BUILD. While this
is hardly desirable, I don't see why kswapd should use different logic
for balancing depending on whether compaction is used or not.

> I should probably give kswapd-compaction another spin after fixing
> this, because with compaction kswapd should be super successful at
> satisfying zone_watermark_ok_safe(zone, _order_...) in the
> sleeping_prematurely high watermark check, leading to pgdat_balanced
> returning true most of the time (which would make kswapd go crazy spin
> instead of stopping as it was supposed to). Mel, do you also think
> it's worth another try with a fixed sleeping_prematurely like above?
> 

It's worth a try anyway although I think it's more important to figure
out if all_unreclaimable is being improperly set or not.

> Another thing, I'm not excited of the schedule_timeout(HZ/10) in
> kswapd_try_to_sleep(), it seems all for the statistics.

It's to catch where kswapd balances a zone but continual allocations put
the zone under the high watermark quickly. It's to keep kswapd awake to
reduce the likelihood that processes get hit the min watermark and
stall.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2011-05-23 17:35 UTC|newest]

Thread overview: 102+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-11 22:42 Kernel falls apart under light memory pressure (i.e. linking vmlinux) Andrew Lutomirski
2011-05-11 23:07 ` Andi Kleen
2011-05-11 23:28   ` Andrew Lutomirski
2011-05-12  5:46     ` Andi Kleen
2011-05-12 11:54       ` Andrew Lutomirski
2011-05-14 15:46         ` Andrew Lutomirski
2011-05-14 15:46           ` Andrew Lutomirski
2011-05-14 16:53           ` Andi Kleen
2011-05-14 16:53             ` Andi Kleen
     [not found]             ` <BANLkTik6SS9NH7XVSRBoCR16_5veY0MKBw@mail.gmail.com>
2011-05-14 17:43               ` Andi Kleen
2011-05-15  1:37                 ` Minchan Kim
2011-05-15 15:27                   ` Wu Fengguang
2011-05-15 15:27                     ` Wu Fengguang
2011-05-15 15:59                     ` Andrew Lutomirski
2011-05-15 15:59                       ` Andrew Lutomirski
2011-05-15 22:58                       ` Minchan Kim
2011-05-15 22:58                         ` Minchan Kim
2011-05-16  8:51                         ` Mel Gorman
2011-05-16  8:51                           ` Mel Gorman
2011-05-15 16:12                     ` Andrew Lutomirski
2011-05-15 16:12                       ` Andrew Lutomirski
2011-05-17  6:00                       ` Wu Fengguang
2011-05-17  6:00                         ` Wu Fengguang
2011-05-17  6:35                         ` Minchan Kim
2011-05-17  6:35                           ` Minchan Kim
2011-05-17 19:22                         ` Andrew Lutomirski
2011-05-18  5:17                           ` Minchan Kim
2011-05-18  5:17                             ` Minchan Kim
2011-05-19  2:15                             ` Andrew Lutomirski
2011-05-19  2:30                               ` KAMEZAWA Hiroyuki
2011-05-19  2:30                                 ` KAMEZAWA Hiroyuki
2011-05-19  2:41                                 ` Andrew Lutomirski
2011-05-19  2:54                               ` Minchan Kim
2011-05-19  2:54                                 ` Minchan Kim
2011-05-19 14:16                                 ` Andrew Lutomirski
2011-05-20  0:17                                   ` Minchan Kim
2011-05-20  0:17                                     ` Minchan Kim
2011-05-20  2:58                                   ` Andrew Lutomirski
2011-05-20  2:58                                     ` Andrew Lutomirski
2011-05-20  3:12                                     ` KOSAKI Motohiro
2011-05-20  3:12                                       ` KOSAKI Motohiro
2011-05-20  3:38                                       ` Andrew Lutomirski
2011-05-20  3:38                                         ` Andrew Lutomirski
2011-05-20  4:20                                         ` Minchan Kim
2011-05-20  4:20                                           ` Minchan Kim
2011-05-20  5:08                                           ` KAMEZAWA Hiroyuki
2011-05-20  5:08                                             ` KAMEZAWA Hiroyuki
2011-05-20  5:36                                             ` Minchan Kim
2011-05-20  5:36                                               ` Minchan Kim
2011-05-20  7:43                                               ` KAMEZAWA Hiroyuki
2011-05-20  7:43                                                 ` KAMEZAWA Hiroyuki
2011-05-20 10:11                                             ` Andrea Arcangeli
2011-05-20 10:11                                               ` Andrea Arcangeli
2011-05-20 14:11                                               ` Andrew Lutomirski
2011-05-20 15:33                                                 ` Minchan Kim
2011-05-20 15:33                                                   ` Minchan Kim
2011-05-20 16:01                                                   ` Andrew Lutomirski
2011-05-20 16:01                                                     ` Andrew Lutomirski
2011-05-20 16:19                                                     ` Minchan Kim
2011-05-20 16:19                                                       ` Minchan Kim
2011-05-20 18:09                                                       ` Andrew Lutomirski
2011-05-20 18:40                                                         ` Andrew Lutomirski
2011-05-20 18:40                                                           ` Andrew Lutomirski
2011-05-21 12:04                                                         ` KOSAKI Motohiro
2011-05-21 12:04                                                           ` KOSAKI Motohiro
2011-05-21 13:34                                                           ` Andrew Lutomirski
2011-05-21 13:34                                                             ` Andrew Lutomirski
2011-05-21 14:14                                                             ` KOSAKI Motohiro
2011-05-21 14:14                                                               ` KOSAKI Motohiro
2011-05-21 14:44                                                             ` Minchan Kim
2011-05-21 14:44                                                               ` Minchan Kim
2011-05-22 12:22                                                               ` Andrew Lutomirski
2011-05-22 12:22                                                                 ` Andrew Lutomirski
2011-05-22 23:12                                                                 ` Minchan Kim
2011-05-22 23:12                                                                   ` Minchan Kim
2011-05-23 16:42                                                                   ` Andrea Arcangeli
2011-05-23 16:42                                                                     ` Andrea Arcangeli
2011-05-23 17:35                                                                     ` Mel Gorman [this message]
2011-05-23 17:35                                                                       ` Mel Gorman
2011-05-24  1:19                                                                   ` Andrew Lutomirski
2011-05-24  1:34                                                                     ` Minchan Kim
2011-05-24  1:34                                                                       ` Minchan Kim
2011-05-24 11:24                                                                       ` Andrew Lutomirski
2011-05-24 11:24                                                                         ` Andrew Lutomirski
2011-05-24 11:55                                                                         ` Andrew Lutomirski
2011-05-25  0:43                                                                           ` KOSAKI Motohiro
2011-05-25  0:43                                                                             ` KOSAKI Motohiro
2011-05-21 14:31                                                           ` Minchan Kim
2011-05-21 14:31                                                             ` Minchan Kim
2011-05-19 14:51                             ` Wu Fengguang
2011-05-19 14:51                               ` Wu Fengguang
2011-05-19 15:00                               ` Andrew Lutomirski
2011-05-19 15:00                                 ` Andrew Lutomirski
2011-05-20  0:20                               ` Minchan Kim
2011-05-20  0:20                                 ` Minchan Kim
2011-05-15 22:40                     ` Minchan Kim
2011-05-15 22:40                       ` Minchan Kim
2011-05-17  5:52                       ` Wu Fengguang
2011-05-17  5:52                         ` Wu Fengguang
2011-05-17  6:26                         ` Minchan Kim
2011-05-17  6:26                           ` Minchan Kim
2011-05-20 10:40   ` Andrea Arcangeli

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110523173535.GZ5279@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=andi@firstfloor.org \
    --cc=fengguang.wu@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@mit.edu \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.