All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, xemul@openvz.org,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: Re: Re: [PATCH 4/13] memcg: force_empty moving account
Date: Mon, 22 Sep 2008 17:32:48 +0200	[thread overview]
Message-ID: <1222097568.16700.33.camel@lappy.programming.kicks-ass.net> (raw)
In-Reply-To: <28237198.1222095970373.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, 2008-09-23 at 00:06 +0900, kamezawa.hiroyu@jp.fujitsu.com wrote:
> ----- Original Message -----
> >On Mon, 2008-09-22 at 23:50 +0900, kamezawa.hiroyu@jp.fujitsu.com wrote:
> >> ----- Original Message -----
> >> >> +			spin_lock_irqsave(&mz->lru_lock, flags);
> >> >> +		} else {
> >> >> +			unlock_page(page);
> >> >> +			put_page(page);
> >> >> +		}
> >> >> +		if (atomic_read(&mem->css.cgroup->count) > 0)
> >> >> +			break;
> >> >>  	}
> >> >>  	spin_unlock_irqrestore(&mz->lru_lock, flags);
> >> >
> >> >do _NOT_ use yield() ever! unless you know what you're doing, and
> >> >probably not even then.
> >> >
> >> >NAK!
> >> Hmm, sorry. cond_resched() is ok ?
> >
> >depends on what you want to do, please explain what you're trying to do.
> >
> Sorry again.
> 
> This force_empty is called only in following situation
>  - there is no user threas in this cgroup.
>  - a user tries to rmdir() this cgroup or explicitly type
>    echo 1 > ../memory.force_empty.
> 
> force_empty() scans lru list of this cgroup and check page_cgroup on the
> list one by one. Because there are no tasks in this group, force_empty can
> see following racy condtions while scanning.
> 
>  - global lru tries to remove the page which pointed by page_cgroup 
>    and it is not-on-LRU.

So you either skip the page because it already got un-accounted, or you
retry because its state is already updated to some new state.

>  - the page is locked by someone.
>    ....find some lock contetion with invalidation/truncate.

Then you just contend the lock and get woken when you obtain?

>  - in later patch, page_cgroup can be on pagevec(i added) and we have to drain
>    it to remove from LRU.

Then unlock, drain, lock, no need to sleep some arbitrary amount of time
[0-inf).

> In above situation, force_empty() have to wait for some event proceeds.
> 
> Hmm...detecting busy situation in loop and sleep in out-side-of-loop
> is better ? Anyway, ok, I'll rewrite this.

The better solution is to wait for events in a non-polling fashion, for
example by using wait_event().

yield() might not actually wait at all, suppose you're the highest
priority FIFO task on the system - if you used yield and rely on someone
else to run you'll deadlock.

Also, depending on sysctl_sched_compat_yield, SCHED_OTHER tasks using
yield() can behave radically different.

> BTW, sched.c::yield() is for what purpose now ?

There are some (lagacy) users of yield, sadly they are all incorrect,
but removing them is non-trivial for various reasons.

The -rt kernel has 2 sites where yield() is the correct thing to do. In
both cases its where 2 SCHED_FIFO-99 tasks (migration and stop_machine)
depend on each-other.


WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
To: kamezawa.hiroyu@jp.fujitsu.com
Cc: linux-mm@kvack.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp, xemul@openvz.org,
	LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: Re: Re: [PATCH 4/13] memcg: force_empty moving account
Date: Mon, 22 Sep 2008 17:32:48 +0200	[thread overview]
Message-ID: <1222097568.16700.33.camel@lappy.programming.kicks-ass.net> (raw)
In-Reply-To: <28237198.1222095970373.kamezawa.hiroyu@jp.fujitsu.com>

On Tue, 2008-09-23 at 00:06 +0900, kamezawa.hiroyu@jp.fujitsu.com wrote:
> ----- Original Message -----
> >On Mon, 2008-09-22 at 23:50 +0900, kamezawa.hiroyu@jp.fujitsu.com wrote:
> >> ----- Original Message -----
> >> >> +			spin_lock_irqsave(&mz->lru_lock, flags);
> >> >> +		} else {
> >> >> +			unlock_page(page);
> >> >> +			put_page(page);
> >> >> +		}
> >> >> +		if (atomic_read(&mem->css.cgroup->count) > 0)
> >> >> +			break;
> >> >>  	}
> >> >>  	spin_unlock_irqrestore(&mz->lru_lock, flags);
> >> >
> >> >do _NOT_ use yield() ever! unless you know what you're doing, and
> >> >probably not even then.
> >> >
> >> >NAK!
> >> Hmm, sorry. cond_resched() is ok ?
> >
> >depends on what you want to do, please explain what you're trying to do.
> >
> Sorry again.
> 
> This force_empty is called only in following situation
>  - there is no user threas in this cgroup.
>  - a user tries to rmdir() this cgroup or explicitly type
>    echo 1 > ../memory.force_empty.
> 
> force_empty() scans lru list of this cgroup and check page_cgroup on the
> list one by one. Because there are no tasks in this group, force_empty can
> see following racy condtions while scanning.
> 
>  - global lru tries to remove the page which pointed by page_cgroup 
>    and it is not-on-LRU.

So you either skip the page because it already got un-accounted, or you
retry because its state is already updated to some new state.

>  - the page is locked by someone.
>    ....find some lock contetion with invalidation/truncate.

Then you just contend the lock and get woken when you obtain?

>  - in later patch, page_cgroup can be on pagevec(i added) and we have to drain
>    it to remove from LRU.

Then unlock, drain, lock, no need to sleep some arbitrary amount of time
[0-inf).

> In above situation, force_empty() have to wait for some event proceeds.
> 
> Hmm...detecting busy situation in loop and sleep in out-side-of-loop
> is better ? Anyway, ok, I'll rewrite this.

The better solution is to wait for events in a non-polling fashion, for
example by using wait_event().

yield() might not actually wait at all, suppose you're the highest
priority FIFO task on the system - if you used yield and rely on someone
else to run you'll deadlock.

Also, depending on sysctl_sched_compat_yield, SCHED_OTHER tasks using
yield() can behave radically different.

> BTW, sched.c::yield() is for what purpose now ?

There are some (lagacy) users of yield, sadly they are all incorrect,
but removing them is non-trivial for various reasons.

The -rt kernel has 2 sites where yield() is the correct thing to do. In
both cases its where 2 SCHED_FIFO-99 tasks (migration and stop_machine)
depend on each-other.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-09-22 15:33 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-22 10:51 [PATCH 0/13] memory cgroup updates v4 KAMEZAWA Hiroyuki
2008-09-22 10:51 ` KAMEZAWA Hiroyuki
2008-09-22 10:55 ` [PATCH 1/13] memcg: avoid accounting special mapping KAMEZAWA Hiroyuki
2008-09-22 10:55   ` KAMEZAWA Hiroyuki
2008-09-22 10:57 ` [PATCH 2/13] memcg: account fault-in swap under lock KAMEZAWA Hiroyuki
2008-09-22 10:57   ` KAMEZAWA Hiroyuki
2008-09-22 10:58 ` [PATCH 3/13] memcg: nolimit root cgroup KAMEZAWA Hiroyuki
2008-09-22 10:58   ` KAMEZAWA Hiroyuki
2008-09-22 11:00 ` [PATCH 4/13] memcg: force_empty moving account KAMEZAWA Hiroyuki
2008-09-22 11:00   ` KAMEZAWA Hiroyuki
2008-09-22 14:23   ` Peter Zijlstra
2008-09-22 14:23     ` Peter Zijlstra
2008-09-22 14:50     ` kamezawa.hiroyu
2008-09-22 14:50       ` kamezawa.hiroyu
2008-09-22 14:56       ` Peter Zijlstra
2008-09-22 14:56         ` Peter Zijlstra
2008-09-22 15:06         ` kamezawa.hiroyu
2008-09-22 15:06           ` kamezawa.hiroyu
2008-09-22 15:32           ` Peter Zijlstra [this message]
2008-09-22 15:32             ` Peter Zijlstra
2008-09-22 15:43             ` kamezawa.hiroyu
2008-09-22 15:43               ` kamezawa.hiroyu
2008-09-22 11:02 ` [PATCH 5/13] memcg: cleanup to make mapping null before unchage KAMEZAWA Hiroyuki
2008-09-22 11:02   ` KAMEZAWA Hiroyuki
2008-09-22 11:03 ` [PATCH 6/13] memcg: optimze per cpu accounting for memcg KAMEZAWA Hiroyuki
2008-09-22 11:03   ` KAMEZAWA Hiroyuki
2008-09-22 11:05 ` [PATCH 3.5/13] memcg: make page_cgroup flags to be atomic KAMEZAWA Hiroyuki
2008-09-22 11:05   ` KAMEZAWA Hiroyuki
2008-09-22 11:09 ` [PATCH 3.6/13] memcg: add function to move account KAMEZAWA Hiroyuki
2008-09-22 11:09   ` KAMEZAWA Hiroyuki
2008-09-24  6:50   ` Daisuke Nishimura
2008-09-24  6:50     ` Daisuke Nishimura
2008-09-24  7:11     ` KAMEZAWA Hiroyuki
2008-09-24  7:11       ` KAMEZAWA Hiroyuki
2008-09-22 11:12 ` [PATCH 9/13] memcg: lookup page cgroup (and remove pointer from struct page) KAMEZAWA Hiroyuki
2008-09-22 11:12   ` KAMEZAWA Hiroyuki
2008-09-22 14:52   ` Dave Hansen
2008-09-22 14:52     ` Dave Hansen
2008-09-22 15:14     ` kamezawa.hiroyu
2008-09-22 15:14       ` kamezawa.hiroyu
2008-09-22 15:47       ` Dave Hansen
2008-09-22 15:47         ` Dave Hansen
2008-09-22 15:57         ` kamezawa.hiroyu
2008-09-22 15:57           ` kamezawa.hiroyu
2008-09-22 16:10           ` Dave Hansen
2008-09-22 16:10             ` Dave Hansen
2008-09-22 17:34             ` kamezawa.hiroyu
2008-09-22 17:34               ` kamezawa.hiroyu
2008-09-22 15:47   ` Peter Zijlstra
2008-09-22 15:47     ` Peter Zijlstra
2008-09-22 16:04     ` kamezawa.hiroyu
2008-09-22 16:04       ` kamezawa.hiroyu
2008-09-22 16:06       ` Peter Zijlstra
2008-09-22 16:06         ` Peter Zijlstra
2008-09-23 23:48   ` KAMEZAWA Hiroyuki
2008-09-23 23:48     ` KAMEZAWA Hiroyuki
2008-09-24  2:09     ` Balbir Singh
2008-09-24  2:09       ` Balbir Singh
2008-09-24  3:09       ` KAMEZAWA Hiroyuki
2008-09-24  3:09         ` KAMEZAWA Hiroyuki
2008-09-24  8:31         ` Balbir Singh
2008-09-24  8:31           ` Balbir Singh
2008-09-24  8:46           ` KAMEZAWA Hiroyuki
2008-09-24  8:46             ` KAMEZAWA Hiroyuki
2008-09-22 11:13 ` [PATCH 10/13] memcg: page_cgroup look aside table KAMEZAWA Hiroyuki
2008-09-22 11:13   ` KAMEZAWA Hiroyuki
2008-09-22 11:17 ` [PATCH 11/13] memcg: lazy LRU free (NEW) KAMEZAWA Hiroyuki
2008-09-22 11:17   ` KAMEZAWA Hiroyuki
2008-09-22 11:22 ` [PATCH 12/13] memcg: lazy LRU add KAMEZAWA Hiroyuki
2008-09-22 11:22   ` KAMEZAWA Hiroyuki
2008-09-22 11:24 ` [PATCH 13/13] memcg: swap accounting fix KAMEZAWA Hiroyuki
2008-09-22 11:24   ` KAMEZAWA Hiroyuki
2008-09-22 11:28 ` [PATCH 0/13] memory cgroup updates v4 KAMEZAWA Hiroyuki
2008-09-22 11:28   ` KAMEZAWA Hiroyuki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1222097568.16700.33.camel@lappy.programming.kicks-ass.net \
    --to=a.p.zijlstra@chello.nl \
    --cc=balbir@linux.vnet.ibm.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@elte.hu \
    --cc=nishimura@mxp.nes.nec.co.jp \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.