All of lore.kernel.org
 help / color / mirror / Atom feed
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	MinChan Kim <minchan.kim@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: Race condition between putback_lru_page and mem_cgroup_move_list
Date: Mon, 04 Aug 2008 23:22:56 +0530	[thread overview]
Message-ID: <489741F8.2080104@linux.vnet.ibm.com> (raw)
In-Reply-To: <2f11576a0808040937y70f274e0j32f6b9c98b0f992d@mail.gmail.com>

KOSAKI Motohiro wrote:
> Hi
> 
>>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
>>> I am not sure that it was already known problem.
>>>
>>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
>>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
>> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
>> reading thru' the code, I can't see why it requires this.  But then it's
>> Monday, here...
> 
> I also think zone's lru lock is unnecessary.
> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
> 

We need zone LRU lock, since the reclaim paths hold them. Not sure if I
understand why you call zone's LRU lock unnecessary, could you elaborate please?

>  >> But we cannot safely get to page_cgroup without it, so just try_lock it:
> 
> if my assumption is true, comment modifying is better.
> 
> 
>>> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
>>> while doing list iteration.
>>>
>>> Do we have to use pagevec ?
>> This shouldn't be necessary, IMO.  putback_lru_page() is used as
>> follows:
>>
>> 1) in vmscan.c [shrink_*_list()] when an unevictable page is
>> encountered.  This should be relatively rare.  Once vmscan sees an
>> unevictable page, it parks it on the unevictable lru list where it
>> [vmscan] won't see the page again until it becomes reclaimable.
>>
>> 2) as a replacement for move_to_lru() in page migration as the inverse
>> to isolate_lru_page().  We did this to catch patches that became
>> unevictable or, more importantly, evictable while page migration held
>> them isolated.  move_to_lru() already grabbed and released the zone lru
>> lock on each page migrated.
>>
>> 3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
>> "mlocked pages are unevictable" series.  This one can result in a storm
>> of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
>> mlockall() of a task with a lot of mapped address space.  Again, this is
>> probably a very rare event--unless you're stressing [stressing over?]
>> mlock(), as I've been doing :)--and often involves a major fault [page
>> allocation], per page anyway.
>>
>> I originally did have a pagevec for the unevictable lru but it
>> complicated ensuring that we don't strand evictable pages on the
>> unevictable list.  See the retry logic in putback_lru_page().
>>
>> As for the !UNEVICTABLE_LRU version, the only place this should be
>> called is from page migration as none of the other call sites are
>> compiled in or reachable when !UNEVICTABLE_LRU.
>>
>> Thoughts?
> 
> I think both opinion is correct.
> unevictable lru related code doesn't require pagevec.
> 
> but mem_cgroup_move_lists is used by active/inactive list transition too.
> then, pagevec is necessary for keeping reclaim throuput.
> 

It's on my TODO list. I hope to get to it soon.

> Kim-san, Thank you nice point out!
> I queued this fix to my TODO list.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL


WARNING: multiple messages have this Message-ID (diff)
From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>,
	MinChan Kim <minchan.kim@gmail.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-mm <linux-mm@kvack.org>, Rik van Riel <riel@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: Race condition between putback_lru_page and mem_cgroup_move_list
Date: Mon, 04 Aug 2008 23:22:56 +0530	[thread overview]
Message-ID: <489741F8.2080104@linux.vnet.ibm.com> (raw)
In-Reply-To: <2f11576a0808040937y70f274e0j32f6b9c98b0f992d@mail.gmail.com>

KOSAKI Motohiro wrote:
> Hi
> 
>>> I think this is a race condition if mem_cgroup_move_lists's comment isn't right.
>>> I am not sure that it was already known problem.
>>>
>>> mem_cgroup_move_lists assume the appropriate zone's lru lock is already held.
>>> but putback_lru_page calls mem_cgroup_move_lists without holding lru_lock.
>> Hmmm, the comment on mem_cgroup_move_lists() does say this.  Although,
>> reading thru' the code, I can't see why it requires this.  But then it's
>> Monday, here...
> 
> I also think zone's lru lock is unnecessary.
> So, I guess below "it" indicate lock_page_cgroup, not zone lru lock.
> 

We need zone LRU lock, since the reclaim paths hold them. Not sure if I
understand why you call zone's LRU lock unnecessary, could you elaborate please?

>  >> But we cannot safely get to page_cgroup without it, so just try_lock it:
> 
> if my assumption is true, comment modifying is better.
> 
> 
>>> Repeatedly, spin_[un/lock]_irq use in mem_cgroup_move_list have a big overhead
>>> while doing list iteration.
>>>
>>> Do we have to use pagevec ?
>> This shouldn't be necessary, IMO.  putback_lru_page() is used as
>> follows:
>>
>> 1) in vmscan.c [shrink_*_list()] when an unevictable page is
>> encountered.  This should be relatively rare.  Once vmscan sees an
>> unevictable page, it parks it on the unevictable lru list where it
>> [vmscan] won't see the page again until it becomes reclaimable.
>>
>> 2) as a replacement for move_to_lru() in page migration as the inverse
>> to isolate_lru_page().  We did this to catch patches that became
>> unevictable or, more importantly, evictable while page migration held
>> them isolated.  move_to_lru() already grabbed and released the zone lru
>> lock on each page migrated.
>>
>> 3) In m[un]lock_vma_page() and clear_page_mlock(), new with in the
>> "mlocked pages are unevictable" series.  This one can result in a storm
>> of zone lru traffic--e.g., mlock()ing or munlocking() a large segment or
>> mlockall() of a task with a lot of mapped address space.  Again, this is
>> probably a very rare event--unless you're stressing [stressing over?]
>> mlock(), as I've been doing :)--and often involves a major fault [page
>> allocation], per page anyway.
>>
>> Ii>>? originally did have a pagevec for the unevictable lru but it
>> complicated ensuring that we don't strand evictable pages on the
>> unevictable list.  See the retry logic in putback_lru_page().
>>
>> As for the !UNEVICTABLE_LRU version, the only place this should be
>> called is from page migration as none of the other call sites are
>> compiled in or reachable when !UNEVICTABLE_LRU.
>>
>> Thoughts?
> 
> I think both opinion is correct.
> unevictable lru related code doesn't require pagevec.
> 
> but mem_cgroup_move_lists is used by active/inactive list transition too.
> then, pagevec is necessary for keeping reclaim throuput.
> 

It's on my TODO list. I hope to get to it soon.

> Kim-san, Thank you nice point out!
> I queued this fix to my TODO list.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2008-08-04 17:53 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-04 14:36 Race condition between putback_lru_page and mem_cgroup_move_list MinChan Kim
2008-08-04 14:36 ` MinChan Kim
2008-08-04 15:31 ` Lee Schermerhorn
2008-08-04 15:31   ` Lee Schermerhorn
2008-08-04 16:37   ` KOSAKI Motohiro
2008-08-04 16:37     ` KOSAKI Motohiro
2008-08-04 17:52     ` Balbir Singh [this message]
2008-08-04 17:52       ` Balbir Singh
2008-08-04 23:52       ` MinChan Kim
2008-08-04 23:52         ` MinChan Kim
2008-08-05  3:49       ` kamezawa.hiroyu
2008-08-05  3:49         ` kamezawa.hiroyu
2008-08-05  6:20       ` KOSAKI Motohiro
2008-08-05  6:20         ` KOSAKI Motohiro
2008-08-05 10:46         ` Balbir Singh
2008-08-05 10:46           ` Balbir Singh
2008-08-05 11:19           ` KOSAKI Motohiro
2008-08-05 11:19             ` KOSAKI Motohiro
2008-08-05 11:38             ` KOSAKI Motohiro
2008-08-05 11:38               ` KOSAKI Motohiro
2008-08-06 16:53       ` Lee Schermerhorn
2008-08-06 16:53         ` Lee Schermerhorn
2008-08-07 11:00         ` KOSAKI Motohiro
2008-08-07 11:00           ` KOSAKI Motohiro
2008-08-07 11:27           ` Lee Schermerhorn
2008-08-07 11:27             ` Lee Schermerhorn
2008-08-07 12:42         ` Balbir Singh
2008-08-07 12:42           ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=489741F8.2080104@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=Lee.Schermerhorn@hp.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=riel@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.