Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Hirokazu Takahashi <taka@valinux.co.jp>,
	linux-mm@kvack.org, yamamoto@valinux.co.jp, riel@redhat.com
Subject: Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.
Date: Fri, 22 Feb 2008 16:44:49 +0530	[thread overview]
Message-ID: <47BEAEA9.10801@linux.vnet.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0802220916290.18145@blonde.site>

Hugh Dickins wrote:
> On Wed, 20 Feb 2008, KAMEZAWA Hiroyuki wrote:
>> On Wed, 20 Feb 2008 15:27:53 +0900 (JST)
>> Hirokazu Takahashi <taka@valinux.co.jp> wrote:
>>
>>>> Unlike the unsafeties of force_empty, this is liable to hit anyone
>>>> running with MEM_CONT compiled in, they don't have to be consciously
>>>> using mem_cgroups at all.
>>> As for force_empty, though this may not be the main topic here,
>>> mem_cgroup_force_empty_list() can be implemented simpler.
>>> It is possible to make the function just call mem_cgroup_uncharge_page()
>>> instead of releasing page_cgroups by itself. The tips is to call get_page()
>>> before invoking mem_cgroup_uncharge_page() so the page won't be released
>>> during this function.
>>>
>>> Kamezawa-san, you may want look into the attached patch.
>>> I think you will be free from the weired complexity here.
>>>
>>> This code can be optimized but it will be enough since this function
>>> isn't critical.
>>>
>>> Thanks.
>>>
>>>
>>> Signed-off-by: Hirokazu Takahashi <taka@vallinux.co.jp>
> 
> Hirokazu-san, may I change that to <taka@valinux.co.jp>?
> 
>> ...
>>
>> Seems simple. But isn't there following case ?
>>
>> ==in force_empty==
>>
>> pc1 = list_entry(list->prev, struct page_cgroup, lru);
>> page = pc1->page;
>> get_page(page)
>> spin_unlock_irqrestore(&mz->lru_lock, flags)
>> mem_cgroup_uncharge_page(page);
>> 	=> lock_page_cgroup(page);
>> 		=> pc2 = page_get_page_cgroup(page);
>>
>> Here, pc2 != pc1 and pc2->mem_cgroup != pc1->mem_cgroup.
>> maybe need some check.
>>
>> But maybe yours is good direction.
> 
> I like Hirokazu-san's approach very much.
> 
> Although I eventually completed the locking for my mem_cgroup_move_lists
> (SLAB_DESTROY_BY_RCU didn't help there, actually, because it left a
> possibility that the same page_cgroup got reused for the same page
> but a different mem_cgroup: in which case we got the wrong spinlock),
> his reversal in force_empty lets us use straightforward locking in
> mem_cgroup_move_lists (though it still has to try_lock_page_cgroup).
> So I want to take Hirokazu-san's patch into my bugfix and cleanup
> series, where it's testing out fine so far.
> 
> Regarding your point above, Kamezawa-san: you're surely right that can
> happen, but is it a case that we actually need to avoid?  Aren't we
> entitled to take the page out of pc2->mem_cgroup there, because if any
> such race occurs, it could easily have happened the other way around,
> removing the page from pc1->mem_cgroup just after pc2->mem_cgroup
> touched it, so ending up with that page in neither?
> 
> I'd just prefer not to handle it as you did in your patch, because
> earlier in my series I'd removed the mem_cgroup_uncharge level (which
> just gets in the way, requiring a silly lock_page_cgroup at the end
> just to match the unlock_page_cgroup at the mem_cgroup_uncharge_page
> level), and don't much want to add it back in.
> 

I've been looking through the code time and again, looking for races. I will try
and build a sketch of all the functions and dependencies tonight. One thing that
struck me was that making page_get_page_cgroup() call lock_page_cgroup()
internally might potentially fix a lot of racy call sites. I was thinking of
splitting page_get_page_cgroup into __page_get_page_cgroup() <--> just get the
pc without lock and page_get_page_cgroup(), that holds the lock and then returns pc.

Of course, this is just a thought process. I am yet to write the code and look
at the results.

> While we're thinking of races...
> 
> It seemed to me that mem_cgroup_uncharge should be doing its css_put
> after its __mem_cgroup_remove_list: doesn't doing it before leave open
> a slight danger that the struct mem_cgroup could be freed before the
> remove_list?  Perhaps there's some other refcounting that makes that
> impossible, but I've felt safer shifting those around.
> 

Yes, it's a good idea to move it down.

> Hugh


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-02-22 11:19 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-19 12:54 [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races KAMEZAWA Hiroyuki
2008-02-19 15:40 ` Hugh Dickins
2008-02-20  1:03   ` KAMEZAWA Hiroyuki
2008-02-20  4:14     ` Hugh Dickins
2008-02-20  4:37       ` KAMEZAWA Hiroyuki
2008-02-20  4:39         ` Hugh Dickins
2008-02-20  4:41           ` Balbir Singh
2008-02-20  6:40         ` Balbir Singh
2008-02-20  7:23           ` KAMEZAWA Hiroyuki
2008-02-20  3:14   ` KAMEZAWA Hiroyuki
2008-02-20  3:37     ` YAMAMOTO Takashi
2008-02-20  4:13       ` KAMEZAWA Hiroyuki
2008-02-20  4:32     ` Hugh Dickins
2008-02-20  5:57   ` Balbir Singh
2008-02-20  9:58     ` Hirokazu Takahashi
2008-02-20 10:06       ` Paul Menage
2008-02-20 10:11         ` Balbir Singh
2008-02-20 10:18           ` Paul Menage
2008-02-20 10:55             ` Balbir Singh
2008-02-20 11:21               ` KAMEZAWA Hiroyuki
2008-02-20 11:18                 ` Balbir Singh
2008-02-20 11:32                   ` KAMEZAWA Hiroyuki
2008-02-20 11:34                     ` Balbir Singh
2008-02-20 11:44                       ` Paul Menage
2008-02-20 11:41                   ` Paul Menage
2008-02-20 11:36       ` Balbir Singh
2008-02-20 11:55         ` Paul Menage
2008-02-21  2:49         ` Hirokazu Takahashi
2008-02-21  6:35           ` KAMEZAWA Hiroyuki
2008-02-21  9:07             ` Hirokazu Takahashi
2008-02-21  9:21               ` KAMEZAWA Hiroyuki
2008-02-21  9:28                 ` Balbir Singh
2008-02-21  9:44                   ` KAMEZAWA Hiroyuki
2008-02-22  3:31                     ` [RFC] Block I/O Cgroup Hirokazu Takahashi
2008-02-22  5:05                       ` KAMEZAWA Hiroyuki
2008-02-22  5:45                         ` Hirokazu Takahashi
2008-02-21  9:25               ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Balbir Singh
2008-02-20  6:27   ` Hirokazu Takahashi
2008-02-20  6:50     ` KAMEZAWA Hiroyuki
2008-02-20  8:32       ` Clean up force_empty (Was Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.) KAMEZAWA Hiroyuki
2008-02-20 10:07         ` Clean up force_empty Hirokazu Takahashi
2008-02-22  9:24       ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Hugh Dickins
2008-02-22 10:07         ` KAMEZAWA Hiroyuki
2008-02-22 10:25           ` Hugh Dickins
2008-02-22 10:34             ` KAMEZAWA Hiroyuki
2008-02-22 10:50         ` Hirokazu Takahashi
2008-02-22 11:14         ` Balbir Singh [this message]
2008-02-22 12:00           ` Hugh Dickins
2008-02-22 12:28             ` Balbir Singh
2008-02-22 12:53               ` Hugh Dickins
2008-02-25  3:18                 ` Balbir Singh
2008-02-19 15:54 ` kamezawa.hiroyu
2008-02-19 16:26   ` Hugh Dickins
2008-02-20  1:55     ` KAMEZAWA Hiroyuki
2008-02-20  2:05       ` YAMAMOTO Takashi
2008-02-20  2:15         ` KAMEZAWA Hiroyuki
2008-02-20  2:32           ` YAMAMOTO Takashi
2008-02-20  4:27             ` Hugh Dickins
2008-02-20  6:38     ` Balbir Singh
2008-02-20 11:00       ` Hugh Dickins
2008-02-20 11:32         ` Balbir Singh
2008-02-20 14:19           ` Hugh Dickins
2008-02-20  5:00 ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47BEAEA9.10801@linux.vnet.ibm.com \
    --to=balbir@linux.vnet.ibm.com \
    --cc=hugh@veritas.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-mm@kvack.org \
    --cc=riel@redhat.com \
    --cc=taka@valinux.co.jp \
    --cc=yamamoto@valinux.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).