From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Hirokazu Takahashi <taka@valinux.co.jp>,
linux-mm@kvack.org, yamamoto@valinux.co.jp, riel@redhat.com
Subject: Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.
Date: Fri, 22 Feb 2008 16:44:49 +0530 [thread overview]
Message-ID: <47BEAEA9.10801@linux.vnet.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0802220916290.18145@blonde.site>
Hugh Dickins wrote:
> On Wed, 20 Feb 2008, KAMEZAWA Hiroyuki wrote:
>> On Wed, 20 Feb 2008 15:27:53 +0900 (JST)
>> Hirokazu Takahashi <taka@valinux.co.jp> wrote:
>>
>>>> Unlike the unsafeties of force_empty, this is liable to hit anyone
>>>> running with MEM_CONT compiled in, they don't have to be consciously
>>>> using mem_cgroups at all.
>>> As for force_empty, though this may not be the main topic here,
>>> mem_cgroup_force_empty_list() can be implemented simpler.
>>> It is possible to make the function just call mem_cgroup_uncharge_page()
>>> instead of releasing page_cgroups by itself. The tips is to call get_page()
>>> before invoking mem_cgroup_uncharge_page() so the page won't be released
>>> during this function.
>>>
>>> Kamezawa-san, you may want look into the attached patch.
>>> I think you will be free from the weired complexity here.
>>>
>>> This code can be optimized but it will be enough since this function
>>> isn't critical.
>>>
>>> Thanks.
>>>
>>>
>>> Signed-off-by: Hirokazu Takahashi <taka@vallinux.co.jp>
>
> Hirokazu-san, may I change that to <taka@valinux.co.jp>?
>
>> ...
>>
>> Seems simple. But isn't there following case ?
>>
>> ==in force_empty==
>>
>> pc1 = list_entry(list->prev, struct page_cgroup, lru);
>> page = pc1->page;
>> get_page(page)
>> spin_unlock_irqrestore(&mz->lru_lock, flags)
>> mem_cgroup_uncharge_page(page);
>> => lock_page_cgroup(page);
>> => pc2 = page_get_page_cgroup(page);
>>
>> Here, pc2 != pc1 and pc2->mem_cgroup != pc1->mem_cgroup.
>> maybe need some check.
>>
>> But maybe yours is good direction.
>
> I like Hirokazu-san's approach very much.
>
> Although I eventually completed the locking for my mem_cgroup_move_lists
> (SLAB_DESTROY_BY_RCU didn't help there, actually, because it left a
> possibility that the same page_cgroup got reused for the same page
> but a different mem_cgroup: in which case we got the wrong spinlock),
> his reversal in force_empty lets us use straightforward locking in
> mem_cgroup_move_lists (though it still has to try_lock_page_cgroup).
> So I want to take Hirokazu-san's patch into my bugfix and cleanup
> series, where it's testing out fine so far.
>
> Regarding your point above, Kamezawa-san: you're surely right that can
> happen, but is it a case that we actually need to avoid? Aren't we
> entitled to take the page out of pc2->mem_cgroup there, because if any
> such race occurs, it could easily have happened the other way around,
> removing the page from pc1->mem_cgroup just after pc2->mem_cgroup
> touched it, so ending up with that page in neither?
>
> I'd just prefer not to handle it as you did in your patch, because
> earlier in my series I'd removed the mem_cgroup_uncharge level (which
> just gets in the way, requiring a silly lock_page_cgroup at the end
> just to match the unlock_page_cgroup at the mem_cgroup_uncharge_page
> level), and don't much want to add it back in.
>
I've been looking through the code time and again, looking for races. I will try
and build a sketch of all the functions and dependencies tonight. One thing that
struck me was that making page_get_page_cgroup() call lock_page_cgroup()
internally might potentially fix a lot of racy call sites. I was thinking of
splitting page_get_page_cgroup into __page_get_page_cgroup() <--> just get the
pc without lock and page_get_page_cgroup(), that holds the lock and then returns pc.
Of course, this is just a thought process. I am yet to write the code and look
at the results.
> While we're thinking of races...
>
> It seemed to me that mem_cgroup_uncharge should be doing its css_put
> after its __mem_cgroup_remove_list: doesn't doing it before leave open
> a slight danger that the struct mem_cgroup could be freed before the
> remove_list? Perhaps there's some other refcounting that makes that
> impossible, but I've felt safer shifting those around.
>
Yes, it's a good idea to move it down.
> Hugh
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-02-22 11:19 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-19 12:54 [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races KAMEZAWA Hiroyuki
2008-02-19 15:40 ` Hugh Dickins
2008-02-20 1:03 ` KAMEZAWA Hiroyuki
2008-02-20 4:14 ` Hugh Dickins
2008-02-20 4:37 ` KAMEZAWA Hiroyuki
2008-02-20 4:39 ` Hugh Dickins
2008-02-20 4:41 ` Balbir Singh
2008-02-20 6:40 ` Balbir Singh
2008-02-20 7:23 ` KAMEZAWA Hiroyuki
2008-02-20 3:14 ` KAMEZAWA Hiroyuki
2008-02-20 3:37 ` YAMAMOTO Takashi
2008-02-20 4:13 ` KAMEZAWA Hiroyuki
2008-02-20 4:32 ` Hugh Dickins
2008-02-20 5:57 ` Balbir Singh
2008-02-20 9:58 ` Hirokazu Takahashi
2008-02-20 10:06 ` Paul Menage
2008-02-20 10:11 ` Balbir Singh
2008-02-20 10:18 ` Paul Menage
2008-02-20 10:55 ` Balbir Singh
2008-02-20 11:21 ` KAMEZAWA Hiroyuki
2008-02-20 11:18 ` Balbir Singh
2008-02-20 11:32 ` KAMEZAWA Hiroyuki
2008-02-20 11:34 ` Balbir Singh
2008-02-20 11:44 ` Paul Menage
2008-02-20 11:41 ` Paul Menage
2008-02-20 11:36 ` Balbir Singh
2008-02-20 11:55 ` Paul Menage
2008-02-21 2:49 ` Hirokazu Takahashi
2008-02-21 6:35 ` KAMEZAWA Hiroyuki
2008-02-21 9:07 ` Hirokazu Takahashi
2008-02-21 9:21 ` KAMEZAWA Hiroyuki
2008-02-21 9:28 ` Balbir Singh
2008-02-21 9:44 ` KAMEZAWA Hiroyuki
2008-02-22 3:31 ` [RFC] Block I/O Cgroup Hirokazu Takahashi
2008-02-22 5:05 ` KAMEZAWA Hiroyuki
2008-02-22 5:45 ` Hirokazu Takahashi
2008-02-21 9:25 ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Balbir Singh
2008-02-20 6:27 ` Hirokazu Takahashi
2008-02-20 6:50 ` KAMEZAWA Hiroyuki
2008-02-20 8:32 ` Clean up force_empty (Was Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.) KAMEZAWA Hiroyuki
2008-02-20 10:07 ` Clean up force_empty Hirokazu Takahashi
2008-02-22 9:24 ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Hugh Dickins
2008-02-22 10:07 ` KAMEZAWA Hiroyuki
2008-02-22 10:25 ` Hugh Dickins
2008-02-22 10:34 ` KAMEZAWA Hiroyuki
2008-02-22 10:50 ` Hirokazu Takahashi
2008-02-22 11:14 ` Balbir Singh [this message]
2008-02-22 12:00 ` Hugh Dickins
2008-02-22 12:28 ` Balbir Singh
2008-02-22 12:53 ` Hugh Dickins
2008-02-25 3:18 ` Balbir Singh
2008-02-19 15:54 ` kamezawa.hiroyu
2008-02-19 16:26 ` Hugh Dickins
2008-02-20 1:55 ` KAMEZAWA Hiroyuki
2008-02-20 2:05 ` YAMAMOTO Takashi
2008-02-20 2:15 ` KAMEZAWA Hiroyuki
2008-02-20 2:32 ` YAMAMOTO Takashi
2008-02-20 4:27 ` Hugh Dickins
2008-02-20 6:38 ` Balbir Singh
2008-02-20 11:00 ` Hugh Dickins
2008-02-20 11:32 ` Balbir Singh
2008-02-20 14:19 ` Hugh Dickins
2008-02-20 5:00 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47BEAEA9.10801@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=taka@valinux.co.jp \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).