From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: Hugh Dickins <hugh@veritas.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
Hirokazu Takahashi <taka@valinux.co.jp>,
linux-mm@kvack.org, yamamoto@valinux.co.jp, riel@redhat.com
Subject: Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.
Date: Fri, 22 Feb 2008 16:44:49 +0530 [thread overview]
Message-ID: <47BEAEA9.10801@linux.vnet.ibm.com> (raw)
In-Reply-To: <Pine.LNX.4.64.0802220916290.18145@blonde.site>
Hugh Dickins wrote:
> On Wed, 20 Feb 2008, KAMEZAWA Hiroyuki wrote:
>> On Wed, 20 Feb 2008 15:27:53 +0900 (JST)
>> Hirokazu Takahashi <taka@valinux.co.jp> wrote:
>>
>>>> Unlike the unsafeties of force_empty, this is liable to hit anyone
>>>> running with MEM_CONT compiled in, they don't have to be consciously
>>>> using mem_cgroups at all.
>>> As for force_empty, though this may not be the main topic here,
>>> mem_cgroup_force_empty_list() can be implemented simpler.
>>> It is possible to make the function just call mem_cgroup_uncharge_page()
>>> instead of releasing page_cgroups by itself. The tips is to call get_page()
>>> before invoking mem_cgroup_uncharge_page() so the page won't be released
>>> during this function.
>>>
>>> Kamezawa-san, you may want look into the attached patch.
>>> I think you will be free from the weired complexity here.
>>>
>>> This code can be optimized but it will be enough since this function
>>> isn't critical.
>>>
>>> Thanks.
>>>
>>>
>>> Signed-off-by: Hirokazu Takahashi <taka@vallinux.co.jp>
>
> Hirokazu-san, may I change that to <taka@valinux.co.jp>?
>
>> ...
>>
>> Seems simple. But isn't there following case ?
>>
>> ==in force_empty==
>>
>> pc1 = list_entry(list->prev, struct page_cgroup, lru);
>> page = pc1->page;
>> get_page(page)
>> spin_unlock_irqrestore(&mz->lru_lock, flags)
>> mem_cgroup_uncharge_page(page);
>> => lock_page_cgroup(page);
>> => pc2 = page_get_page_cgroup(page);
>>
>> Here, pc2 != pc1 and pc2->mem_cgroup != pc1->mem_cgroup.
>> maybe need some check.
>>
>> But maybe yours is good direction.
>
> I like Hirokazu-san's approach very much.
>
> Although I eventually completed the locking for my mem_cgroup_move_lists
> (SLAB_DESTROY_BY_RCU didn't help there, actually, because it left a
> possibility that the same page_cgroup got reused for the same page
> but a different mem_cgroup: in which case we got the wrong spinlock),
> his reversal in force_empty lets us use straightforward locking in
> mem_cgroup_move_lists (though it still has to try_lock_page_cgroup).
> So I want to take Hirokazu-san's patch into my bugfix and cleanup
> series, where it's testing out fine so far.
>
> Regarding your point above, Kamezawa-san: you're surely right that can
> happen, but is it a case that we actually need to avoid? Aren't we
> entitled to take the page out of pc2->mem_cgroup there, because if any
> such race occurs, it could easily have happened the other way around,
> removing the page from pc1->mem_cgroup just after pc2->mem_cgroup
> touched it, so ending up with that page in neither?
>
> I'd just prefer not to handle it as you did in your patch, because
> earlier in my series I'd removed the mem_cgroup_uncharge level (which
> just gets in the way, requiring a silly lock_page_cgroup at the end
> just to match the unlock_page_cgroup at the mem_cgroup_uncharge_page
> level), and don't much want to add it back in.
>
I've been looking through the code time and again, looking for races. I will try
and build a sketch of all the functions and dependencies tonight. One thing that
struck me was that making page_get_page_cgroup() call lock_page_cgroup()
internally might potentially fix a lot of racy call sites. I was thinking of
splitting page_get_page_cgroup into __page_get_page_cgroup() <--> just get the
pc without lock and page_get_page_cgroup(), that holds the lock and then returns pc.
Of course, this is just a thought process. I am yet to write the code and look
at the results.
> While we're thinking of races...
>
> It seemed to me that mem_cgroup_uncharge should be doing its css_put
> after its __mem_cgroup_remove_list: doesn't doing it before leave open
> a slight danger that the struct mem_cgroup could be freed before the
> remove_list? Perhaps there's some other refcounting that makes that
> impossible, but I've felt safer shifting those around.
>
Yes, it's a good idea to move it down.
> Hugh
--
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-02-22 11:19 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-19 12:54 [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races KAMEZAWA Hiroyuki
2008-02-19 15:40 ` Hugh Dickins
2008-02-19 15:54 ` kamezawa.hiroyu
2008-02-19 16:26 ` Hugh Dickins
2008-02-20 1:55 ` KAMEZAWA Hiroyuki
2008-02-20 2:05 ` YAMAMOTO Takashi
2008-02-20 2:15 ` KAMEZAWA Hiroyuki
2008-02-20 2:32 ` YAMAMOTO Takashi
2008-02-20 4:27 ` Hugh Dickins
2008-02-20 6:38 ` Balbir Singh
2008-02-20 11:00 ` Hugh Dickins
2008-02-20 11:32 ` Balbir Singh
2008-02-20 14:19 ` Hugh Dickins
2008-02-20 1:03 ` KAMEZAWA Hiroyuki
2008-02-20 4:14 ` Hugh Dickins
2008-02-20 4:37 ` KAMEZAWA Hiroyuki
2008-02-20 4:39 ` Hugh Dickins
2008-02-20 4:41 ` Balbir Singh
2008-02-20 6:40 ` Balbir Singh
2008-02-20 7:23 ` KAMEZAWA Hiroyuki
2008-02-20 3:14 ` KAMEZAWA Hiroyuki
2008-02-20 3:37 ` YAMAMOTO Takashi
2008-02-20 4:13 ` KAMEZAWA Hiroyuki
2008-02-20 4:32 ` Hugh Dickins
2008-02-20 5:57 ` Balbir Singh
2008-02-20 9:58 ` Hirokazu Takahashi
2008-02-20 10:06 ` Paul Menage
2008-02-20 10:11 ` Balbir Singh
2008-02-20 10:18 ` Paul Menage
2008-02-20 10:55 ` Balbir Singh
2008-02-20 11:21 ` KAMEZAWA Hiroyuki
2008-02-20 11:18 ` Balbir Singh
2008-02-20 11:32 ` KAMEZAWA Hiroyuki
2008-02-20 11:34 ` Balbir Singh
2008-02-20 11:44 ` Paul Menage
2008-02-20 11:41 ` Paul Menage
2008-02-20 11:36 ` Balbir Singh
2008-02-20 11:55 ` Paul Menage
2008-02-21 2:49 ` Hirokazu Takahashi
2008-02-21 6:35 ` KAMEZAWA Hiroyuki
2008-02-21 9:07 ` Hirokazu Takahashi
2008-02-21 9:21 ` KAMEZAWA Hiroyuki
2008-02-21 9:28 ` Balbir Singh
2008-02-21 9:44 ` KAMEZAWA Hiroyuki
2008-02-22 3:31 ` [RFC] Block I/O Cgroup Hirokazu Takahashi
2008-02-22 5:05 ` KAMEZAWA Hiroyuki
2008-02-22 5:45 ` Hirokazu Takahashi
2008-02-21 9:25 ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Balbir Singh
2008-02-20 6:27 ` Hirokazu Takahashi
2008-02-20 6:50 ` KAMEZAWA Hiroyuki
2008-02-20 8:32 ` Clean up force_empty (Was Re: [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races.) KAMEZAWA Hiroyuki
2008-02-20 10:07 ` Clean up force_empty Hirokazu Takahashi
2008-02-22 9:24 ` [RFC][PATCH] Clarify mem_cgroup lock handling and avoid races Hugh Dickins
2008-02-22 10:07 ` KAMEZAWA Hiroyuki
2008-02-22 10:25 ` Hugh Dickins
2008-02-22 10:34 ` KAMEZAWA Hiroyuki
2008-02-22 10:50 ` Hirokazu Takahashi
2008-02-22 11:14 ` Balbir Singh [this message]
2008-02-22 12:00 ` Hugh Dickins
2008-02-22 12:28 ` Balbir Singh
2008-02-22 12:53 ` Hugh Dickins
2008-02-25 3:18 ` Balbir Singh
2008-02-20 5:00 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47BEAEA9.10801@linux.vnet.ibm.com \
--to=balbir@linux.vnet.ibm.com \
--cc=hugh@veritas.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=linux-mm@kvack.org \
--cc=riel@redhat.com \
--cc=taka@valinux.co.jp \
--cc=yamamoto@valinux.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.