Re: [RFC][PATCH] another swap controller for cgroup

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Balbir Singh <balbir-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: YAMAMOTO Takashi <yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org>
Cc: minoura-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org,
	nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org,
	containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org,
	hugh-DTz5qymZ9yRBDgjK7y7TUQ@public.gmane.org
Subject: Re: [RFC][PATCH] another swap controller for cgroup
Date: Thu, 08 May 2008 21:13:50 +0530	[thread overview]
Message-ID: <48231FB6.7000206@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080507055045.4ACBD5A0A-Pcsii4f/SVk@public.gmane.org>

YAMAMOTO Takashi wrote:
> hi,
> 
>> Hi, Thanks for the patches and your patience. I've just applied your
>> patches on top of 2.6.25-mm1 (it had a minor reject, that I've fixed).
>> I am building and testing the patches along with KAMEZAWA-San's low
>> overhead patches.
> 
> thanks.
> 
>>> +#include <linux/err.h>
>>> +#include <linux/cgroup.h>
>>> +#include <linux/hugetlb.h>
>> My powerpc build fails, we need to move hugetlb.h down to the bottom
> 
> what's the error message?
> 

It's unable to find the hugetlb call, I think is_hugetlb_vma() or so.

>>> +struct swap_cgroup {
>>> +	struct cgroup_subsys_state scg_css;
>> Can't we call this just css. Since the structure is swap_cgroup it
>> already has the required namespace required to distinguish it from
>> other css's. Please see page 4 of "The practice of programming", be
>> consistent. The same comment applies to all members below.
> 
> i don't have the book.
> i like this kind of prefixes as it's grep-friendly.
> 
>>> +#define	task_to_css(task) task_subsys_state((task), swap_cgroup_subsys_id)
>>> +#define	css_to_scg(css)	container_of((css), struct swap_cgroup, scg_css)
>>> +#define	cg_to_css(cg)	cgroup_subsys_state((cg), swap_cgroup_subsys_id)
>>> +#define	cg_to_scg(cg)	css_to_scg(cg_to_css(cg))
>> Aren't static inline better than macros? I would suggest moving to
>> them.
> 
> sounds like a matter of preference.
> either ok for me.
> 

There are other advantages, like better type checking of the arguments. The
compiler might even determine that it's better of making a function call instead
of inlining it (rare, but possible).

>>> +static struct swap_cgroup *
>>> +swap_cgroup_prepare_ptp(struct page *ptp, struct mm_struct *mm)
>>> +{
>>> +	struct swap_cgroup *scg = ptp->ptp_swap_cgroup;
>>> +
>> Is this routine safe w.r.t. concurrent operations, modifications to
>> ptp_swap_cgroup?
> 
> it's always accessed with the page table locked.
> 
>>> +	BUG_ON(mm == NULL);
>>> +	BUG_ON(mm->swap_cgroup == NULL);
>>> +	if (scg == NULL) {
>>> +		/*
>>> +		 * see swap_cgroup_attach.
>>> +		 */
>>> +		smp_rmb();
>>> +		scg = mm->swap_cgroup;
>> With the mm->owner patches, we need not maintain a separate
>> mm->swap_cgroup.
> 
> i don't think the mm->owner patch, at least with the current form,
> can replace it.
> 

Could you please mention what the limitations are? We could get those fixed or
take another serious look at the mm->owner patches.

>>> +	/*
>>> +	 * swap_cgroup_attach is in progress.
>>> +	 */
>>> +
>>> +	res_counter_charge_force(&newscg->scg_counter, PAGE_CACHE_SIZE);
>> So, we force the counter to go over limit?
> 
> yes.
> 
> as newscg != oldscg here means the task is being moved between cgroups,
> this instance of res_counter_charge_force should not matter much.
> 

Isn't it bad to force a group to go over it's limit due to migration?

>>> +static int
>>> +swap_cgroup_write_u64(struct cgroup *cg, struct cftype *cft, u64 val)
>>> +{
>>> +	struct res_counter *counter = &cg_to_scg(cg)->scg_counter;
>>> +	unsigned long flags;
>>> +
>>> +	/* XXX res_counter_write_u64 */
>>> +	BUG_ON(cft->private != RES_LIMIT);
>>> +	spin_lock_irqsave(&counter->lock, flags);
>>> +	counter->limit = val;
>>> +	spin_unlock_irqrestore(&counter->lock, flags);
>>> +	return 0;
>>> +}
>>> +
>> We need to write actual numbers here? Can't we keep the write
>> interface consistent with the memory controller?
> 
> i'm not sure what you mean here.  can you explain a bit more?
> do you mean K, M, etc?
> 

Yes, I mean the same format that memparse() uses.

>>> +static void
>>> +swap_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cg)
>>> +{
>>> +	struct swap_cgroup *oldscg = cg_to_scg(cg);
>>> +	struct swap_cgroup *newscg;
>>> +	struct list_head *pos;
>>> +	struct list_head *next;
>>> +
>>> +	/*
>>> +	 * move our anonymous objects to init_mm's group.
>>> +	 */
>> Is this good design, should be allow a swap_cgroup to be destroyed,
>> even though pages are allocated to it?
> 
> first of all, i'm not quite happy with this design. :)
> having said that, what else can we do?
> i tend to think that trying to swap-in these pages is too much effort
> for little benefit.
> 

Just fail the destroy operation, in this case.

>> Is moving to init_mm (root
>> cgroup) a good idea? Ideally with support for hierarchies, if we ever
>> do move things, it should be to the parent cgroup.
> 
> i chose init_mm because there seemed to be no consensus about
> cgroup hierarchy semantics.
> 

I would suggest that we fail deletion of a group for now. I have a set of
patches for the cgroup hierarchy semantics. I think the parent is the best place
to move it.

>>> +		info->swap_cgroup = newscg;
>>> +		res_counter_uncharge(&oldscg->scg_counter, bytes);
>>> +		res_counter_charge_force(&newscg->scg_counter, bytes);
>> I don't like the excessive use of res_counter_charge_force(), it seems
>> like we might end up bypassing the controller all together. I would
>> rather fail the destroy operation if the charge fails.
> 
>>> +	bytes = swslots * PAGE_CACHE_SIZE;
>>> +	res_counter_uncharge(&oldscg->scg_counter, bytes);
>>> +	/*
>>> +	 * XXX ignore newscg's limit because cgroup ->attach method can't fail.
>>> +	 */
>>> +	res_counter_charge_force(&newscg->scg_counter, bytes);
>> But in the future, we could plan on making attach fail (I have a
>> requirement for it). Again, I don't like the _force operation
> 
> allowing these operations fail implies to have code to back out
> partial operations.  i'm afraid that it will be too complex.
> 

OK, we need to find out a way to fix that then.

>>> +static void
>>> +swap_cgroup_attach_mm(struct mm_struct *mm, struct swap_cgroup *oldscg,
>>> +    struct swap_cgroup *newscg)
>> We need comments about the function, why do we attach an mm?
> 
> instead of a task, you mean?
> because we count the number of ptes which points to swap
> and ptes belong to an mm, not a task.
> 

OK

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

WARNING: multiple messages have this Message-ID (diff)

From: Balbir Singh <balbir@linux.vnet.ibm.com>
To: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: nishimura@mxp.nes.nec.co.jp, containers@lists.osdl.org,
	minoura@valinux.co.jp, hugh@veritas.com, linux-mm@kvack.org,
	kamezawa.hiroyu@jp.fujitsu.com
Subject: Re: [RFC][PATCH] another swap controller for cgroup
Date: Thu, 08 May 2008 21:13:50 +0530	[thread overview]
Message-ID: <48231FB6.7000206@linux.vnet.ibm.com> (raw)
In-Reply-To: <20080507055045.4ACBD5A0A@siro.lan>

YAMAMOTO Takashi wrote:
> hi,
> 
>> Hi, Thanks for the patches and your patience. I've just applied your
>> patches on top of 2.6.25-mm1 (it had a minor reject, that I've fixed).
>> I am building and testing the patches along with KAMEZAWA-San's low
>> overhead patches.
> 
> thanks.
> 
>>> +#include <linux/err.h>
>>> +#include <linux/cgroup.h>
>>> +#include <linux/hugetlb.h>
>> My powerpc build fails, we need to move hugetlb.h down to the bottom
> 
> what's the error message?
> 

It's unable to find the hugetlb call, I think is_hugetlb_vma() or so.

>>> +struct swap_cgroup {
>>> +	struct cgroup_subsys_state scg_css;
>> Can't we call this just css. Since the structure is swap_cgroup it
>> already has the required namespace required to distinguish it from
>> other css's. Please see page 4 of "The practice of programming", be
>> consistent. The same comment applies to all members below.
> 
> i don't have the book.
> i like this kind of prefixes as it's grep-friendly.
> 
>>> +#define	task_to_css(task) task_subsys_state((task), swap_cgroup_subsys_id)
>>> +#define	css_to_scg(css)	container_of((css), struct swap_cgroup, scg_css)
>>> +#define	cg_to_css(cg)	cgroup_subsys_state((cg), swap_cgroup_subsys_id)
>>> +#define	cg_to_scg(cg)	css_to_scg(cg_to_css(cg))
>> Aren't static inline better than macros? I would suggest moving to
>> them.
> 
> sounds like a matter of preference.
> either ok for me.
> 

There are other advantages, like better type checking of the arguments. The
compiler might even determine that it's better of making a function call instead
of inlining it (rare, but possible).

>>> +static struct swap_cgroup *
>>> +swap_cgroup_prepare_ptp(struct page *ptp, struct mm_struct *mm)
>>> +{
>>> +	struct swap_cgroup *scg = ptp->ptp_swap_cgroup;
>>> +
>> Is this routine safe w.r.t. concurrent operations, modifications to
>> ptp_swap_cgroup?
> 
> it's always accessed with the page table locked.
> 
>>> +	BUG_ON(mm == NULL);
>>> +	BUG_ON(mm->swap_cgroup == NULL);
>>> +	if (scg == NULL) {
>>> +		/*
>>> +		 * see swap_cgroup_attach.
>>> +		 */
>>> +		smp_rmb();
>>> +		scg = mm->swap_cgroup;
>> With the mm->owner patches, we need not maintain a separate
>> mm->swap_cgroup.
> 
> i don't think the mm->owner patch, at least with the current form,
> can replace it.
> 

Could you please mention what the limitations are? We could get those fixed or
take another serious look at the mm->owner patches.

>>> +	/*
>>> +	 * swap_cgroup_attach is in progress.
>>> +	 */
>>> +
>>> +	res_counter_charge_force(&newscg->scg_counter, PAGE_CACHE_SIZE);
>> So, we force the counter to go over limit?
> 
> yes.
> 
> as newscg != oldscg here means the task is being moved between cgroups,
> this instance of res_counter_charge_force should not matter much.
> 

Isn't it bad to force a group to go over it's limit due to migration?

>>> +static int
>>> +swap_cgroup_write_u64(struct cgroup *cg, struct cftype *cft, u64 val)
>>> +{
>>> +	struct res_counter *counter = &cg_to_scg(cg)->scg_counter;
>>> +	unsigned long flags;
>>> +
>>> +	/* XXX res_counter_write_u64 */
>>> +	BUG_ON(cft->private != RES_LIMIT);
>>> +	spin_lock_irqsave(&counter->lock, flags);
>>> +	counter->limit = val;
>>> +	spin_unlock_irqrestore(&counter->lock, flags);
>>> +	return 0;
>>> +}
>>> +
>> We need to write actual numbers here? Can't we keep the write
>> interface consistent with the memory controller?
> 
> i'm not sure what you mean here.  can you explain a bit more?
> do you mean K, M, etc?
> 

Yes, I mean the same format that memparse() uses.

>>> +static void
>>> +swap_cgroup_destroy(struct cgroup_subsys *ss, struct cgroup *cg)
>>> +{
>>> +	struct swap_cgroup *oldscg = cg_to_scg(cg);
>>> +	struct swap_cgroup *newscg;
>>> +	struct list_head *pos;
>>> +	struct list_head *next;
>>> +
>>> +	/*
>>> +	 * move our anonymous objects to init_mm's group.
>>> +	 */
>> Is this good design, should be allow a swap_cgroup to be destroyed,
>> even though pages are allocated to it?
> 
> first of all, i'm not quite happy with this design. :)
> having said that, what else can we do?
> i tend to think that trying to swap-in these pages is too much effort
> for little benefit.
> 

Just fail the destroy operation, in this case.

>> Is moving to init_mm (root
>> cgroup) a good idea? Ideally with support for hierarchies, if we ever
>> do move things, it should be to the parent cgroup.
> 
> i chose init_mm because there seemed to be no consensus about
> cgroup hierarchy semantics.
> 

I would suggest that we fail deletion of a group for now. I have a set of
patches for the cgroup hierarchy semantics. I think the parent is the best place
to move it.

>>> +		info->swap_cgroup = newscg;
>>> +		res_counter_uncharge(&oldscg->scg_counter, bytes);
>>> +		res_counter_charge_force(&newscg->scg_counter, bytes);
>> I don't like the excessive use of res_counter_charge_force(), it seems
>> like we might end up bypassing the controller all together. I would
>> rather fail the destroy operation if the charge fails.
> 
>>> +	bytes = swslots * PAGE_CACHE_SIZE;
>>> +	res_counter_uncharge(&oldscg->scg_counter, bytes);
>>> +	/*
>>> +	 * XXX ignore newscg's limit because cgroup ->attach method can't fail.
>>> +	 */
>>> +	res_counter_charge_force(&newscg->scg_counter, bytes);
>> But in the future, we could plan on making attach fail (I have a
>> requirement for it). Again, I don't like the _force operation
> 
> allowing these operations fail implies to have code to back out
> partial operations.  i'm afraid that it will be too complex.
> 

OK, we need to find out a way to fix that then.

>>> +static void
>>> +swap_cgroup_attach_mm(struct mm_struct *mm, struct swap_cgroup *oldscg,
>>> +    struct swap_cgroup *newscg)
>> We need comments about the function, why do we attach an mm?
> 
> instead of a task, you mean?
> because we count the number of ptes which points to swap
> and ptes belong to an mm, not a task.
> 

OK

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2008-05-08 15:43 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-17  2:04 [RFC][PATCH] another swap controller for cgroup YAMAMOTO Takashi
2008-03-17  2:04 ` YAMAMOTO Takashi
     [not found] ` <20080317020407.8512E1E7995-Pcsii4f/SVk@public.gmane.org>
2008-03-17  5:11   ` Balbir Singh
2008-03-17  5:11     ` Balbir Singh
2008-03-17  8:15   ` Daisuke Nishimura
2008-03-17  8:15     ` Daisuke Nishimura
     [not found]     ` <47DE2894.6010306-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-03-17  8:50       ` YAMAMOTO Takashi
2008-03-17  8:50         ` YAMAMOTO Takashi
     [not found]         ` <20080317085003.EA4511E7A77-Pcsii4f/SVk@public.gmane.org>
2008-04-29 22:50           ` YAMAMOTO Takashi
2008-04-29 22:50             ` YAMAMOTO Takashi
     [not found]             ` <20080429225047.EC4645A04-Pcsii4f/SVk@public.gmane.org>
2008-04-30  4:09               ` Daisuke Nishimura
2008-04-30  4:09                 ` Daisuke Nishimura
     [not found]                 ` <4817F108.40806-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-05-22  4:46                   ` YAMAMOTO Takashi
2008-05-22  4:46                     ` YAMAMOTO Takashi
     [not found]                     ` <20080522044656.2FB695A0A-Pcsii4f/SVk@public.gmane.org>
2008-05-22  4:54                       ` Daisuke Nishimura
2008-05-22  4:54                         ` Daisuke Nishimura
2008-05-05  6:11               ` Balbir Singh
2008-05-05  6:11                 ` Balbir Singh
     [not found]                 ` <20080505061142.GA13834-SINUvgVNF2CyUtPGxGje5AC/G2K4zDHf@public.gmane.org>
2008-05-07  5:50                   ` YAMAMOTO Takashi
2008-05-07  5:50                     ` YAMAMOTO Takashi
     [not found]                     ` <20080507055045.4ACBD5A0A-Pcsii4f/SVk@public.gmane.org>
2008-05-08 15:43                       ` Balbir Singh [this message]
2008-05-08 15:43                         ` Balbir Singh
     [not found]                         ` <48231FB6.7000206-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-05-14  3:21                           ` YAMAMOTO Takashi
2008-05-14  3:21                             ` YAMAMOTO Takashi
     [not found]                             ` <20080514032125.46F7D5A07-Pcsii4f/SVk@public.gmane.org>
2008-05-14  3:27                               ` Paul Menage
2008-05-14  3:27                                 ` Paul Menage
2008-05-14  8:44                               ` Paul Menage
2008-05-14  8:44                                 ` Paul Menage
     [not found]                                 ` <6599ad830805140144k583f7426k4024dd17a6cd3eb8-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-05-15  6:23                                   ` YAMAMOTO Takashi
2008-05-15  6:23                                     ` YAMAMOTO Takashi
     [not found]                                     ` <20080515062318.5F1BA5A07-Pcsii4f/SVk@public.gmane.org>
2008-05-15  7:19                                       ` Paul Menage
2008-05-15  7:19                                         ` Paul Menage
     [not found]                                         ` <6599ad830805150019v5ba23fe1xe5a6e8b80bc194f5-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2008-05-15  8:56                                           ` YAMAMOTO Takashi
2008-05-15  8:56                                             ` YAMAMOTO Takashi
     [not found]                                             ` <20080515085606.7239D5A07-Pcsii4f/SVk@public.gmane.org>
2008-05-15 12:01                                               ` Daisuke Nishimura
2008-05-15 12:01                                                 ` Daisuke Nishimura
     [not found]                                                 ` <482C2631.1030600-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-05-19  4:14                                                   ` YAMAMOTO Takashi
2008-05-19  4:14                                                     ` YAMAMOTO Takashi
2008-03-24 12:10       ` Daisuke Nishimura
2008-03-24 12:10         ` Daisuke Nishimura
     [not found]         ` <47E79A26.3070401-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-03-24 12:22           ` Balbir Singh
2008-03-24 12:22             ` Balbir Singh
     [not found]             ` <47E79CF0.6040308-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2008-03-25  6:46               ` Daisuke Nishimura
2008-03-25  6:46                 ` Daisuke Nishimura
2008-03-25  3:10           ` YAMAMOTO Takashi
2008-03-25  3:10             ` YAMAMOTO Takashi
     [not found]             ` <20080325031039.549831E9292-Pcsii4f/SVk@public.gmane.org>
2008-03-25  4:35               ` Daisuke Nishimura
2008-03-25  4:35                 ` Daisuke Nishimura
     [not found]                 ` <47E88129.1010705-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-03-25  8:57                   ` YAMAMOTO Takashi
2008-03-25  8:57                     ` YAMAMOTO Takashi
     [not found]                     ` <20080325085723.698C11E936D-Pcsii4f/SVk@public.gmane.org>
2008-03-25 12:35                       ` Daisuke Nishimura
2008-03-25 12:35                         ` Daisuke Nishimura
     [not found]                         ` <47E8F1A0.5080209-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-03-27  6:28                           ` YAMAMOTO Takashi
2008-03-27  6:28                             ` YAMAMOTO Takashi
     [not found]                             ` <20080327062834.DAB7E5A02-Pcsii4f/SVk@public.gmane.org>
2008-03-28  9:00                               ` Daisuke Nishimura
2008-03-28  9:00                                 ` Daisuke Nishimura
     [not found]                                 ` <47ECB3B1.6040500-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org>
2008-04-08  3:29                                   ` YAMAMOTO Takashi
2008-04-10  7:40                               ` YAMAMOTO Takashi
2008-04-10  7:40                                 ` YAMAMOTO Takashi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48231FB6.7000206@linux.vnet.ibm.com \
    --to=balbir-23vcf4htsmix0ybbhkvfkdbpr1lh4cv8@public.gmane.org \
    --cc=containers-qjLDD68F18O7TbgM5vRIOg@public.gmane.org \
    --cc=hugh-DTz5qymZ9yRBDgjK7y7TUQ@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=minoura-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org \
    --cc=nishimura-YQH0OdQVrdy45+QrQBaojngSJqDPrsil@public.gmane.org \
    --cc=yamamoto-jCdQPDEk3idL9jVzuh4AOg@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.