All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2
Date: Thu, 10 Dec 2015 20:00:20 +0300	[thread overview]
Message-ID: <20151210170020.GA5171@esperanza> (raw)
In-Reply-To: <20151210160027.GA3308@cmpxchg.org>

On Thu, Dec 10, 2015 at 11:00:27AM -0500, Johannes Weiner wrote:
> On Thu, Dec 10, 2015 at 02:39:14PM +0300, Vladimir Davydov wrote:
...
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index c6a5ed2f2744..993c9a26b637 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -169,6 +169,7 @@ struct mem_cgroup {
> >  
> >  	/* Accounted resources */
> >  	struct page_counter memory;
> > +	struct page_counter swap;
> >  	struct page_counter memsw;
> >  	struct page_counter kmem;
> 
> We should probably separate this to differentiate the new counters
> from the old ones. Only memory and swap are actual resources, the
> memsw and kmem counters are counting consumer-oriented.

Yeah, but we'd better do it in a separate patch.

> 
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 457181844b6e..f4b3ccdcba91 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -368,11 +368,16 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
> >  #endif
> >  #ifdef CONFIG_MEMCG_SWAP
> >  extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry);
> > +extern int mem_cgroup_charge_swap(struct page *page, swp_entry_t entry);
> 
> Should this be mem_cgroup_try_swap() to keep in line with the page
> counter terminology? So it's clear this is not forcing a charge.

Hmm, I thought we only use try_charge name in memcontrol.c if there is
commit stage, e.g. we have memcg_kmem_charge, not memcg_kmem_try_charge.
This conflicts with page_counter semantics though, so we might want to
rename it.

> 
> > @@ -1248,12 +1248,15 @@ static unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
> >  {
> >  	unsigned long limit;
> >  
> > -	limit = memcg->memory.limit;
> > +	limit = READ_ONCE(memcg->memory.limit);
> >  	if (mem_cgroup_swappiness(memcg)) {
> >  		unsigned long memsw_limit;
> > +		unsigned long swap_limit;
> >  
> > -		memsw_limit = memcg->memsw.limit;
> > -		limit = min(limit + total_swap_pages, memsw_limit);
> > +		memsw_limit = READ_ONCE(memcg->memsw.limit);
> > +		swap_limit = min(READ_ONCE(memcg->swap.limit),
> > +				 (unsigned long)total_swap_pages);
> > +		limit = min(limit + swap_limit, memsw_limit);
> >  	}
> >  	return limit;
> 
> This is taking a racy snapshot, so we don't rely on 100% accuracy. Can
> we do without the READ_ONCE()?

Well, I suppose we can, but passing a volatile value to min macro looks
a bit scary to me. What if swap_limit is changed from a finite value
less than total_swap_pages to inf while we are there? We might use an
infinite memory size in OOM which would screw up OOM scores AFAIU.
Unlikely, but still.

> 
> > @@ -5754,26 +5760,66 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> >  	memcg_check_events(memcg, page);
> >  }
> >  
> > +/*
> > + * mem_cgroup_charge_swap - charge a swap entry
> > + * @page: page being added to swap
> > + * @entry: swap entry to charge
> > + *
> > + * Try to charge @entry to the memcg that @page belongs to.
> > + *
> > + * Returns 0 on success, -ENOMEM on failure.
> > + */
> > +int mem_cgroup_charge_swap(struct page *page, swp_entry_t entry)
> > +{
> > +	struct mem_cgroup *memcg;
> > +	struct page_counter *counter;
> > +	unsigned short oldid;
> > +
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !do_swap_account)
> > +		return 0;
> > +
> > +	memcg = page->mem_cgroup;
> > +
> > +	/* Readahead page, never charged */
> > +	if (!memcg)
> > +		return 0;
> > +
> > +	if (!mem_cgroup_is_root(memcg) &&
> > +	    !page_counter_try_charge(&memcg->swap, 1, &counter))
> > +		return -ENOMEM;
> > +
> > +	oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg));
> > +	VM_BUG_ON_PAGE(oldid, page);
> > +	mem_cgroup_swap_statistics(memcg, true);
> > +
> > +	css_get(&memcg->css);
> 
> I think we don't have to duplicate the swap record code. Both cgroup1
> and cgroup2 could run this function to handle the swapout record and
> statistics, and then mem_cgroup_swapout() would simply uncharge memsw.

Well, may be. I'm afraid this might make mem_cgroup_charge_swap look a
bit messy due to necessity to check cgroup_subsys_on_dfl in the middle
of it, but I'll give it a try.

> 
> > @@ -5828,6 +5931,8 @@ static int __init mem_cgroup_swap_init(void)
> >  {
> >  	if (!mem_cgroup_disabled() && really_do_swap_account) {
> >  		do_swap_account = 1;
> > +		WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys,
> > +					       swap_files));
> >  		WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys,
> >  						  memsw_cgroup_files));
> 
> I guess we could also support cgroup.memory=noswap.
> 

Yeah, that would be cleaner. I wonder if we could drop swapaccount boot
param then so as not to clutter the API.

Thanks for the review!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2
Date: Thu, 10 Dec 2015 20:00:20 +0300	[thread overview]
Message-ID: <20151210170020.GA5171@esperanza> (raw)
In-Reply-To: <20151210160027.GA3308@cmpxchg.org>

On Thu, Dec 10, 2015 at 11:00:27AM -0500, Johannes Weiner wrote:
> On Thu, Dec 10, 2015 at 02:39:14PM +0300, Vladimir Davydov wrote:
...
> > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> > index c6a5ed2f2744..993c9a26b637 100644
> > --- a/include/linux/memcontrol.h
> > +++ b/include/linux/memcontrol.h
> > @@ -169,6 +169,7 @@ struct mem_cgroup {
> >  
> >  	/* Accounted resources */
> >  	struct page_counter memory;
> > +	struct page_counter swap;
> >  	struct page_counter memsw;
> >  	struct page_counter kmem;
> 
> We should probably separate this to differentiate the new counters
> from the old ones. Only memory and swap are actual resources, the
> memsw and kmem counters are counting consumer-oriented.

Yeah, but we'd better do it in a separate patch.

> 
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 457181844b6e..f4b3ccdcba91 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -368,11 +368,16 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem)
> >  #endif
> >  #ifdef CONFIG_MEMCG_SWAP
> >  extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry);
> > +extern int mem_cgroup_charge_swap(struct page *page, swp_entry_t entry);
> 
> Should this be mem_cgroup_try_swap() to keep in line with the page
> counter terminology? So it's clear this is not forcing a charge.

Hmm, I thought we only use try_charge name in memcontrol.c if there is
commit stage, e.g. we have memcg_kmem_charge, not memcg_kmem_try_charge.
This conflicts with page_counter semantics though, so we might want to
rename it.

> 
> > @@ -1248,12 +1248,15 @@ static unsigned long mem_cgroup_get_limit(struct mem_cgroup *memcg)
> >  {
> >  	unsigned long limit;
> >  
> > -	limit = memcg->memory.limit;
> > +	limit = READ_ONCE(memcg->memory.limit);
> >  	if (mem_cgroup_swappiness(memcg)) {
> >  		unsigned long memsw_limit;
> > +		unsigned long swap_limit;
> >  
> > -		memsw_limit = memcg->memsw.limit;
> > -		limit = min(limit + total_swap_pages, memsw_limit);
> > +		memsw_limit = READ_ONCE(memcg->memsw.limit);
> > +		swap_limit = min(READ_ONCE(memcg->swap.limit),
> > +				 (unsigned long)total_swap_pages);
> > +		limit = min(limit + swap_limit, memsw_limit);
> >  	}
> >  	return limit;
> 
> This is taking a racy snapshot, so we don't rely on 100% accuracy. Can
> we do without the READ_ONCE()?

Well, I suppose we can, but passing a volatile value to min macro looks
a bit scary to me. What if swap_limit is changed from a finite value
less than total_swap_pages to inf while we are there? We might use an
infinite memory size in OOM which would screw up OOM scores AFAIU.
Unlikely, but still.

> 
> > @@ -5754,26 +5760,66 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry)
> >  	memcg_check_events(memcg, page);
> >  }
> >  
> > +/*
> > + * mem_cgroup_charge_swap - charge a swap entry
> > + * @page: page being added to swap
> > + * @entry: swap entry to charge
> > + *
> > + * Try to charge @entry to the memcg that @page belongs to.
> > + *
> > + * Returns 0 on success, -ENOMEM on failure.
> > + */
> > +int mem_cgroup_charge_swap(struct page *page, swp_entry_t entry)
> > +{
> > +	struct mem_cgroup *memcg;
> > +	struct page_counter *counter;
> > +	unsigned short oldid;
> > +
> > +	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) || !do_swap_account)
> > +		return 0;
> > +
> > +	memcg = page->mem_cgroup;
> > +
> > +	/* Readahead page, never charged */
> > +	if (!memcg)
> > +		return 0;
> > +
> > +	if (!mem_cgroup_is_root(memcg) &&
> > +	    !page_counter_try_charge(&memcg->swap, 1, &counter))
> > +		return -ENOMEM;
> > +
> > +	oldid = swap_cgroup_record(entry, mem_cgroup_id(memcg));
> > +	VM_BUG_ON_PAGE(oldid, page);
> > +	mem_cgroup_swap_statistics(memcg, true);
> > +
> > +	css_get(&memcg->css);
> 
> I think we don't have to duplicate the swap record code. Both cgroup1
> and cgroup2 could run this function to handle the swapout record and
> statistics, and then mem_cgroup_swapout() would simply uncharge memsw.

Well, may be. I'm afraid this might make mem_cgroup_charge_swap look a
bit messy due to necessity to check cgroup_subsys_on_dfl in the middle
of it, but I'll give it a try.

> 
> > @@ -5828,6 +5931,8 @@ static int __init mem_cgroup_swap_init(void)
> >  {
> >  	if (!mem_cgroup_disabled() && really_do_swap_account) {
> >  		do_swap_account = 1;
> > +		WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys,
> > +					       swap_files));
> >  		WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys,
> >  						  memsw_cgroup_files));
> 
> I guess we could also support cgroup.memory=noswap.
> 

Yeah, that would be cleaner. I wonder if we could drop swapaccount boot
param then so as not to clutter the API.

Thanks for the review!

  reply	other threads:[~2015-12-10 17:00 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-10 11:39 [PATCH 0/7] Add swap accounting to cgroup2 Vladimir Davydov
2015-12-10 11:39 ` Vladimir Davydov
2015-12-10 11:39 ` [PATCH 1/7] mm: memcontrol: charge swap " Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-10 16:00   ` Johannes Weiner
2015-12-10 16:00     ` Johannes Weiner
2015-12-10 17:00     ` Vladimir Davydov [this message]
2015-12-10 17:00       ` Vladimir Davydov
2015-12-11  2:48   ` Kamezawa Hiroyuki
2015-12-11  2:48     ` Kamezawa Hiroyuki
2015-12-11  7:39     ` Vladimir Davydov
2015-12-11  7:39       ` Vladimir Davydov
2015-12-14 15:30   ` Michal Hocko
2015-12-14 15:30     ` Michal Hocko
2015-12-14 15:48     ` Johannes Weiner
2015-12-14 15:48       ` Johannes Weiner
2015-12-14 19:42     ` Vladimir Davydov
2015-12-14 19:42       ` Vladimir Davydov
2015-12-14 19:52       ` One Thousand Gnomes
2015-12-14 19:52         ` One Thousand Gnomes
2015-12-15  3:22       ` Kamezawa Hiroyuki
2015-12-15  3:22         ` Kamezawa Hiroyuki
2015-12-15 11:02         ` Vladimir Davydov
2015-12-15 11:02           ` Vladimir Davydov
2015-12-16  2:44           ` Kamezawa Hiroyuki
2015-12-16  2:44             ` Kamezawa Hiroyuki
2015-12-15 14:50         ` Johannes Weiner
2015-12-15 14:50           ` Johannes Weiner
2015-12-16  3:18           ` Kamezawa Hiroyuki
2015-12-16  3:18             ` Kamezawa Hiroyuki
2015-12-16 11:09             ` Johannes Weiner
2015-12-16 11:09               ` Johannes Weiner
2015-12-17  2:46               ` Kamezawa Hiroyuki
2015-12-17  2:46                 ` Kamezawa Hiroyuki
2015-12-17  3:32                 ` Johannes Weiner
2015-12-17  3:32                   ` Johannes Weiner
2015-12-17  4:29                   ` Kamezawa Hiroyuki
2015-12-17  4:29                     ` Kamezawa Hiroyuki
2015-12-15 17:21       ` Michal Hocko
2015-12-15 17:21         ` Michal Hocko
2015-12-15 20:22         ` Johannes Weiner
2015-12-15 20:22           ` Johannes Weiner
2015-12-16  3:57         ` Kamezawa Hiroyuki
2015-12-16  3:57           ` Kamezawa Hiroyuki
2015-12-15  3:12     ` Kamezawa Hiroyuki
2015-12-15  3:12       ` Kamezawa Hiroyuki
2015-12-15  8:30       ` Vladimir Davydov
2015-12-15  8:30         ` Vladimir Davydov
2015-12-15  9:29         ` Kamezawa Hiroyuki
2015-12-15  9:29           ` Kamezawa Hiroyuki
2015-12-10 11:39 ` [PATCH 2/7] mm: vmscan: pass memcg to get_scan_count() Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:24   ` Johannes Weiner
2015-12-11 19:24     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 3/7] mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:25   ` Johannes Weiner
2015-12-11 19:25     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 4/7] swap.h: move memcg related stuff to the end of the file Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:25   ` Johannes Weiner
2015-12-11 19:25     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 5/7] mm: vmscan: do not scan anon pages if memcg swap limit is hit Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:27   ` Johannes Weiner
2015-12-11 19:27     ` Johannes Weiner
2015-12-10 11:39 ` [PATCH 6/7] mm: free swap cache aggressively if memcg swap is full Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:33   ` Johannes Weiner
2015-12-11 19:33     ` Johannes Weiner
2015-12-12 16:18     ` Vladimir Davydov
2015-12-12 16:18       ` Vladimir Davydov
2015-12-10 11:39 ` [PATCH 7/7] Documentation: cgroup: add memory.swap.{current,max} description Vladimir Davydov
2015-12-10 11:39   ` Vladimir Davydov
2015-12-11 19:42   ` Johannes Weiner
2015-12-11 19:42     ` Johannes Weiner
2015-12-12 16:19     ` Vladimir Davydov
2015-12-12 16:19       ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151210170020.GA5171@esperanza \
    --to=vdavydov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.