All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@kernel.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel-team@fb.com
Subject: Re: [PATCH v1 1/4] mm: memcontrol: use helpers to access page's memcg data
Date: Thu, 24 Sep 2020 15:45:08 -0400	[thread overview]
Message-ID: <20200924194508.GA329853@cmpxchg.org> (raw)
In-Reply-To: <20200922203700.2879671-2-guro@fb.com>

On Tue, Sep 22, 2020 at 01:36:57PM -0700, Roman Gushchin wrote:
> Currently there are many open-coded reads and writes of the
> page->mem_cgroup pointer, as well as a couple of read helpers,
> which are barely used.
> 
> It creates an obstacle on a way to reuse some bits of the pointer
> for storing additional bits of information. In fact, we already do
> this for slab pages, where the last bit indicates that a pointer has
> an attached vector of objcg pointers instead of a regular memcg
> pointer.
> 
> This commits introduces 4 new helper functions and converts all
> raw accesses to page->mem_cgroup to calls of these helpers:
>   struct mem_cgroup *page_mem_cgroup(struct page *page);
>   struct mem_cgroup *page_mem_cgroup_check(struct page *page);
>   void set_page_mem_cgroup(struct page *page, struct mem_cgroup *memcg);
>   void clear_page_mem_cgroup(struct page *page);

Sounds reasonable to me!

> page_mem_cgroup_check() is intended to be used in cases when the page
> can be a slab page and have a memcg pointer pointing at objcg vector.
> It does check the lowest bit, and if set, returns NULL.
> page_mem_cgroup() contains a VM_BUG_ON_PAGE() check for the page not
> being a slab page. So do set_page_mem_cgroup() and clear_page_mem_cgroup().
> 
> To make sure nobody uses a direct access, struct page's
> mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
> Only new helpers and a couple of slab-accounting related functions
> access this field directly.
> 
> page_memcg() and page_memcg_rcu() helpers defined in mm.h are removed.
> New page_mem_cgroup() is a direct analog of page_memcg(), while
> page_memcg_rcu() has a single call site in a small rcu-read-lock
> section, so it's just not worth it to have a separate helper. So
> it's replaced with page_mem_cgroup() too.

page_memcg_rcu() does READ_ONCE(). We need to keep that for lockless
accesses.

> @@ -343,6 +343,72 @@ struct mem_cgroup {
>  
>  extern struct mem_cgroup *root_mem_cgroup;
>  
> +/*
> + * page_mem_cgroup - get the memory cgroup associated with a page
> + * @page: a pointer to the page struct
> + *
> + * Returns a pointer to the memory cgroup associated with the page,
> + * or NULL. This function assumes that the page is known to have a
> + * proper memory cgroup pointer. It's not safe to call this function
> + * against some type of pages, e.g. slab pages or ex-slab pages.
> + */
> +static inline struct mem_cgroup *page_mem_cgroup(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	return (struct mem_cgroup *)page->memcg_data;
> +}

This would also be a good place to mention what's required for the
function to be called safely, or in a way that produces a stable
result - i.e. the list of conditions in commit_charge().

> + * page_mem_cgroup_check - get the memory cgroup associated with a page
> + * @page: a pointer to the page struct
> + *
> + * Returns a pointer to the memory cgroup associated with the page,
> + * or NULL. This function unlike page_mem_cgroup() can take any  page
> + * as an argument. It has to be used in cases when it's not known if a page
> + * has an associated memory cgroup pointer or an object cgroups vector.
> + */
> +static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page)
> +{
> +	unsigned long memcg_data = page->memcg_data;
> +
> +	/*
> +	 * The lowest bit set means that memcg isn't a valid
> +	 * memcg pointer, but a obj_cgroups pointer.
> +	 * In this case the page is shared and doesn't belong
> +	 * to any specific memory cgroup.
> +	 */
> +	if (memcg_data & 0x1UL)
> +		return NULL;
> +
> +	return (struct mem_cgroup *)memcg_data;
> +}

Here as well.

> +
> +/*
> + * set_page_mem_cgroup - associate a page with a memory cgroup
> + * @page: a pointer to the page struct
> + * @memcg: a pointer to the memory cgroup
> + *
> + * Associates a page with a memory cgroup.
> + */
> +static inline void set_page_mem_cgroup(struct page *page,
> +				       struct mem_cgroup *memcg)
> +{
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	page->memcg_data = (unsigned long)memcg;
> +}
> +
> +/*
> + * clear_page_mem_cgroup - clear an association of a page with a memory cgroup
> + * @page: a pointer to the page struct
> + *
> + * Clears an association of a page with a memory cgroup.
> + */
> +static inline void clear_page_mem_cgroup(struct page *page)
> +{
> +	VM_BUG_ON_PAGE(PageSlab(page), page);
> +	page->memcg_data = 0;
> +}
> +
>  static __always_inline bool memcg_stat_item_in_bytes(int idx)
>  {
>  	if (idx == MEMCG_PERCPU_B)
> @@ -743,15 +809,15 @@ static inline void mod_memcg_state(struct mem_cgroup *memcg,
>  static inline void __mod_memcg_page_state(struct page *page,
>  					  int idx, int val)
>  {
> -	if (page->mem_cgroup)
> -		__mod_memcg_state(page->mem_cgroup, idx, val);
> +	if (page_mem_cgroup(page))
> +		__mod_memcg_state(page_mem_cgroup(page), idx, val);
>  }
>  
>  static inline void mod_memcg_page_state(struct page *page,
>  					int idx, int val)
>  {
> -	if (page->mem_cgroup)
> -		mod_memcg_state(page->mem_cgroup, idx, val);
> +	if (page_mem_cgroup(page))
> +		mod_memcg_state(page_mem_cgroup(page), idx, val);
>  }
>  
>  static inline unsigned long lruvec_page_state(struct lruvec *lruvec,
> @@ -838,12 +904,12 @@ static inline void __mod_lruvec_page_state(struct page *page,
>  	struct lruvec *lruvec;
>  
>  	/* Untracked pages have no memcg, no lruvec. Update only the node */
> -	if (!head->mem_cgroup) {
> +	if (!page_mem_cgroup(head)) {
>  		__mod_node_page_state(pgdat, idx, val);
>  		return;
>  	}
>  
> -	lruvec = mem_cgroup_lruvec(head->mem_cgroup, pgdat);
> +	lruvec = mem_cgroup_lruvec(page_mem_cgroup(head), pgdat);
>  	__mod_lruvec_state(lruvec, idx, val);

The repetition of the function call is a bit jarring, especially in
configs with VM_BUG_ON() enabled (some distros use it for their beta
release kernels, so it's not just kernel developer test machines that
pay this cost). Can you please use a local variable when the function
needs the memcg more than once?

> @@ -878,8 +944,8 @@ static inline void count_memcg_events(struct mem_cgroup *memcg,
>  static inline void count_memcg_page_event(struct page *page,
>  					  enum vm_event_item idx)
>  {
> -	if (page->mem_cgroup)
> -		count_memcg_events(page->mem_cgroup, idx, 1);
> +	if (page_mem_cgroup(page))
> +		count_memcg_events(page_mem_cgroup(page), idx, 1);
>  }
>  
>  static inline void count_memcg_event_mm(struct mm_struct *mm,
> @@ -941,6 +1007,25 @@ void mem_cgroup_split_huge_fixup(struct page *head);
>  
>  struct mem_cgroup;
>  
> +static inline struct mem_cgroup *page_mem_cgroup(struct page *page)
> +{
> +	return NULL;
> +}
> +
> +static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page)
> +{
> +	return NULL;
> +}
> +
> +static inline void set_page_mem_cgroup(struct page *page,
> +				       struct mem_cgroup *memcg)
> +{
> +}
> +
> +static inline void clear_page_mem_cgroup(struct page *page)
> +{
> +}
> +
>  static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
>  {
>  	return true;
> @@ -1430,7 +1515,7 @@ static inline void mem_cgroup_track_foreign_dirty(struct page *page,
>  	if (mem_cgroup_disabled())
>  		return;
>  
> -	if (unlikely(&page->mem_cgroup->css != wb->memcg_css))
> +	if (unlikely(&page_mem_cgroup(page)->css != wb->memcg_css))
>  		mem_cgroup_track_foreign_dirty_slowpath(page, wb);
>  }
>  
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 17e712207d74..5e24ff2ffec9 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1476,28 +1476,6 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
>  #endif
>  }
>  
> -#ifdef CONFIG_MEMCG
> -static inline struct mem_cgroup *page_memcg(struct page *page)
> -{
> -	return page->mem_cgroup;
> -}
> -static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
> -{
> -	WARN_ON_ONCE(!rcu_read_lock_held());
> -	return READ_ONCE(page->mem_cgroup);
> -}
> -#else
> -static inline struct mem_cgroup *page_memcg(struct page *page)
> -{
> -	return NULL;
> -}
> -static inline struct mem_cgroup *page_memcg_rcu(struct page *page)
> -{
> -	WARN_ON_ONCE(!rcu_read_lock_held());
> -	return NULL;
> -}
> -#endif

You essentially renamed these existing helpers, but I don't think
that's justified. Especially with the proliferation of callsites, the
original names are nicer. I'd prefer we keep them.

> @@ -560,16 +560,7 @@ ino_t page_cgroup_ino(struct page *page)
>  	unsigned long ino = 0;
>  
>  	rcu_read_lock();
> -	memcg = page->mem_cgroup;
> -
> -	/*
> -	 * The lowest bit set means that memcg isn't a valid
> -	 * memcg pointer, but a obj_cgroups pointer.
> -	 * In this case the page is shared and doesn't belong
> -	 * to any specific memory cgroup.
> -	 */
> -	if ((unsigned long) memcg & 0x1UL)
> -		memcg = NULL;
> +	memcg = page_mem_cgroup_check(page);

This should actually have been using READ_ONCE() all along. Otherwise
the compiler can issue multiple loads to page->mem_cgroup here and you
can end up with a pointer with the lowest bit set leaking out.

> @@ -2928,17 +2918,6 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  
>  	page = virt_to_head_page(p);
>  
> -	/*
> -	 * If page->mem_cgroup is set, it's either a simple mem_cgroup pointer
> -	 * or a pointer to obj_cgroup vector. In the latter case the lowest
> -	 * bit of the pointer is set.
> -	 * The page->mem_cgroup pointer can be asynchronously changed
> -	 * from NULL to (obj_cgroup_vec | 0x1UL), but can't be changed
> -	 * from a valid memcg pointer to objcg vector or back.
> -	 */
> -	if (!page->mem_cgroup)
> -		return NULL;
> -
>  	/*
>  	 * Slab objects are accounted individually, not per-page.
>  	 * Memcg membership data for each individual object is saved in
> @@ -2956,8 +2935,14 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  		return NULL;
>  	}
>  
> -	/* All other pages use page->mem_cgroup */
> -	return page->mem_cgroup;
> +	/*
> +	 * page_mem_cgroup_check() is used here, because page_has_obj_cgroups()
> +	 * check above could fail because the object cgroups vector wasn't set
> +	 * at that moment, but it can be set concurrently.
> +	 * page_mem_cgroup_check(page) will guarantee tat a proper memory
> +	 * cgroup pointer or NULL will be returned.
> +	 */
> +	return page_mem_cgroup_check(page);

The code right now doesn't look quite safe. As per above, without the
READ_ONCE the compiler might issue multiple loads and we may get a
pointer with the low bit set.

Maybe slightly off-topic, but what are "all other pages" in general?
I don't see any callsites that ask for ownership on objects whose
backing pages may belong to a single memcg. That wouldn't seem to make
too much sense. Unless I'm missing something, this function should
probably tighten up its scope a bit and only work on stuff that is
actually following the obj_cgroup protocol.

I.e. either do the obj_cgroup lookup, or return root_mem_cgroup like
the other mem_cgroup_from_* functions.


  reply	other threads:[~2020-09-24 19:46 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-22 20:36 [PATCH v1 0/4] mm: allow mapping accounted kernel pages to userspace Roman Gushchin
2020-09-22 20:36 ` [PATCH v1 1/4] mm: memcontrol: use helpers to access page's memcg data Roman Gushchin
2020-09-24 19:45   ` Johannes Weiner [this message]
2020-09-24 20:27     ` Roman Gushchin
2020-09-25 17:39       ` Johannes Weiner
2020-09-22 20:36 ` [PATCH v1 2/4] mm: memcontrol/slab: use helpers to access slab page's memcg_data Roman Gushchin
2020-09-24 19:53   ` Johannes Weiner
2020-09-24 20:29     ` Roman Gushchin
2020-09-22 20:36 ` [PATCH v1 3/4] mm: introduce page memcg flags Roman Gushchin
2020-09-24  7:03   ` Shakeel Butt
2020-09-24 17:05     ` Roman Gushchin
2020-09-24 20:01   ` Johannes Weiner
2020-09-24 20:39     ` Roman Gushchin
2020-09-25 18:07       ` Johannes Weiner
2020-09-24 20:05   ` Johannes Weiner
2020-09-22 20:37 ` [PATCH v1 4/4] mm: convert page kmemcg type to a page memcg flag Roman Gushchin
2020-09-24  7:06   ` Shakeel Butt
2020-09-24 20:14   ` Johannes Weiner
2020-09-24 20:42     ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200924194508.GA329853@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.