linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Balbir Singh <bsingharora@gmail.com>
To: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	John Hubbard <jhubbard@nvidia.com>,
	David Nellans <dnellans@nvidia.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	cgroups@vger.kernel.org
Subject: Re: [HMM-CDM 4/5] mm/memcontrol: support MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_PUBLIC
Date: Thu, 15 Jun 2017 11:41:59 +1000	[thread overview]
Message-ID: <20170615114159.11a1eece@firefly.ozlabs.ibm.com> (raw)
In-Reply-To: <20170614201144.9306-5-jglisse@redhat.com>

On Wed, 14 Jun 2017 16:11:43 -0400
Jérôme Glisse <jglisse@redhat.com> wrote:

> HMM pages (private or public device pages) are ZONE_DEVICE page and
> thus need special handling when it comes to lru or refcount. This
> patch make sure that memcontrol properly handle those when it face
> them. Those pages are use like regular pages in a process address
> space either as anonymous page or as file back page. So from memcg
> point of view we want to handle them like regular page for now at
> least.
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: cgroups@vger.kernel.org
> ---
>  kernel/memremap.c |  2 ++
>  mm/memcontrol.c   | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
>  2 files changed, 55 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index da74775..584984c 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -479,6 +479,8 @@ void put_zone_device_private_or_public_page(struct page *page)
>  		__ClearPageActive(page);
>  		__ClearPageWaiters(page);
>  
> +		mem_cgroup_uncharge(page);
> +

A zone device page could have a mem_cgroup charge if

1. The old page was charged to a cgroup and the new page from ZONE_DEVICE then
gets the charge that we need to drop here

And should not be charged

2. If the driver allowed mmap based allocation (these pages are not on LRU


Since put_zone_device_private_or_public_page() is called from release_pages(),
I think the assumption is that 2 is not a problem? I've not tested the mmap
bits yet.

>  		page->pgmap->page_free(page, page->pgmap->data);
>  	}
>  	else if (!count)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index b93f5fe..171b638 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4391,12 +4391,13 @@ enum mc_target_type {
>  	MC_TARGET_NONE = 0,
>  	MC_TARGET_PAGE,
>  	MC_TARGET_SWAP,
> +	MC_TARGET_DEVICE,
>  };
>  
>  static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  						unsigned long addr, pte_t ptent)
>  {
> -	struct page *page = vm_normal_page(vma, addr, ptent);
> +	struct page *page = _vm_normal_page(vma, addr, ptent, true);
>  
>  	if (!page || !page_mapped(page))
>  		return NULL;
> @@ -4407,13 +4408,20 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  		if (!(mc.flags & MOVE_FILE))
>  			return NULL;
>  	}
> -	if (!get_page_unless_zero(page))
> +	if (is_device_public_page(page)) {
> +		/*
> +		 * MEMORY_DEVICE_PUBLIC means ZONE_DEVICE page and which have a
> +		 * refcount of 1 when free (unlike normal page)
> +		 */
> +		if (!page_ref_add_unless(page, 1, 1))
> +			return NULL;
> +	} else if (!get_page_unless_zero(page))
>  		return NULL;
>  
>  	return page;
>  }
>  
> -#ifdef CONFIG_SWAP
> +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
>  static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
>  			pte_t ptent, swp_entry_t *entry)
>  {
> @@ -4422,6 +4430,23 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
>  
>  	if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent))
>  		return NULL;
> +
> +	/*
> +	 * Handle MEMORY_DEVICE_PRIVATE which are ZONE_DEVICE page belonging to
> +	 * a device and because they are not accessible by CPU they are store
> +	 * as special swap entry in the CPU page table.
> +	 */
> +	if (is_device_private_entry(ent)) {
> +		page = device_private_entry_to_page(ent);
> +		/*
> +		 * MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have
> +		 * a refcount of 1 when free (unlike normal page)
> +		 */
> +		if (!page_ref_add_unless(page, 1, 1))
> +			return NULL;
> +		return page;
> +	}
> +
>  	/*
>  	 * Because lookup_swap_cache() updates some statistics counter,
>  	 * we call find_get_page() with swapper_space directly.
> @@ -4582,6 +4607,8 @@ static int mem_cgroup_move_account(struct page *page,
>   *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
>   *     target for charge migration. if @target is not NULL, the entry is stored
>   *     in target->ent.
> + *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PUBLIC
> + *     or MEMORY_DEVICE_PRIVATE (so ZONE_DEVICE page and thus not on the lru).
>   *
>   * Called with pte lock held.
>   */
> @@ -4610,6 +4637,9 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
>  		 */
>  		if (page->mem_cgroup == mc.from) {
>  			ret = MC_TARGET_PAGE;
> +			if (is_device_private_page(page) ||
> +			    is_device_public_page(page))
> +				ret = MC_TARGET_DEVICE;
>  			if (target)
>  				target->page = page;
>  		}
> @@ -4669,6 +4699,11 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
>  
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> +		/*
> +		 * Note their can not be MC_TARGET_DEVICE for now as we do not
                        there
> +		 * support transparent huge page with MEMORY_DEVICE_PUBLIC or
> +		 * MEMORY_DEVICE_PRIVATE but this might change.

I am trying to remind myself why THP and MEMORY_DEVICE_* pages don't work well
together today, the driver could allocate a THP size set of pages and migrate it.
There are patches to do THP migration, not upstream yet. Could you remind me
of any other limitations?

Balbir Singh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-06-15  1:42 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-14 20:11 [HMM-CDM 0/5] Cache coherent device memory (CDM) with HMM Jérôme Glisse
2017-06-14 20:11 ` [HMM-CDM 1/5] mm/device-public-memory: device memory cache coherent with CPU Jérôme Glisse
2017-06-14 20:11 ` [HMM-CDM 2/5] mm/hmm: add new helper to hotplug CDM memory region Jérôme Glisse
2017-06-15  4:28   ` Balbir Singh
2017-06-14 20:11 ` [HMM-CDM 3/5] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-06-15  3:31   ` Balbir Singh
2017-06-15 15:35     ` Jerome Glisse
2017-06-14 20:11 ` [HMM-CDM 4/5] mm/memcontrol: support MEMORY_DEVICE_PRIVATE and MEMORY_DEVICE_PUBLIC Jérôme Glisse
2017-06-15  1:41   ` Balbir Singh [this message]
2017-06-15  2:04     ` Jerome Glisse
2017-06-15  3:10       ` Balbir Singh
2017-06-14 20:11 ` [HMM-CDM 5/5] mm/hmm: simplify kconfig and enable HMM and DEVICE_PUBLIC for ppc64 Jérôme Glisse
2017-06-14 23:10   ` John Hubbard
2017-06-15  2:09     ` Jerome Glisse
2017-06-15  3:15       ` John Hubbard
2017-06-15  1:46   ` Balbir Singh
2017-06-15  2:07     ` Jerome Glisse
2017-06-15  2:59       ` Balbir Singh
2017-06-14 21:20 ` [HMM-CDM 0/5] Cache coherent device memory (CDM) with HMM Dave Hansen
2017-06-14 21:38   ` Jerome Glisse
2017-06-14 21:58     ` Dave Hansen
2017-06-14 22:07       ` Benjamin Herrenschmidt
2017-06-14 23:40       ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170615114159.11a1eece@firefly.ozlabs.ibm.com \
    --to=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dnellans@nvidia.com \
    --cc=hannes@cmpxchg.org \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).