All of lore.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, Dave Hansen <dave@sr71.net>,
	linux-nvdimm@lists.01.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Stephen Bates <Stephen.Bates@pmcs.com>
Subject: Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
Date: Fri, 2 Oct 2015 15:21:37 -0600	[thread overview]
Message-ID: <20151002212137.GB30448@deltatee.com> (raw)
In-Reply-To: <20150923044227.36490.99741.stgit@dwillia2-desk3.jf.intel.com>

Hi Dan,

We've been doing some experimenting and testing with this patchset.
Specifically, we are trying to use you're ZONE_DEVICE work to enable
peer to peer PCIe transfers. This is actually working pretty well
(though we're still testing and working through some things).

However, we've found a couple of issues:

On Wed, Sep 23, 2015 at 12:42:27AM -0400, Dan Williams wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3d6baa7d4534..20097e7b679a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -49,12 +49,16 @@ struct page {
>  					 * updated asynchronously */
>  	union {
>  		struct address_space *mapping;	/* If low bit clear, points to
> -						 * inode address_space, or NULL.
> +						 * inode address_space, unless
> +						 * the page is in ZONE_DEVICE
> +						 * then it points to its parent
> +						 * dev_pagemap, otherwise NULL.
>  						 * If page mapped as anonymous
>  						 * memory, low bit is set, and
>  						 * it points to anon_vma object:
>  						 * see PAGE_MAPPING_ANON below.
>  						 */
> +		struct dev_pagemap *pgmap;
>  		void *s_mem;			/* slab first object */
>  	};


When you add to this union and overide the mapping value, we see bugs
in calls to set_page_dirty when it tries to dereference mapping. I believe
a change to page_mapping is required such as the patch that's at the end of
this email.


> diff --git a/mm/gup.c b/mm/gup.c
> index a798293fc648..1064e9a489a4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -98,7 +98,16 @@ retry:
>  	}
>
>  	page = vm_normal_page(vma, address, pte);
> -	if (unlikely(!page)) {
> +	if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
> +		/*
> +		 * Only return device mapping pages in the FOLL_GET case since
> +		 * they are only valid while holding the pgmap reference.
> +		 */
> +		if (get_dev_pagemap(pte_pfn(pte), NULL))
> +			page = pte_page(pte);
> +		else
> +			goto no_page;
> +	} else if (unlikely(!page)) {

I've found that if a driver creates a ZONE_DEVICE mapping but doesn't
create the pagemap (using devm_register_pagemap) then the get_user_pages code
will go into an infinite loop. I'm not really sure if this as an issue or
not but it seems a bit undesirable for a buggy driver to be able to cause this.

My thoughts are that either devm_register_pagemap needs to be done by
devm_memremap_pages so a driver cannot use one without the other,
or the GUP code needs to return EFAULT if no pagemap was registered so
it doesn't loop forever.

Thanks!

Logan



diff --git a/mm/util.c b/mm/util.c
index 68ff8a5..19af683 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -368,6 +368,9 @@ struct address_space *page_mapping(struct page *page)
 		return swap_address_space(entry);
 	}

+	if (unlikely(is_zone_device_page(page)))
+		return NULL;
+
 	mapping = (unsigned long)page->mapping;
 	if (mapping & PAGE_MAPPING_FLAGS)
 		return NULL;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Logan Gunthorpe <logang@deltatee.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: akpm@linux-foundation.org, Dave Hansen <dave@sr71.net>,
	linux-nvdimm@ml01.01.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org,
	Matthew Wilcox <willy@linux.intel.com>,
	Ross Zwisler <ross.zwisler@linux.intel.com>,
	Stephen Bates <Stephen.Bates@pmcs.com>
Subject: Re: [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup
Date: Fri, 2 Oct 2015 15:21:37 -0600	[thread overview]
Message-ID: <20151002212137.GB30448@deltatee.com> (raw)
In-Reply-To: <20150923044227.36490.99741.stgit@dwillia2-desk3.jf.intel.com>

Hi Dan,

We've been doing some experimenting and testing with this patchset.
Specifically, we are trying to use you're ZONE_DEVICE work to enable
peer to peer PCIe transfers. This is actually working pretty well
(though we're still testing and working through some things).

However, we've found a couple of issues:

On Wed, Sep 23, 2015 at 12:42:27AM -0400, Dan Williams wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 3d6baa7d4534..20097e7b679a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -49,12 +49,16 @@ struct page {
>  					 * updated asynchronously */
>  	union {
>  		struct address_space *mapping;	/* If low bit clear, points to
> -						 * inode address_space, or NULL.
> +						 * inode address_space, unless
> +						 * the page is in ZONE_DEVICE
> +						 * then it points to its parent
> +						 * dev_pagemap, otherwise NULL.
>  						 * If page mapped as anonymous
>  						 * memory, low bit is set, and
>  						 * it points to anon_vma object:
>  						 * see PAGE_MAPPING_ANON below.
>  						 */
> +		struct dev_pagemap *pgmap;
>  		void *s_mem;			/* slab first object */
>  	};


When you add to this union and overide the mapping value, we see bugs
in calls to set_page_dirty when it tries to dereference mapping. I believe
a change to page_mapping is required such as the patch that's at the end of
this email.


> diff --git a/mm/gup.c b/mm/gup.c
> index a798293fc648..1064e9a489a4 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -98,7 +98,16 @@ retry:
>  	}
>
>  	page = vm_normal_page(vma, address, pte);
> -	if (unlikely(!page)) {
> +	if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
> +		/*
> +		 * Only return device mapping pages in the FOLL_GET case since
> +		 * they are only valid while holding the pgmap reference.
> +		 */
> +		if (get_dev_pagemap(pte_pfn(pte), NULL))
> +			page = pte_page(pte);
> +		else
> +			goto no_page;
> +	} else if (unlikely(!page)) {

I've found that if a driver creates a ZONE_DEVICE mapping but doesn't
create the pagemap (using devm_register_pagemap) then the get_user_pages code
will go into an infinite loop. I'm not really sure if this as an issue or
not but it seems a bit undesirable for a buggy driver to be able to cause this.

My thoughts are that either devm_register_pagemap needs to be done by
devm_memremap_pages so a driver cannot use one without the other,
or the GUP code needs to return EFAULT if no pagemap was registered so
it doesn't loop forever.

Thanks!

Logan



diff --git a/mm/util.c b/mm/util.c
index 68ff8a5..19af683 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -368,6 +368,9 @@ struct address_space *page_mapping(struct page *page)
 		return swap_address_space(entry);
 	}

+	if (unlikely(is_zone_device_page(page)))
+		return NULL;
+
 	mapping = (unsigned long)page->mapping;
 	if (mapping & PAGE_MAPPING_FLAGS)
 		return NULL;

  reply	other threads:[~2015-10-02 21:21 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-23  4:41 [PATCH 00/15] get_user_pages() for dax mappings Dan Williams
2015-09-23  4:41 ` Dan Williams
2015-09-23  4:41 ` [PATCH 01/15] avr32: convert to asm-generic/memory_model.h Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-24 15:10   ` Christoph Hellwig
2015-09-24 15:10     ` Christoph Hellwig
2015-09-26  0:36     ` Dan Williams
2015-09-26  0:36       ` Dan Williams
2015-09-26 20:10       ` Christoph Hellwig
2015-09-26 20:10         ` Christoph Hellwig
2015-09-28 18:44         ` Luck, Tony
2015-09-28 18:44           ` Luck, Tony
2015-09-23  4:41 ` [PATCH 02/15] hugetlb: fix compile error on tile Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-23  4:41 ` [PATCH 03/15] frv: fix compiler warning from definition of __pmd() Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-23  4:41 ` [PATCH 04/15] x86, mm: quiet arch_add_memory() Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-24 15:10   ` Christoph Hellwig
2015-09-24 15:10     ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 05/15] pmem: kill memremap_pmem() Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-24 15:11   ` Christoph Hellwig
2015-09-24 15:11     ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 06/15] devm_memunmap: use devres_release() Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-24 15:13   ` Christoph Hellwig
2015-09-24 15:13     ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 07/15] devm_memremap: convert to return ERR_PTR Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-24 15:13   ` Christoph Hellwig
2015-09-24 15:13     ` Christoph Hellwig
2015-09-23  4:41 ` [PATCH 08/15] block, dax, pmem: reference counting infrastructure Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-23  4:41   ` Dan Williams
2015-09-24 15:15   ` Christoph Hellwig
2015-09-24 15:15     ` Christoph Hellwig
2015-09-25  0:03     ` Dan Williams
2015-09-25  0:03       ` Dan Williams
2015-09-25 11:32       ` Christoph Hellwig
2015-09-25 11:32         ` Christoph Hellwig
2015-09-25 21:08         ` Williams, Dan J
2015-09-25 21:08           ` Williams, Dan J
2015-09-23  4:42 ` [PATCH 09/15] block, pmem: fix null pointer de-reference on shutdown, check for queue death Dan Williams
2015-09-23  4:42   ` Dan Williams
2015-09-23  4:42 ` [PATCH 10/15] block, dax: fix lifetime of in-kernel dax mappings Dan Williams
2015-09-23  4:42   ` Dan Williams
2015-10-07 22:56   ` Logan Gunthorpe
2015-10-07 22:56     ` Logan Gunthorpe
2015-10-09 21:12     ` Dan Williams
2015-10-09 21:12       ` Dan Williams
2015-09-23  4:42 ` [PATCH 11/15] mm, dax, pmem: introduce __pfn_t Dan Williams
2015-09-23  4:42   ` Dan Williams
2015-09-23 16:02   ` Dave Hansen
2015-09-23 16:02     ` Dave Hansen
2015-09-23 23:36     ` Williams, Dan J
2015-09-23 23:36       ` Williams, Dan J
2015-09-23  4:42 ` [PATCH 12/15] mm, dax, gpu: convert vm_insert_mixed to __pfn_t, introduce _PAGE_DEVMAP Dan Williams
2015-09-23  4:42   ` Dan Williams
2015-09-23 13:47   ` Geert Uytterhoeven
2015-09-23 13:47     ` Geert Uytterhoeven
2015-09-23 13:47     ` Geert Uytterhoeven
2015-09-23 16:59     ` Dan Williams
2015-09-23 16:59       ` Dan Williams
2015-09-23  4:42 ` [PATCH 13/15] mm, dax: convert vmf_insert_pfn_pmd() to __pfn_t Dan Williams
2015-09-23  4:42   ` Dan Williams
2015-09-23  4:42 ` [PATCH 14/15] mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup Dan Williams
2015-09-23  4:42   ` Dan Williams
2015-10-02 21:21   ` Logan Gunthorpe [this message]
2015-10-02 21:21     ` Logan Gunthorpe
2015-10-02 21:53     ` Dan Williams
2015-10-02 21:53       ` Dan Williams
2015-10-02 22:14       ` Logan Gunthorpe
2015-10-02 22:14         ` Logan Gunthorpe
2015-10-02 22:14         ` Logan Gunthorpe
2015-10-02 22:42       ` Logan Gunthorpe
2015-10-02 22:42         ` Logan Gunthorpe
2015-10-02 22:42         ` Logan Gunthorpe
2015-10-02 22:55         ` Dan Williams
2015-10-02 22:55           ` Dan Williams
2015-10-02 22:55           ` Dan Williams
2015-09-23  4:42 ` [PATCH 15/15] mm, x86: get_user_pages() for dax mappings Dan Williams
2015-09-23  4:42   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151002212137.GB30448@deltatee.com \
    --to=logang@deltatee.com \
    --cc=Stephen.Bates@pmcs.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave@sr71.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=ross.zwisler@linux.intel.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.