All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: john.hubbard@gmail.com
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz>
Subject: Re: [PATCH v2 6/6] mm: track gup pages with page->dma_pinned_* fields
Date: Mon, 12 Nov 2018 14:58:11 +0100	[thread overview]
Message-ID: <20181112135811.GF7175@quack2.suse.cz> (raw)
In-Reply-To: <20181110085041.10071-7-jhubbard@nvidia.com>


Just as a side note, can you please CC me on the whole series next time?
Because this time I had to look up e.g. the introductory email in the
mailing list... Thanks!

On Sat 10-11-18 00:50:41, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> This patch sets and restores the new page->dma_pinned_flags and
> page->dma_pinned_count fields, but does not actually use them for
> anything yet.
> 
> In order to use these fields at all, the page must be removed from
> any LRU list that it's on. The patch also adds some precautions that
> prevent the page from getting moved back onto an LRU, once it is
> in this state.
> 
> This is in preparation to fix some problems that came up when using
> devices (NICs, GPUs, for example) that set up direct access to a chunk
> of system (CPU) memory, so that they can DMA to/from that memory.
> 
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jan Kara <jack@suse.cz>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/mm.h | 19 +++++----------
>  mm/gup.c           | 55 +++++++++++++++++++++++++++++++++++++++++--
>  mm/memcontrol.c    |  8 +++++++
>  mm/swap.c          | 58 ++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 125 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 09fbb2c81aba..6c64b1e0b777 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -950,6 +950,10 @@ static inline void put_page(struct page *page)
>  {
>  	page = compound_head(page);
>  
> +	VM_BUG_ON_PAGE(PageDmaPinned(page) &&
> +		       page_ref_count(page) <
> +				atomic_read(&page->dma_pinned_count),
> +		       page);
>  	/*
>  	 * For devmap managed pages we need to catch refcount transition from
>  	 * 2 to 1, when refcount reach one it means the page is free and we
> @@ -964,21 +968,10 @@ static inline void put_page(struct page *page)
>  }
>  
>  /*
> - * put_user_page() - release a page that had previously been acquired via
> - * a call to one of the get_user_pages*() functions.
> - *
>   * Pages that were pinned via get_user_pages*() must be released via
> - * either put_user_page(), or one of the put_user_pages*() routines
> - * below. This is so that eventually, pages that are pinned via
> - * get_user_pages*() can be separately tracked and uniquely handled. In
> - * particular, interactions with RDMA and filesystems need special
> - * handling.
> + * one of these put_user_pages*() routines:
>   */
> -static inline void put_user_page(struct page *page)
> -{
> -	put_page(page);
> -}
> -
> +void put_user_page(struct page *page);
>  void put_user_pages_dirty(struct page **pages, unsigned long npages);
>  void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
>  void put_user_pages(struct page **pages, unsigned long npages);
> diff --git a/mm/gup.c b/mm/gup.c
> index 55a41dee0340..ec1b26591532 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -25,6 +25,50 @@ struct follow_page_context {
>  	unsigned int page_mask;
>  };
>  
> +static void pin_page_for_dma(struct page *page)
> +{
> +	int ret = 0;
> +	struct zone *zone;
> +
> +	page = compound_head(page);
> +	zone = page_zone(page);
> +
> +	spin_lock(zone_gup_lock(zone));

A think you'll need irqsafe lock here as get_user_pages_fast() can get
called from interrupt context in some cases. And so can put_user_page()...

<snip>

> +/*
> + * put_user_page() - release a page that had previously been acquired via
> + * a call to one of the get_user_pages*() functions.
> + *
> + * Usage: Pages that were pinned via get_user_pages*() must be released via
> + * either put_user_page(), or one of the put_user_pages*() routines
> + * below. This is so that eventually, pages that are pinned via
> + * get_user_pages*() can be separately tracked and uniquely handled. In
> + * particular, interactions with RDMA and filesystems need special
> + * handling.
> + */
> +void put_user_page(struct page *page)
> +{
> +	struct zone *zone = page_zone(page);
> +
> +	page = compound_head(page);
> +
> +	if (atomic_dec_and_test(&page->dma_pinned_count)) {
> +		spin_lock(zone_gup_lock(zone));
> +		/* Re-check while holding the lock, because
> +		 * pin_page_for_dma() or get_page() may have snuck in right
> +		 * after the atomic_dec_and_test, and raised the count
> +		 * above zero again. If so, just leave the flag set. And
> +		 * because the atomic_dec_and_test above already got the
> +		 * accounting correct, no other action is required.
> +		 */
> +		VM_BUG_ON_PAGE(PageLRU(page), page);
> +		VM_BUG_ON_PAGE(!PageDmaPinned(page), page);
> +
> +		if (atomic_read(&page->dma_pinned_count) == 0) {

We have atomic_dec_and_lock[_irqsave]() exactly for constructs like this.

> +			ClearPageDmaPinned(page);
> +
> +			if (PageDmaPinnedWasLru(page)) {
> +				ClearPageDmaPinnedWasLru(page);
> +				putback_lru_page(page);
> +			}
> +		}
> +
> +		spin_unlock(zone_gup_lock(zone));
> +	}
> +
> +	put_page(page);
> +}
> +EXPORT_SYMBOL(put_user_page);
> +

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2018-11-12 13:58 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-10  8:50 [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages john.hubbard
2018-11-10  8:50 ` [PATCH v2 1/6] mm/gup: finish consolidating error handling john.hubbard
2018-11-12 15:41   ` Keith Busch
2018-11-12 16:14     ` Dan Williams
2018-11-15  0:45       ` John Hubbard
2018-11-10  8:50 ` [PATCH v2 2/6] mm: introduce put_user_page*(), placeholder versions john.hubbard
2018-11-11 14:10   ` Mike Rapoport
2018-11-10  8:50 ` [PATCH v2 3/6] infiniband/mm: convert put_page() to put_user_page*() john.hubbard
2018-11-10  8:50 ` [PATCH v2 4/6] mm: introduce page->dma_pinned_flags, _count john.hubbard
2018-11-10  8:50 ` [PATCH v2 5/6] mm: introduce zone_gup_lock, for dma-pinned pages john.hubbard
2018-11-10  8:50 ` [PATCH v2 6/6] mm: track gup pages with page->dma_pinned_* fields john.hubbard
2018-11-12 13:58   ` Jan Kara [this message]
2018-11-15  6:28   ` [LKP] [mm] 0e9755bfa2: kernel_BUG_at_include/linux/mm.h kernel test robot
2018-11-15  6:28     ` kernel test robot
2018-11-19 18:57 ` [PATCH v2 0/6] RFC: gup+dma: tracking dma-pinned pages Tom Talpey
2018-11-21  6:09   ` John Hubbard
2018-11-21  6:09     ` John Hubbard
2018-11-21 16:49     ` Tom Talpey
2018-11-21 22:06       ` John Hubbard
2018-11-21 22:06         ` John Hubbard
2018-11-28  1:21         ` Tom Talpey
2018-11-28  2:52           ` John Hubbard
2018-11-28  2:52             ` John Hubbard
2018-11-28 13:59             ` Tom Talpey
2018-11-30  1:39               ` John Hubbard
2018-11-30  1:39                 ` John Hubbard
2018-11-30  2:18                 ` Tom Talpey
2018-11-30  2:21                   ` John Hubbard
2018-11-30  2:21                     ` John Hubbard
2018-11-30  2:30                     ` Tom Talpey
2018-11-30  3:00                       ` John Hubbard
2018-11-30  3:00                         ` John Hubbard
2018-11-30  3:14                         ` Tom Talpey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181112135811.GF7175@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=dan.j.williams@intel.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=john.hubbard@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.