Linux RDMA and InfiniBand development

Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed

* Re: [PATCH 0/3] iopmem : A block device for PCIe memory
From: Christoph Hellwig @ 2016-10-27 12:32 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	sbates-Rgftl6RXld5BDgjK7y7TUQ, Raj, Ashok,
	haggaie-VPRAkNaXOzVWk0Htik3J/w, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	Jonathan Corbet, Dave Chinner,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jim.macdonald-FgSLVYC75IpWk0Htik3J/w, Stephen Bates,
	Christoph Hellwig, Linux MM, linux-block-u79uwXL29TY76Z2rM5mHXA,
	Jens Axboe, David Woodhouse
In-Reply-To: <76e957c9-8002-5a46-8111-269bb0401718-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

On Thu, Oct 27, 2016 at 01:22:49PM +0300, Sagi Grimberg wrote:
> Christoph, did you manage to leap to the future and solve the
> RDMA persistency hole? :)
> 
> e.g. what happens with O_DSYNC in this model? Or you did
> a message exchange for commits?

Yes, pNFS calls this the layoutcommit.  That being said once we get a RDMA
commit or flush operation we could easily make the layoutcommit optional
for some operations.  There already is a precedence for the in the
flexfiles layout specification.

^ permalink raw reply

* Re: [PATCH 09/12] SRP transport: Move queuecommand() wait code to SCSI core
From: Sagi Grimberg @ 2016-10-27 12:20 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <e86cdaf9-6305-d2cb-6068-0a050c023d73@sandisk.com>

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH 11/12] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Sagi Grimberg @ 2016-10-27 12:19 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <805e1911-cd10-0563-c76b-256d76054b08@sandisk.com>

Looks good,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH 10/12] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Sagi Grimberg @ 2016-10-27 12:19 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: linux-block@vger.kernel.org, James Bottomley, Martin K. Petersen,
	Mike Snitzer, linux-rdma@vger.kernel.org, Ming Lei,
	linux-nvme@lists.infradead.org, Keith Busch, Doug Ledford,
	linux-scsi@vger.kernel.org, Laurence Oberman, Christoph Hellwig
In-Reply-To: <0cd77719-1f11-d5c3-3186-1c7c3cfd6886@sandisk.com>

Thanks for moving it,

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Sagi Grimberg @ 2016-10-27 12:16 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <32b0bd88-cb8e-754a-89fc-b1825778b05a-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Looks good,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 02/12] blk-mq: Introduce blk_mq_hctx_stopped()
From: Sagi Grimberg @ 2016-10-27 12:15 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lei,
	Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <0de50789-e3b7-0a07-73c1-4fb87b1f957e-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

Looks fine,

Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 01/12] blk-mq: Do not invoke .queue_rq() for a stopped queue
From: Sagi Grimberg @ 2016-10-27 12:14 UTC (permalink / raw)
  To: Bart Van Assche, Jens Axboe
  Cc: Christoph Hellwig, James Bottomley, Martin K. Petersen,
	Mike Snitzer, Doug Ledford, Keith Busch, Ming Lin,
	Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <1debcf7f-c950-308b-d297-3e48a77e08d7@sandisk.com>

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>

^ permalink raw reply

* Re: [PATCH 1/2] mm: add locked parameter to get_user_pages_remote()
From: Michal Hocko @ 2016-10-27 10:59 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel
In-Reply-To: <20161027105527.GG6454@dhcp22.suse.cz>

On Thu 27-10-16 12:55:27, Michal Hocko wrote:
> On Thu 27-10-16 10:51:40, Lorenzo Stoakes wrote:
> > This patch adds a int *locked parameter to get_user_pages_remote() to allow
> > VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
> > 
> > Taking into account the previous adjustments to get_user_pages*() functions
> > allowing for the passing of gup_flags, we are now in a position where
> > __get_user_pages_unlocked() need only be exported for his ability to allow
> > VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport
> > __get_user_pages_unlocked() as well as allowing for future flexibility in the
> > use of get_user_pages_remote().
> 
> I would also add that this shouldn't introduce any functional change.

Forgot to mention that this also opens doors to change other g_u_p_r
callers to allow FAULT_RETRY logic.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 2/2] mm: unexport __get_user_pages_unlocked()
From: Michal Hocko @ 2016-10-27 10:57 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel
In-Reply-To: <20161027095141.2569-3-lstoakes@gmail.com>

On Thu 27-10-16 10:51:41, Lorenzo Stoakes wrote:
> This patch unexports the low-level __get_user_pages_unlocked() function and
> replaces invocations with calls to more appropriate higher-level functions.
> 
> In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with
> get_user_pages_unlocked() since we can now pass gup_flags.
> 
> In async_pf_execute() and process_vm_rw_single_vec() we need to pass different
> tsk, mm arguments so get_user_pages_remote() is the sane replacement in these
> cases (having added manual acquisition and release of mmap_sem.)
> 
> Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
> flag. However, this flag was originally silently dropped by 1e9877902dc7e
> ("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been
> unintentional and reintroducing it is therefore not an issue.

Looks good to me.
 
> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  include/linux/mm.h     |  3 ---
>  mm/gup.c               |  8 ++++----
>  mm/nommu.c             |  7 +++----
>  mm/process_vm_access.c | 12 ++++++++----
>  virt/kvm/async_pf.c    | 10 +++++++---
>  virt/kvm/kvm_main.c    |  5 ++---
>  6 files changed, 24 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cc15445..7b2d14e 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1280,9 +1280,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
>  			    struct vm_area_struct **vmas);
>  long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
>  		    unsigned int gup_flags, struct page **pages, int *locked);
> -long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> -			       unsigned long start, unsigned long nr_pages,
> -			       struct page **pages, unsigned int gup_flags);
>  long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>  		    struct page **pages, unsigned int gup_flags);
>  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> diff --git a/mm/gup.c b/mm/gup.c
> index 0567851..8028af1 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -866,9 +866,10 @@ EXPORT_SYMBOL(get_user_pages_locked);
>   * caller if required (just like with __get_user_pages). "FOLL_GET"
>   * is set implicitly if "pages" is non-NULL.
>   */
> -__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> -					       unsigned long start, unsigned long nr_pages,
> -					       struct page **pages, unsigned int gup_flags)
> +static __always_inline long __get_user_pages_unlocked(struct task_struct *tsk,
> +		struct mm_struct *mm, unsigned long start,
> +		unsigned long nr_pages, struct page **pages,
> +		unsigned int gup_flags)
>  {
>  	long ret;
>  	int locked = 1;
> @@ -880,7 +881,6 @@ __always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct m
>  		up_read(&mm->mmap_sem);
>  	return ret;
>  }
> -EXPORT_SYMBOL(__get_user_pages_unlocked);
> 
>  /*
>   * get_user_pages_unlocked() is suitable to replace the form:
> diff --git a/mm/nommu.c b/mm/nommu.c
> index 8b8faaf..669437b 100644
> --- a/mm/nommu.c
> +++ b/mm/nommu.c
> @@ -176,9 +176,9 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
>  }
>  EXPORT_SYMBOL(get_user_pages_locked);
> 
> -long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> -			       unsigned long start, unsigned long nr_pages,
> -			       struct page **pages, unsigned int gup_flags)
> +static long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
> +				      unsigned long start, unsigned long nr_pages,
> +			              struct page **pages, unsigned int gup_flags)
>  {
>  	long ret;
>  	down_read(&mm->mmap_sem);
> @@ -187,7 +187,6 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
>  	up_read(&mm->mmap_sem);
>  	return ret;
>  }
> -EXPORT_SYMBOL(__get_user_pages_unlocked);
> 
>  long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
>  			     struct page **pages, unsigned int gup_flags)
> diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
> index be8dc8d..84d0c7e 100644
> --- a/mm/process_vm_access.c
> +++ b/mm/process_vm_access.c
> @@ -88,7 +88,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
>  	ssize_t rc = 0;
>  	unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
>  		/ sizeof(struct pages *);
> -	unsigned int flags = FOLL_REMOTE;
> +	unsigned int flags = 0;
> 
>  	/* Work out address and page range required */
>  	if (len == 0)
> @@ -100,15 +100,19 @@ static int process_vm_rw_single_vec(unsigned long addr,
> 
>  	while (!rc && nr_pages && iov_iter_count(iter)) {
>  		int pages = min(nr_pages, max_pages_per_loop);
> +		int locked = 1;
>  		size_t bytes;
> 
>  		/*
>  		 * Get the pages we're interested in.  We must
> -		 * add FOLL_REMOTE because task/mm might not
> +		 * access remotely because task/mm might not
>  		 * current/current->mm
>  		 */
> -		pages = __get_user_pages_unlocked(task, mm, pa, pages,
> -						  process_pages, flags);
> +		down_read(&mm->mmap_sem);
> +		pages = get_user_pages_remote(task, mm, pa, pages, flags,
> +					      process_pages, NULL, &locked);
> +		if (locked)
> +			up_read(&mm->mmap_sem);
>  		if (pages <= 0)
>  			return -EFAULT;
> 
> diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
> index 8035cc1..dab8b19 100644
> --- a/virt/kvm/async_pf.c
> +++ b/virt/kvm/async_pf.c
> @@ -76,16 +76,20 @@ static void async_pf_execute(struct work_struct *work)
>  	struct kvm_vcpu *vcpu = apf->vcpu;
>  	unsigned long addr = apf->addr;
>  	gva_t gva = apf->gva;
> +	int locked = 1;
> 
>  	might_sleep();
> 
>  	/*
>  	 * This work is run asynchromously to the task which owns
>  	 * mm and might be done in another context, so we must
> -	 * use FOLL_REMOTE.
> +	 * access remotely.
>  	 */
> -	__get_user_pages_unlocked(NULL, mm, addr, 1, NULL,
> -			FOLL_WRITE | FOLL_REMOTE);
> +	down_read(&mm->mmap_sem);
> +	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
> +			&locked);
> +	if (locked)
> +		up_read(&mm->mmap_sem);
> 
>  	kvm_async_page_present_sync(vcpu, apf);
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2907b7b..c45d951 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1415,13 +1415,12 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
>  		npages = get_user_page_nowait(addr, write_fault, page);
>  		up_read(&current->mm->mmap_sem);
>  	} else {
> -		unsigned int flags = FOLL_TOUCH | FOLL_HWPOISON;
> +		unsigned int flags = FOLL_HWPOISON;
> 
>  		if (write_fault)
>  			flags |= FOLL_WRITE;
> 
> -		npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
> -						   page, flags);
> +		npages = get_user_pages_unlocked(addr, 1, page, flags);
>  	}
>  	if (npages != 1)
>  		return npages;
> --
> 2.10.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 1/2] mm: add locked parameter to get_user_pages_remote()
From: Michal Hocko @ 2016-10-27 10:55 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: linux-mm, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel
In-Reply-To: <20161027095141.2569-2-lstoakes@gmail.com>

On Thu 27-10-16 10:51:40, Lorenzo Stoakes wrote:
> This patch adds a int *locked parameter to get_user_pages_remote() to allow
> VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
> 
> Taking into account the previous adjustments to get_user_pages*() functions
> allowing for the passing of gup_flags, we are now in a position where
> __get_user_pages_unlocked() need only be exported for his ability to allow
> VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport
> __get_user_pages_unlocked() as well as allowing for future flexibility in the
> use of get_user_pages_remote().

I would also add that this shouldn't introduce any functional change.

> Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gem.c   |  2 +-
>  drivers/gpu/drm/i915/i915_gem_userptr.c |  2 +-
>  drivers/infiniband/core/umem_odp.c      |  2 +-
>  fs/exec.c                               |  2 +-
>  include/linux/mm.h                      |  2 +-
>  kernel/events/uprobes.c                 |  4 ++--
>  mm/gup.c                                | 12 ++++++++----
>  mm/memory.c                             |  2 +-
>  security/tomoyo/domain.c                |  2 +-
>  9 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> index 0370b84..0c69a97f 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> @@ -763,7 +763,7 @@ static struct page **etnaviv_gem_userptr_do_get_pages(
>  	down_read(&mm->mmap_sem);
>  	while (pinned < npages) {
>  		ret = get_user_pages_remote(task, mm, ptr, npages - pinned,
> -					    flags, pvec + pinned, NULL);
> +					    flags, pvec + pinned, NULL, NULL);
>  		if (ret < 0)
>  			break;
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index c6f780f..836b525 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -522,7 +522,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
>  					 obj->userptr.ptr + pinned * PAGE_SIZE,
>  					 npages - pinned,
>  					 flags,
> -					 pvec + pinned, NULL);
> +					 pvec + pinned, NULL, NULL);
>  				if (ret < 0)
>  					break;
> 
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index 1f0fe32..6b079a3 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -578,7 +578,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
>  		 */
>  		npages = get_user_pages_remote(owning_process, owning_mm,
>  				user_virt, gup_num_pages,
> -				flags, local_page_list, NULL);
> +				flags, local_page_list, NULL, NULL);
>  		up_read(&owning_mm->mmap_sem);
> 
>  		if (npages < 0)
> diff --git a/fs/exec.c b/fs/exec.c
> index 4e497b9..2cf049d 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -209,7 +209,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
>  	 * doing the exec and bprm->mm is the new process's mm.
>  	 */
>  	ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
> -			&page, NULL);
> +			&page, NULL, NULL);
>  	if (ret <= 0)
>  		return NULL;
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index a92c8d7..cc15445 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1274,7 +1274,7 @@ extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
>  long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
>  			    unsigned long start, unsigned long nr_pages,
>  			    unsigned int gup_flags, struct page **pages,
> -			    struct vm_area_struct **vmas);
> +			    struct vm_area_struct **vmas, int *locked);
>  long get_user_pages(unsigned long start, unsigned long nr_pages,
>  			    unsigned int gup_flags, struct page **pages,
>  			    struct vm_area_struct **vmas);
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index f9ec9ad..215871b 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -301,7 +301,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
>  retry:
>  	/* Read the page with vaddr into memory */
>  	ret = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &old_page,
> -			&vma);
> +			&vma, NULL);
>  	if (ret <= 0)
>  		return ret;
> 
> @@ -1712,7 +1712,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
>  	 * essentially a kernel access to the memory.
>  	 */
>  	result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
> -			NULL);
> +			NULL, NULL);
>  	if (result < 0)
>  		return result;
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index ec4f827..0567851 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -920,6 +920,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>   *		only intends to ensure the pages are faulted in.
>   * @vmas:	array of pointers to vmas corresponding to each page.
>   *		Or NULL if the caller does not require them.
> + * @locked:	pointer to lock flag indicating whether lock is held and
> + *		subsequently whether VM_FAULT_RETRY functionality can be
> + *		utilised. Lock must initially be held.
>   *
>   * Returns number of pages pinned. This may be fewer than the number
>   * requested. If nr_pages is 0 or negative, returns 0. If no pages
> @@ -963,10 +966,10 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>  long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
>  		unsigned long start, unsigned long nr_pages,
>  		unsigned int gup_flags, struct page **pages,
> -		struct vm_area_struct **vmas)
> +		struct vm_area_struct **vmas, int *locked)
>  {
>  	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
> -				       NULL, false,
> +				       locked, true,
>  				       gup_flags | FOLL_TOUCH | FOLL_REMOTE);
>  }
>  EXPORT_SYMBOL(get_user_pages_remote);
> @@ -974,8 +977,9 @@ EXPORT_SYMBOL(get_user_pages_remote);
>  /*
>   * This is the same as get_user_pages_remote(), just with a
>   * less-flexible calling convention where we assume that the task
> - * and mm being operated on are the current task's.  We also
> - * obviously don't pass FOLL_REMOTE in here.
> + * and mm being operated on are the current task's and don't allow
> + * passing of a locked parameter.  We also obviously don't pass
> + * FOLL_REMOTE in here.
>   */
>  long get_user_pages(unsigned long start, unsigned long nr_pages,
>  		unsigned int gup_flags, struct page **pages,
> diff --git a/mm/memory.c b/mm/memory.c
> index e18c57b..2f3949b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3883,7 +3883,7 @@ static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
>  		struct page *page = NULL;
> 
>  		ret = get_user_pages_remote(tsk, mm, addr, 1,
> -				gup_flags, &page, &vma);
> +				gup_flags, &page, &vma, NULL);
>  		if (ret <= 0) {
>  #ifndef CONFIG_HAVE_IOREMAP_PROT
>  			break;
> diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
> index 682b73a..838ffa7 100644
> --- a/security/tomoyo/domain.c
> +++ b/security/tomoyo/domain.c
> @@ -881,7 +881,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
>  	 * the execve().
>  	 */
>  	if (get_user_pages_remote(current, bprm->mm, pos, 1,
> -				FOLL_FORCE, &page, NULL) <= 0)
> +				FOLL_FORCE, &page, NULL, NULL) <= 0)
>  		return false;
>  #else
>  	page = bprm->page[pos / PAGE_SIZE];
> --
> 2.10.1

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 0/3] iopmem : A block device for PCIe memory
From: Sagi Grimberg @ 2016-10-27 10:22 UTC (permalink / raw)
  To: Christoph Hellwig, Dave Chinner
  Cc: haggaie-VPRAkNaXOzVWk0Htik3J/w, sbates-Rgftl6RXld5BDgjK7y7TUQ,
	Raj, Ashok, Jonathan Corbet, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	jim.macdonald-FgSLVYC75IpWk0Htik3J/w, Stephen Bates,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/, Linux MM,
	linux-block-u79uwXL29TY76Z2rM5mHXA, Jens Axboe, David Woodhouse
In-Reply-To: <20161021095714.GA12209-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>


>> You do realise that local filesystems can silently change the
>> location of file data at any point in time, so there is no such
>> thing as a "stable mapping" of file data to block device addresses
>> in userspace?
>>
>> If you want remote access to the blocks owned and controlled by a
>> filesystem, then you need to use a filesystem with a remote locking
>> mechanism to allow co-ordinated, coherent access to the data in
>> those blocks. Anything else is just asking for ongoing, unfixable
>> filesystem corruption or data leakage problems (i.e.  security
>> issues).
>
> And at least for XFS we have such a mechanism :)  E.g. I have a
> prototype of a pNFS layout that uses XFS+DAX to allow clients to do
> RDMA directly to XFS files, with the same locking mechanism we use
> for the current block and scsi layout in xfs_pnfs.c.

Christoph, did you manage to leap to the future and solve the
RDMA persistency hole? :)

e.g. what happens with O_DSYNC in this model? Or you did
a message exchange for commits?

^ permalink raw reply

* Re: [PATCH 0/2] mm: unexport __get_user_pages_unlocked()
From: Lorenzo Stoakes @ 2016-10-27  9:54 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel
In-Reply-To: <20161027095141.2569-1-lstoakes@gmail.com>

On Thu, Oct 27, 2016 at 10:51:39AM +0100, Lorenzo Stoakes wrote:
> This patch series continues the cleanup of get_user_pages*() functions taking
> advantage of the fact we can now pass gup_flags as we please.

Note that this patch series has an unfortunate trivial dependency on my recent
'fix up get_user_pages* comments' patch which means this series applies against
-mmots but not mainline at this point in time.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* [PATCH 2/2] mm: unexport __get_user_pages_unlocked()
From: Lorenzo Stoakes @ 2016-10-27  9:51 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel,
	Lorenzo Stoakes
In-Reply-To: <20161027095141.2569-1-lstoakes@gmail.com>

This patch unexports the low-level __get_user_pages_unlocked() function and
replaces invocations with calls to more appropriate higher-level functions.

In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with
get_user_pages_unlocked() since we can now pass gup_flags.

In async_pf_execute() and process_vm_rw_single_vec() we need to pass different
tsk, mm arguments so get_user_pages_remote() is the sane replacement in these
cases (having added manual acquisition and release of mmap_sem.)

Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
flag. However, this flag was originally silently dropped by 1e9877902dc7e
("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been
unintentional and reintroducing it is therefore not an issue.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 include/linux/mm.h     |  3 ---
 mm/gup.c               |  8 ++++----
 mm/nommu.c             |  7 +++----
 mm/process_vm_access.c | 12 ++++++++----
 virt/kvm/async_pf.c    | 10 +++++++---
 virt/kvm/kvm_main.c    |  5 ++---
 6 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cc15445..7b2d14e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1280,9 +1280,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    struct vm_area_struct **vmas);
 long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
-long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-			       unsigned long start, unsigned long nr_pages,
-			       struct page **pages, unsigned int gup_flags);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
diff --git a/mm/gup.c b/mm/gup.c
index 0567851..8028af1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -866,9 +866,10 @@ EXPORT_SYMBOL(get_user_pages_locked);
  * caller if required (just like with __get_user_pages). "FOLL_GET"
  * is set implicitly if "pages" is non-NULL.
  */
-__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-					       unsigned long start, unsigned long nr_pages,
-					       struct page **pages, unsigned int gup_flags)
+static __always_inline long __get_user_pages_unlocked(struct task_struct *tsk,
+		struct mm_struct *mm, unsigned long start,
+		unsigned long nr_pages, struct page **pages,
+		unsigned int gup_flags)
 {
 	long ret;
 	int locked = 1;
@@ -880,7 +881,6 @@ __always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct m
 		up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL(__get_user_pages_unlocked);

 /*
  * get_user_pages_unlocked() is suitable to replace the form:
diff --git a/mm/nommu.c b/mm/nommu.c
index 8b8faaf..669437b 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -176,9 +176,9 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages_locked);

-long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-			       unsigned long start, unsigned long nr_pages,
-			       struct page **pages, unsigned int gup_flags)
+static long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+				      unsigned long start, unsigned long nr_pages,
+			              struct page **pages, unsigned int gup_flags)
 {
 	long ret;
 	down_read(&mm->mmap_sem);
@@ -187,7 +187,6 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 	up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL(__get_user_pages_unlocked);

 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 			     struct page **pages, unsigned int gup_flags)
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index be8dc8d..84d0c7e 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -88,7 +88,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
 	ssize_t rc = 0;
 	unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
 		/ sizeof(struct pages *);
-	unsigned int flags = FOLL_REMOTE;
+	unsigned int flags = 0;

 	/* Work out address and page range required */
 	if (len == 0)
@@ -100,15 +100,19 @@ static int process_vm_rw_single_vec(unsigned long addr,

 	while (!rc && nr_pages && iov_iter_count(iter)) {
 		int pages = min(nr_pages, max_pages_per_loop);
+		int locked = 1;
 		size_t bytes;

 		/*
 		 * Get the pages we're interested in.  We must
-		 * add FOLL_REMOTE because task/mm might not
+		 * access remotely because task/mm might not
 		 * current/current->mm
 		 */
-		pages = __get_user_pages_unlocked(task, mm, pa, pages,
-						  process_pages, flags);
+		down_read(&mm->mmap_sem);
+		pages = get_user_pages_remote(task, mm, pa, pages, flags,
+					      process_pages, NULL, &locked);
+		if (locked)
+			up_read(&mm->mmap_sem);
 		if (pages <= 0)
 			return -EFAULT;

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 8035cc1..dab8b19 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -76,16 +76,20 @@ static void async_pf_execute(struct work_struct *work)
 	struct kvm_vcpu *vcpu = apf->vcpu;
 	unsigned long addr = apf->addr;
 	gva_t gva = apf->gva;
+	int locked = 1;

 	might_sleep();

 	/*
 	 * This work is run asynchromously to the task which owns
 	 * mm and might be done in another context, so we must
-	 * use FOLL_REMOTE.
+	 * access remotely.
 	 */
-	__get_user_pages_unlocked(NULL, mm, addr, 1, NULL,
-			FOLL_WRITE | FOLL_REMOTE);
+	down_read(&mm->mmap_sem);
+	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+			&locked);
+	if (locked)
+		up_read(&mm->mmap_sem);

 	kvm_async_page_present_sync(vcpu, apf);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2907b7b..c45d951 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1415,13 +1415,12 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 		npages = get_user_page_nowait(addr, write_fault, page);
 		up_read(&current->mm->mmap_sem);
 	} else {
-		unsigned int flags = FOLL_TOUCH | FOLL_HWPOISON;
+		unsigned int flags = FOLL_HWPOISON;

 		if (write_fault)
 			flags |= FOLL_WRITE;

-		npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
-						   page, flags);
+		npages = get_user_pages_unlocked(addr, 1, page, flags);
 	}
 	if (npages != 1)
 		return npages;
--
2.10.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 1/2] mm: add locked parameter to get_user_pages_remote()
From: Lorenzo Stoakes @ 2016-10-27  9:51 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel,
	Lorenzo Stoakes
In-Reply-To: <20161027095141.2569-1-lstoakes@gmail.com>

This patch adds a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().

Taking into account the previous adjustments to get_user_pages*() functions
allowing for the passing of gup_flags, we are now in a position where
__get_user_pages_unlocked() need only be exported for his ability to allow
VM_FAULT_RETRY behaviour, this adjustment allows us to subsequently unexport
__get_user_pages_unlocked() as well as allowing for future flexibility in the
use of get_user_pages_remote().

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
 drivers/gpu/drm/etnaviv/etnaviv_gem.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  2 +-
 drivers/infiniband/core/umem_odp.c      |  2 +-
 fs/exec.c                               |  2 +-
 include/linux/mm.h                      |  2 +-
 kernel/events/uprobes.c                 |  4 ++--
 mm/gup.c                                | 12 ++++++++----
 mm/memory.c                             |  2 +-
 security/tomoyo/domain.c                |  2 +-
 9 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 0370b84..0c69a97f 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -763,7 +763,7 @@ static struct page **etnaviv_gem_userptr_do_get_pages(
 	down_read(&mm->mmap_sem);
 	while (pinned < npages) {
 		ret = get_user_pages_remote(task, mm, ptr, npages - pinned,
-					    flags, pvec + pinned, NULL);
+					    flags, pvec + pinned, NULL, NULL);
 		if (ret < 0)
 			break;

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index c6f780f..836b525 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -522,7 +522,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 					 obj->userptr.ptr + pinned * PAGE_SIZE,
 					 npages - pinned,
 					 flags,
-					 pvec + pinned, NULL);
+					 pvec + pinned, NULL, NULL);
 				if (ret < 0)
 					break;

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 1f0fe32..6b079a3 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -578,7 +578,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 		 */
 		npages = get_user_pages_remote(owning_process, owning_mm,
 				user_virt, gup_num_pages,
-				flags, local_page_list, NULL);
+				flags, local_page_list, NULL, NULL);
 		up_read(&owning_mm->mmap_sem);

 		if (npages < 0)
diff --git a/fs/exec.c b/fs/exec.c
index 4e497b9..2cf049d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -209,7 +209,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 	 * doing the exec and bprm->mm is the new process's mm.
 	 */
 	ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
-			&page, NULL);
+			&page, NULL, NULL);
 	if (ret <= 0)
 		return NULL;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a92c8d7..cc15445 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1274,7 +1274,7 @@ extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 			    unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
-			    struct vm_area_struct **vmas);
+			    struct vm_area_struct **vmas, int *locked);
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f9ec9ad..215871b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -301,7 +301,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
 retry:
 	/* Read the page with vaddr into memory */
 	ret = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &old_page,
-			&vma);
+			&vma, NULL);
 	if (ret <= 0)
 		return ret;

@@ -1712,7 +1712,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
 	 * essentially a kernel access to the memory.
 	 */
 	result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
-			NULL);
+			NULL, NULL);
 	if (result < 0)
 		return result;

diff --git a/mm/gup.c b/mm/gup.c
index ec4f827..0567851 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -920,6 +920,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
  *		only intends to ensure the pages are faulted in.
  * @vmas:	array of pointers to vmas corresponding to each page.
  *		Or NULL if the caller does not require them.
+ * @locked:	pointer to lock flag indicating whether lock is held and
+ *		subsequently whether VM_FAULT_RETRY functionality can be
+ *		utilised. Lock must initially be held.
  *
  * Returns number of pages pinned. This may be fewer than the number
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
@@ -963,10 +966,10 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, unsigned long nr_pages,
 		unsigned int gup_flags, struct page **pages,
-		struct vm_area_struct **vmas)
+		struct vm_area_struct **vmas, int *locked)
 {
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
-				       NULL, false,
+				       locked, true,
 				       gup_flags | FOLL_TOUCH | FOLL_REMOTE);
 }
 EXPORT_SYMBOL(get_user_pages_remote);
@@ -974,8 +977,9 @@ EXPORT_SYMBOL(get_user_pages_remote);
 /*
  * This is the same as get_user_pages_remote(), just with a
  * less-flexible calling convention where we assume that the task
- * and mm being operated on are the current task's.  We also
- * obviously don't pass FOLL_REMOTE in here.
+ * and mm being operated on are the current task's and don't allow
+ * passing of a locked parameter.  We also obviously don't pass
+ * FOLL_REMOTE in here.
  */
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 		unsigned int gup_flags, struct page **pages,
diff --git a/mm/memory.c b/mm/memory.c
index e18c57b..2f3949b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3883,7 +3883,7 @@ static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 		struct page *page = NULL;

 		ret = get_user_pages_remote(tsk, mm, addr, 1,
-				gup_flags, &page, &vma);
+				gup_flags, &page, &vma, NULL);
 		if (ret <= 0) {
 #ifndef CONFIG_HAVE_IOREMAP_PROT
 			break;
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 682b73a..838ffa7 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -881,7 +881,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
 	 * the execve().
 	 */
 	if (get_user_pages_remote(current, bprm->mm, pos, 1,
-				FOLL_FORCE, &page, NULL) <= 0)
+				FOLL_FORCE, &page, NULL, NULL) <= 0)
 		return false;
 #else
 	page = bprm->page[pos / PAGE_SIZE];
--
2.10.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* [PATCH 0/2] mm: unexport __get_user_pages_unlocked()
From: Lorenzo Stoakes @ 2016-10-27  9:51 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel

This patch series continues the cleanup of get_user_pages*() functions taking
advantage of the fact we can now pass gup_flags as we please.

It firstly adds an additional 'locked' parameter to get_user_pages_remote() to
allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary
as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec()
makes use of this and no other existing higher level function would allow it to
do so.

Secondly existing callers of __get_user_pages_unlocked() are replaced with the
appropriate higher-level replacement - get_user_pages_unlocked() if the current
task and memory descriptor are referenced, or get_user_pages_remote() if other
task/memory descriptors are referenced (having acquiring mmap_sem.)

Lorenzo Stoakes (2):
  mm: add locked parameter to get_user_pages_remote()
  mm: unexport __get_user_pages_unlocked()

 drivers/gpu/drm/etnaviv/etnaviv_gem.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  2 +-
 drivers/infiniband/core/umem_odp.c      |  2 +-
 fs/exec.c                               |  2 +-
 include/linux/mm.h                      |  5 +----
 kernel/events/uprobes.c                 |  4 ++--
 mm/gup.c                                | 20 ++++++++++++--------
 mm/memory.c                             |  2 +-
 mm/nommu.c                              |  7 +++----
 mm/process_vm_access.c                  | 12 ++++++++----
 security/tomoyo/domain.c                |  2 +-
 virt/kvm/async_pf.c                     | 10 +++++++---
 virt/kvm/kvm_main.c                     |  5 ++---
 13 files changed, 41 insertions(+), 34 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: A question regarding "multiple SGL"
From: Sagi Grimberg @ 2016-10-27  9:02 UTC (permalink / raw)
  To: Christoph Hellwig, Qiuxin (robert)
  Cc: Bart Van Assche, Jens Axboe,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	James Bottomley, Martin K. Petersen, Mike Snitzer,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ming Lei,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org,
	Keith Busch, Doug Ledford,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Laurence Oberman, Tiger zhao
In-Reply-To: <20161027071009.GA6434-jcswGhMUV9g@public.gmane.org>


> Hi Robert,

Hey Robert, Christoph,

> please explain your use cases that isn't handled.  The one and only
> reason to set MSDBD to 1 is to make the code a lot simpler given that
> there is no real use case for supporting more.
>
> RDMA uses memory registrations to register large and possibly
> discontiguous data regions for a single rkey, aka single SGL descriptor
> in NVMe terms.  There would be two reasons to support multiple SGL
> descriptors:  a) to support a larger I/O size than supported by a single
> MR, or b) to support a data region format not mappable by a single
> MR.
>
> iSER only supports a single rkey (or stag in IETF terminology) and has
> been doing fine on a) and mostly fine on b).   There are a few possible
> data layouts not supported by the traditional IB/iWarp FR WRs, but the
> limit is in fact exactly the same as imposed by the NVMe PRPs used for
> PCIe NVMe devices, so the Linux block layer has support to not generate
> them.  Also with modern Mellanox IB/RoCE hardware we can actually
> register completely arbitrary SGLs.  iSER supports using this registration
> mode already with a trivial code addition, but for NVMe we didn't have a
> pressing need yet.

Good explanation :)

The IO transfer size is a bit more pressing on some devices (e.g.
cxgb3/4) where the number of pages per-MR can be indeed lower than
a reasonable transfer size (Steve can correct me if I'm wrong).

However, if there is a real demand for this we'll happily accept
patches :)

Just a note, having this feature in-place can bring unexpected behavior
depending on how we implement it:
- If we can use multiple MRs per IO (for multiple SGLs) we can either
prepare for the worst-case and allocate enough MRs to satisfy the
various IO patterns. This will be much heavier in terms of resource
allocation and can limit the scalability of the host driver.
- Or we can implement a shared MR pool with a reasonable number of MRs.
In this case each IO can consume one or more MRs on the expense of
other IOs. In this case we may need to requeue the IO later when we
have enough available MRs to satisfy the IO. This can yield some
unexpected performance gaps for some workloads.

Cheers,
Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Trouble enabling iSER for ConnectX-4 Lx
From: Sagi Grimberg @ 2016-10-27  8:48 UTC (permalink / raw)
  To: Robert LeBlanc, linux-rdma
In-Reply-To: <CAANLjFr_we+33Nen-NYp1xQPzQ-wbR=GL4LBkEZb9azMUN-_=Q@mail.gmail.com>

Hi Robert,

AFAIK, MLNX_OFED does includes isert only for specific distros.

This is probably a compat issue between stock isert and MLNX
provided RDMA stack.

Any specific reason not to use upstream (or stock 4.4.27) kernel?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 06/12] blk-mq: Add a kick_requeue_list argument to blk_mq_requeue_request()
From: Johannes Thumshirn @ 2016-10-27  8:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <3944826d-bfde-f1e6-40ec-2c9a3c259e3a@sandisk.com>

On Wed, Oct 26, 2016 at 03:53:39PM -0700, Bart Van Assche wrote:
> Most blk_mq_requeue_request() and blk_mq_add_to_requeue_list() calls
> are followed by kicking the requeue list. Hence add an argument to
> these two functions that allows to kick the requeue list. This was
> proposed by Christoph Hellwig.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply

* Re: [PATCH 07/12] dm: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code
From: Johannes Thumshirn @ 2016-10-27  8:28 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <a0733adf-065e-2099-3850-cb1c55df1e35-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On Wed, Oct 26, 2016 at 03:54:07PM -0700, Bart Van Assche wrote:
> Instead of manipulating both QUEUE_FLAG_STOPPED and BLK_MQ_S_STOPPED
> in the dm start and stop queue functions, only manipulate the latter
> flag. Change blk_queue_stopped() tests into blk_mq_queue_stopped().
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>

-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 09/12] SRP transport: Move queuecommand() wait code to SCSI core
From: Johannes Thumshirn @ 2016-10-27  8:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <e86cdaf9-6305-d2cb-6068-0a050c023d73-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On Wed, Oct 26, 2016 at 03:55:00PM -0700, Bart Van Assche wrote:
> Additionally, rename srp_wait_for_queuecommand() into
> scsi_wait_for_queuecommand() and add a comment about the
> queuecommand() call from scsi_send_eh_cmnd().
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>

-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 10/12] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Johannes Thumshirn @ 2016-10-27  8:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <0cd77719-1f11-d5c3-3186-1c7c3cfd6886-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On Wed, Oct 26, 2016 at 03:55:34PM -0700, Bart Van Assche wrote:
> Ensure that if scsi-mq is enabled that scsi_wait_for_queuecommand()
> waits until ongoing shost->hostt->queuecommand() calls have finished.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>

-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Johannes Thumshirn @ 2016-10-27  8:18 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <32b0bd88-cb8e-754a-89fc-b1825778b05a@sandisk.com>

On Wed, Oct 26, 2016 at 03:52:35PM -0700, Bart Van Assche wrote:
> Move the "hctx stopped" test and the insert request calls into
> blk_mq_direct_issue_request(). Rename that function into
> blk_mq_try_issue_directly() to reflect its new semantics. Pass
> the hctx pointer to that function instead of looking it up a
> second time. These changes avoid that code has to be duplicated
> in the next patch.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply

* Re: [PATCH 04/12] blk-mq: Move more code into blk_mq_direct_issue_request()
From: Johannes Thumshirn @ 2016-10-27  8:17 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <32b0bd88-cb8e-754a-89fc-b1825778b05a@sandisk.com>

On Wed, Oct 26, 2016 at 03:52:35PM -0700, Bart Van Assche wrote:
> Move the "hctx stopped" test and the insert request calls into
> blk_mq_direct_issue_request(). Rename that function into
> blk_mq_try_issue_directly() to reflect its new semantics. Pass
> the hctx pointer to that function instead of looking it up a
> second time. These changes avoid that code has to be duplicated
> in the next patch.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply

* Re: [PATCH 03/12] blk-mq: Introduce blk_mq_queue_stopped()
From: Johannes Thumshirn @ 2016-10-27  8:16 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman,
	linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
In-Reply-To: <f68b2997-8b0d-7aea-2859-5fbda4f6bf71-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>

On Wed, Oct 26, 2016 at 03:52:05PM -0700, Bart Van Assche wrote:
> The function blk_queue_stopped() allows to test whether or not a
> traditional request queue has been stopped. Introduce a helper
> function that allows block drivers to query easily whether or not
> one or more hardware contexts of a blk-mq queue have been stopped.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Reviewed-by: Hannes Reinecke <hare-IBi9RG/b67k@public.gmane.org>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn-l3A5Bk7waGM@public.gmane.org>

-- 
Johannes Thumshirn                                          Storage
jthumshirn-l3A5Bk7waGM@public.gmane.org                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 02/12] blk-mq: Introduce blk_mq_hctx_stopped()
From: Johannes Thumshirn @ 2016-10-27  8:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Christoph Hellwig, James Bottomley,
	Martin K. Petersen, Mike Snitzer, Doug Ledford, Keith Busch,
	Ming Lei, Laurence Oberman, linux-block@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-nvme@lists.infradead.org
In-Reply-To: <0de50789-e3b7-0a07-73c1-4fb87b1f957e@sandisk.com>

On Wed, Oct 26, 2016 at 03:51:33PM -0700, Bart Van Assche wrote:
> Multiple functions test the BLK_MQ_S_STOPPED bit so introduce
> a helper function that performs this test.
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Christoph Hellwig <hch@lst.de>

Looks good,
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>

-- 
Johannes Thumshirn                                          Storage
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox