Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* RE: [PATCH] rdma UAPI: Use __kernel_sockaddr_storage
From: Steve Wise @ 2016-10-27 17:47 UTC (permalink / raw)
  To: 'Jason Gunthorpe', 'Doug Ledford',
	linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477587077-15410-1-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

> 
> The kernel side is #ifdef'd to this type, and the UAPI header
> should use it directly. It has slightly different alignment
> requirments from the usual user space version.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

If the alignment changed, does this break binary compatibility?  IE a new
librdmacm with an old rdma_ucm.ko? 

> ---
>  include/uapi/rdma/rdma_user_cm.h | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/rdma/rdma_user_cm.h
> b/include/uapi/rdma/rdma_user_cm.h
> index 01923d463673..d71da36e3cd6 100644
> --- a/include/uapi/rdma/rdma_user_cm.h
> +++ b/include/uapi/rdma/rdma_user_cm.h
> @@ -110,7 +110,7 @@ struct rdma_ucm_bind {
>  	__u32 id;
>  	__u16 addr_size;
>  	__u16 reserved;
> -	struct sockaddr_storage addr;
> +	struct __kernel_sockaddr_storage addr;
>  };
> 
>  struct rdma_ucm_resolve_ip {
> @@ -126,8 +126,8 @@ struct rdma_ucm_resolve_addr {
>  	__u16 src_size;
>  	__u16 dst_size;
>  	__u32 reserved;
> -	struct sockaddr_storage src_addr;
> -	struct sockaddr_storage dst_addr;
> +	struct __kernel_sockaddr_storage src_addr;
> +	struct __kernel_sockaddr_storage dst_addr;
>  };
> 
>  struct rdma_ucm_resolve_route {
> @@ -164,8 +164,8 @@ struct rdma_ucm_query_addr_resp {
>  	__u16 pkey;
>  	__u16 src_size;
>  	__u16 dst_size;
> -	struct sockaddr_storage src_addr;
> -	struct sockaddr_storage dst_addr;
> +	struct __kernel_sockaddr_storage src_addr;
> +	struct __kernel_sockaddr_storage dst_addr;
>  };
> 
>  struct rdma_ucm_query_path_resp {
> @@ -257,7 +257,7 @@ struct rdma_ucm_join_mcast {
>  	__u32 id;
>  	__u16 addr_size;
>  	__u16 join_flags;
> -	struct sockaddr_storage addr;
> +	struct __kernel_sockaddr_storage addr;
>  };
> 
>  struct rdma_ucm_get_event {
> --
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] rdma UAPI: Use __kernel_sockaddr_storage
From: Jason Gunthorpe @ 2016-10-27 17:55 UTC (permalink / raw)
  To: Steve Wise; +Cc: 'Doug Ledford', linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <028b01d2307a$41454dd0$c3cfe970$@opengridcomputing.com>

On Thu, Oct 27, 2016 at 12:47:58PM -0500, Steve Wise wrote:
> > 
> > The kernel side is #ifdef'd to this type, and the UAPI header
> > should use it directly. It has slightly different alignment
> > requirments from the usual user space version.
> > 
> > Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
> 
> If the alignment changed, does this break binary compatibility?  IE a new
> librdmacm with an old rdma_ucm.ko? 

The kernel side already uses __kernel_sockaddr_storage (via a
#define), so there is no possible break to binary compatibility.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Trouble enabling iSER for ConnectX-4 Lx
From: Robert LeBlanc @ 2016-10-27 20:10 UTC (permalink / raw)
  To: Alaa Hleihel; +Cc: linux-rdma
In-Reply-To: <af629bfa-5355-0d14-9cc1-f5c4ef687f67-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

Thanks for the push in the right direction. With installing OFED, I
was able to use iSER and the comparison between Ethernet and RoCE is
quite impressive. I had a feeling that OFED would be needed, but since
it wasn't listed in the software for the adapter or mentioned in the
adapter documentation, it wasn't clear that it was needed. One thing
I'm not 100% sure about is if after running cma_roce_mode -d mlx5_0 -p
1 -m 2 is if RoCE v2 is being used for iSER. I'll have to read up more
on doing a packet capture of RoCE to know for sure, unless you have a
better way to confirm.

Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Oct 27, 2016 at 1:26 AM, Alaa Hleihel <alaa-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org> wrote:
> Hi Robert,
>
> You've installed mlnx-en package, which does not provide iSER modules.
> Instead, you should get MLNX_OFED from:
> http://www.mellanox.com/page/products_dyn?product_family=26&mtag=linux_sw_drivers
>
> Note that ib_isert in MLNX_OFED is not backported to all kernels, and it's
> enabled by default only for a specific kernels.
> To force building the module against your kernel, use the following command
> for MLNX_OFED installation:
> # MLNX_EXTRA_FLAGS=--with-isert ./mlnxofedinstall --force
> --add-kernel-support  --with-isert --skip-repo
>
> As for configuring RoCE, please refer to the MLNX_OFED User Manual:
> http://www.mellanox.com/related-docs/prod_software/Mellanox_OFED_Linux_User_Manual_v3.40.pdf
>
> Regards,
> Alaa
>
>
>
> On 10/27/2016 02:13, Robert LeBlanc wrote:
>
> We have some ConnectX-4 Lx cards that I'm trying to test RoCE and iSER
> on. I downloaded and installed the Mellanox drivers with VMA [0]. I
> was able to run the ib_read_bw tests over the adapters after
> installing the infiniband-diags and perftest RPMs. When I went to
> configure LIO for iSER, I'm getting the message "Cannot change iser"
> on step 6 in the procedure here [1] which I've done many times with
> Infiniband without issues. I navigated to
> /sys/kernel/config/target/iscsi/{iqn}/tpgt_1/np/{portal_ip:port} and
> sure enough, I can't write '1' into iser. The kernel is not giving any
> messages and the ib_isert module is loaded. This is on 4.4.27,
> Mellanox driver 3.4-1.0.0.3 built with `./install --add-kernel-support
> --skip-repo --tmpdir /root/junk --vma`
>
> # mstflint -d 4:00.0 q
> Image type:          FS3
> FW Version:          14.16.1020
> FW Release Date:     20.6.2016
> Rom Info:            type=UEFI version=14.10.16
>                     type=PXE version=3.4.812 devid=4117
> Description:         UID                GuidsNumber
> Base GUID:           0cc47a000089f706        4
> Base MAC:            00000cc47a89f706        4
> Image VSD:
> Device VSD:
> PSID:                SM_2001000001034
>
> # ibstatus
> Infiniband device 'mlx5_0' port 1 status:
>        default gid:     fe80:0000:0000:0000:0ec4:7aff:fe89:f706
>        base lid:        0x0
>        sm lid:          0x0
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            25 Gb/sec (1X EDR)
>        link_layer:      Ethernet
>
> Infiniband device 'mlx5_1' port 1 status:
>        default gid:     fe80:0000:0000:0000:0ec4:7aff:fe89:f707
>        base lid:        0x0
>        sm lid:          0x0
>        state:           4: ACTIVE
>        phys state:      5: LinkUp
>        rate:            25 Gb/sec (1X EDR)
>        link_layer:      Ethernet
>
> Any ideas of what I'm doing wrong here? I don't have any experience
> with RoCE, so I'm sure I'm doing something wrong. And the manual has
> nothing about configuring RoCE other than enabling --vma when
> installing the drivers [2].
>
> Thanks,
> Robert LeBlanc
>
> [0] http://www.mellanox.com/page/products_dyn?product_family=27
> [1] https://community.mellanox.com/docs/DOC-1472
> [2]
> http://www.mellanox.com/related-docs/prod_software/Mellanox_EN_for_Linux_User_Manual_v3_40.pdf
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: Trouble enabling iSER for ConnectX-4 Lx
From: Robert LeBlanc @ 2016-10-27 20:15 UTC (permalink / raw)
  To: Sagi Grimberg; +Cc: linux-rdma
In-Reply-To: <ba970fe1-52cd-9f8d-c48d-5eac76b5bfa2-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>

Installing Mellanox OFED and not just the Ethernet drivers cleared
things up. We have a handful of patches that are required for products
we are developing and a few patches that are trying to resolve iSER D
state issues we are seeing (which may not even be useful while using
iSER from Mellanox OFED).

Now I have to see if there will be issues with the Qloic FastLinq card
with the Mellanox OFED. I don't think we will have both cards in
production, just as we are trying to settle on hardware and protocols.

Thank you,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Oct 27, 2016 at 2:48 AM, Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org> wrote:
> Hi Robert,
>
> AFAIK, MLNX_OFED does includes isert only for specific distros.
>
> This is probably a compat issue between stock isert and MLNX
> provided RDMA stack.
>
> Any specific reason not to use upstream (or stock 4.4.27) kernel?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v2 0/2] mm: unexport __get_user_pages_unlocked()
From: Lorenzo Stoakes @ 2016-10-27 20:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel

This patch series continues the cleanup of get_user_pages*() functions taking
advantage of the fact we can now pass gup_flags as we please.

It firstly adds an additional 'locked' parameter to get_user_pages_remote() to
allow for its callers to utilise VM_FAULT_RETRY functionality. This is necessary
as the invocation of __get_user_pages_unlocked() in process_vm_rw_single_vec()
makes use of this and no other existing higher level function would allow it to
do so.

Secondly existing callers of __get_user_pages_unlocked() are replaced with the
appropriate higher-level replacement - get_user_pages_unlocked() if the current
task and memory descriptor are referenced, or get_user_pages_remote() if other
task/memory descriptors are referenced (having acquiring mmap_sem.)

Lorenzo Stoakes (2):
  mm: add locked parameter to get_user_pages_remote()
  mm: unexport __get_user_pages_unlocked()

 drivers/gpu/drm/etnaviv/etnaviv_gem.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  2 +-
 drivers/infiniband/core/umem_odp.c      |  2 +-
 fs/exec.c                               |  2 +-
 include/linux/mm.h                      |  5 +----
 kernel/events/uprobes.c                 |  4 ++--
 mm/gup.c                                | 20 ++++++++++++--------
 mm/memory.c                             |  2 +-
 mm/nommu.c                              |  7 +++----
 mm/process_vm_access.c                  | 12 ++++++++----
 security/tomoyo/domain.c                |  2 +-
 virt/kvm/async_pf.c                     | 10 +++++++---
 virt/kvm/kvm_main.c                     |  5 ++---
 13 files changed, 41 insertions(+), 34 deletions(-)



^ permalink raw reply

* [PATCH v2 1/2] mm: add locked parameter to get_user_pages_remote()
From: Lorenzo Stoakes @ 2016-10-27 20:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel,
	Lorenzo Stoakes
In-Reply-To: <20161027203403.31708-1-lstoakes@gmail.com>

This patch adds an int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().

It additionally clears the way for __get_user_pages_unlocked() to be unexported
as its sole remaining useful characteristic was to allow for VM_FAULT_RETRY
behaviour when faulting in pages.

It should not introduce any functional changes, however it does allow for
subsequent changes to get_user_pages_remote() callers to take advantage of
VM_FAULT_RETRY.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
v2: updated description

 drivers/gpu/drm/etnaviv/etnaviv_gem.c   |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c |  2 +-
 drivers/infiniband/core/umem_odp.c      |  2 +-
 fs/exec.c                               |  2 +-
 include/linux/mm.h                      |  2 +-
 kernel/events/uprobes.c                 |  4 ++--
 mm/gup.c                                | 12 ++++++++----
 mm/memory.c                             |  2 +-
 security/tomoyo/domain.c                |  2 +-
 9 files changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 0370b84..0c69a97f 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -763,7 +763,7 @@ static struct page **etnaviv_gem_userptr_do_get_pages(
 	down_read(&mm->mmap_sem);
 	while (pinned < npages) {
 		ret = get_user_pages_remote(task, mm, ptr, npages - pinned,
-					    flags, pvec + pinned, NULL);
+					    flags, pvec + pinned, NULL, NULL);
 		if (ret < 0)
 			break;

diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index c6f780f..836b525 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -522,7 +522,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 					 obj->userptr.ptr + pinned * PAGE_SIZE,
 					 npages - pinned,
 					 flags,
-					 pvec + pinned, NULL);
+					 pvec + pinned, NULL, NULL);
 				if (ret < 0)
 					break;

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 1f0fe32..6b079a3 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -578,7 +578,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 		 */
 		npages = get_user_pages_remote(owning_process, owning_mm,
 				user_virt, gup_num_pages,
-				flags, local_page_list, NULL);
+				flags, local_page_list, NULL, NULL);
 		up_read(&owning_mm->mmap_sem);

 		if (npages < 0)
diff --git a/fs/exec.c b/fs/exec.c
index 4e497b9..2cf049d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -209,7 +209,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
 	 * doing the exec and bprm->mm is the new process's mm.
 	 */
 	ret = get_user_pages_remote(current, bprm->mm, pos, 1, gup_flags,
-			&page, NULL);
+			&page, NULL, NULL);
 	if (ret <= 0)
 		return NULL;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a92c8d7..cc15445 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1274,7 +1274,7 @@ extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 			    unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
-			    struct vm_area_struct **vmas);
+			    struct vm_area_struct **vmas, int *locked);
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    unsigned int gup_flags, struct page **pages,
 			    struct vm_area_struct **vmas);
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index f9ec9ad..215871b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -301,7 +301,7 @@ int uprobe_write_opcode(struct mm_struct *mm, unsigned long vaddr,
 retry:
 	/* Read the page with vaddr into memory */
 	ret = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &old_page,
-			&vma);
+			&vma, NULL);
 	if (ret <= 0)
 		return ret;

@@ -1712,7 +1712,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
 	 * essentially a kernel access to the memory.
 	 */
 	result = get_user_pages_remote(NULL, mm, vaddr, 1, FOLL_FORCE, &page,
-			NULL);
+			NULL, NULL);
 	if (result < 0)
 		return result;

diff --git a/mm/gup.c b/mm/gup.c
index ec4f827..0567851 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -920,6 +920,9 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
  *		only intends to ensure the pages are faulted in.
  * @vmas:	array of pointers to vmas corresponding to each page.
  *		Or NULL if the caller does not require them.
+ * @locked:	pointer to lock flag indicating whether lock is held and
+ *		subsequently whether VM_FAULT_RETRY functionality can be
+ *		utilised. Lock must initially be held.
  *
  * Returns number of pages pinned. This may be fewer than the number
  * requested. If nr_pages is 0 or negative, returns 0. If no pages
@@ -963,10 +966,10 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
 long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 		unsigned long start, unsigned long nr_pages,
 		unsigned int gup_flags, struct page **pages,
-		struct vm_area_struct **vmas)
+		struct vm_area_struct **vmas, int *locked)
 {
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
-				       NULL, false,
+				       locked, true,
 				       gup_flags | FOLL_TOUCH | FOLL_REMOTE);
 }
 EXPORT_SYMBOL(get_user_pages_remote);
@@ -974,8 +977,9 @@ EXPORT_SYMBOL(get_user_pages_remote);
 /*
  * This is the same as get_user_pages_remote(), just with a
  * less-flexible calling convention where we assume that the task
- * and mm being operated on are the current task's.  We also
- * obviously don't pass FOLL_REMOTE in here.
+ * and mm being operated on are the current task's and don't allow
+ * passing of a locked parameter.  We also obviously don't pass
+ * FOLL_REMOTE in here.
  */
 long get_user_pages(unsigned long start, unsigned long nr_pages,
 		unsigned int gup_flags, struct page **pages,
diff --git a/mm/memory.c b/mm/memory.c
index e18c57b..2f3949b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3883,7 +3883,7 @@ static int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm,
 		struct page *page = NULL;

 		ret = get_user_pages_remote(tsk, mm, addr, 1,
-				gup_flags, &page, &vma);
+				gup_flags, &page, &vma, NULL);
 		if (ret <= 0) {
 #ifndef CONFIG_HAVE_IOREMAP_PROT
 			break;
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 682b73a..838ffa7 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -881,7 +881,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
 	 * the execve().
 	 */
 	if (get_user_pages_remote(current, bprm->mm, pos, 1,
-				FOLL_FORCE, &page, NULL) <= 0)
+				FOLL_FORCE, &page, NULL, NULL) <= 0)
 		return false;
 #else
 	page = bprm->page[pos / PAGE_SIZE];
--
2.10.1

^ permalink raw reply related

* [PATCH v2 2/2] mm: unexport __get_user_pages_unlocked()
From: Lorenzo Stoakes @ 2016-10-27 20:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, Linus Torvalds, Jan Kara, Hugh Dickins, Dave Hansen,
	Rik van Riel, Mel Gorman, Andrew Morton, Paolo Bonzini,
	Radim Krčmář, kvm, linux-kernel,
	linux-security-module, linux-rdma, dri-devel, linux-fsdevel,
	Lorenzo Stoakes
In-Reply-To: <20161027203403.31708-1-lstoakes@gmail.com>

This patch unexports the low-level __get_user_pages_unlocked() function and
replaces invocations with calls to more appropriate higher-level functions.

In hva_to_pfn_slow() we are able to replace __get_user_pages_unlocked() with
get_user_pages_unlocked() since we can now pass gup_flags.

In async_pf_execute() and process_vm_rw_single_vec() we need to pass different
tsk, mm arguments so get_user_pages_remote() is the sane replacement in these
cases (having added manual acquisition and release of mmap_sem.)

Additionally get_user_pages_remote() reintroduces use of the FOLL_TOUCH
flag. However, this flag was originally silently dropped by 1e9877902dc7e
("mm/gup: Introduce get_user_pages_remote()"), so this appears to have been
unintentional and reintroducing it is therefore not an issue.

Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
---
v2: updated patch to apply against mainline rather than -mmots

 include/linux/mm.h     |  3 ---
 mm/gup.c               |  8 ++++----
 mm/nommu.c             |  7 +++----
 mm/process_vm_access.c | 12 ++++++++----
 virt/kvm/async_pf.c    | 10 +++++++---
 virt/kvm/kvm_main.c    |  5 ++---
 6 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index cc15445..7b2d14e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1280,9 +1280,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
 			    struct vm_area_struct **vmas);
 long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 		    unsigned int gup_flags, struct page **pages, int *locked);
-long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-			       unsigned long start, unsigned long nr_pages,
-			       struct page **pages, unsigned int gup_flags);
 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 		    struct page **pages, unsigned int gup_flags);
 int get_user_pages_fast(unsigned long start, int nr_pages, int write,
diff --git a/mm/gup.c b/mm/gup.c
index 0567851..8028af1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -866,9 +866,10 @@ EXPORT_SYMBOL(get_user_pages_locked);
  * according to the parameters "pages", "write", "force"
  * respectively.
  */
-__always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-					       unsigned long start, unsigned long nr_pages,
-					       struct page **pages, unsigned int gup_flags)
+static __always_inline long __get_user_pages_unlocked(struct task_struct *tsk,
+		struct mm_struct *mm, unsigned long start,
+		unsigned long nr_pages, struct page **pages,
+		unsigned int gup_flags)
 {
 	long ret;
 	int locked = 1;
@@ -880,7 +881,6 @@ __always_inline long __get_user_pages_unlocked(struct task_struct *tsk, struct m
 		up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL(__get_user_pages_unlocked);

 /*
  * get_user_pages_unlocked() is suitable to replace the form:
diff --git a/mm/nommu.c b/mm/nommu.c
index 8b8faaf..669437b 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -176,9 +176,9 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
 }
 EXPORT_SYMBOL(get_user_pages_locked);

-long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
-			       unsigned long start, unsigned long nr_pages,
-			       struct page **pages, unsigned int gup_flags)
+static long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
+				      unsigned long start, unsigned long nr_pages,
+			              struct page **pages, unsigned int gup_flags)
 {
 	long ret;
 	down_read(&mm->mmap_sem);
@@ -187,7 +187,6 @@ long __get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
 	up_read(&mm->mmap_sem);
 	return ret;
 }
-EXPORT_SYMBOL(__get_user_pages_unlocked);

 long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
 			     struct page **pages, unsigned int gup_flags)
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index be8dc8d..84d0c7e 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -88,7 +88,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
 	ssize_t rc = 0;
 	unsigned long max_pages_per_loop = PVM_MAX_KMALLOC_PAGES
 		/ sizeof(struct pages *);
-	unsigned int flags = FOLL_REMOTE;
+	unsigned int flags = 0;

 	/* Work out address and page range required */
 	if (len == 0)
@@ -100,15 +100,19 @@ static int process_vm_rw_single_vec(unsigned long addr,

 	while (!rc && nr_pages && iov_iter_count(iter)) {
 		int pages = min(nr_pages, max_pages_per_loop);
+		int locked = 1;
 		size_t bytes;

 		/*
 		 * Get the pages we're interested in.  We must
-		 * add FOLL_REMOTE because task/mm might not
+		 * access remotely because task/mm might not
 		 * current/current->mm
 		 */
-		pages = __get_user_pages_unlocked(task, mm, pa, pages,
-						  process_pages, flags);
+		down_read(&mm->mmap_sem);
+		pages = get_user_pages_remote(task, mm, pa, pages, flags,
+					      process_pages, NULL, &locked);
+		if (locked)
+			up_read(&mm->mmap_sem);
 		if (pages <= 0)
 			return -EFAULT;

diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 8035cc1..dab8b19 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -76,16 +76,20 @@ static void async_pf_execute(struct work_struct *work)
 	struct kvm_vcpu *vcpu = apf->vcpu;
 	unsigned long addr = apf->addr;
 	gva_t gva = apf->gva;
+	int locked = 1;

 	might_sleep();

 	/*
 	 * This work is run asynchromously to the task which owns
 	 * mm and might be done in another context, so we must
-	 * use FOLL_REMOTE.
+	 * access remotely.
 	 */
-	__get_user_pages_unlocked(NULL, mm, addr, 1, NULL,
-			FOLL_WRITE | FOLL_REMOTE);
+	down_read(&mm->mmap_sem);
+	get_user_pages_remote(NULL, mm, addr, 1, FOLL_WRITE, NULL, NULL,
+			&locked);
+	if (locked)
+		up_read(&mm->mmap_sem);

 	kvm_async_page_present_sync(vcpu, apf);

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2907b7b..c45d951 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1415,13 +1415,12 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
 		npages = get_user_page_nowait(addr, write_fault, page);
 		up_read(&current->mm->mmap_sem);
 	} else {
-		unsigned int flags = FOLL_TOUCH | FOLL_HWPOISON;
+		unsigned int flags = FOLL_HWPOISON;

 		if (write_fault)
 			flags |= FOLL_WRITE;

-		npages = __get_user_pages_unlocked(current, current->mm, addr, 1,
-						   page, flags);
+		npages = get_user_pages_unlocked(addr, 1, page, flags);
 	}
 	if (npages != 1)
 		return npages;
--
2.10.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* Re: [PATCH rdma-core v2 4/4] redhat/spec: build split rpm packages
From: Jason Gunthorpe @ 2016-10-27 21:10 UTC (permalink / raw)
  To: Jarod Wilson; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161020153357.27286-5-jarod-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Thu, Oct 20, 2016 at 11:33:57AM -0400, Jarod Wilson wrote:
> @@ -7,10 +7,11 @@ Summary: RDMA core userspace libraries and daemons
>  #  providers/ipathverbs/ Dual licensed using a BSD license with an extra patent clause
>  #  providers/rxe/ Incorporates code from ipathverbs and contains the patent clause
>  #  providers/hfi1verbs Uses the 3 Clause BSD license
> -License: (GPLv2 or BSD) and (GPLv2 or PathScale-BSD)
> +License: GPLv2 or BSD

Is this Ok? The Fedora guidelines I read suggested the PathScale
license would need to be assigned a short tag, and I'd be surprised if
'BSD' is the right tag due to the patent stuff..

>  Url: http://openfabrics.org/

I guess we should change this url to
https://github.com/linux-rdma/rdma-core ?

>  Source: rdma-core-%{version}.tgz
> -BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
> +# https://github.com/linux-rdma/rdma-core
> +BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX)

I always wondered why there was so much variability in spec files
here.. I followed the Fedora guidelines, should we copy the above into
the other spec file?

> @@ -19,20 +20,15 @@ BuildRequires: pkgconfig
>  BuildRequires: pkgconfig(libnl-3.0)
>  BuildRequires: pkgconfig(libnl-route-3.0)
>  BuildRequires: valgrind-devel
> +BuildRequires: libnl3-devel

?

Isn't pkgconfig(libnl-3.0) the same thing?

>%define systemd_dep systemd-units
>%if 0%{?fedora} >= 18
>%define systemd_dep systemd
>%endif

The source package probably doesn't even build on FC 18.. can probably
remove this

> +Summary: InfiniBand Communication Manager Assistant
> +Requires(post): %{systemd_dep}
> +Requires(preun): %{systemd_dep}
> +Requires(postun): %{systemd_dep}

I suppose we need these and related in the other spec file too?
Looks like this spec file isn't going to work on C6, so you can
probably drop the other systemd compat stuff:

--- a/redhat/rdma-core.spec
+++ b/redhat/rdma-core.spec
@@ -202,13 +202,6 @@ discover and use SCSI devices via the SCSI RDMA Protocol over InfiniBand.
 
 %build
 
-# Detect if systemd is supported on this system
-%if 0%{?_unitdir:1}
-%define my_unitdir %{_unitdir}
-%else
-%define my_unitdir /tmp/
-%endif
-
 # New RPM defines _rundir, usually as /run
 %if 0%{?_rundir:1}
 %else
@@ -228,7 +221,7 @@ discover and use SCSI devices via the SCSI RDMA Protocol over InfiniBand.
          -DCMAKE_INSTALL_INFODIR:PATH=%{_infodir} \
          -DCMAKE_INSTALL_MANDIR:PATH=%{_mandir} \
          -DCMAKE_INSTALL_SYSCONFDIR:PATH=%{_sysconfdir} \
-        -DCMAKE_INSTALL_SYSTEMD_SERVICEDIR:PATH=%{my_unitdir} \
+        -DCMAKE_INSTALL_SYSTEMD_SERVICEDIR:PATH=%{_unitdir} \
         -DCMAKE_INSTALL_INITDDIR:PATH=%{_initrddir} \
         -DCMAKE_INSTALL_RUNDIR:PATH=%{_rundir} \
         -DCMAKE_INSTALL_DOCDIR:PATH=%{_docdir}/%{name}-%{version}
@@ -276,8 +269,6 @@ install -D -m0644 redhat/srp_daemon.service %{buildroot}%{_unitdir}/
 
 %if 0%{?_unitdir:1}
 rm -rf %{buildroot}/%{_initrddir}/
-%else
-rm -rf %{buildroot}/%{my_unitdir}/
 %endif
 
 %post -p /sbin/ldconfig

> +%package -n librdmacm-utils
> +Summary: Examples for the librdmacm library
> +Requires: librdmacm%{?_isa} = %{version}-%{release}

Why the requires? Shouldn't auto shlib dependencies take care of that?

Anyhow, this all looks fine to me, I put a branch here, with one
change to make the debian packaging work after the README.md change:

https://github.com/jgunthorpe/rdma-plumbing/tree/redhat-packaging

If you want to make any final adjustments let me know, otherwise I
will send this on..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH rdma-core 0/4] Migrate to use kernel uAPI headers
From: Jason Gunthorpe @ 2016-10-27 23:06 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA

This is a start on this process, verbs will be quite challenging.

https://github.com/linux-rdma/rdma-core/pull/29

Jason Gunthorpe (4):
  Support -DKERNEL_DIR to use kernel UAPI headers directly
  Move rdma_netlink compat into CMake
  verbs: Replace infiniband/sa-kern-abi.h with the kernel's
    uapi/rdma/ib_user_sa.h
  ibcm: Replace infiniband/cm_abi.h with the kernel's
    uapi/rdma/ib_user_cm.h

 CMakeLists.txt                             |  11 +-
 buildlib/RDMA_LinuxHeaders.cmake           |  85 +++++++
 buildlib/fixup-include/rdma-rdma_netlink.h | 225 ++++++++++++++++++
 ibacm/src/acm.c                            |   3 -
 ibacm/src/acm_netlink.h                    | 128 -----------
 iwpmd/iwarp_pm.h                           |   2 +-
 iwpmd/iwarp_pm_common.c                    |   5 -
 iwpmd/iwarp_pm_server.c                    |   4 +-
 iwpmd/iwpm_netlink.h                       | 214 ------------------
 libibcm/cm.c                               |  78 ++++---
 libibcm/cm_abi.h                           | 351 ++++-------------------------
 libibverbs/marshall.c                      |   4 +-
 libibverbs/marshall.h                      |   6 +-
 libibverbs/sa-kern-abi.h                   |  34 +--
 librdmacm/rdma_cma_abi.h                   |   4 +-
 15 files changed, 416 insertions(+), 738 deletions(-)
 create mode 100644 buildlib/RDMA_LinuxHeaders.cmake
 create mode 100644 buildlib/fixup-include/rdma-rdma_netlink.h
 delete mode 100644 ibacm/src/acm_netlink.h
 delete mode 100644 iwpmd/iwpm_netlink.h

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH rdma-core 1/4] Support -DKERNEL_DIR to use kernel UAPI headers directly
From: Jason Gunthorpe @ 2016-10-27 23:06 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477609570-8087-1-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

This is useful to painlessly test if the kernel headers work with the
build. All the kernel header shimming is moved into
buildlib/RDMA_LinuxHeaders.cmake, new headers can be added to the list.

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 CMakeLists.txt                   | 11 ++----
 buildlib/RDMA_LinuxHeaders.cmake | 85 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 89 insertions(+), 7 deletions(-)
 create mode 100644 buildlib/RDMA_LinuxHeaders.cmake

diff --git a/CMakeLists.txt b/CMakeLists.txt
index 230aab5ee01f..7abaa895c173 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -19,6 +19,8 @@
 #      Produce static libraries along with the usual shared libraries.
 #  -DVERBS_PROVIDER_DIR='' (default /usr/lib.../libibverbs)
 #      Use the historical search path for providers, in the standard system library.
+#  -DKERNEL_DIR='.../linux' (default '')
+#      If set use the kernel UAPI headers from this kernel source tree.
 
 cmake_minimum_required(VERSION 2.8.11 FATAL_ERROR)
 project(RDMA C)
@@ -241,10 +243,7 @@ endif()
 # should rely on this.
 check_type_size("long" SIZEOF_LONG BUILTIN_TYPES_ONLY LANGUAGE C)
 
-# Are our kernel headers new enough?
-# If not replace them with built-in copies so we can continue to build.
-CHECK_INCLUDE_FILE("rdma/rdma_user_rxe.h" HAVE_RDMA_USER_RXE)
-RDMA_DoFixup("${HAVE_RDMA_USER_RXE}" "rdma/rdma_user_rxe.h")
+include(RDMA_LinuxHeaders)
 
 #-------------------------
 # Apply fixups
@@ -375,9 +374,7 @@ else()
     message(STATUS " netlink/route/link.h and net/if.h NOT co-includable (old headers)")
   endif()
 endif()
-if (NOT HAVE_RDMA_USER_RXE)
-  message(STATUS " rdma/rdma_user_rxe.h NOT found (old system kernel headers)")
-endif()
+rdma_report_missing_kheaders()
 if (NOT HAVE_C_WARNINGS)
   message(STATUS " extended C warnings NOT supported")
 endif()
diff --git a/buildlib/RDMA_LinuxHeaders.cmake b/buildlib/RDMA_LinuxHeaders.cmake
new file mode 100644
index 000000000000..bd16d8deca72
--- /dev/null
+++ b/buildlib/RDMA_LinuxHeaders.cmake
@@ -0,0 +1,85 @@
+# COPYRIGHT (c) 2016 Obsidian Research Corporation. See COPYING file
+
+# Check that the system kernel headers are new enough, if not replace the
+# headers with our internal copy.
+
+set(DEFAULT_TEST "int main(int argc,const char *argv[]) {return 1;}")
+set(MISSING_HEADERS "")
+
+function(rdma_canon_header PATH OUT_VAR)
+  string(TOUPPER "${PATH}" HAVE)
+  string(REPLACE " " "_" HAVE "${HAVE}")
+  string(REPLACE "/" "_" HAVE "${HAVE}")
+  string(REPLACE "." "_" HAVE "${HAVE}")
+  set("${OUT_VAR}" "HAVE_${HAVE}" PARENT_SCOPE)
+endfunction()
+
+function(rdma_check_kheader PATH C_TEST)
+  rdma_canon_header("${PATH}" HAVE)
+
+  if(KERNEL_DIR)
+    # Drop a symlink back to the kernel into our include/ directory
+    if (EXISTS "${KERNEL_DIR}/include/uapi/${PATH}")
+      set(DEST "${BUILD_INCLUDE}/${PATH}")
+
+      if(CMAKE_VERSION VERSION_LESS "2.8.12")
+	get_filename_component(DIR ${DEST} PATH)
+      else()
+	get_filename_component(DIR ${DEST} DIRECTORY)
+      endif()
+      file(MAKE_DIRECTORY "${DIR}")
+
+      # We cannot just -I the kernel UAPI dir, it depends on some
+      # post-processing of things like linux/stddef.h. Instead we symlink the
+      # kernel headers into our tree and rely on the distro's fixup of
+      # non-rdma headers.  The RDMA headers are all compatible with this
+      # scheme.
+      execute_process(COMMAND "${CMAKE_COMMAND}" "-E" "create_symlink"
+	"${KERNEL_DIR}/include/uapi/${PATH}"
+	"${DEST}")
+    else()
+      message(FATAL_ERROR "Kernel tree does not contain expected UAPI header"
+	"${KERNEL_DIR}/include/uapi/${PATH}")
+    endif()
+
+    set(CMAKE_REQUIRED_INCLUDES "${BUILD_INCLUDE}")
+  endif()
+
+  # Note: The RDMA kernel headers use sockaddr{_in,_in6,}/etc so we have to
+  # include system headers to define sockaddrs before testing any of them.
+  CHECK_C_SOURCE_COMPILES("
+ #include <sys/socket.h>
+ #include <netinet/in.h>
+ #include <${PATH}>
+${C_TEST}" "${HAVE}")
+
+  if(KERNEL_DIR)
+    if (NOT "${${HAVE}}")
+      # Run the compile test against the linked kernel header, this is to help
+      # make sure the compile tests work before the headers hit the distro
+      message(FATAL_ERROR "Kernel UAPI header failed compile test"
+	"${PATH}")
+    endif()
+  else()
+    RDMA_DoFixup("${${HAVE}}" "${PATH}")
+    if (NOT "${${HAVE}}")
+      list(APPEND MISSING_HEADERS "${PATH}")
+      set(MISSING_HEADERS "${MISSING_HEADERS}" PARENT_SCOPE)
+    endif()
+  endif()
+endfunction()
+
+function(rdma_report_missing_kheaders)
+  foreach(I IN LISTS MISSING_HEADERS)
+    message(STATUS " ${I} NOT found (old system kernel headers)")
+  endforeach()
+endfunction()
+
+# This list is topologically sorted
+rdma_check_kheader("rdma/ib_user_verbs.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/ib_user_sa.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/ib_user_cm.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/ib_user_mad.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/rdma_netlink.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/rdma_user_cm.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/rdma_user_rxe.h" "${DEFAULT_TEST}")
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-core 2/4] Move rdma_netlink compat into CMake
From: Jason Gunthorpe @ 2016-10-27 23:06 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Steve Wise
In-Reply-To: <1477609570-8087-1-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

Detect if the distro's rdma_netlink.h is new enough, if not replace
it with the built in copy, and eliminate the two loose copies of the
header.

The built in copy is from v4.8

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 buildlib/RDMA_LinuxHeaders.cmake           |   2 +-
 buildlib/fixup-include/rdma-rdma_netlink.h | 225 +++++++++++++++++++++++++++++
 ibacm/src/acm.c                            |   3 -
 ibacm/src/acm_netlink.h                    | 128 ----------------
 iwpmd/iwarp_pm.h                           |   2 +-
 iwpmd/iwarp_pm_common.c                    |   5 -
 iwpmd/iwarp_pm_server.c                    |   4 +-
 iwpmd/iwpm_netlink.h                       | 214 ---------------------------
 8 files changed, 229 insertions(+), 354 deletions(-)
 create mode 100644 buildlib/fixup-include/rdma-rdma_netlink.h
 delete mode 100644 ibacm/src/acm_netlink.h
 delete mode 100644 iwpmd/iwpm_netlink.h

Steve,

Can you check if the changes to iwpmd/iwarp_pm_server.c make sense?
Should we do something to fix the kernel header?

diff --git a/buildlib/RDMA_LinuxHeaders.cmake b/buildlib/RDMA_LinuxHeaders.cmake
index bd16d8deca72..c67b0a6113d2 100644
--- a/buildlib/RDMA_LinuxHeaders.cmake
+++ b/buildlib/RDMA_LinuxHeaders.cmake
@@ -80,6 +80,6 @@ rdma_check_kheader("rdma/ib_user_verbs.h" "${DEFAULT_TEST}")
 rdma_check_kheader("rdma/ib_user_sa.h" "${DEFAULT_TEST}")
 rdma_check_kheader("rdma/ib_user_cm.h" "${DEFAULT_TEST}")
 rdma_check_kheader("rdma/ib_user_mad.h" "${DEFAULT_TEST}")
-rdma_check_kheader("rdma/rdma_netlink.h" "${DEFAULT_TEST}")
+rdma_check_kheader("rdma/rdma_netlink.h" "int main(int argc,const char *argv[]) { return RDMA_NL_IWPM_REMOTE_INFO && RDMA_NL_IWCM; }")
 rdma_check_kheader("rdma/rdma_user_cm.h" "${DEFAULT_TEST}")
 rdma_check_kheader("rdma/rdma_user_rxe.h" "${DEFAULT_TEST}")
diff --git a/buildlib/fixup-include/rdma-rdma_netlink.h b/buildlib/fixup-include/rdma-rdma_netlink.h
new file mode 100644
index 000000000000..02fe8390c18f
--- /dev/null
+++ b/buildlib/fixup-include/rdma-rdma_netlink.h
@@ -0,0 +1,225 @@
+#ifndef _UAPI_RDMA_NETLINK_H
+#define _UAPI_RDMA_NETLINK_H
+
+#include <linux/types.h>
+
+enum {
+	RDMA_NL_RDMA_CM = 1,
+	RDMA_NL_IWCM,
+	RDMA_NL_RSVD,
+	RDMA_NL_LS,	/* RDMA Local Services */
+	RDMA_NL_I40IW,
+	RDMA_NL_NUM_CLIENTS
+};
+
+enum {
+	RDMA_NL_GROUP_CM = 1,
+	RDMA_NL_GROUP_IWPM,
+	RDMA_NL_GROUP_LS,
+	RDMA_NL_NUM_GROUPS
+};
+
+#define RDMA_NL_GET_CLIENT(type) ((type & (((1 << 6) - 1) << 10)) >> 10)
+#define RDMA_NL_GET_OP(type) (type & ((1 << 10) - 1))
+#define RDMA_NL_GET_TYPE(client, op) ((client << 10) + op)
+
+enum {
+	RDMA_NL_RDMA_CM_ID_STATS = 0,
+	RDMA_NL_RDMA_CM_NUM_OPS
+};
+
+enum {
+	RDMA_NL_RDMA_CM_ATTR_SRC_ADDR = 1,
+	RDMA_NL_RDMA_CM_ATTR_DST_ADDR,
+	RDMA_NL_RDMA_CM_NUM_ATTR,
+};
+
+/* iwarp port mapper op-codes */
+enum {
+	RDMA_NL_IWPM_REG_PID = 0,
+	RDMA_NL_IWPM_ADD_MAPPING,
+	RDMA_NL_IWPM_QUERY_MAPPING,
+	RDMA_NL_IWPM_REMOVE_MAPPING,
+	RDMA_NL_IWPM_REMOTE_INFO,
+	RDMA_NL_IWPM_HANDLE_ERR,
+	RDMA_NL_IWPM_MAPINFO,
+	RDMA_NL_IWPM_MAPINFO_NUM,
+	RDMA_NL_IWPM_NUM_OPS
+};
+
+struct rdma_cm_id_stats {
+	__u32	qp_num;
+	__u32	bound_dev_if;
+	__u32	port_space;
+	__s32	pid;
+	__u8	cm_state;
+	__u8	node_type;
+	__u8	port_num;
+	__u8	qp_type;
+};
+
+enum {
+	IWPM_NLA_REG_PID_UNSPEC = 0,
+	IWPM_NLA_REG_PID_SEQ,
+	IWPM_NLA_REG_IF_NAME,
+	IWPM_NLA_REG_IBDEV_NAME,
+	IWPM_NLA_REG_ULIB_NAME,
+	IWPM_NLA_REG_PID_MAX
+};
+
+enum {
+	IWPM_NLA_RREG_PID_UNSPEC = 0,
+	IWPM_NLA_RREG_PID_SEQ,
+	IWPM_NLA_RREG_IBDEV_NAME,
+	IWPM_NLA_RREG_ULIB_NAME,
+	IWPM_NLA_RREG_ULIB_VER,
+	IWPM_NLA_RREG_PID_ERR,
+	IWPM_NLA_RREG_PID_MAX
+
+};
+
+enum {
+	IWPM_NLA_MANAGE_MAPPING_UNSPEC = 0,
+	IWPM_NLA_MANAGE_MAPPING_SEQ,
+	IWPM_NLA_MANAGE_ADDR,
+	IWPM_NLA_MANAGE_MAPPED_LOC_ADDR,
+	IWPM_NLA_RMANAGE_MAPPING_ERR,
+	IWPM_NLA_RMANAGE_MAPPING_MAX
+};
+
+#define IWPM_NLA_MANAGE_MAPPING_MAX 3
+#define IWPM_NLA_QUERY_MAPPING_MAX  4
+#define IWPM_NLA_MAPINFO_SEND_MAX   3
+
+enum {
+	IWPM_NLA_QUERY_MAPPING_UNSPEC = 0,
+	IWPM_NLA_QUERY_MAPPING_SEQ,
+	IWPM_NLA_QUERY_LOCAL_ADDR,
+	IWPM_NLA_QUERY_REMOTE_ADDR,
+	IWPM_NLA_RQUERY_MAPPED_LOC_ADDR,
+	IWPM_NLA_RQUERY_MAPPED_REM_ADDR,
+	IWPM_NLA_RQUERY_MAPPING_ERR,
+	IWPM_NLA_RQUERY_MAPPING_MAX
+};
+
+enum {
+	IWPM_NLA_MAPINFO_REQ_UNSPEC = 0,
+	IWPM_NLA_MAPINFO_ULIB_NAME,
+	IWPM_NLA_MAPINFO_ULIB_VER,
+	IWPM_NLA_MAPINFO_REQ_MAX
+};
+
+enum {
+	IWPM_NLA_MAPINFO_UNSPEC = 0,
+	IWPM_NLA_MAPINFO_LOCAL_ADDR,
+	IWPM_NLA_MAPINFO_MAPPED_ADDR,
+	IWPM_NLA_MAPINFO_MAX
+};
+
+enum {
+	IWPM_NLA_MAPINFO_NUM_UNSPEC = 0,
+	IWPM_NLA_MAPINFO_SEQ,
+	IWPM_NLA_MAPINFO_SEND_NUM,
+	IWPM_NLA_MAPINFO_ACK_NUM,
+	IWPM_NLA_MAPINFO_NUM_MAX
+};
+
+enum {
+	IWPM_NLA_ERR_UNSPEC = 0,
+	IWPM_NLA_ERR_SEQ,
+	IWPM_NLA_ERR_CODE,
+	IWPM_NLA_ERR_MAX
+};
+
+/*
+ * Local service operations:
+ *   RESOLVE - The client requests the local service to resolve a path.
+ *   SET_TIMEOUT - The local service requests the client to set the timeout.
+ *   IP_RESOLVE - The client requests the local service to resolve an IP to GID.
+ */
+enum {
+	RDMA_NL_LS_OP_RESOLVE = 0,
+	RDMA_NL_LS_OP_SET_TIMEOUT,
+	RDMA_NL_LS_OP_IP_RESOLVE,
+	RDMA_NL_LS_NUM_OPS
+};
+
+/* Local service netlink message flags */
+#define RDMA_NL_LS_F_ERR	0x0100	/* Failed response */
+
+/*
+ * Local service resolve operation family header.
+ * The layout for the resolve operation:
+ *    nlmsg header
+ *    family header
+ *    attributes
+ */
+
+/*
+ * Local service path use:
+ * Specify how the path(s) will be used.
+ *   ALL - For connected CM operation (6 pathrecords)
+ *   UNIDIRECTIONAL - For unidirectional UD (1 pathrecord)
+ *   GMP - For miscellaneous GMP like operation (at least 1 reversible
+ *         pathrecord)
+ */
+enum {
+	LS_RESOLVE_PATH_USE_ALL = 0,
+	LS_RESOLVE_PATH_USE_UNIDIRECTIONAL,
+	LS_RESOLVE_PATH_USE_GMP,
+	LS_RESOLVE_PATH_USE_MAX
+};
+
+#define LS_DEVICE_NAME_MAX 64
+
+struct rdma_ls_resolve_header {
+	__u8 device_name[LS_DEVICE_NAME_MAX];
+	__u8 port_num;
+	__u8 path_use;
+};
+
+struct rdma_ls_ip_resolve_header {
+	__u32 ifindex;
+};
+
+/* Local service attribute type */
+#define RDMA_NLA_F_MANDATORY	(1 << 13)
+#define RDMA_NLA_TYPE_MASK	(~(NLA_F_NESTED | NLA_F_NET_BYTEORDER | \
+				  RDMA_NLA_F_MANDATORY))
+
+/*
+ * Local service attributes:
+ *   Attr Name       Size                       Byte order
+ *   -----------------------------------------------------
+ *   PATH_RECORD     struct ib_path_rec_data
+ *   TIMEOUT         u32                        cpu
+ *   SERVICE_ID      u64                        cpu
+ *   DGID            u8[16]                     BE
+ *   SGID            u8[16]                     BE
+ *   TCLASS          u8
+ *   PKEY            u16                        cpu
+ *   QOS_CLASS       u16                        cpu
+ *   IPV4            u32                        BE
+ *   IPV6            u8[16]                     BE
+ */
+enum {
+	LS_NLA_TYPE_UNSPEC = 0,
+	LS_NLA_TYPE_PATH_RECORD,
+	LS_NLA_TYPE_TIMEOUT,
+	LS_NLA_TYPE_SERVICE_ID,
+	LS_NLA_TYPE_DGID,
+	LS_NLA_TYPE_SGID,
+	LS_NLA_TYPE_TCLASS,
+	LS_NLA_TYPE_PKEY,
+	LS_NLA_TYPE_QOS_CLASS,
+	LS_NLA_TYPE_IPV4,
+	LS_NLA_TYPE_IPV6,
+	LS_NLA_TYPE_MAX
+};
+
+/* Local service DGID/SGID attribute: big endian */
+struct rdma_nla_ls_gid {
+	__u8		gid[16];
+};
+
+#endif /* _UAPI_RDMA_NETLINK_H */
diff --git a/ibacm/src/acm.c b/ibacm/src/acm.c
index cc7dd065f69c..5f4068f619b4 100644
--- a/ibacm/src/acm.c
+++ b/ibacm/src/acm.c
@@ -61,9 +61,6 @@
 #include <ccan/list.h>
 #include "acm_mad.h"
 #include "acm_util.h"
-#if !defined(RDMA_NL_LS_F_ERR)
-	#include "acm_netlink.h"
-#endif
 
 #define src_out     data[0]
 #define src_index   data[1]
diff --git a/ibacm/src/acm_netlink.h b/ibacm/src/acm_netlink.h
deleted file mode 100644
index 867ae8c838fc..000000000000
diff --git a/iwpmd/iwarp_pm.h b/iwpmd/iwarp_pm.h
index b5a5a457a423..fc09e4fd752a 100644
--- a/iwpmd/iwarp_pm.h
+++ b/iwpmd/iwarp_pm.h
@@ -53,7 +53,7 @@
 #include <syslog.h>
 #include <netlink/msg.h>
 #include <ccan/list.h>
-#include "iwpm_netlink.h"
+#include <rdma/rdma_netlink.h>
 
 #define IWARP_PM_PORT          3935
 #define IWARP_PM_VER_SHIFT     6
diff --git a/iwpmd/iwarp_pm_common.c b/iwpmd/iwarp_pm_common.c
index 58b1089a1998..941e0406ade7 100644
--- a/iwpmd/iwarp_pm_common.c
+++ b/iwpmd/iwarp_pm_common.c
@@ -33,11 +33,6 @@
 
 #include "iwarp_pm.h"
 
-/* Necessary only for SLES11 */
-#if !defined (NETLINK_RDMA)
-	#define NETLINK_RDMA	        20
-#endif
-
 /* iwpm config params */
 static const char * iwpm_param_names[IWPM_PARAM_NUM] =
 	{ "nl_sock_rbuf_size" };
diff --git a/iwpmd/iwarp_pm_server.c b/iwpmd/iwarp_pm_server.c
index ab90c6c4b077..ef541c8175ed 100644
--- a/iwpmd/iwarp_pm_server.c
+++ b/iwpmd/iwarp_pm_server.c
@@ -1214,8 +1214,8 @@ static int init_iwpm_clients(__u32 iwarp_clients[])
 {
 	int client_num = 2;
 
-	iwarp_clients[0] = RDMA_NL_NES;
-	iwarp_clients[1] = RDMA_NL_C4IW;
+	iwarp_clients[0] = RDMA_NL_IWCM;
+	iwarp_clients[1] = RDMA_NL_IWCM+1; /* Legacy RDMA_NL_C4IW for old kernels */
 
 	return client_num;
 }
diff --git a/iwpmd/iwpm_netlink.h b/iwpmd/iwpm_netlink.h
deleted file mode 100644
index 0edcb620de99..000000000000
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-core 3/4] verbs: Replace infiniband/sa-kern-abi.h with the kernel's uapi/rdma/ib_user_sa.h
From: Jason Gunthorpe @ 2016-10-27 23:06 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477609570-8087-1-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

Using the system header from the kernel is now the expected way to export
definitions to user space. Tree wide update to shift from the local
header and deal with the name changes.

Unfortunately this was exposed as a public installed header, for
now drop in a compat header with a #warning not to use it. Some
day we can delete it.

Apps are expected to also migrate to rdma/ib_user_sa.h as their
source for this information.

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 libibcm/cm.c             |  8 ++++----
 libibverbs/marshall.c    |  4 ++--
 libibverbs/marshall.h    |  6 +++---
 libibverbs/sa-kern-abi.h | 34 +++++-----------------------------
 librdmacm/rdma_cma_abi.h |  4 ++--
 5 files changed, 16 insertions(+), 40 deletions(-)

diff --git a/libibcm/cm.c b/libibcm/cm.c
index f775923aa73c..5bc521be5e3a 100644
--- a/libibcm/cm.c
+++ b/libibcm/cm.c
@@ -380,8 +380,8 @@ int ib_cm_listen(struct ib_cm_id *cm_id,
 
 int ib_cm_send_req(struct ib_cm_id *cm_id, struct ib_cm_req_param *param)
 {
-	struct ibv_kern_path_rec p_path;
-	struct ibv_kern_path_rec *a_path;
+	struct ib_user_path_rec p_path;
+	struct ib_user_path_rec *a_path;
 	struct cm_abi_req *cmd;
 	void *msg;
 	int result;
@@ -646,7 +646,7 @@ int ib_cm_send_lap(struct ib_cm_id *cm_id,
 		   void *private_data,
 		   uint8_t private_data_len)
 {
-	struct ibv_kern_path_rec abi_path;
+	struct ib_user_path_rec abi_path;
 	struct cm_abi_lap *cmd;
 	void *msg;
 	int result;
@@ -673,7 +673,7 @@ int ib_cm_send_lap(struct ib_cm_id *cm_id,
 int ib_cm_send_sidr_req(struct ib_cm_id *cm_id,
 			struct ib_cm_sidr_req_param *param)
 {
-	struct ibv_kern_path_rec abi_path;
+	struct ib_user_path_rec abi_path;
 	struct cm_abi_sidr_req *cmd;
 	void *msg;
 	int result;
diff --git a/libibverbs/marshall.c b/libibverbs/marshall.c
index a33048404d35..5b0260832ca7 100644
--- a/libibverbs/marshall.c
+++ b/libibverbs/marshall.c
@@ -90,7 +90,7 @@ void ibv_copy_qp_attr_from_kern(struct ibv_qp_attr *dst,
 }
 
 void ibv_copy_path_rec_from_kern(struct ibv_sa_path_rec *dst,
-				 struct ibv_kern_path_rec *src)
+				 struct ib_user_path_rec *src)
 {
 	memcpy(dst->dgid.raw, src->dgid, sizeof dst->dgid);
 	memcpy(dst->sgid.raw, src->sgid, sizeof dst->sgid);
@@ -114,7 +114,7 @@ void ibv_copy_path_rec_from_kern(struct ibv_sa_path_rec *dst,
 	dst->packet_life_time_selector = src->packet_life_time_selector;
 }
 
-void ibv_copy_path_rec_to_kern(struct ibv_kern_path_rec *dst,
+void ibv_copy_path_rec_to_kern(struct ib_user_path_rec *dst,
 			       struct ibv_sa_path_rec *src)
 {
 	memcpy(dst->dgid, src->dgid.raw, sizeof src->dgid);
diff --git a/libibverbs/marshall.h b/libibverbs/marshall.h
index 8be76c5444d2..1dab1114a58c 100644
--- a/libibverbs/marshall.h
+++ b/libibverbs/marshall.h
@@ -36,7 +36,7 @@
 #include <infiniband/verbs.h>
 #include <infiniband/sa.h>
 #include <infiniband/kern-abi.h>
-#include <infiniband/sa-kern-abi.h>
+#include <rdma/ib_user_sa.h>
 
 #ifdef __cplusplus
 #  define BEGIN_C_DECLS extern "C" {
@@ -55,9 +55,9 @@ void ibv_copy_ah_attr_from_kern(struct ibv_ah_attr *dst,
 				struct ibv_kern_ah_attr *src);
 
 void ibv_copy_path_rec_from_kern(struct ibv_sa_path_rec *dst,
-				 struct ibv_kern_path_rec *src);
+				 struct ib_user_path_rec *src);
 
-void ibv_copy_path_rec_to_kern(struct ibv_kern_path_rec *dst,
+void ibv_copy_path_rec_to_kern(struct ib_user_path_rec *dst,
 			       struct ibv_sa_path_rec *src);
 
 END_C_DECLS
diff --git a/libibverbs/sa-kern-abi.h b/libibverbs/sa-kern-abi.h
index 4927d114ea0f..134aeccb4a0a 100644
--- a/libibverbs/sa-kern-abi.h
+++ b/libibverbs/sa-kern-abi.h
@@ -1,6 +1,4 @@
 /*
- * Copyright (c) 2005 Intel Corporation.  All rights reserved.
- *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
  * General Public License (GPL) Version 2, available from the file
@@ -33,33 +31,11 @@
 #ifndef INFINIBAND_SA_KERN_ABI_H
 #define INFINIBAND_SA_KERN_ABI_H
 
-#include <linux/types.h>
+#warning "This header is obsolete, use rdma/ib_user_sa.h instead"
 
-/*
- * Obsolete, deprecated names.  Will be removed in libibverbs 1.1.
- */
-#define ib_kern_path_rec	ibv_kern_path_rec
+#include <rdma/ib_user_sa.h>
 
-struct ibv_kern_path_rec {
-	__u8  dgid[16];
-	__u8  sgid[16];
-	__u16 dlid;
-	__u16 slid;
-	__u32 raw_traffic;
-	__u32 flow_label;
-	__u32 reversible;
-	__u32 mtu;
-	__u16 pkey;
-	__u8  hop_limit;
-	__u8  traffic_class;
-	__u8  numb_path;
-	__u8  sl;
-	__u8  mtu_selector;
-	__u8  rate_selector;
-	__u8  rate;
-	__u8  packet_life_time_selector;
-	__u8  packet_life_time;
-	__u8  preference;
-};
+#define ib_kern_path_rec ib_user_path_rec
+#define ibv_kern_path_rec ib_user_path_rec
 
-#endif /* INFINIBAND_SA_KERN_ABI_H */
+#endif
diff --git a/librdmacm/rdma_cma_abi.h b/librdmacm/rdma_cma_abi.h
index b72f33080b42..71b93f888cc8 100644
--- a/librdmacm/rdma_cma_abi.h
+++ b/librdmacm/rdma_cma_abi.h
@@ -34,7 +34,7 @@
 #define RDMA_CMA_ABI_H
 
 #include <infiniband/kern-abi.h>
-#include <infiniband/sa-kern-abi.h>
+#include <rdma/ib_user_sa.h>
 #include <infiniband/sa.h>
 
 /*
@@ -173,7 +173,7 @@ struct ucma_abi_query {
 
 struct ucma_abi_query_route_resp {
 	__u64 node_guid;
-	struct ibv_kern_path_rec ib_route[2];
+	struct ib_user_path_rec ib_route[2];
 	struct sockaddr_in6 src_addr;
 	struct sockaddr_in6 dst_addr;
 	__u32 num_paths;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH rdma-core 4/4] ibcm: Replace infiniband/cm_abi.h with the kernel's uapi/rdma/ib_user_cm.h
From: Jason Gunthorpe @ 2016-10-27 23:06 UTC (permalink / raw)
  To: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA; +Cc: Sean Hefty
In-Reply-To: <1477609570-8087-1-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

Using the system header from the kernel is now the expected way to export
definitions to user space.

Unfortunately this was exposed as a public installed header, for
now drop in a compat header with a #warning not to use it. Some
day we can delete it.

Signed-off-by: Jason Gunthorpe <jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
---
 libibcm/cm.c     |  70 ++++++-----
 libibcm/cm_abi.h | 351 +++++++------------------------------------------------
 2 files changed, 83 insertions(+), 338 deletions(-)

diff --git a/libibcm/cm.c b/libibcm/cm.c
index 5bc521be5e3a..433d6e88f8ac 100644
--- a/libibcm/cm.c
+++ b/libibcm/cm.c
@@ -45,7 +45,7 @@
 #include <stddef.h>
 
 #include <infiniband/cm.h>
-#include <infiniband/cm_abi.h>
+#include <rdma/ib_user_cm.h>
 #include <infiniband/driver.h>
 #include <infiniband/marshall.h>
 
@@ -53,6 +53,9 @@
 
 #define PFX "libibcm: "
 
+#define IB_USER_CM_MIN_ABI_VERSION     4
+#define IB_USER_CM_MAX_ABI_VERSION     5
+
 static int abi_ver;
 static pthread_mutex_t mut = PTHREAD_MUTEX_INITIALIZER;
 
@@ -69,7 +72,7 @@ static inline int ERR(int err)
 
 #define CM_CREATE_MSG_CMD_RESP(msg, cmd, resp, type, size) \
 do {                                        \
-	struct cm_abi_cmd_hdr *hdr;         \
+	struct ib_ucm_cmd_hdr *hdr;         \
                                             \
 	size = sizeof(*hdr) + sizeof(*cmd); \
 	msg = alloca(size);                 \
@@ -89,7 +92,7 @@ do {                                        \
 
 #define CM_CREATE_MSG_CMD(msg, cmd, type, size) \
 do {                                        \
-	struct cm_abi_cmd_hdr *hdr;         \
+	struct ib_ucm_cmd_hdr *hdr;         \
                                             \
 	size = sizeof(*hdr) + sizeof(*cmd); \
 	msg = alloca(size);                 \
@@ -244,8 +247,8 @@ err:	ib_cm_free_id(cm_id_priv);
 int ib_cm_create_id(struct ib_cm_device *device,
 		    struct ib_cm_id **cm_id, void *context)
 {
-	struct cm_abi_create_id_resp *resp;
-	struct cm_abi_create_id *cmd;
+	struct ib_ucm_create_id_resp *resp;
+	struct ib_ucm_create_id *cmd;
 	struct cm_id_private *cm_id_priv;
 	void *msg;
 	int result;
@@ -274,8 +277,8 @@ err:	ib_cm_free_id(cm_id_priv);
 
 int ib_cm_destroy_id(struct ib_cm_id *cm_id)
 {
-	struct cm_abi_destroy_id_resp *resp;
-	struct cm_abi_destroy_id *cmd;
+	struct ib_ucm_destroy_id_resp *resp;
+	struct ib_ucm_destroy_id *cmd;
 	struct cm_id_private *cm_id_priv;
 	void *msg;
 	int result;
@@ -303,8 +306,8 @@ int ib_cm_destroy_id(struct ib_cm_id *cm_id)
 
 int ib_cm_attr_id(struct ib_cm_id *cm_id, struct ib_cm_attr_param *param)
 {
-	struct cm_abi_attr_id_resp *resp;
-	struct cm_abi_attr_id *cmd;
+	struct ib_ucm_attr_id_resp *resp;
+	struct ib_ucm_attr_id *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -333,7 +336,7 @@ int ib_cm_init_qp_attr(struct ib_cm_id *cm_id,
 		       int *qp_attr_mask)
 {
 	struct ibv_kern_qp_attr *resp;
-	struct cm_abi_init_qp_attr *cmd;
+	struct ib_ucm_init_qp_attr *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -361,7 +364,7 @@ int ib_cm_listen(struct ib_cm_id *cm_id,
 		 uint64_t service_id,
 		 uint64_t service_mask)
 {
-	struct cm_abi_listen *cmd;
+	struct ib_ucm_listen *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -382,7 +385,7 @@ int ib_cm_send_req(struct ib_cm_id *cm_id, struct ib_cm_req_param *param)
 {
 	struct ib_user_path_rec p_path;
 	struct ib_user_path_rec *a_path;
-	struct cm_abi_req *cmd;
+	struct ib_ucm_req *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -433,7 +436,7 @@ int ib_cm_send_req(struct ib_cm_id *cm_id, struct ib_cm_req_param *param)
 
 int ib_cm_send_rep(struct ib_cm_id *cm_id, struct ib_cm_rep_param *param)
 {
-	struct cm_abi_rep *cmd;
+	struct ib_ucm_rep *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -471,7 +474,7 @@ static inline int cm_send_private_data(struct ib_cm_id *cm_id,
 				       void *private_data,
 				       uint8_t private_data_len)
 {
-	struct cm_abi_private_data *cmd;
+	struct ib_ucm_private_data *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -517,6 +520,14 @@ int ib_cm_send_drep(struct ib_cm_id *cm_id,
 
 static int cm_establish(struct ib_cm_id *cm_id)
 {
+	/* In kernel ABI 4 ESTABLISH was repurposed as NOTIFY and gained an
+	   extra field. For some reason the compat definitions were deleted
+	   from the uapi headers :( */
+#define IB_USER_CM_CMD_ESTABLISH IB_USER_CM_CMD_NOTIFY
+	struct cm_abi_establish { /* ABI 4 support */
+		__u32 id;
+	};
+
 	struct cm_abi_establish *cmd;
 	void *msg;
 	int result;
@@ -534,7 +545,7 @@ static int cm_establish(struct ib_cm_id *cm_id)
 
 int ib_cm_notify(struct ib_cm_id *cm_id, enum ibv_event_type event)
 {
-	struct cm_abi_notify *cmd;
+	struct ib_ucm_notify *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -565,7 +576,7 @@ static inline int cm_send_status(struct ib_cm_id *cm_id,
 				 void *private_data,
 				 uint8_t private_data_len)
 {
-	struct cm_abi_info *cmd;
+	struct ib_ucm_info *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -620,7 +631,7 @@ int ib_cm_send_mra(struct ib_cm_id *cm_id,
 		   void *private_data,
 		   uint8_t private_data_len)
 {
-	struct cm_abi_mra *cmd;
+	struct ib_ucm_mra *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -647,7 +658,7 @@ int ib_cm_send_lap(struct ib_cm_id *cm_id,
 		   uint8_t private_data_len)
 {
 	struct ib_user_path_rec abi_path;
-	struct cm_abi_lap *cmd;
+	struct ib_ucm_lap *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -674,7 +685,7 @@ int ib_cm_send_sidr_req(struct ib_cm_id *cm_id,
 			struct ib_cm_sidr_req_param *param)
 {
 	struct ib_user_path_rec abi_path;
-	struct cm_abi_sidr_req *cmd;
+	struct ib_ucm_sidr_req *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -686,7 +697,6 @@ int ib_cm_send_sidr_req(struct ib_cm_id *cm_id,
 	cmd->id             = cm_id->handle;
 	cmd->sid            = param->service_id;
 	cmd->timeout        = param->timeout_ms;
-	cmd->pkey           = param->path->pkey;
 	cmd->max_cm_retries = param->max_cm_retries;
 
 	ibv_copy_path_rec_to_kern(&abi_path, param->path);
@@ -707,7 +717,7 @@ int ib_cm_send_sidr_req(struct ib_cm_id *cm_id,
 int ib_cm_send_sidr_rep(struct ib_cm_id *cm_id,
 			struct ib_cm_sidr_rep_param *param)
 {
-	struct cm_abi_sidr_rep *cmd;
+	struct ib_ucm_sidr_rep *cmd;
 	void *msg;
 	int result;
 	int size;
@@ -739,7 +749,7 @@ int ib_cm_send_sidr_rep(struct ib_cm_id *cm_id,
 }
 
 static void cm_event_req_get(struct ib_cm_req_event_param *ureq,
-			     struct cm_abi_req_event_resp *kreq)
+			     struct ib_ucm_req_event_resp *kreq)
 {
 	ureq->remote_ca_guid             = kreq->remote_ca_guid;
 	ureq->remote_qkey                = kreq->remote_qkey;
@@ -763,7 +773,7 @@ static void cm_event_req_get(struct ib_cm_req_event_param *ureq,
 }
 
 static void cm_event_rep_get(struct ib_cm_rep_event_param *urep,
-			     struct cm_abi_rep_event_resp *krep)
+			     struct ib_ucm_rep_event_resp *krep)
 {
 	urep->remote_ca_guid      = krep->remote_ca_guid;
 	urep->remote_qkey         = krep->remote_qkey;
@@ -779,7 +789,7 @@ static void cm_event_rep_get(struct ib_cm_rep_event_param *urep,
 }
 
 static void cm_event_sidr_rep_get(struct ib_cm_sidr_rep_event_param *urep,
-				  struct cm_abi_sidr_rep_event_resp *krep)
+				  struct ib_ucm_sidr_rep_event_resp *krep)
 {
 	urep->status = krep->status;
 	urep->qkey   = krep->qkey;
@@ -789,9 +799,9 @@ static void cm_event_sidr_rep_get(struct ib_cm_sidr_rep_event_param *urep,
 int ib_cm_get_event(struct ib_cm_device *device, struct ib_cm_event **event)
 {
 	struct cm_id_private *cm_id_priv;
-	struct cm_abi_cmd_hdr *hdr;
-	struct cm_abi_event_get *cmd;
-	struct cm_abi_event_resp *resp;
+	struct ib_ucm_cmd_hdr *hdr;
+	struct ib_ucm_event_get *cmd;
+	struct ib_ucm_event_resp *resp;
 	struct ib_cm_event *evt = NULL;
 	struct ibv_sa_path_rec *path_a = NULL;
 	struct ibv_sa_path_rec *path_b = NULL;
@@ -861,7 +871,7 @@ int ib_cm_get_event(struct ib_cm_device *device, struct ib_cm_event **event)
 	evt->cm_id = (void *) (uintptr_t) resp->uid;
 	evt->event = resp->event;
 
-	if (resp->present & CM_ABI_PRES_PRIMARY) {
+	if (resp->present & IB_UCM_PRES_PRIMARY) {
 		path_a = malloc(sizeof(*path_a));
 		if (!path_a) {
 			result = ERR(ENOMEM);
@@ -869,7 +879,7 @@ int ib_cm_get_event(struct ib_cm_device *device, struct ib_cm_event **event)
 		}
 	}
 
-	if (resp->present & CM_ABI_PRES_ALTERNATE) {
+	if (resp->present & IB_UCM_PRES_ALTERNATE) {
 		path_b = malloc(sizeof(*path_b));
 		if (!path_b) {
 			result = ERR(ENOMEM);
@@ -940,7 +950,7 @@ int ib_cm_get_event(struct ib_cm_device *device, struct ib_cm_event **event)
 		break;
 	}
 
-	if (resp->present & CM_ABI_PRES_DATA) {
+	if (resp->present & IB_UCM_PRES_DATA) {
 		evt->private_data = data;
 		data = NULL;
 	}
diff --git a/libibcm/cm_abi.h b/libibcm/cm_abi.h
index 8fd10dd56150..8b76dc1fba78 100644
--- a/libibcm/cm_abi.h
+++ b/libibcm/cm_abi.h
@@ -1,7 +1,4 @@
 /*
- * Copyright (c) 2005 Topspin Communications.  All rights reserved.
- * Copyright (c) 2005 Intel Corporation.  All rights reserved.
- *
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
  * General Public License (GPL) Version 2, available from the file
@@ -29,310 +26,48 @@
  * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  * SOFTWARE.
- *
- * $Id$
- */
-
-#ifndef CM_ABI_H
-#define CM_ABI_H
-
-#include <linux/types.h>
-#include <infiniband/sa.h>
-#include <infiniband/marshall.h>
-
-/*
- * This file must be kept in sync with the kernel's version of ib_user_cm.h
  */
 
-#define IB_USER_CM_MIN_ABI_VERSION	4
-#define IB_USER_CM_MAX_ABI_VERSION	5
-
-enum {
-	IB_USER_CM_CMD_CREATE_ID,
-	IB_USER_CM_CMD_DESTROY_ID,
-	IB_USER_CM_CMD_ATTR_ID,
-
-	IB_USER_CM_CMD_LISTEN,
-	IB_USER_CM_CMD_NOTIFY,
-	IB_USER_CM_CMD_ESTABLISH = IB_USER_CM_CMD_NOTIFY, /* ABI 4 support */
-	
-	IB_USER_CM_CMD_SEND_REQ,
-	IB_USER_CM_CMD_SEND_REP,
-	IB_USER_CM_CMD_SEND_RTU,
-	IB_USER_CM_CMD_SEND_DREQ,
-	IB_USER_CM_CMD_SEND_DREP,
-	IB_USER_CM_CMD_SEND_REJ,
-	IB_USER_CM_CMD_SEND_MRA,
-	IB_USER_CM_CMD_SEND_LAP,
-	IB_USER_CM_CMD_SEND_APR,
-	IB_USER_CM_CMD_SEND_SIDR_REQ,
-	IB_USER_CM_CMD_SEND_SIDR_REP,
-
-	IB_USER_CM_CMD_EVENT,
-	IB_USER_CM_CMD_INIT_QP_ATTR,
-};
-/*
- * command ABI structures.
- */
-struct cm_abi_cmd_hdr {
-	__u32 cmd;
-	__u16 in;
-	__u16 out;
-};
-
-struct cm_abi_create_id {
-	__u64 uid;
-	__u64 response;
-};
-
-struct cm_abi_create_id_resp {
-	__u32 id;
-};
-
-struct cm_abi_destroy_id {
-	__u64 response;
-	__u32 id;
-	__u32 reserved;
-};
-
-struct cm_abi_destroy_id_resp {
-	__u32 events_reported;
-};
-
-struct cm_abi_attr_id {
-	__u64 response;
-	__u32 id;
-	__u32 reserved;
-};
-
-struct cm_abi_attr_id_resp {
-	__u64 service_id;
-	__u64 service_mask;
-	__u32 local_id;
-	__u32 remote_id;
-};
-
-struct cm_abi_init_qp_attr {
-	__u64 response;
-	__u32 id;
-	__u32 qp_state;
-};
-
-struct cm_abi_listen {
-	__u64 service_id;
-	__u64 service_mask;
-	__u32 id;
-	__u32 reserved;
-};
-
-struct cm_abi_establish {	/* ABI 4 support */
-	__u32 id;
-};
-
-struct cm_abi_notify {
-	__u32 id;
-	__u32 event;
-};
-
-struct cm_abi_private_data {
-	__u64 data;
-	__u32 id;
-	__u8  len;
-	__u8  reserved[3];
-};
-
-struct cm_abi_req {
-	__u32 id;
-	__u32 qpn;
-	__u32 qp_type;
-	__u32 psn;
-	__u64 sid;
-	__u64 data;
-	__u64 primary_path;
-	__u64 alternate_path;
-	__u8  len;
-	__u8  peer_to_peer;
-	__u8  responder_resources;
-	__u8  initiator_depth;
-	__u8  remote_cm_response_timeout;
-	__u8  flow_control;
-	__u8  local_cm_response_timeout;
-	__u8  retry_count;
-	__u8  rnr_retry_count;
-	__u8  max_cm_retries;
-	__u8  srq;
-	__u8  reserved[5];
-};
-
-struct cm_abi_rep {
-	__u64 uid;
-	__u64 data;
-	__u32 id;
-	__u32 qpn;
-	__u32 psn;
-	__u8  len;
-	__u8  responder_resources;
-	__u8  initiator_depth;
-	__u8  target_ack_delay;
-	__u8  failover_accepted;
-	__u8  flow_control;
-	__u8  rnr_retry_count;
-	__u8  srq;
-	__u8  reserved[4];
-};
-
-struct cm_abi_info {
-	__u32 id;
-	__u32 status;
-	__u64 info;
-	__u64 data;
-	__u8  info_len;
-	__u8  data_len;
-	__u8  reserved[6];
-};
-
-struct cm_abi_mra {
-	__u64 data;
-	__u32 id;
-	__u8  len;
-	__u8  timeout;
-	__u8  reserved[2];
-};
-
-struct cm_abi_lap {
-	__u64 path;
-	__u64 data;
-	__u32 id;
-	__u8  len;
-	__u8  reserved[3];
-};
-
-struct cm_abi_sidr_req {
-	__u32 id;
-	__u32 timeout;
-	__u64 sid;
-	__u64 data;
-	__u64 path;
-	__u16 pkey;
-	__u8  len;
-	__u8  max_cm_retries;
-	__u8  reserved[4];
-};
-
-struct cm_abi_sidr_rep {
-	__u32 id;
-	__u32 qpn;
-	__u32 qkey;
-	__u32 status;
-	__u64 info;
-	__u64 data;
-	__u8  info_len;
-	__u8  data_len;
-	__u8  reserved[6];
-};
-/*
- * event notification ABI structures.
- */
-struct cm_abi_event_get {
-	__u64 response;
-	__u64 data;
-	__u64 info;
-	__u8  data_len;
-	__u8  info_len;
-	__u8  reserved[6];
-};
-
-struct cm_abi_req_event_resp {
-	struct ibv_kern_path_rec primary_path;
-	struct ibv_kern_path_rec alternate_path;
-	__u64                  remote_ca_guid;
-	__u32                  remote_qkey;
-	__u32                  remote_qpn;
-	__u32                  qp_type;
-	__u32                  starting_psn;
-	__u8  responder_resources;
-	__u8  initiator_depth;
-	__u8  local_cm_response_timeout;
-	__u8  flow_control;
-	__u8  remote_cm_response_timeout;
-	__u8  retry_count;
-	__u8  rnr_retry_count;
-	__u8  srq;
-	__u8  port;
-	__u8  reserved[7];
-};
-
-struct cm_abi_rep_event_resp {
-	__u64 remote_ca_guid;
-	__u32 remote_qkey;
-	__u32 remote_qpn;
-	__u32 starting_psn;
-	__u8  responder_resources;
-	__u8  initiator_depth;
-	__u8  target_ack_delay;
-	__u8  failover_accepted;
-	__u8  flow_control;
-	__u8  rnr_retry_count;
-	__u8  srq;
-	__u8  reserved[5];
-};
-
-struct cm_abi_rej_event_resp {
-	__u32 reason;
-	/* ari in cm_abi_event_get info field. */
-};
-
-struct cm_abi_mra_event_resp {
-	__u8  timeout;
-	__u8  reserved[3];
-};
-
-struct cm_abi_lap_event_resp {
-	struct ibv_kern_path_rec path;
-};
-
-struct cm_abi_apr_event_resp {
-	__u32 status;
-	/* apr info in cm_abi_event_get info field. */
-};
-
-struct cm_abi_sidr_req_event_resp {
-	__u16 pkey;
-	__u8  port;
-	__u8  reserved;
-};
-
-struct cm_abi_sidr_rep_event_resp {
-	__u32 status;
-	__u32 qkey;
-	__u32 qpn;
-	/* info in cm_abi_event_get info field. */
-};
-
-#define CM_ABI_PRES_DATA      0x01
-#define CM_ABI_PRES_INFO      0x02
-#define CM_ABI_PRES_PRIMARY   0x04
-#define CM_ABI_PRES_ALTERNATE 0x08
-
-struct cm_abi_event_resp {
-	__u64 uid;
-	__u32 id;
-	__u32 event;
-	__u32 present;
-	__u32 reserved;
-	union {
-		struct cm_abi_req_event_resp req_resp;
-		struct cm_abi_rep_event_resp rep_resp;
-		struct cm_abi_rej_event_resp rej_resp;
-		struct cm_abi_mra_event_resp mra_resp;
-		struct cm_abi_lap_event_resp lap_resp;
-		struct cm_abi_apr_event_resp apr_resp;
-
-		struct cm_abi_sidr_req_event_resp sidr_req_resp;
-		struct cm_abi_sidr_rep_event_resp sidr_rep_resp;
-
-		__u32                             send_status;
-	} u;
-};
-
-#endif /* CM_ABI_H */
+#ifndef INFINIBAND_CM_ABI_H
+#define INFINIBAND_CM_ABI_H
+
+#warning "This header is obsolete, use rdma/ib_user_cm.h instead"
+
+#include <rdma/ib_user_cm.h>
+
+#define cm_abi_cmd_hdr ib_ucm_cmd_hdr
+#define cm_abi_create_id ib_ucm_create_id
+#define cm_abi_create_id_resp ib_ucm_create_id_resp
+#define cm_abi_destroy_id ib_ucm_destroy_id
+#define cm_abi_destroy_id_resp ib_ucm_destroy_id_resp
+#define cm_abi_attr_id ib_ucm_attr_id
+#define cm_abi_attr_id_resp ib_ucm_attr_id_resp
+#define cm_abi_init_qp_attr ib_ucm_init_qp_attr
+#define cm_abi_listen ib_ucm_listen
+#define cm_abi_establish ib_ucm_establish
+#define cm_abi_notify ib_ucm_notify
+#define cm_abi_private_data ib_ucm_private_data
+#define cm_abi_req ib_ucm_req
+#define cm_abi_rep ib_ucm_rep
+#define cm_abi_info ib_ucm_info
+#define cm_abi_mra ib_ucm_mra
+#define cm_abi_lap ib_ucm_lap
+#define cm_abi_sidr_req ib_ucm_sidr_req
+#define cm_abi_sidr_rep ib_ucm_sidr_rep
+#define cm_abi_event_get ib_ucm_event_get
+#define cm_abi_req_event_resp ib_ucm_req_event_resp
+#define cm_abi_rep_event_resp ib_ucm_rep_event_resp
+#define cm_abi_rej_event_resp ib_ucm_rej_event_resp
+#define cm_abi_mra_event_resp ib_ucm_mra_event_resp
+#define cm_abi_lap_event_resp ib_ucm_lap_event_resp
+#define cm_abi_apr_event_resp ib_ucm_apr_event_resp
+#define cm_abi_sidr_req_event_resp ib_ucm_sidr_req_event_resp
+#define cm_abi_sidr_rep_event_resp ib_ucm_sidr_rep_event_resp
+#define cm_abi_event_resp ib_ucm_event_resp
+
+#define CM_ABI_PRES_DATA IB_UCM_PRES_DATA
+#define CM_ABI_PRES_INFO IB_UCM_PRES_INFO
+#define CM_ABI_PRES_PRIMARY IB_UCM_PRES_PRIMARY
+#define CM_ABI_PRES_ALTERNATE IB_UCM_PRES_ALTERNATE
+
+#endif
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH 2/3] iopmem : Add a block device driver for PCIe attached IO memory.
From: Christoph Hellwig @ 2016-10-28  6:45 UTC (permalink / raw)
  To: Stephen Bates
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-y27Ovi1pjclAfugRpC6u6w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	ross.zwisler-VuQAYsv1563Yd54FQh9/CA, willy-VuQAYsv1563Yd54FQh9/CA,
	jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	haggaie-VPRAkNaXOzVWk0Htik3J/w, hch-wEGCiKHe2LqWVfeAwA7xHQ,
	axboe-b10kYP2dOMg, corbet-T1hC0tSOHrs,
	jim.macdonald-FgSLVYC75IpWk0Htik3J/w,
	sbates-Rgftl6RXld5BDgjK7y7TUQ, logang-OTvnGxWRz7hWk0Htik3J/w
In-Reply-To: <1476826937-20665-3-git-send-email-sbates-pv7U853sEMVWk0Htik3J/w@public.gmane.org>

> Signed-off-by: Stephen Bates <sbates-pv7U853sEMVWk0Htik3J/w@public.gmane.org>

FYI, that address has bounced throught the whole thread for me,
replacing it with a known good one for now.


> + * This driver is heavily based on drivers/block/pmem.c.
> + * Copyright (c) 2014, Intel Corporation.
> + * Copyright (C) 2007 Nick Piggin
> + * Copyright (C) 2007 Novell Inc.

Is there anything left of it actually?  I didn't spot anything
obvious.  Nevermind that we don't have a file with that name anymore :)

> +  /*
> +   * We can only access the iopmem device with full 32-bit word
> +   * accesses which cannot be gaurantee'd by the regular memcpy
> +   */

Odd comment formatting. 

> +static void memcpy_from_iopmem(void *dst, const void *src, size_t sz)
> +{
> +	u64 *wdst = dst;
> +	const u64 *wsrc = src;
> +	u64 tmp;
> +
> +	while (sz >= sizeof(*wdst)) {
> +		*wdst++ = *wsrc++;
> +		sz -= sizeof(*wdst);
> +	}
> +
> +	if (!sz)
> +		return;
> +
> +	tmp = *wsrc;
> +	memcpy(wdst, &tmp, sz);
> +}

And then we dod a memcpy here anyway.  And no volatile whatsover, so
the compiler could do anything to it.  I defintively feel a bit uneasy
about having this in the driver as well.  Can we define the exact
semantics for this and define it by the system, possibly in an arch
specific way?

> +static void iopmem_do_bvec(struct iopmem_device *iopmem, struct page *page,
> +			   unsigned int len, unsigned int off, bool is_write,
> +			   sector_t sector)
> +{
> +	phys_addr_t iopmem_off = sector * 512;
> +	void *iopmem_addr = iopmem->virt_addr + iopmem_off;
> +
> +	if (!is_write) {
> +		read_iopmem(page, off, iopmem_addr, len);
> +		flush_dcache_page(page);
> +	} else {
> +		flush_dcache_page(page);
> +		write_iopmem(iopmem_addr, page, off, len);
> +	}

How about moving the  address and offset calculation as well as the
cache flushing into read_iopmem/write_iopmem and removing this function?

> +static blk_qc_t iopmem_make_request(struct request_queue *q, struct bio *bio)
> +{
> +	struct iopmem_device *iopmem = q->queuedata;
> +	struct bio_vec bvec;
> +	struct bvec_iter iter;
> +
> +	bio_for_each_segment(bvec, bio, iter) {
> +		iopmem_do_bvec(iopmem, bvec.bv_page, bvec.bv_len,
> +			    bvec.bv_offset, op_is_write(bio_op(bio)),
> +			    iter.bi_sector);

op_is_write just checks the data direction.  I'd feel much more
comfortable with a switch on the op, e.g.

	switch (bio_op(bio))) {
	case REQ_OP_READ:
		bio_for_each_segment(bvec, bio, iter)
			read_iopmem(iopmem, bvec, iter.bi_sector);
		break;
	case REQ_OP_READ:
		bio_for_each_segment(bvec, bio, iter)
			write_iopmem(iopmem, bvec, iter.bi_sector);
	defualt:
		WARN_ON_ONCE(1);
		bio->bi_error = -EIO;
		break;
	}
			

> +static long iopmem_direct_access(struct block_device *bdev, sector_t sector,
> +			       void **kaddr, pfn_t *pfn, long size)
> +{
> +	struct iopmem_device *iopmem = bdev->bd_queue->queuedata;
> +	resource_size_t offset = sector * 512;
> +
> +	if (!iopmem)
> +		return -ENODEV;

I don't think this can ever happen, can it?

> +static DEFINE_IDA(iopmem_instance_ida);
> +static DEFINE_SPINLOCK(ida_lock);
> +
> +static int iopmem_set_instance(struct iopmem_device *iopmem)
> +{
> +	int instance, error;
> +
> +	do {
> +		if (!ida_pre_get(&iopmem_instance_ida, GFP_KERNEL))
> +			return -ENODEV;
> +
> +		spin_lock(&ida_lock);
> +		error = ida_get_new(&iopmem_instance_ida, &instance);
> +		spin_unlock(&ida_lock);
> +
> +	} while (error == -EAGAIN);
> +
> +	if (error)
> +		return -ENODEV;
> +
> +	iopmem->instance = instance;
> +	return 0;
> +}
> +
> +static void iopmem_release_instance(struct iopmem_device *iopmem)
> +{
> +	spin_lock(&ida_lock);
> +	ida_remove(&iopmem_instance_ida, iopmem->instance);
> +	spin_unlock(&ida_lock);
> +}
> +

Just use ida_simple_get/ida_simple_remove instead to take care
of the locking and preloading, and get rid of these two functions.


> +static int iopmem_attach_disk(struct iopmem_device *iopmem)
> +{
> +	struct gendisk *disk;
> +	int nid = dev_to_node(iopmem->dev);
> +	struct request_queue *q = iopmem->queue;
> +
> +	blk_queue_write_cache(q, true, true);

You don't handle flush commands or the fua bit in make_request, so
this setting seems wrong.

> +	int err = 0;
> +	int nid = dev_to_node(&pdev->dev);
> +
> +	if (pci_enable_device_mem(pdev) < 0) {

propagate the actual error code, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 3/3] iopmem : Add documentation for iopmem driver
From: Christoph Hellwig @ 2016-10-28  6:46 UTC (permalink / raw)
  To: Stephen Bates
  Cc: jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/,
	sbates-Rgftl6RXld5BDgjK7y7TUQ, haggaie-VPRAkNaXOzVWk0Htik3J/w,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-y27Ovi1pjclAfugRpC6u6w, corbet-T1hC0tSOHrs,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	jim.macdonald-FgSLVYC75IpWk0Htik3J/w,
	linux-block-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, axboe-b10kYP2dOMg,
	hch-wEGCiKHe2LqWVfeAwA7xHQ
In-Reply-To: <1476826937-20665-4-git-send-email-sbates-pv7U853sEMVWk0Htik3J/w@public.gmane.org>

I'd say please fold this into the previous patch.

^ permalink raw reply

* Re: [PATCH rdma-core 3/4] verbs: Replace infiniband/sa-kern-abi.h with the kernel's uapi/rdma/ib_user_sa.h
From: Christoph Hellwig @ 2016-10-28  6:53 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1477609570-8087-4-git-send-email-jgunthorpe-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

I can't see how this is supposed to work, there is no copy at all
of ib_user_sa.h in the tree.

Having to rely on system headers is a sure way to make the build
break most of the time.

What we need is a canonical copy of the kernel heades in the rdma-core
tree, with the option of just pointing to a kernel tree instead.

E.g. by default use headers from rdma-core/kernel/headers, but
optionally allow the build systems to use those from a kernel tree
explicitly specified.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [RFC ABI V5 07/10] IB/core: Support getting IOCTL header/SGEs from kernel space
From: Christoph Hellwig @ 2016-10-28  6:59 UTC (permalink / raw)
  To: Matan Barak
  Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Doug Ledford, Jason Gunthorpe,
	Sean Hefty, Christoph Lameter, Liran Liss, Haggai Eran,
	Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1477579398-6875-8-git-send-email-matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

> +	mm_segment_t currentfs = get_fs();
>  
>  	if (!ib_dev)
>  		return -EIO;
> @@ -240,8 +242,10 @@ static long ib_uverbs_cmd_verbs(struct ib_device *ib_dev,
>  		goto out;
>  	}
>  
> +	set_fs(oldfs);
>  	err = uverbs_handle_action(buf, ctx->uattrs, hdr->num_attrs, ib_dev,
>  				   file, action, ctx->uverbs_attr_array);
> +	set_fs(currentfs);

Adding this magic in new code is not acceptable.  Any given API
must take either a kernel or a user pointer.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-10-28  7:28 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161027145139.GD6818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 22:51, Jason Gunthorpe 写道:
> On Thu, Oct 27, 2016 at 11:41:35AM +0800, oulijun wrote:
> 
>> when startup by DT, the content of device/modalias is
>> of:NinfinibandT<NULL>Chisilicon,hns-roce-v1.  it is long and
>> complex.
> 
> If you want to match the hardware then properly parsing the mod alias
> is the right way to do it.
> 
>> the content of device/of_node/compatible is hisilicon,hns-roce-v1
> 
> No, it is more complex than that, future DTs may have a list for
> instance, just a string match is not good enough
> 
>> when startup by APCI, the content of device/modalias is
>> acpi:HISI00D1:
> 
> Also may be more complex..
> 
>>   if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
>> 	...
>>   if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
>>  	...
> 
> You can also match PCI with modalias.
> 
>>> But I wonder if this isn't generically better to be
>>>
>>>  last_dir(readlink("device/driver")) == "hns"
>>>
> 
>> I think it is not insteaded. because it will be find the hns in the
>> path(device/driver)
> 
> I said readlink, which will return something like '../../../../bus/platform/drivers/hhns'
> 
> So just detect the driver is calld hhns and let the kernel deal with
> figuring it out.
> 
> Jason
> 
> .
> 
Hi, Jason
  I have verified it according to your advice and learned the readlink. i think that it need to
exist the file, as follows:
   xx -> /xxxx/../../

but not have it.
rs/t@(none)$ cd /sys/class/infiniband/hns_0/device/driver/module/driver
root@(none)$ ls
platform:hns_roce

I try to read the result of readlink(), but it is fail:
readlink("/sys/class/infiniband/hns_0/device/driver/module/driver", buf, sizeof *buf)

in addition that, the last() is not exit in C library.

Lijun Ou


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-core 1/7] libhns: Add initial main frame
From: oulijun @ 2016-10-28  7:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <20161027145139.GD6818-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>

在 2016/10/27 22:51, Jason Gunthorpe 写道:
> On Thu, Oct 27, 2016 at 11:41:35AM +0800, oulijun wrote:
> 
>> when startup by DT, the content of device/modalias is
>> of:NinfinibandT<NULL>Chisilicon,hns-roce-v1.  it is long and
>> complex.
> 
> If you want to match the hardware then properly parsing the mod alias
> is the right way to do it.
> 
>> the content of device/of_node/compatible is hisilicon,hns-roce-v1
> 
> No, it is more complex than that, future DTs may have a list for
> instance, just a string match is not good enough
> 
>> when startup by APCI, the content of device/modalias is
>> acpi:HISI00D1:
> 
> Also may be more complex..
> 
>>   if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
>> 	...
>>   if (ibv_read_sysfs-file(uverbs_sys_path, "device/vendor", value, sizeof(value)) > 0)
>>  	...
> 
> You can also match PCI with modalias.
> 
>>> But I wonder if this isn't generically better to be
>>>
>>>  last_dir(readlink("device/driver")) == "hns"
>>>
> 
>> I think it is not insteaded. because it will be find the hns in the
>> path(device/driver)
> 
> I said readlink, which will return something like '../../../../bus/platform/drivers/hhns'
> 
> So just detect the driver is calld hhns and let the kernel deal with
> figuring it out.
> 
> Jason
> 
> .
> 
Hi, Jason
  My understand is wrong with my reply in the previous email. it is really exit the link, as
follows:

root@(none)$ cd /sys/class/infiniband/hns_0/device/
root@(none)$ ls -l
total 0
lrwxrwxrwx    1 root     root             0 Oct 27 11:07 driver -> ../../../../bus/platform/drivers/hns_roce

but I think it is the standard approach. because my device(hip06) is only platform device and the other device(hip07/hip0x0 will
be pcie device, it will be distinguished separately.
Hence, we adpot the origin approach.

Lijun Ou

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] IBcore/CM: Issue DREQ when receiving REQ/REP for stale QP
From: Hans Westgaard Ry @ 2016-10-28 11:14 UTC (permalink / raw)
  To: Doug Ledford, Sean Hefty, Hal Rosenstock, Matan Barak,
	Erez Shitrit, Bart Van Assche, Ira Weiny, Or Gerlitz, Hakon Bugge,
	Yuval Shaia, linux-rdma, linux-kernel

from "InfiBand Architecture Specifications Volume 1":

  A QP is said to have a stale connection when only one side has
  connection information. A stale connection may result if the remote CM
  had dropped the connection and sent a DREQ but the DREQ was never
  received by the local CM. Alternatively the remote CM may have lost
  all record of past connections because its node crashed and rebooted,
  while the local CM did not become aware of the remote node's reboot
  and therefore did not clean up stale connections.

and:

   A local CM may receive a REQ/REP for a stale connection. It shall
   abort the connection issuing REJ to the REQ/REP. It shall then issue
   DREQ with "DREQ:remote QPN” set to the remote QPN from the REQ/REP.

This patch solves a problem with reuse of QPN. Current codebase, that
is IPoIB, relies on a REAP-mechanism to do cleanup of the structures
in CM. A problem with this is the timeconstants governing this
mechanism; they are up to 768 seconds and the interface may look
inresponsive in that period.  Issuing a DREQ (and receiving a DREP)
does the necessary cleanup and the interface comes up.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
---
 drivers/infiniband/core/cm.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c995255..c97e4d5 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -1519,6 +1519,7 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 	struct cm_id_private *listen_cm_id_priv, *cur_cm_id_priv;
 	struct cm_timewait_info *timewait_info;
 	struct cm_req_msg *req_msg;
+	struct ib_cm_id *cm_id;
 
 	req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad;
 
@@ -1540,10 +1541,18 @@ static struct cm_id_private * cm_match_req(struct cm_work *work,
 	timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
 	if (timewait_info) {
 		cm_cleanup_timewait(cm_id_priv->timewait_info);
+		cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
+					   timewait_info->work.remote_id);
+
 		spin_unlock_irq(&cm.lock);
 		cm_issue_rej(work->port, work->mad_recv_wc,
 			     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REQ,
 			     NULL, 0);
+		if (cur_cm_id_priv) {
+			cm_id = &cur_cm_id_priv->id;
+			ib_send_cm_dreq(cm_id, NULL, 0);
+			cm_deref_id(cur_cm_id_priv);
+		}
 		return NULL;
 	}
 
@@ -1919,6 +1928,9 @@ static int cm_rep_handler(struct cm_work *work)
 	struct cm_id_private *cm_id_priv;
 	struct cm_rep_msg *rep_msg;
 	int ret;
+	struct cm_id_private *cur_cm_id_priv;
+	struct ib_cm_id *cm_id;
+	struct cm_timewait_info *timewait_info;
 
 	rep_msg = (struct cm_rep_msg *)work->mad_recv_wc->recv_buf.mad;
 	cm_id_priv = cm_acquire_id(rep_msg->remote_comm_id, 0);
@@ -1953,16 +1965,26 @@ static int cm_rep_handler(struct cm_work *work)
 		goto error;
 	}
 	/* Check for a stale connection. */
-	if (cm_insert_remote_qpn(cm_id_priv->timewait_info)) {
+	timewait_info = cm_insert_remote_qpn(cm_id_priv->timewait_info);
+	if (timewait_info) {
 		rb_erase(&cm_id_priv->timewait_info->remote_id_node,
 			 &cm.remote_id_table);
 		cm_id_priv->timewait_info->inserted_remote_id = 0;
+		cur_cm_id_priv = cm_get_id(timewait_info->work.local_id,
+					   timewait_info->work.remote_id);
+
 		spin_unlock(&cm.lock);
 		spin_unlock_irq(&cm_id_priv->lock);
 		cm_issue_rej(work->port, work->mad_recv_wc,
 			     IB_CM_REJ_STALE_CONN, CM_MSG_RESPONSE_REP,
 			     NULL, 0);
 		ret = -EINVAL;
+		if (cur_cm_id_priv) {
+			cm_id = &cur_cm_id_priv->id;
+			ib_send_cm_dreq(cm_id, NULL, 0);
+			cm_deref_id(cur_cm_id_priv);
+		}
+
 		goto error;
 	}
 	spin_unlock(&cm.lock);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH rdma-rc 01/12] IB/mlx5: Replace numerical constant with predefined MACRO
From: Or Gerlitz @ 2016-10-28 12:54 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Max Gurtovoy
In-Reply-To: <1477575407-20562-2-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 4:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Replace the pre-defined macro signifying inline umr instead
> of the numerical constant.

Leon,

By all means (or no means, choose), this is a fix. If you are telling
vmware and Broadcom people here for months how to write their code,
let's stop for a minute and look in the mirror, drop this patch.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 3/6] IB/core: Set routable RoCE gid type for ipv4/ipv6 networks
From: Or Gerlitz @ 2016-10-28 12:57 UTC (permalink / raw)
  To: Mark Bloch, Maor Gottlieb
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575391-20134-4-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 4:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

M/M, spelling...

> If the underlying netowrk type is ipv4 or ipv6 and the device supports

netowrk...

> routable RoCE, prefer it so the traffic could cross subnets.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 5/6] IB/core: Save QP in ib_flow structure
From: Or Gerlitz @ 2016-10-28 13:00 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575391-20134-6-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 4:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:

> This bug wasn't seen in the wild because there are no kernel consumers
> currently in the kernel.

Indeed, it's nice to avoid future bug, but there's no point to put it
into rc fix as no new kernel consumers are to be added in 4.9, agree?
it's a -next thing
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 12/12] IB/mlx5: Limit mkey page size to 2GB
From: Or Gerlitz @ 2016-10-28 13:02 UTC (permalink / raw)
  To: Majd Dibbiny
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Maor Gottlieb
In-Reply-To: <1477575407-20562-13-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 4:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> From: Majd Dibbiny <majd-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> The maximum page size in the mkey context is 2GB.
>
> Until today, we didn't enforce this requirement in the code,
> and therefore, if we got a page size larger than 2GB, we

got a page size larger than 2GB? who are those pages?

> have passed zeros in the log_page_shift instead of the actual value
> and the registration failed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH rdma-rc 08/12] IB/mlx5: Wait for all async command completions to complete
From: Or Gerlitz @ 2016-10-28 13:04 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Doug Ledford, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1477575407-20562-9-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>

On Thu, Oct 27, 2016 at 4:36 PM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:


> +static void wait_for_async_commands(struct mlx5_ib_dev *dev)
> +{
> +       struct mlx5_mr_cache *cache = &dev->cache;
> +       struct mlx5_cache_ent *ent;
> +       int total = 0;
> +       int i;
> +       int j;
> +
> +       for (i = 0; i < MAX_MR_CACHE_ENTRIES; i++) {
> +               ent = &cache->ent[i];
> +               for (j = 0 ; j < 1000; j++) {
> +                       if (!ent->pending)
> +                               break;
> +                       msleep(50);
> +               }

you had another patch on this series which change a hard coded
constant into a define, why this patch add two new hard coded
constants, so all to all, we're not making progress on that
no-hard-coded-constants front... better decide where you want to go
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox