* Re: Enabling peer to peer device transactions for PCIe devices
From: Serguei Sagalovitch @ 2016-11-28 14:48 UTC (permalink / raw)
To: Haggai Eran, Jason Gunthorpe, Christian König
Cc: Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Max Gurtovoy, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <d9e064a0-9c47-3e41-3154-cece8c70a119-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
On 2016-11-27 09:02 AM, Haggai Eran wrote
> On PeerDirect, we have some kind of a middle-ground solution for pinning
> GPU memory. We create a non-ODP MR pointing to VRAM but rely on
> user-space and the GPU not to migrate it. If they do, the MR gets
> destroyed immediately. This should work on legacy devices without ODP
> support, and allows the system to safely terminate a process that
> misbehaves. The downside of course is that it cannot transparently
> migrate memory but I think for user-space RDMA doing that transparently
> requires hardware support for paging, via something like HMM.
>
> ...
May be I am wrong but my understanding is that PeerDirect logic basically
follow "RDMA register MR" logic so basically nothing prevent to "terminate"
process for "MMU notifier" case when we are very low on memory
not making it similar (not worse) then PeerDirect case.
>> I'm hearing most people say ZONE_DEVICE is the way to handle this,
>> which means the missing remaing piece for RDMA is some kind of DMA
>> core support for p2p address translation..
> Yes, this is definitely something we need. I think Will Davis's patches
> are a good start.
>
> Another thing I think is that while HMM is good for user-space
> applications, for kernel p2p use there is no need for that.
About HMM: I do not think that in the current form HMM would fit in
requirement for generic P2P transfer case. My understanding is that at
the current stage HMM is good for "caching" system memory
in device memory for fast GPU access but in RDMA MR non-ODP case
it will not work because the location of memory should not be
changed so memory should be allocated directly in PCIe memory.
> Using ZONE_DEVICE with or without something like DMA-BUF to pin and unpin
> pages for the short duration as you wrote above could work fine for
> kernel uses in which we can guarantee they are short.
Potentially there is another issue related to pin/unpin. If memory could
be used a lot of time then there is no sense to rebuild and program
s/g tables each time if location of memory was not changed.
^ permalink raw reply
* [PATCH 2/2] IB/rxe: Hold refs when running tasklets
From: Andrew Boyer @ 2016-11-28 14:37 UTC (permalink / raw)
To: monis-VPRAkNaXOzVWk0Htik3J/w, yonatanc-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: Andrew Boyer
In-Reply-To: <1480343844-8381-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>
It might be possible for all of a QP's references to be dropped
while one of that QP's tasklets is running.
For example, the completer might run during QP destroy.
If qp->valid is false, it will drop all of the packets on
the resp_pkts list, potentially removing the last reference.
Then it tries to advance the SQ consumer pointer. If the
SQ's buffer has already been destroyed, the system will
panic.
To be safe, hold a reference on the QP for the duration
of each tasklet.
Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_comp.c | 4 ++++
drivers/infiniband/sw/rxe/rxe_req.c | 4 ++++
drivers/infiniband/sw/rxe/rxe_resp.c | 3 +++
3 files changed, 11 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
index 6c5e29d..3687fcd 100644
--- a/drivers/infiniband/sw/rxe/rxe_comp.c
+++ b/drivers/infiniband/sw/rxe/rxe_comp.c
@@ -510,6 +510,8 @@ int rxe_completer(void *arg)
struct rxe_pkt_info *pkt = NULL;
enum comp_state state;
+ rxe_add_ref(qp);
+
if (!qp->valid) {
while ((skb = skb_dequeue(&qp->resp_pkts))) {
rxe_drop_ref(qp);
@@ -739,11 +741,13 @@ int rxe_completer(void *arg)
/* we come here if we are done with processing and want the task to
* exit from the loop calling us
*/
+ rxe_drop_ref(qp);
return -EAGAIN;
done:
/* we come here if we have processed a packet we want the task to call
* us again to see if there is anything else to do
*/
+ rxe_drop_ref(qp);
return 0;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c
index 22bd963..035cb90 100644
--- a/drivers/infiniband/sw/rxe/rxe_req.c
+++ b/drivers/infiniband/sw/rxe/rxe_req.c
@@ -596,6 +596,8 @@ int rxe_requester(void *arg)
struct rxe_qp rollback_qp;
struct rxe_send_wqe rollback_wqe;
+ rxe_add_ref(qp);
+
next_wqe:
if (unlikely(!qp->valid || qp->req.state == QP_STATE_ERROR))
goto exit;
@@ -756,8 +758,10 @@ int rxe_requester(void *arg)
*/
wqe->wr.send_flags |= IB_SEND_SIGNALED;
__rxe_do_task(&qp->comp.task);
+ rxe_drop_ref(qp);
return -EAGAIN;
exit:
+ rxe_drop_ref(qp);
return -EAGAIN;
}
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
index dd3d88a..337a1cb 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -1198,6 +1198,8 @@ int rxe_responder(void *arg)
struct rxe_pkt_info *pkt = NULL;
int ret = 0;
+ rxe_add_ref(qp);
+
qp->resp.aeth_syndrome = AETH_ACK_UNLIMITED;
if (!qp->valid) {
@@ -1386,5 +1388,6 @@ int rxe_responder(void *arg)
exit:
ret = -EAGAIN;
done:
+ rxe_drop_ref(qp);
return ret;
}
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 1/2] IB/rxe: Wait for tasklets to finish before tearing down QP
From: Andrew Boyer @ 2016-11-28 14:37 UTC (permalink / raw)
To: monis-VPRAkNaXOzVWk0Htik3J/w, yonatanc-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: Andrew Boyer
In-Reply-To: <1480343844-8381-1-git-send-email-andrew.boyer-8PEkshWhKlo@public.gmane.org>
The system may crash when a malformed request is received and
the error is detected by the responder.
NodeA: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 50000
NodeB: $ ibv_rc_pingpong -g 0 -d rxe0 -i 1 -n 1 -s 1024 <NodeA_ip>
The responder generates a receive error on node B since the incoming
SEND is oversized. If the client tears down the QP before the responder
or the completer finish running, a page fault may occur.
The fix makes the destroy operation spin until the tasks complete, which
appears to be original intent of the design.
Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
---
drivers/infiniband/sw/rxe/rxe_task.c | 19 +++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_task.h | 1 +
2 files changed, 20 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c
index 1e19bf8..1e9a28f 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.c
+++ b/drivers/infiniband/sw/rxe/rxe_task.c
@@ -121,6 +121,7 @@ int rxe_init_task(void *obj, struct rxe_task *task,
task->arg = arg;
task->func = func;
snprintf(task->name, sizeof(task->name), "%s", name);
+ task->destroyed = false;
tasklet_init(&task->tasklet, rxe_do_task, (unsigned long)task);
@@ -132,11 +133,29 @@ int rxe_init_task(void *obj, struct rxe_task *task,
void rxe_cleanup_task(struct rxe_task *task)
{
+ unsigned long flags;
+ bool idle = false;
+
+ /*
+ * Mark the task, then wait for it to finish. It might be
+ * running in a non-tasklet (direct call) context.
+ */
+ task->destroyed = true;
+
+ do {
+ spin_lock_irqsave(&task->state_lock, flags);
+ idle = (task->state == TASK_STATE_START);
+ spin_unlock_irqrestore(&task->state_lock, flags);
+ } while (!idle);
+
tasklet_kill(&task->tasklet);
}
void rxe_run_task(struct rxe_task *task, int sched)
{
+ if (task->destroyed)
+ return;
+
if (sched)
tasklet_schedule(&task->tasklet);
else
diff --git a/drivers/infiniband/sw/rxe/rxe_task.h b/drivers/infiniband/sw/rxe/rxe_task.h
index d14aa6d..08ff42d 100644
--- a/drivers/infiniband/sw/rxe/rxe_task.h
+++ b/drivers/infiniband/sw/rxe/rxe_task.h
@@ -54,6 +54,7 @@ struct rxe_task {
int (*func)(void *arg);
int ret;
char name[16];
+ bool destroyed;
};
/*
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 0/2] Fix kernel panics when tearing down QPs
From: Andrew Boyer @ 2016-11-28 14:37 UTC (permalink / raw)
To: monis-VPRAkNaXOzVWk0Htik3J/w, yonatanc-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: Andrew Boyer
This is a set of two patches that prevent kernel panics seen when tearing
down QPs. The second patch (holding refs in tasklets) might or might not be
needed once the first patch (waiting for tasklets to finish) is applied.
Feedback welcomed.
Andrew Boyer (2):
IB/rxe: Wait for tasklets to finish before tearing down QP
IB/rxe: Hold refs when running tasklets
drivers/infiniband/sw/rxe/rxe_comp.c | 4 ++++
drivers/infiniband/sw/rxe/rxe_req.c | 4 ++++
drivers/infiniband/sw/rxe/rxe_resp.c | 3 +++
drivers/infiniband/sw/rxe/rxe_task.c | 19 +++++++++++++++++++
drivers/infiniband/sw/rxe/rxe_task.h | 1 +
5 files changed, 31 insertions(+)
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] ethernet :mellanox :mlx5: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-28 13:37 UTC (permalink / raw)
To: saeedm, matanb, leonro; +Cc: netdev, linux-rdma, sahu.rameshwar73
In alloc_cmd_box(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
---
drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
index 1e639f8..d96ebd4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c
@@ -1063,14 +1063,13 @@ static struct mlx5_cmd_mailbox *alloc_cmd_box(struct mlx5_core_dev *dev,
if (!mailbox)
return ERR_PTR(-ENOMEM);
- mailbox->buf = pci_pool_alloc(dev->cmd.pool, flags,
+ mailbox->buf = pci_pool_zalloc(dev->cmd.pool, flags,
&mailbox->dma);
if (!mailbox->buf) {
mlx5_core_dbg(dev, "failed allocation\n");
kfree(mailbox);
return ERR_PTR(-ENOMEM);
}
- memset(mailbox->buf, 0, sizeof(struct mlx5_cmd_prot_block));
mailbox->next = NULL;
return mailbox;
--
1.9.1
^ permalink raw reply related
* [PATCH] ethernet :mellanox :mlx4: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-28 13:28 UTC (permalink / raw)
To: yishaih-VPRAkNaXOzVWk0Htik3J/w
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
sahu.rameshwar73-Re5JQEeQqe8AvxtiuMwx3w
In mlx4_alloc_cmd_mailbox(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index e36bebc..ee3bd76 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -2679,14 +2679,13 @@ struct mlx4_cmd_mailbox *mlx4_alloc_cmd_mailbox(struct mlx4_dev *dev)
if (!mailbox)
return ERR_PTR(-ENOMEM);
- mailbox->buf = pci_pool_alloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
+ mailbox->buf = pci_pool_zalloc(mlx4_priv(dev)->cmd.pool, GFP_KERNEL,
&mailbox->dma);
if (!mailbox->buf) {
kfree(mailbox);
return ERR_PTR(-ENOMEM);
}
- memset(mailbox->buf, 0, MLX4_MAILBOX_SIZE);
return mailbox;
}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH] infiniband: mthca: Replace pci_pool_alloc by pci_pool_zalloc
From: Souptick Joarder @ 2016-11-28 12:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
sahu.rameshwar73-Re5JQEeQqe8AvxtiuMwx3w
In mthca_create_ah(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()
Signed-off-by: Souptick joarder <jrdr.linux-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/hw/mthca/mthca_av.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mthca/mthca_av.c b/drivers/infiniband/hw/mthca/mthca_av.c
index bcac294..c202c89 100644
--- a/drivers/infiniband/hw/mthca/mthca_av.c
+++ b/drivers/infiniband/hw/mthca/mthca_av.c
@@ -186,7 +186,7 @@ int mthca_create_ah(struct mthca_dev *dev,
on_hca_fail:
if (ah->type == MTHCA_AH_PCI_POOL) {
- ah->av = pci_pool_alloc(dev->av_table.pool,
+ ah->av = pci_pool_zalloc(dev->av_table.pool,
GFP_ATOMIC, &ah->avdma);
if (!ah->av)
return -ENOMEM;
@@ -196,7 +196,6 @@ int mthca_create_ah(struct mthca_dev *dev,
ah->key = pd->ntmr.ibmr.lkey;
- memset(av, 0, MTHCA_AV_SIZE);
av->port_pd = cpu_to_be32(pd->pd_num | (ah_attr->port_num << 24));
av->g_slid = ah_attr->src_path_bits;
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH rdma-next 1/4] IB/mlx4: Fix out-of-range array index in destroy qp flow
From: Leon Romanovsky @ 2016-11-28 8:31 UTC (permalink / raw)
To: Bart Van Assche
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jack Morgenstein
In-Reply-To: <BLUPR02MB1683B2E832867815771A4B2C818B0-Y8PPn9RqzNfZ9ihocuPUdanrV9Ap65cLvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1002 bytes --]
On Sun, Nov 27, 2016 at 04:56:19PM +0000, Bart Van Assche wrote:
> On 11/27/16 05:18, Leon Romanovsky wrote:
> > Fixes: 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
> > Signed-off-by: Jack Morgenstein <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
> > Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>
> Hello Leon,
>
> Commit ID 9433c188915c refers to a patch that was merged in kernel
> version v3.15. Shouldn't a "Cc: stable" tag be added to this patch and
> also to the other patches in this series?
I'm extra cautions with stable tags and prefer to finalize my stable
checker in my submissions scripts first, before adding it manually and
I have plans to use it next kernel release.
Thanks
>
> Bart.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* RE: BALANCE PAYMENT
From: coral-Yvmzju5463hBWQWeTLFoew @ 2016-11-28 6:05 UTC (permalink / raw)
Dear Sir/s,
Please see attached.
Thanks and regards,
Accounts Department
Al Omraniya Trading Co. LLC
P.O. Box: 10757, Al Khabaisi Area,
Deira 2, Dubai, U.A.E.
Tel: +971 4 268 2730 / Fax: +971 4 268 4117
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: zhoucm1 @ 2016-11-28 5:31 UTC (permalink / raw)
To: Christian König, Haggai Eran, Jason Gunthorpe, Yu, Qiang
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Sander, Ben, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Max Gurtovoy, Blinzer, Paul,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <0087fba9-7bcb-8bb3-c26e-4ef3e4970c34-5C7GfCeVMHo@public.gmane.org>
+Qiang, who is working on it.
On 2016年11月27日 22:07, Christian König wrote:
> Am 27.11.2016 um 15:02 schrieb Haggai Eran:
>> On 11/25/2016 9:32 PM, Jason Gunthorpe wrote:
>>> On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:
>>>
>>>>> Like you say below we have to handle short lived in the usual way,
>>>>> and
>>>>> that covers basically every device except IB MRs, including the
>>>>> command queue on a NVMe drive.
>>>> Well a problem which wasn't mentioned so far is that while GPUs do
>>>> have a
>>>> page table to mirror the CPU page table, they usually can't recover
>>>> from
>>>> page faults.
>>>> So what we do is making sure that all memory accessed by the GPU
>>>> Jobs stays
>>>> in place while those jobs run (pretty much the same pinning you do
>>>> for the
>>>> DMA).
>>> Yes, it is DMA, so this is a valid approach.
>>>
>>> But, you don't need page faults from the GPU to do proper coherent
>>> page table mirroring. Basically when the driver submits the work to
>>> the GPU it 'faults' the pages into the CPU and mirror translation
>>> table (instead of pinning).
>>>
>>> Like in ODP, MMU notifiers/HMM are used to monitor for translation
>>> changes. If a change comes in the GPU driver checks if an executing
>>> command is touching those pages and blocks the MMU notifier until the
>>> command flushes, then unfaults the page (blocking future commands) and
>>> unblocks the mmu notifier.
>> I think blocking mmu notifiers against something that is basically
>> controlled by user-space can be problematic. This can block things like
>> memory reclaim. If you have user-space access to the device's queues,
>> user-space can block the mmu notifier forever.
> Really good point.
>
> I think this means the bare minimum if we don't have recoverable page
> faults is to have preemption support like Felix described in his
> answer as well.
>
> Going to keep that in mind,
> Christian.
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
^ permalink raw reply
* Re: [PATCH rdma-next 1/4] IB/mlx4: Fix out-of-range array index in destroy qp flow
From: Bart Van Assche @ 2016-11-27 16:56 UTC (permalink / raw)
To: Leon Romanovsky, dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Jack Morgenstein
In-Reply-To: <1480252702-8005-2-git-send-email-leon@kernel.org>
On 11/27/16 05:18, Leon Romanovsky wrote:
> Fixes: 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
> Signed-off-by: Jack Morgenstein <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Hello Leon,
Commit ID 9433c188915c refers to a patch that was merged in kernel
version v3.15. Shouldn't a "Cc: stable" tag be added to this patch and
also to the other patches in this series?
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH rdma-next 10/10] IB/mlx5: Support RAW Ethernet when RoCE is disabled
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.
This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++++---------
1 file changed, 13 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index b811c70..c0462d1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2724,6 +2724,8 @@ static int mlx5_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_immutable *immutable)
{
struct ib_port_attr attr;
+ struct mlx5_ib_dev *dev = to_mdev(ibdev);
+ enum rdma_link_layer ll = mlx5_ib_port_link_layer(ibdev, port_num);
int err;
immutable->core_cap_flags = get_core_cap_flags(ibdev);
@@ -2734,7 +2736,8 @@ static int mlx5_port_immutable(struct ib_device *ibdev, u8 port_num,
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->max_mad_size = IB_MGMT_MAD_SIZE;
+ if ((ll == IB_LINK_LAYER_INFINIBAND) || MLX5_CAP_GEN(dev->mdev, roce))
+ immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
}
@@ -2819,9 +2822,11 @@ static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
if (err)
return err;
- err = mlx5_nic_vport_enable_roce(dev->mdev);
- if (err)
- goto err_unregister_netdevice_notifier;
+ if (MLX5_CAP_GEN(dev->mdev, roce)) {
+ err = mlx5_nic_vport_enable_roce(dev->mdev);
+ if (err)
+ goto err_unregister_netdevice_notifier;
+ }
err = mlx5_eth_lag_init(dev);
if (err)
@@ -2830,7 +2835,8 @@ static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
return 0;
err_disable_roce:
- mlx5_nic_vport_disable_roce(dev->mdev);
+ if (MLX5_CAP_GEN(dev->mdev, roce))
+ mlx5_nic_vport_disable_roce(dev->mdev);
err_unregister_netdevice_notifier:
mlx5_remove_netdev_notifier(dev);
@@ -2840,7 +2846,8 @@ err_unregister_netdevice_notifier:
static void mlx5_disable_eth(struct mlx5_ib_dev *dev)
{
mlx5_eth_lag_cleanup(dev);
- mlx5_nic_vport_disable_roce(dev->mdev);
+ if (MLX5_CAP_GEN(dev->mdev, roce))
+ mlx5_nic_vport_disable_roce(dev->mdev);
}
static void mlx5_ib_dealloc_q_counters(struct mlx5_ib_dev *dev)
@@ -2962,9 +2969,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
port_type_cap = MLX5_CAP_GEN(mdev, port_type);
ll = mlx5_port_type_cap_to_rdma_ll(port_type_cap);
- if ((ll == IB_LINK_LAYER_ETHERNET) && !MLX5_CAP_GEN(mdev, roce))
- return NULL;
-
printk_once(KERN_INFO "%s", mlx5_version);
dev = (struct mlx5_ib_dev *)ib_alloc_device(sizeof(*dev));
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 09/10] IB/mlx5: Rename RoCE related helpers to reflect being Eth ones
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
This is a pre-step towards having mlx5 IB device also over Eth ports where
RoCE is not supported. We change the roce enable/disable and roce_lag
init/fini function names to have _eth instead of _roce.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 91921f1..b811c70 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2748,7 +2748,7 @@ static void get_dev_fw_str(struct ib_device *ibdev, char *str,
fw_rev_min(dev->mdev), fw_rev_sub(dev->mdev));
}
-static int mlx5_roce_lag_init(struct mlx5_ib_dev *dev)
+static int mlx5_eth_lag_init(struct mlx5_ib_dev *dev)
{
struct mlx5_core_dev *mdev = dev->mdev;
struct mlx5_flow_namespace *ns = mlx5_get_flow_namespace(mdev,
@@ -2777,7 +2777,7 @@ err_destroy_vport_lag:
return err;
}
-static void mlx5_roce_lag_cleanup(struct mlx5_ib_dev *dev)
+static void mlx5_eth_lag_cleanup(struct mlx5_ib_dev *dev)
{
struct mlx5_core_dev *mdev = dev->mdev;
@@ -2811,7 +2811,7 @@ static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev)
}
}
-static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
+static int mlx5_enable_eth(struct mlx5_ib_dev *dev)
{
int err;
@@ -2823,7 +2823,7 @@ static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
if (err)
goto err_unregister_netdevice_notifier;
- err = mlx5_roce_lag_init(dev);
+ err = mlx5_eth_lag_init(dev);
if (err)
goto err_disable_roce;
@@ -2837,9 +2837,9 @@ err_unregister_netdevice_notifier:
return err;
}
-static void mlx5_disable_roce(struct mlx5_ib_dev *dev)
+static void mlx5_disable_eth(struct mlx5_ib_dev *dev)
{
- mlx5_roce_lag_cleanup(dev);
+ mlx5_eth_lag_cleanup(dev);
mlx5_nic_vport_disable_roce(dev->mdev);
}
@@ -3143,14 +3143,14 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
spin_lock_init(&dev->reset_flow_resource_lock);
if (ll == IB_LINK_LAYER_ETHERNET) {
- err = mlx5_enable_roce(dev);
+ err = mlx5_enable_eth(dev);
if (err)
goto err_free_port;
}
err = create_dev_resources(&dev->devr);
if (err)
- goto err_disable_roce;
+ goto err_disable_eth;
err = mlx5_ib_odp_init_one(dev);
if (err)
@@ -3194,9 +3194,9 @@ err_odp:
err_rsrc:
destroy_dev_resources(&dev->devr);
-err_disable_roce:
+err_disable_eth:
if (ll == IB_LINK_LAYER_ETHERNET) {
- mlx5_disable_roce(dev);
+ mlx5_disable_eth(dev);
mlx5_remove_netdev_notifier(dev);
}
@@ -3221,7 +3221,7 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
mlx5_ib_odp_remove_one(dev);
destroy_dev_resources(&dev->devr);
if (ll == IB_LINK_LAYER_ETHERNET)
- mlx5_disable_roce(dev);
+ mlx5_disable_eth(dev);
kfree(dev->port);
ib_dealloc_device(&dev->ib_dev);
}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 08/10] IB/mlx5: Refactor registration to netdev notifier
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Refactor the netdev notifier registration into a small helper function.
This is a pre-step towards having mlx5 IB device over an Ethernet port
which doesn't support RoCE. Also, renamed the de-registration helper
and the new helper as netdev notifier and not roce, to make it clear
this is not only used with roce.
This patch doesn't change any functionality.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 29 ++++++++++++++++++++---------
1 file changed, 20 insertions(+), 9 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index a137254..91921f1 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2789,7 +2789,21 @@ static void mlx5_roce_lag_cleanup(struct mlx5_ib_dev *dev)
}
}
-static void mlx5_remove_roce_notifier(struct mlx5_ib_dev *dev)
+static int mlx5_add_netdev_notifier(struct mlx5_ib_dev *dev)
+{
+ int err;
+
+ dev->roce.nb.notifier_call = mlx5_netdev_event;
+ err = register_netdevice_notifier(&dev->roce.nb);
+ if (err) {
+ dev->roce.nb.notifier_call = NULL;
+ return err;
+ }
+
+ return 0;
+}
+
+static void mlx5_remove_netdev_notifier(struct mlx5_ib_dev *dev)
{
if (dev->roce.nb.notifier_call) {
unregister_netdevice_notifier(&dev->roce.nb);
@@ -2801,12 +2815,9 @@ static int mlx5_enable_roce(struct mlx5_ib_dev *dev)
{
int err;
- dev->roce.nb.notifier_call = mlx5_netdev_event;
- err = register_netdevice_notifier(&dev->roce.nb);
- if (err) {
- dev->roce.nb.notifier_call = NULL;
+ err = mlx5_add_netdev_notifier(dev);
+ if (err)
return err;
- }
err = mlx5_nic_vport_enable_roce(dev->mdev);
if (err)
@@ -2822,7 +2833,7 @@ err_disable_roce:
mlx5_nic_vport_disable_roce(dev->mdev);
err_unregister_netdevice_notifier:
- mlx5_remove_roce_notifier(dev);
+ mlx5_remove_netdev_notifier(dev);
return err;
}
@@ -3186,7 +3197,7 @@ err_rsrc:
err_disable_roce:
if (ll == IB_LINK_LAYER_ETHERNET) {
mlx5_disable_roce(dev);
- mlx5_remove_roce_notifier(dev);
+ mlx5_remove_netdev_notifier(dev);
}
err_free_port:
@@ -3203,7 +3214,7 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, void *context)
struct mlx5_ib_dev *dev = context;
enum rdma_link_layer ll = mlx5_ib_port_link_layer(&dev->ib_dev, 1);
- mlx5_remove_roce_notifier(dev);
+ mlx5_remove_netdev_notifier(dev);
ib_unregister_device(&dev->ib_dev);
mlx5_ib_dealloc_q_counters(dev);
destroy_umrc_res(dev);
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 07/10] IB/uverbs: Propagate supported QP types to user-space
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Propagate supported qp types to user-space when they issue a query
port call. From user-space point of view, zero means that the query
is not supported, under the reasoning that it doesn't make sense
to have user-space devices that don't support any qp type. Note that
this will only happen when they run over older kernels.
Make sure to filter out qp types which are not supported for
user-space (SMI, GSI, etc).
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/uverbs_cmd.c | 2 ++
include/uapi/rdma/ib_user_verbs.h | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..1c386bc 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -526,6 +526,8 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
resp.phys_state = attr.phys_state;
resp.link_layer = rdma_port_get_link_layer(ib_dev,
cmd.port_num);
+ /* don't expose to user-space QPTs they don't know */
+ resp.qp_type_cap = attr.qp_type_cap & ~(BIT(IB_QPT_SMI) | BIT(IB_QPT_GSI));
if (copy_to_user((void __user *) (unsigned long) cmd.response,
&resp, sizeof resp))
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 25225eb..a50604f 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -276,7 +276,7 @@ struct ib_uverbs_query_port_resp {
__u8 active_speed;
__u8 phys_state;
__u8 link_layer;
- __u8 reserved[2];
+ __u16 qp_type_cap;
};
struct ib_uverbs_alloc_pd {
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 06/10] IB/core: Enable to query QP types supported by IB device on a port
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add qp_type_cap port attribute which is a bit field representation
of the ib_qp_type enum. This will allow applications to query what QP
types are supported by an IB device instance on a specific port.
The qp_type_cap port attribute is set by the core according to the
protocol supported for the device port. This holds for all the
providers with the exception of two RoCE drivers that don't implement
UD and UC. To handle that, they (hns and qedr) are patched to remove
these QPs from what's the core has set for them as supported.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/device.c | 28 ++++++++++++++++++++++++++++
drivers/infiniband/hw/hns/hns_roce_main.c | 3 +++
drivers/infiniband/hw/qedr/verbs.c | 2 ++
include/rdma/ib_verbs.h | 1 +
4 files changed, 34 insertions(+)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 760ef60..f7abde2 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -646,6 +646,31 @@ void ib_dispatch_event(struct ib_event *event)
}
EXPORT_SYMBOL(ib_dispatch_event);
+static void get_port_qp_types(const struct ib_device *device, u8 port_num,
+ struct ib_port_attr *port_attr)
+{
+ if (rdma_cap_ib_smi(device, port_num))
+ port_attr->qp_type_cap |= BIT(IB_QPT_SMI);
+
+ if (rdma_cap_ib_cm(device, port_num))
+ port_attr->qp_type_cap |= BIT(IB_QPT_GSI);
+
+ if (rdma_ib_or_roce(device, port_num)) {
+ port_attr->qp_type_cap |= (BIT(IB_QPT_RC) | BIT(IB_QPT_UD) | BIT(IB_QPT_UC));
+ if (device->attrs.device_cap_flags & IB_DEVICE_XRC)
+ port_attr->qp_type_cap |= (BIT(IB_QPT_XRC_INI) | BIT(IB_QPT_XRC_TGT));
+ }
+
+ if (rdma_protocol_iwarp(device, port_num))
+ port_attr->qp_type_cap |= BIT(IB_QPT_RC);
+
+ if (rdma_protocol_raw_packet(device, port_num))
+ port_attr->qp_type_cap |= BIT(IB_QPT_RAW_PACKET);
+
+ if (rdma_protocol_usnic(device, port_num))
+ port_attr->qp_type_cap |= BIT(IB_QPT_UD);
+}
+
/**
* ib_query_port - Query IB port attributes
* @device:Device to query
@@ -666,6 +691,9 @@ int ib_query_port(struct ib_device *device,
return -EINVAL;
memset(port_attr, 0, sizeof(*port_attr));
+
+ get_port_qp_types(device, port_num, port_attr);
+
err = device->query_port(device, port_num, port_attr);
if (err || port_attr->subnet_prefix)
return err;
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index c6b5779..22de534 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -430,6 +430,9 @@ static int hns_roce_query_port(struct ib_device *ib_dev, u8 port_num,
IB_PORT_ACTIVE : IB_PORT_DOWN;
props->phys_state = (props->state == IB_PORT_ACTIVE) ? 5 : 3;
+ /* mark that UD and UC aren't supported */
+ props->qp_type_cap &= ~(BIT(IB_QPT_UD) | BIT(IB_QPT_UC));
+
spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
return 0;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 53207ff..33d0219 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -263,6 +263,8 @@ int qedr_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *attr)
attr->max_msg_sz = rdma_port->max_msg_size;
attr->max_vl_num = 4;
+ /* mark that UD and UC aren't supported */
+ attr->qp_type_cap &= ~(BIT(IB_QPT_UD) | BIT(IB_QPT_UC));
return 0;
}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 485b725..0b839e4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -536,6 +536,7 @@ struct ib_port_attr {
u8 active_speed;
u8 phys_state;
bool grh_required;
+ u16 qp_type_cap;
};
enum ib_device_modify_flags {
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 05/10] IB: Query port through the core instead of directly calling the driver handler
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Change the drivers to call ib_query_port in their get port
immutable handler instead of their own query port handler.
Doing this required to set the core cap flags of this device
before the ib_query_port call is made, since the IB core might
need these caps to serve the port query.
Drivers are ensured by the IB core that the port attributes passed
to the port query verb implementation are zero, and hence we
removed the zeroing from the drivers.
A pre-step for allowing drivers to modify some port attributes (in
their query port handler) which are set by the IB core before
calling them.
This patch doesn't add any new functionality.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 ++++---
drivers/infiniband/hw/cxgb4/provider.c | 8 ++++----
drivers/infiniband/hw/hfi1/verbs.c | 1 +
drivers/infiniband/hw/hns/hns_roce_main.c | 7 ++++---
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 8 ++++----
drivers/infiniband/hw/mlx4/alias_GUID.c | 1 +
drivers/infiniband/hw/mlx4/main.c | 18 +++++++++---------
drivers/infiniband/hw/mlx4/sysfs.c | 1 +
drivers/infiniband/hw/mlx5/mad.c | 2 +-
drivers/infiniband/hw/mlx5/main.c | 12 +++++++-----
drivers/infiniband/hw/mthca/mthca_provider.c | 9 +++++----
drivers/infiniband/hw/nes/nes_verbs.c | 5 +++--
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 9 +++++----
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 1 +
drivers/infiniband/hw/qedr/verbs.c | 9 +++++----
drivers/infiniband/hw/qib/qib_verbs.c | 1 +
drivers/infiniband/hw/usnic/usnic_ib_main.c | 5 +++--
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +-
drivers/infiniband/sw/rdmavt/vt.c | 7 ++++---
drivers/infiniband/sw/rxe/rxe_verbs.c | 6 ++++--
20 files changed, 68 insertions(+), 51 deletions(-)
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index cba57bb..fcaca76 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1132,7 +1132,7 @@ static int iwch_query_port(struct ib_device *ibdev,
dev = to_iwch_dev(ibdev);
netdev = dev->rdev.port_info.lldevs[port-1];
- memset(props, 0, sizeof(struct ib_port_attr));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->max_mtu = IB_MTU_4096;
if (netdev->mtu >= 4096)
props->active_mtu = IB_MTU_4096;
@@ -1337,13 +1337,14 @@ static int iwch_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = iwch_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
return 0;
}
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 645e606..84f0885 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -356,8 +356,7 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port,
dev = to_c4iw_dev(ibdev);
netdev = dev->rdev.lldi.ports[port-1];
-
- memset(props, 0, sizeof(struct ib_port_attr));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->max_mtu = IB_MTU_4096;
if (netdev->mtu >= 4096)
props->active_mtu = IB_MTU_4096;
@@ -503,13 +502,14 @@ static int c4iw_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = c4iw_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
return 0;
}
diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c
index 4b7a16c..699ee54 100644
--- a/drivers/infiniband/hw/hfi1/verbs.c
+++ b/drivers/infiniband/hw/hfi1/verbs.c
@@ -1399,6 +1399,7 @@ static int query_port(struct rvt_dev_info *rdi, u8 port_num,
struct hfi1_pportdata *ppd = &dd->pport[port_num - 1];
u16 lid = ppd->lid;
+ /* props being zeroed by the caller, avoid zeroing it here */
props->lid = lid ? lid : 0;
props->lmc = ppd->lmc;
/* OPA logical states match IB logical states */
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c b/drivers/infiniband/hw/hns/hns_roce_main.c
index 764e35a..c6b5779 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -403,7 +403,7 @@ static int hns_roce_query_port(struct ib_device *ib_dev, u8 port_num,
assert(port_num > 0);
port = port_num - 1;
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->max_mtu = hr_dev->caps.max_mtu;
props->gid_tbl_len = hr_dev->caps.gid_table_len[port];
@@ -572,14 +572,15 @@ static int hns_roce_port_immutable(struct ib_device *ib_dev, u8 port_num,
struct ib_port_attr attr;
int ret;
- ret = hns_roce_query_port(ib_dev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
+
+ ret = ib_query_port(ib_dev, port_num, &attr);
if (ret)
return ret;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 6329c97..84ae990 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -96,8 +96,7 @@ static int i40iw_query_port(struct ib_device *ibdev,
struct i40iw_device *iwdev = to_iwdev(ibdev);
struct net_device *netdev = iwdev->netdev;
- memset(props, 0, sizeof(*props));
-
+ /* props being zeroed by the caller, avoid zeroing it here */
props->max_mtu = IB_MTU_4096;
if (netdev->mtu >= 4096)
props->active_mtu = IB_MTU_4096;
@@ -2359,14 +2358,15 @@ static int i40iw_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = i40iw_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
return 0;
}
diff --git a/drivers/infiniband/hw/mlx4/alias_GUID.c b/drivers/infiniband/hw/mlx4/alias_GUID.c
index 5e99390..0678da5 100644
--- a/drivers/infiniband/hw/mlx4/alias_GUID.c
+++ b/drivers/infiniband/hw/mlx4/alias_GUID.c
@@ -499,6 +499,7 @@ static int set_guid_rec(struct ib_device *ibdev,
struct list_head *head =
&dev->sriov.alias_guid.ports_guid[port - 1].cb_list;
+ memset(&attr, 0, sizeof(struct ib_port_attr));
err = __mlx4_ib_query_port(ibdev, port, &attr, 1);
if (err) {
pr_debug("mlx4_ib_query_port failed (err: %d), port: %d\n",
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 3902832..c0b8789f 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -737,7 +737,7 @@ int __mlx4_ib_query_port(struct ib_device *ibdev, u8 port,
{
int err;
- memset(props, 0, sizeof *props);
+ /* props being zeroed by the caller, avoid zeroing it here */
err = mlx4_ib_port_link_layer(ibdev, port) == IB_LINK_LAYER_INFINIBAND ?
ib_link_query_port(ibdev, port, props, netw_view) :
@@ -1010,7 +1010,7 @@ static int mlx4_ib_modify_port(struct ib_device *ibdev, u8 port, int mask,
mutex_lock(&mdev->cap_mask_mutex);
- err = mlx4_ib_query_port(ibdev, port, &attr);
+ err = ib_query_port(ibdev, port, &attr);
if (err)
goto out;
@@ -2523,13 +2523,6 @@ static int mlx4_port_immutable(struct ib_device *ibdev, u8 port_num,
struct mlx4_ib_dev *mdev = to_mdev(ibdev);
int err;
- err = mlx4_ib_query_port(ibdev, port_num, &attr);
- if (err)
- return err;
-
- immutable->pkey_tbl_len = attr.pkey_tbl_len;
- immutable->gid_tbl_len = attr.gid_tbl_len;
-
if (mlx4_ib_port_link_layer(ibdev, port_num) == IB_LINK_LAYER_INFINIBAND) {
immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
@@ -2545,6 +2538,13 @@ static int mlx4_port_immutable(struct ib_device *ibdev, u8 port_num,
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
}
+ err = ib_query_port(ibdev, port_num, &attr);
+ if (err)
+ return err;
+
+ immutable->pkey_tbl_len = attr.pkey_tbl_len;
+ immutable->gid_tbl_len = attr.gid_tbl_len;
+
return 0;
}
diff --git a/drivers/infiniband/hw/mlx4/sysfs.c b/drivers/infiniband/hw/mlx4/sysfs.c
index 69fb5ba..5835165 100644
--- a/drivers/infiniband/hw/mlx4/sysfs.c
+++ b/drivers/infiniband/hw/mlx4/sysfs.c
@@ -226,6 +226,7 @@ static int add_port_entries(struct mlx4_ib_dev *device, int port_num)
int ret = 0 ;
struct ib_port_attr attr;
+ memset(&attr, 0, sizeof(struct ib_port_attr));
/* get the physical gid and pkey table sizes.*/
ret = __mlx4_ib_query_port(&device->ib_dev, port_num, &attr, 1);
if (ret)
diff --git a/drivers/infiniband/hw/mlx5/mad.c b/drivers/infiniband/hw/mlx5/mad.c
index 39e5848..4e58b8f 100644
--- a/drivers/infiniband/hw/mlx5/mad.c
+++ b/drivers/infiniband/hw/mlx5/mad.c
@@ -515,7 +515,7 @@ int mlx5_query_mad_ifc_port(struct ib_device *ibdev, u8 port,
if (!in_mad || !out_mad)
goto out;
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
init_query_mad(in_mad);
in_mad->attr_id = IB_SMP_ATTR_PORT_INFO;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 019a7a4..a137254 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -174,7 +174,7 @@ static int mlx5_query_port_roce(struct ib_device *device, u8 port_num,
enum ib_mtu ndev_ib_mtu;
u16 qkey_viol_cntr;
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->port_cap_flags |= IB_PORT_CM_SUP;
props->port_cap_flags |= IB_PORT_IP_BASED_GIDS;
@@ -793,7 +793,7 @@ static int mlx5_query_hca_port(struct ib_device *ibdev, u8 port,
goto out;
}
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
err = mlx5_query_hca_vport_context(mdev, 0, port, 0, rep);
if (err)
@@ -941,7 +941,7 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 port, int mask,
mutex_lock(&dev->cap_mask_mutex);
- err = mlx5_ib_query_port(ibdev, port, &attr);
+ err = ib_query_port(ibdev, port, &attr);
if (err)
goto out;
@@ -2408,6 +2408,7 @@ static int get_port_caps(struct mlx5_ib_dev *dev)
}
for (port = 1; port <= MLX5_CAP_GEN(dev->mdev, num_ports); port++) {
+ memset(pprops, 0, sizeof(struct ib_port_attr));
err = mlx5_ib_query_port(&dev->ib_dev, port, pprops);
if (err) {
mlx5_ib_warn(dev, "query_port %d failed %d\n",
@@ -2725,13 +2726,14 @@ static int mlx5_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = mlx5_ib_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = get_core_cap_flags(ibdev);
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = get_core_cap_flags(ibdev);
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 358930a4..2b464fb 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -146,7 +146,7 @@ static int mthca_query_port(struct ib_device *ibdev,
if (!in_mad || !out_mad)
goto out;
- memset(props, 0, sizeof *props);
+ /* props being zeroed by the caller, avoid zeroing it here */
init_query_mad(in_mad);
in_mad->attr_id = IB_SMP_ATTR_PORT_INFO;
@@ -212,7 +212,7 @@ static int mthca_modify_port(struct ib_device *ibdev,
if (mutex_lock_interruptible(&to_mdev(ibdev)->cap_mask_mutex))
return -ERESTARTSYS;
- err = mthca_query_port(ibdev, port, &attr);
+ err = ib_query_port(ibdev, port, &attr);
if (err)
goto out;
@@ -1164,13 +1164,14 @@ static int mthca_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = mthca_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index bd69125..5256b07 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -475,7 +475,7 @@ static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr
struct nes_vnic *nesvnic = to_nesvnic(ibdev);
struct net_device *netdev = nesvnic->netdev;
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->max_mtu = IB_MTU_4096;
@@ -3673,13 +3673,14 @@ static int nes_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
+ immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
+
err = nes_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IWARP;
return 0;
}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 8960715..3e43bdc 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -93,15 +93,16 @@ static int ocrdma_port_immutable(struct ib_device *ibdev, u8 port_num,
int err;
dev = get_ocrdma_dev(ibdev);
- err = ocrdma_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
+ if (ocrdma_is_udp_encap_supported(dev))
+ immutable->core_cap_flags |= RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
- if (ocrdma_is_udp_encap_supported(dev))
- immutable->core_cap_flags |= RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP;
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 6af44f8..013d15c 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -210,6 +210,7 @@ int ocrdma_query_port(struct ib_device *ibdev,
struct ocrdma_dev *dev;
struct net_device *netdev;
+ /* props being zeroed by the caller, avoid zeroing it here */
dev = get_ocrdma_dev(ibdev);
if (port > 1) {
pr_err("%s(%d) invalid_port=0x%x\n", __func__,
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index a615142..53207ff 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -238,8 +238,8 @@ int qedr_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr *attr)
}
rdma_port = dev->ops->rdma_query_port(dev->rdma_ctx);
- memset(attr, 0, sizeof(*attr));
+ /* *attr being zeroed by the caller, avoid zeroing it here */
if (rdma_port->port_state == QED_RDMA_PORT_UP) {
attr->state = IB_PORT_ACTIVE;
attr->phys_state = 5;
@@ -3533,14 +3533,15 @@ int qedr_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = qedr_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE |
+ RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE |
- RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP;
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 954f150..2204578 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1320,6 +1320,7 @@ static int qib_query_port(struct rvt_dev_info *rdi, u8 port_num,
enum ib_mtu mtu;
u16 lid = ppd->lid;
+ /* props being zeroed by the caller, avoid zeroing it here */
props->lid = lid ? lid : be16_to_cpu(IB_LID_PERMISSIVE);
props->lmc = ppd->lmc;
props->state = dd->f_iblink_state(ppd->lastibcstat);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index dde0b23..4f5a45d 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -321,11 +321,12 @@ static int usnic_port_immutable(struct ib_device *ibdev, u8 port_num,
struct ib_port_attr attr;
int err;
- err = usnic_ib_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_USNIC;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
- immutable->core_cap_flags = RDMA_CORE_PORT_USNIC;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index a5bfbba..797bfe4 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -330,7 +330,7 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
mutex_lock(&us_ibdev->usdev_lock);
__ethtool_get_link_ksettings(us_ibdev->netdev, &cmd);
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->lid = 0;
props->lmc = 1;
diff --git a/drivers/infiniband/sw/rdmavt/vt.c b/drivers/infiniband/sw/rdmavt/vt.c
index d430c2f..1165639 100644
--- a/drivers/infiniband/sw/rdmavt/vt.c
+++ b/drivers/infiniband/sw/rdmavt/vt.c
@@ -165,7 +165,7 @@ static int rvt_query_port(struct ib_device *ibdev, u8 port_num,
return -EINVAL;
rvp = rdi->ports[port_index];
- memset(props, 0, sizeof(*props));
+ /* props being zeroed by the caller, avoid zeroing it here */
props->sm_lid = rvp->sm_lid;
props->sm_sl = rvp->sm_sl;
props->port_cap_flags = rvp->port_cap_flags;
@@ -326,13 +326,14 @@ static int rvt_get_port_immutable(struct ib_device *ibdev, u8 port_num,
if (port_index < 0)
return -EINVAL;
- err = rvt_query_port(ibdev, port_num, &attr);
+ immutable->core_cap_flags = rdi->dparms.core_cap_flags;
+
+ err = ib_query_port(ibdev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = rdi->dparms.core_cap_flags;
immutable->max_mad_size = rdi->dparms.max_mad_size;
return 0;
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 19841c8..e0f4a53 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -86,6 +86,7 @@ static int rxe_query_port(struct ib_device *dev,
port = &rxe->port;
+ /* *attr being zeroed by the caller, avoid zeroing it here */
*attr = port->attr;
mutex_lock(&rxe->usdev_lock);
@@ -261,13 +262,14 @@ static int rxe_port_immutable(struct ib_device *dev, u8 port_num,
int err;
struct ib_port_attr attr;
- err = rxe_query_port(dev, port_num, &attr);
+ immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP;
+
+ err = ib_query_port(dev, port_num, &attr);
if (err)
return err;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
- immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP;
immutable->max_mad_size = IB_MGMT_MAD_SIZE;
return 0;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 04/10] IB: Add protocol for USNIC
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add protocol definition for the proprietary the USNIC driver, to be
used in downstream patches that query supported qp types.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
include/rdma/ib_verbs.h | 8 ++++++++
2 files changed, 9 insertions(+)
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 0a89a95..dde0b23 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -325,6 +325,7 @@ static int usnic_port_immutable(struct ib_device *ibdev, u8 port_num,
if (err)
return err;
+ immutable->core_cap_flags = RDMA_CORE_PORT_USNIC;
immutable->pkey_tbl_len = attr.pkey_tbl_len;
immutable->gid_tbl_len = attr.gid_tbl_len;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0cb3194..485b725 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -486,6 +486,7 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats_struct(
#define RDMA_CORE_CAP_PROT_IWARP 0x00400000
#define RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP 0x00800000
#define RDMA_CORE_CAP_PROT_RAW_PACKET 0x01000000
+#define RDMA_CORE_CAP_PROT_USNIC 0x02000000
#define RDMA_CORE_PORT_IBA_IB (RDMA_CORE_CAP_PROT_IB \
| RDMA_CORE_CAP_IB_MAD \
@@ -511,6 +512,8 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats_struct(
#define RDMA_CORE_PORT_RAW_PACKET (RDMA_CORE_CAP_PROT_RAW_PACKET)
+#define RDMA_CORE_PORT_USNIC (RDMA_CORE_CAP_PROT_USNIC)
+
struct ib_port_attr {
u64 subnet_prefix;
enum ib_port_state state;
@@ -2294,6 +2297,11 @@ static inline bool rdma_protocol_raw_packet(const struct ib_device *device, u8 p
return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_RAW_PACKET;
}
+static inline bool rdma_protocol_usnic(const struct ib_device *device, u8 port_num)
+{
+ return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_USNIC;
+}
+
/**
* rdma_cap_ib_mad - Check if the port of a device supports Infiniband
* Management Datagrams.
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 03/10] IB/mlx4: Support raw packet protocol
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Mark support for the new raw packet protocol on Eth ports.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index b597e82..3902832 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2532,16 +2532,19 @@ static int mlx4_port_immutable(struct ib_device *ibdev, u8 port_num,
if (mlx4_ib_port_link_layer(ibdev, port_num) == IB_LINK_LAYER_INFINIBAND) {
immutable->core_cap_flags = RDMA_CORE_PORT_IBA_IB;
+ immutable->max_mad_size = IB_MGMT_MAD_SIZE;
} else {
if (mdev->dev->caps.flags & MLX4_DEV_CAP_FLAG_IBOE)
immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE;
if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_ROCE_V1_V2)
immutable->core_cap_flags = RDMA_CORE_PORT_IBA_ROCE |
RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP;
+ immutable->core_cap_flags |= RDMA_CORE_PORT_RAW_PACKET;
+ if (immutable->core_cap_flags & (RDMA_CORE_PORT_IBA_ROCE |
+ RDMA_CORE_PORT_IBA_ROCE_UDP_ENCAP))
+ immutable->max_mad_size = IB_MGMT_MAD_SIZE;
}
- immutable->max_mad_size = IB_MGMT_MAD_SIZE;
-
return 0;
}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 02/10] IB/mlx5: Support raw packet protocol
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Mark support for the new raw packet protocol on Eth ports.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 1e47999..019a7a4 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2702,11 +2702,13 @@ static u32 get_core_cap_flags(struct ib_device *ibdev)
if (ll == IB_LINK_LAYER_INFINIBAND)
return RDMA_CORE_PORT_IBA_IB;
+ ret = RDMA_CORE_PORT_RAW_PACKET;
+
if (!(l3_type_cap & MLX5_ROCE_L3_TYPE_IPV4_CAP))
- return 0;
+ return ret;
if (!(l3_type_cap & MLX5_ROCE_L3_TYPE_IPV6_CAP))
- return 0;
+ return ret;
if (roce_version_cap & MLX5_ROCE_VERSION_1_CAP)
ret |= RDMA_CORE_PORT_IBA_ROCE;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 01/10] IB/core: Add raw packet protocol
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua, Or Gerlitz
In-Reply-To: <1480258296-27032-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Define raw packet protocol which comes to denote this port supports
the RAW_PACKET QP type. To be used in downstream patches where the
IB core serves a query on the supported QP types for device/port.
Signed-off-by: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
include/rdma/ib_verbs.h | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..0cb3194 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -485,6 +485,7 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats_struct(
#define RDMA_CORE_CAP_PROT_ROCE 0x00200000
#define RDMA_CORE_CAP_PROT_IWARP 0x00400000
#define RDMA_CORE_CAP_PROT_ROCE_UDP_ENCAP 0x00800000
+#define RDMA_CORE_CAP_PROT_RAW_PACKET 0x01000000
#define RDMA_CORE_PORT_IBA_IB (RDMA_CORE_CAP_PROT_IB \
| RDMA_CORE_CAP_IB_MAD \
@@ -508,6 +509,8 @@ static inline struct rdma_hw_stats *rdma_alloc_hw_stats_struct(
#define RDMA_CORE_PORT_INTEL_OPA (RDMA_CORE_PORT_IBA_IB \
| RDMA_CORE_CAP_OPA_MAD)
+#define RDMA_CORE_PORT_RAW_PACKET (RDMA_CORE_CAP_PROT_RAW_PACKET)
+
struct ib_port_attr {
u64 subnet_prefix;
enum ib_port_state state;
@@ -2286,6 +2289,11 @@ static inline bool rdma_ib_or_roce(const struct ib_device *device, u8 port_num)
rdma_protocol_roce(device, port_num);
}
+static inline bool rdma_protocol_raw_packet(const struct ib_device *device, u8 port_num)
+{
+ return device->port_immutable[port_num].core_cap_flags & RDMA_CORE_CAP_PROT_RAW_PACKET;
+}
+
/**
* rdma_cap_ib_mad - Check if the port of a device supports Infiniband
* Management Datagrams.
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 00/10] Support RAW Ethernet when RoCE is disabled
From: Leon Romanovsky @ 2016-11-27 14:51 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Steve Wise, Mike Marciniszyn,
Dennis Dalessandro, Lijun Ou, Wei Hu(Xavier), Faisal Latif,
Yishai Hadas, Selvin Xavier, Devesh Sharma, Mitesh Ahuja,
Christian Benvenuti, Dave Goodell, Moni Shoua
Hi Doug,
Please find below the patch set from Or. I didn't add version notation to this
patch set, because it was enriched extensively over initial version [1], from
one mlx5-only patch to native IB core support.
It is tested against v4.9-rc6.
From: Or Gerlitz <ogerlitz-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.
This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET QPs)
functionality to remain in place. For that end, we change the relevant driver
flows such that an IB device instance is created in that case as well.
Following the previous post [1], Doug wanted us to enable a way for
applications to query what QP types are actually supported on a device.
This series adds that functionality on device/port granularity, since
some drivers (mlx4) could support different protocols per port and
hence different QP types per port.
QP types are basically derived from the protocol/s supported on the port. We
added two protocols (raw packet and usnic) to have a protocol set which is
consistent with what is currently upstream.
Patches 1-7 deal with the new query and patches 8-10 with the mlx5 specific
changes.
To make it clear, the upstream kernel IB core is fully functional without the
new port attribute, in the sense that no errors are seen from components that
e.g use SMI and GSI services. The query is mostly targeted to user-space.
Thanks, Or.
[1] https://patchwork.kernel.org/patch/9096351/
---------------------------------------------------------------------------------------
Available in the "topic/raw-ibdev" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/raw-ibdev
---------------------------------------------------------------------------------------
CC: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
CC: Steve Wise <swise-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
CC: Mike Marciniszyn <mike.marciniszyn-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Dennis Dalessandro <dennis.dalessandro-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
CC: "Wei Hu(Xavier)" <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
CC: Faisal Latif <faisal.latif-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
CC: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
CC: Selvin Xavier <selvin.xavier-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>
CC: Devesh Sharma <devesh.sharma-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>
CC: Mitesh Ahuja <mitesh.ahuja-1wcpHE2jlwO1Z/+hSey0Gg@public.gmane.org>
CC: Christian Benvenuti <benve-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
CC: Dave Goodell <dgoodell-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
CC: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Or Gerlitz (10):
IB/core: Add raw packet protocol
IB/mlx5: Support raw packet protocol
IB/mlx4: Support raw packet protocol
IB: Add protocol for USNIC
IB: Query port through the core instead of directly calling the driver
handler
IB/core: Enable to query QP types supported by IB device on a port
IB/uverbs: Propagate supported QP types to user-space
IB/mlx5: Refactor registration to netdev notifier
IB/mlx5: Rename RoCE related helpers to reflect being Eth ones
IB/mlx5: Support RAW Ethernet when RoCE is disabled
drivers/infiniband/core/device.c | 28 +++++++++
drivers/infiniband/core/uverbs_cmd.c | 2 +
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 ++-
drivers/infiniband/hw/cxgb4/provider.c | 8 +--
drivers/infiniband/hw/hfi1/verbs.c | 1 +
drivers/infiniband/hw/hns/hns_roce_main.c | 10 ++-
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 8 +--
drivers/infiniband/hw/mlx4/alias_GUID.c | 1 +
drivers/infiniband/hw/mlx4/main.c | 23 ++++---
drivers/infiniband/hw/mlx4/sysfs.c | 1 +
drivers/infiniband/hw/mlx5/mad.c | 2 +-
drivers/infiniband/hw/mlx5/main.c | 91 +++++++++++++++++-----------
drivers/infiniband/hw/mthca/mthca_provider.c | 9 +--
drivers/infiniband/hw/nes/nes_verbs.c | 5 +-
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 9 +--
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 1 +
drivers/infiniband/hw/qedr/verbs.c | 11 ++--
drivers/infiniband/hw/qib/qib_verbs.c | 1 +
drivers/infiniband/hw/usnic/usnic_ib_main.c | 4 +-
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +-
drivers/infiniband/sw/rdmavt/vt.c | 7 ++-
drivers/infiniband/sw/rxe/rxe_verbs.c | 6 +-
include/rdma/ib_verbs.h | 17 ++++++
include/uapi/rdma/ib_user_verbs.h | 2 +-
24 files changed, 173 insertions(+), 83 deletions(-)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Christian König @ 2016-11-27 14:07 UTC (permalink / raw)
To: Haggai Eran, Jason Gunthorpe
Cc: Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Max Gurtovoy, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <d9e064a0-9c47-3e41-3154-cece8c70a119-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Am 27.11.2016 um 15:02 schrieb Haggai Eran:
> On 11/25/2016 9:32 PM, Jason Gunthorpe wrote:
>> On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:
>>
>>>> Like you say below we have to handle short lived in the usual way, and
>>>> that covers basically every device except IB MRs, including the
>>>> command queue on a NVMe drive.
>>> Well a problem which wasn't mentioned so far is that while GPUs do have a
>>> page table to mirror the CPU page table, they usually can't recover from
>>> page faults.
>>> So what we do is making sure that all memory accessed by the GPU Jobs stays
>>> in place while those jobs run (pretty much the same pinning you do for the
>>> DMA).
>> Yes, it is DMA, so this is a valid approach.
>>
>> But, you don't need page faults from the GPU to do proper coherent
>> page table mirroring. Basically when the driver submits the work to
>> the GPU it 'faults' the pages into the CPU and mirror translation
>> table (instead of pinning).
>>
>> Like in ODP, MMU notifiers/HMM are used to monitor for translation
>> changes. If a change comes in the GPU driver checks if an executing
>> command is touching those pages and blocks the MMU notifier until the
>> command flushes, then unfaults the page (blocking future commands) and
>> unblocks the mmu notifier.
> I think blocking mmu notifiers against something that is basically
> controlled by user-space can be problematic. This can block things like
> memory reclaim. If you have user-space access to the device's queues,
> user-space can block the mmu notifier forever.
Really good point.
I think this means the bare minimum if we don't have recoverable page
faults is to have preemption support like Felix described in his answer
as well.
Going to keep that in mind,
Christian.
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Haggai Eran @ 2016-11-27 14:02 UTC (permalink / raw)
To: Jason Gunthorpe, Christian König
Cc: Bridgman, John,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Blinzer, Paul, Suthikulpanit, Suravee,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Deucher, Alexander, Max Gurtovoy, Sander, Ben,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <20161125193252.GC16504-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On 11/25/2016 9:32 PM, Jason Gunthorpe wrote:
> On Fri, Nov 25, 2016 at 02:22:17PM +0100, Christian König wrote:
>
>>> Like you say below we have to handle short lived in the usual way, and
>>> that covers basically every device except IB MRs, including the
>>> command queue on a NVMe drive.
>>
>> Well a problem which wasn't mentioned so far is that while GPUs do have a
>> page table to mirror the CPU page table, they usually can't recover from
>> page faults.
>
>> So what we do is making sure that all memory accessed by the GPU Jobs stays
>> in place while those jobs run (pretty much the same pinning you do for the
>> DMA).
>
> Yes, it is DMA, so this is a valid approach.
>
> But, you don't need page faults from the GPU to do proper coherent
> page table mirroring. Basically when the driver submits the work to
> the GPU it 'faults' the pages into the CPU and mirror translation
> table (instead of pinning).
>
> Like in ODP, MMU notifiers/HMM are used to monitor for translation
> changes. If a change comes in the GPU driver checks if an executing
> command is touching those pages and blocks the MMU notifier until the
> command flushes, then unfaults the page (blocking future commands) and
> unblocks the mmu notifier.
I think blocking mmu notifiers against something that is basically
controlled by user-space can be problematic. This can block things like
memory reclaim. If you have user-space access to the device's queues,
user-space can block the mmu notifier forever.
On PeerDirect, we have some kind of a middle-ground solution for pinning
GPU memory. We create a non-ODP MR pointing to VRAM but rely on
user-space and the GPU not to migrate it. If they do, the MR gets
destroyed immediately. This should work on legacy devices without ODP
support, and allows the system to safely terminate a process that
misbehaves. The downside of course is that it cannot transparently
migrate memory but I think for user-space RDMA doing that transparently
requires hardware support for paging, via something like HMM.
...
> I'm hearing most people say ZONE_DEVICE is the way to handle this,
> which means the missing remaing piece for RDMA is some kind of DMA
> core support for p2p address translation..
Yes, this is definitely something we need. I think Will Davis's patches
are a good start.
Another thing I think is that while HMM is good for user-space
applications, for kernel p2p use there is no need for that. Using
ZONE_DEVICE with or without something like DMA-BUF to pin and unpin
pages for the short duration as you wrote above could work fine for
kernel uses in which we can guarantee they are short.
Haggai
^ permalink raw reply
* [PATCH rdma-next 4/4] IB/mlx5: Use u64 for UMR length
From: Leon Romanovsky @ 2016-11-27 13:18 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Maor Gottlieb
In-Reply-To: <1480252702-8005-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
The fast_registration length is used to convey length for memory
registrations through UMR which can be of any size up to 2^64.
Change the length type to be u64.
Fixes: 968e78dd9644 ('IB/mlx5: Enhance UMR support to allow partial page table update')
Signed-off-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/mlx5_ib.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index 7d68990..0b938ad 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -418,7 +418,7 @@ struct mlx5_umr_wr {
struct ib_pd *pd;
unsigned int page_shift;
unsigned int npages;
- u32 length;
+ u64 length;
int access_flags;
u32 mkey;
};
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox