* RE: [PATCH 1/2] i40iw: Remove variable flush_code and check to set qp->sq_flush
From: Orosco, Henry @ 2016-11-10 4:17 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
In-Reply-To: <20161110034207.18016-1-henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Please disregard this one. Accidently sent this one as part of a series.
-----Original Message-----
From: Orosco, Henry
Sent: Wednesday, November 09, 2016 9:42 PM
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org; Orosco, Henry <henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: [PATCH 1/2] i40iw: Remove variable flush_code and check to set qp->sq_flush
The flush_code variable in i40iw_bld_terminate_hdr() is obsolete and the check to set qp->sq_flush is unreachable. Currently flush code is populated in setup_term_hdr() and both SQ and RQ are flushed always as part of the tear down flow.
Signed-off-by: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Henry Orosco <henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/hw/i40iw/i40iw_ctrl.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index cd71444..57a4236 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -4045,7 +4045,6 @@ static int i40iw_bld_terminate_hdr(struct i40iw_sc_qp *qp,
u16 ddp_seg_len;
int copy_len = 0;
u8 is_tagged = 0;
- enum i40iw_flush_opcode flush_code = FLUSH_INVALID;
u32 opcode;
struct i40iw_terminate_hdr *termhdr;
@@ -4218,9 +4217,6 @@ static int i40iw_bld_terminate_hdr(struct i40iw_sc_qp *qp,
if (copy_len)
memcpy(termhdr + 1, pkt, copy_len);
- if (flush_code && !info->in_rdrsp_wr)
- qp->sq_flush = (info->sq) ? true : false;
-
return sizeof(struct i40iw_terminate_hdr) + copy_len; }
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH] i40iw: Remove variable flush_code and check to set qp->sq_flush
From: Henry Orosco @ 2016-11-10 4:20 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA,
e1000-rdma-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Henry Orosco
The flush_code variable in i40iw_bld_terminate_hdr() is obsolete and
the check to set qp->sq_flush is unreachable. Currently flush code is
populated in setup_term_hdr() and both SQ and RQ are flushed always
as part of the tear down flow.
Signed-off-by: Shiraz Saleem <shiraz.saleem-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Henry Orosco <henry.orosco-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/infiniband/hw/i40iw/i40iw_ctrl.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
index cd71444..57a4236 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_ctrl.c
@@ -4045,7 +4045,6 @@ static int i40iw_bld_terminate_hdr(struct i40iw_sc_qp *qp,
u16 ddp_seg_len;
int copy_len = 0;
u8 is_tagged = 0;
- enum i40iw_flush_opcode flush_code = FLUSH_INVALID;
u32 opcode;
struct i40iw_terminate_hdr *termhdr;
@@ -4218,9 +4217,6 @@ static int i40iw_bld_terminate_hdr(struct i40iw_sc_qp *qp,
if (copy_len)
memcpy(termhdr + 1, pkt, copy_len);
- if (flush_code && !info->in_rdrsp_wr)
- qp->sq_flush = (info->sq) ? true : false;
-
return sizeof(struct i40iw_terminate_hdr) + copy_len;
}
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [PATCH rdma-rc 0/9] mlx4 fixes for 4.9
From: Doug Ledford @ 2016-11-10 5:02 UTC (permalink / raw)
To: Leon Romanovsky, Yuval Shaia; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161108085021.GA27883-2ukJVAZIZ/Y@public.gmane.org>
[-- Attachment #1.1: Type: text/plain, Size: 326 bytes --]
On 11/8/2016 3:50 AM, Leon Romanovsky wrote:
> On Sat, Nov 05, 2016 at 09:57:13PM +0200, Leon Romanovsky wrote:
>> Hi Doug,
>
> Please drop this patch series, I'll respin it once I'll return to
> office.
Okie dokie.
--
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
GPG Key ID: 0E572FDD
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]
^ permalink raw reply
* Re: [PATCH rdma-next 0/4] Add packet pacing support for IB verbs
From: Leon Romanovsky @ 2016-11-10 7:22 UTC (permalink / raw)
To: Hefty, Sean, Bodong Wang
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373AB0A7F0A-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]
On Wed, Nov 09, 2016 at 05:06:52PM +0000, Hefty, Sean wrote:
> > On Tue, Nov 08, 2016 at 05:49:26PM +0000, Hefty, Sean wrote:
> > > > When sending from a 10G host to a 1G host, it is easy to overrun
> > the
> > > > receiver,
> > > > leading to packet loss and traffic backing off. Similar problems
> > occur
> > > > when
> > > > a 10G host sends data to a sub-10G virtual circuit, or a 40G host
> > > > sending
> > > > to a 10G host. Packet pacing could control packet injection rate
> > and
> > > > reduces
> > > > network congestion to maximize throughput & minimize network
> > latency.
> > >
> > > Why isn't the path record data and existing mechanisms sufficient to
> > handle this?
> > >
> >
> > Packet pacing allows different combinations of traffic shaping: per-
> > CPU,
> > per-flow and their combinations with better and steady QoS requirements
> > without involving subnet management.
>
> The patch adds this as a QP attribute, and we already have a rate for that. I still don't see why the standard mechanisms are insufficient or couldn't be adapted.
I'll let to Bodong to elaborate on it more, but as far as I see, the AH
attribute is relevant to UD QP only, while the packet pacing is intended
for all types of QPs.
Thanks
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]
^ permalink raw reply
* Re: [PATCHv12 0/3] rdmacg: IB/core: rdma controller support
From: Parav Pandit @ 2016-11-10 7:41 UTC (permalink / raw)
To: Liran Liss
Cc: Leon Romanovsky, Tejun Heo,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma,
Li Zefan, Johannes Weiner, Doug Ledford, Christoph Hellwig,
Hefty, Sean, Jason Gunthorpe, Haggai Eran,
james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
Or Gerlitz, Matan Barak
In-Reply-To: <HE1PR0501MB2812298C05431B08B0F408EEB1A60-692Kmc8YnlIVrnpjwTCbp8DSnupUy6xnnBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
Hi Leon, Tejun, Christoph, Liran, Doug, Matan,
So are you ok with below proposal?
1. Define two resources by rdmacg.
(a) hca_handles (covers doorbell pages)
(b) hca_resources (mr, pd, qp, srq, vendor defined, all consolidated count)
Both cannot be combined as explained in [1].
2. User configures absolute count for above two resources (similar to
today's file descriptors, pid cgroup controller max limit)
Leon,
Let us know if you have any further discussions during LPC on
questions of [2] in using percentage based scheme or otherwise.
Parav
[1] https://www.spinics.net/lists/linux-rdma/msg42771.html
[2] https://www.spinics.net/lists/linux-rdma/msg42768.html
On Tue, Nov 8, 2016 at 1:42 PM, Liran Liss <liranl-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> wrote:
>> From: Parav Pandit [mailto:pandit.parav-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org]
>
>> >
>> > Hmm..
>> > I guess that you are right.
>> >
>> > So we can add another count for "HCA handles",
>> I prefer this. This keeps it vendor agnostic and clean if we don't go percentage
>> route.
>
> OK; let's do it.
>
>> Would indirection table also fall in this category?
>>
>
> No. It's just another HCA resource...
>
>> > or alternatively, each provider will restrict the number of handles
>> > per device to a reasonable small number (which won't be treated as one of the
>> "HCA resources").
>> This would require vendor drivers to get the understanding of cgroup object
>> and pid and that breaks the modular approach. I like to avoid this.
>>
>> > Typically, a process shouldn't need to open more than a single handle...
>> Right. well behaved application won't do multiple handles.
^ permalink raw reply
* [PATCH rdma-rc] IB/IPoIB: Remove can't use GFP_NOIO warning
From: Leon Romanovsky @ 2016-11-10 8:16 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Kamal Heib
From: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.
This print become more annoying when the IPoIB interface is configured
to work in connected mode.
Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
Signed-off-by: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 4ad297d..50c9772 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1053,8 +1053,6 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
tx_qp = ib_create_qp(priv->pd, &attr);
if (PTR_ERR(tx_qp) == -EINVAL) {
- ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
- priv->ca->name);
attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
tx_qp = ib_create_qp(priv->pd, &attr);
}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: [RFC ABI V5 02/10] RDMA/core: Add support for custom types
From: Matan Barak @ 2016-11-10 8:29 UTC (permalink / raw)
To: Hefty, Sean, Matan Barak
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Doug Ledford,
Jason Gunthorpe, Christoph Lameter, Liran Liss, Haggai Eran,
Majd Dibbiny, Tal Alon, Leon Romanovsky
In-Reply-To: <1828884A29C6694DAF28B7E6B8A82373AB0A8000-P5GAC/sN6hkd3b2yrw5b5LfspsVTdybXVpNB7YpNyf8@public.gmane.org>
On 09/11/2016 20:00, Hefty, Sean wrote:
>> I had thought about that, but the user could initialize its part of
>> the object in the function handler. It can't allocate the object as we
>> need it in order to allocate an IDR entry and co. The assumption here
>> is that the "unlock" stage can't fail.
>
> This is creating a generic OO type of framework, so just add constructor/destructor functions and have all objects inherit from a base ioctl object class.
>
Adding a constructor and destructor to every object would make the
infrastructure slower, it'll open code the locks (which are more related
to the actions themselves and not to the types), it'll duplicate more
code. Anyway, examining our use, I don't really see good value for ctors
and dtors for our usage right now.
>>> In fact, it would be great if we could just cleanup the list in the
>> reverse order that items were created. Maybe this requires supporting
>> a pre-cleanup handler, so that the driver can pluck items out of the
>> list that may need to be destroyed out of order.
>>>
>>
>> So that's essentially one layer of ordering. Why do you consider a
>> driver iterating over all objects simpler than this model?
>
> This problem is a verbs specific issue, and one that only involves MW . We have reference counts that can provide the same functionality. I want to avoid the amount of meta-data that needs to be used to describe objects.
>
It's currently happens in the verbs MW/MRs. However, it could happen
with any types whose bindings happen in the user-space or hardware.
>>>> Adding an object is done in two parts.
>>>> First, an object is allocated and added to IDR/fd table. Then, the
>>>> command's handlers (in downstream patches) could work on this object
>>>> and fill in its required details.
>>>> After a successful command, ib_uverbs_uobject_enable is called and
>>>> this user objects becomes ucontext visible.
>>>
>>> If you have a way to mark that an object is used for exclusive
>> access, you may be able to use that instead of introducing a new
>> variable. (I.e. acquire the object's write lock). I think we want to
>> make an effort to minimize the size of the kernel structure needed to
>> track every user space object (within reason).
>>>
>>
>> I didn't really follow. A command attribute states the nature of the
>> locking (for example, in MODIFY_QP the QP could be exclusively locked,
>> but in QUERY_QP it's only locked for reading). I don't want to really
>> grab a lock, as if I were I could face a dead-lock (user-space could
>> pass parameters in a colliding order), It could be solved by sorting
>> the handles, but that would degrade performance without a good reasob.
>
> I'm suggesting that the locking attribute and command be separate. This allows the framework to acquire the proper type of lock independent of what function it will invoke.
>
The locking is tied to what you want to do on that type. If you query
something, read locks are probably enough. I don't think separating them
will make the code more readable.
> The framework doesn't need to hold locks. It should be able to mark access to an object. If that access is not available, it can abort. This pushes more complex synchronization and thread handling to user space.
>
The try_{read,write} locks achieve exactly that and are simple enough.
>>>> Removing an uboject is done by calling ib_uverbs_uobject_remove.
>>>>
>>>> We should make sure IDR (per-device) and list (per-ucontext) could
>>>> be accessed concurrently without corrupting them.
>>>>
>>>> Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Signed-off-by: Haggai Eran <haggaie-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Signed-off-by: Leon Romanovsky <leonro-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> ---
>>>
>>> As a general comment, I do have concerns that the resulting
>> generalized parsing of everything will negatively impact performance
>> for operations that do have to transition into the kernel. Not all
>> devices offload all operations to user space. Plus the resulting code
>> is extremely difficult to read and non-trivial to use. It's equivalent
>> to reading C++ code that has 4 layers of inheritance with overrides to
>> basic operators...
>>
>> There are two parts here. I think the handlers themselves are simpler,
>> easier to read and less error-prone. They contain less code
>> duplications. The macro based define language explicitly declare all
>> attributes, their types, size, etc.
>> The model here is a bit more complex as we want to achieve both code
>> resue and add/override of new types/actions/attributes.
>>
>>
>>>
>>> Pre and post operators per command that can do straightforward
>> validation seem like a better option.
>>>
>>>
>>
>> I think that would duplicate a lot of code and will be more
>> error-prone than one infrastrucutre that automates all that work for
>> you.
>
> I think that's a toss-up. Either you have to write the code correctly or write the rules correctly. Reading code is straightforward, manually converting rules into code is not.
>
> In any case, the two approaches are not exclusive. By forcing the rule language into the framework, everything is forced to deal with it. By leaving it out, each ioctl provider can decide if they need this or not. If you want verbs to process all ioctl's using a single pre-validation function that operates based on these rules you can. Nothing prevents that. But ioctl providers that want better performance can elect for a more straightforward validation model.
>
Please see Jason's response.
>>>> drivers/infiniband/core/Makefile | 3 +-
>>>> drivers/infiniband/core/device.c | 1 +
>>>> drivers/infiniband/core/rdma_core.c | 489
>>>> ++++++++++++++++++++++++++++++++++
>>>> drivers/infiniband/core/rdma_core.h | 75 ++++++
>>>> drivers/infiniband/core/uverbs.h | 1 +
>>>> drivers/infiniband/core/uverbs_main.c | 2 +-
>>>> include/rdma/ib_verbs.h | 28 +-
>>>> include/rdma/uverbs_ioctl.h | 195 ++++++++++++++
>>>> 8 files changed, 789 insertions(+), 5 deletions(-)
>>>> create mode 100644 drivers/infiniband/core/rdma_core.c
>>>> create mode 100644 drivers/infiniband/core/rdma_core.h
>>>> create mode 100644 include/rdma/uverbs_ioctl.h
>>>>
>>>> diff --git a/drivers/infiniband/core/Makefile
>>>> b/drivers/infiniband/core/Makefile
>>>> index edaae9f..1819623 100644
>>>> --- a/drivers/infiniband/core/Makefile
>>>> +++ b/drivers/infiniband/core/Makefile
>>>> @@ -28,4 +28,5 @@ ib_umad-y := user_mad.o
>>>>
>>>> ib_ucm-y := ucm.o
>>>>
>>>> -ib_uverbs-y := uverbs_main.o uverbs_cmd.o
>>>> uverbs_marshall.o
>>>> +ib_uverbs-y := uverbs_main.o uverbs_cmd.o
>>>> uverbs_marshall.o \
>>>> + rdma_core.o
>>>> diff --git a/drivers/infiniband/core/device.c
>>>> b/drivers/infiniband/core/device.c
>>>> index c3b68f5..43994b1 100644
>>>> --- a/drivers/infiniband/core/device.c
>>>> +++ b/drivers/infiniband/core/device.c
>>>> @@ -243,6 +243,7 @@ struct ib_device *ib_alloc_device(size_t size)
>>>> spin_lock_init(&device->client_data_lock);
>>>> INIT_LIST_HEAD(&device->client_data_list);
>>>> INIT_LIST_HEAD(&device->port_list);
>>>> + INIT_LIST_HEAD(&device->type_list);
>>>>
>>>> return device;
>>>> }
>>>> diff --git a/drivers/infiniband/core/rdma_core.c
>>>> b/drivers/infiniband/core/rdma_core.c
>>>> new file mode 100644
>>>> index 0000000..337abc2
>>>> --- /dev/null
>>>> +++ b/drivers/infiniband/core/rdma_core.c
>>>> @@ -0,0 +1,489 @@
>>>> +/*
>>>> + * Copyright (c) 2016, Mellanox Technologies inc. All rights
>>>> reserved.
>>>> + *
>>>> + * This software is available to you under a choice of one of two
>>>> + * licenses. You may choose to be licensed under the terms of the
>> GNU
>>>> + * General Public License (GPL) Version 2, available from the file
>>>> + * COPYING in the main directory of this source tree, or the
>>>> + * OpenIB.org BSD license below:
>>>> + *
>>>> + * Redistribution and use in source and binary forms, with or
>>>> + * without modification, are permitted provided that the
>> following
>>>> + * conditions are met:
>>>> + *
>>>> + * - Redistributions of source code must retain the above
>>>> + * copyright notice, this list of conditions and the
>> following
>>>> + * disclaimer.
>>>> + *
>>>> + * - Redistributions in binary form must reproduce the above
>>>> + * copyright notice, this list of conditions and the
>> following
>>>> + * disclaimer in the documentation and/or other materials
>>>> + * provided with the distribution.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> OF
>>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> HOLDERS
>>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
>> AN
>>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
>> IN
>>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>>>> + * SOFTWARE.
>>>> + */
>>>> +
>>>> +#include <linux/file.h>
>>>> +#include <linux/anon_inodes.h>
>>>> +#include <rdma/ib_verbs.h>
>>>> +#include "uverbs.h"
>>>> +#include "rdma_core.h"
>>>> +#include <rdma/uverbs_ioctl.h>
>>>> +
>>>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>>>> *ibdev,
>>>> + uint16_t type)
>>>> +{
>>>> + const struct uverbs_types_group *groups = ibdev->types_group;
>>>> + const struct uverbs_types *types;
>>>> + int ret = groups->dist(&type, groups->priv);
>>>> +
>>>> + if (ret >= groups->num_groups)
>>>> + return NULL;
>>>> +
>>>> + types = groups->type_groups[ret];
>>>> +
>>>> + if (type >= types->num_types)
>>>> + return NULL;
>>>> +
>>>> + return types->types[type];
>>>> +}
>>>> +
>>>> +static int uverbs_lock_object(struct ib_uobject *uobj,
>>>> + enum uverbs_idr_access access)
>>>> +{
>>>> + if (access == UVERBS_IDR_ACCESS_READ)
>>>> + return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -
>> EBUSY;
>>>> +
>>>> + /* lock is either WRITE or DESTROY - should be exclusive */
>>>> + return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
>>>
>>> This function could take the lock type directly (read or write),
>> versus inferring it based on some other access type.
>>>
>>
>> We can, but since we use these enums in the attribute specifications,
>> I thought it could be more convinient.
>>
>>>> +}
>>>> +
>>>> +static struct ib_uobject *get_uobj(int id, struct ib_ucontext
>>>> *context)
>>>> +{
>>>> + struct ib_uobject *uobj;
>>>> +
>>>> + rcu_read_lock();
>>>> + uobj = idr_find(&context->device->idr, id);
>>>> + if (uobj && uobj->live) {
>>>> + if (uobj->context != context)
>>>> + uobj = NULL;
>>>> + }
>>>> + rcu_read_unlock();
>>>> +
>>>> + return uobj;
>>>> +}
>>>> +
>>>> +struct ib_ucontext_lock {
>>>> + struct kref ref;
>>>> + /* locking the uobjects_list */
>>>> + struct mutex lock;
>>>> +};
>>>> +
>>>> +static void init_uobjects_list_lock(struct ib_ucontext_lock *lock)
>>>> +{
>>>> + mutex_init(&lock->lock);
>>>> + kref_init(&lock->ref);
>>>> +}
>>>> +
>>>> +static void release_uobjects_list_lock(struct kref *ref)
>>>> +{
>>>> + struct ib_ucontext_lock *lock = container_of(ref,
>>>> + struct
>> ib_ucontext_lock,
>>>> + ref);
>>>> +
>>>> + kfree(lock);
>>>> +}
>>>> +
>>>> +static void init_uobj(struct ib_uobject *uobj, u64 user_handle,
>>>> + struct ib_ucontext *context)
>>>> +{
>>>> + init_rwsem(&uobj->usecnt);
>>>> + uobj->user_handle = user_handle;
>>>> + uobj->context = context;
>>>> + uobj->live = 0;
>>>> +}
>>>> +
>>>> +static int add_uobj(struct ib_uobject *uobj)
>>>> +{
>>>> + int ret;
>>>> +
>>>> + idr_preload(GFP_KERNEL);
>>>> + spin_lock(&uobj->context->device->idr_lock);
>>>> +
>>>> + ret = idr_alloc(&uobj->context->device->idr, uobj, 0, 0,
>>>> GFP_NOWAIT);
>>>> + if (ret >= 0)
>>>> + uobj->id = ret;
>>>> +
>>>> + spin_unlock(&uobj->context->device->idr_lock);
>>>> + idr_preload_end();
>>>> +
>>>> + return ret < 0 ? ret : 0;
>>>> +}
>>>> +
>>>> +static void remove_uobj(struct ib_uobject *uobj)
>>>> +{
>>>> + spin_lock(&uobj->context->device->idr_lock);
>>>> + idr_remove(&uobj->context->device->idr, uobj->id);
>>>> + spin_unlock(&uobj->context->device->idr_lock);
>>>> +}
>>>> +
>>>> +static void put_uobj(struct ib_uobject *uobj)
>>>> +{
>>>> + kfree_rcu(uobj, rcu);
>>>> +}
>>>> +
>>>> +static struct ib_uobject *get_uobject_from_context(struct
>> ib_ucontext
>>>> *ucontext,
>>>> + const struct
>>>> uverbs_type_alloc_action *type,
>>>> + u32 idr,
>>>> + enum
>> uverbs_idr_access access)
>>>> +{
>>>> + struct ib_uobject *uobj;
>>>> + int ret;
>>>> +
>>>> + rcu_read_lock();
>>>> + uobj = get_uobj(idr, ucontext);
>>>> + if (!uobj)
>>>> + goto free;
>>>> +
>>>> + if (uobj->type != type) {
>>>> + uobj = NULL;
>>>> + goto free;
>>>> + }
>>>> +
>>>> + ret = uverbs_lock_object(uobj, access);
>>>> + if (ret)
>>>> + uobj = ERR_PTR(ret);
>>>> +free:
>>>> + rcu_read_unlock();
>>>> + return uobj;
>>>> +
>>>> + return NULL;
>>>> +}
>>>> +
>>>> +static int ib_uverbs_uobject_add(struct ib_uobject *uobject,
>>>> + const struct uverbs_type_alloc_action
>>>> *uobject_type)
>>>> +{
>>>> + uobject->type = uobject_type;
>>>> + return add_uobj(uobject);
>>>> +}
>>>> +
>>>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>>>> uverbs_type_alloc_action *type,
>>>> + struct ib_ucontext
>> *ucontext,
>>>> + enum uverbs_idr_access
>> access,
>>>> + uint32_t idr)
>>>> +{
>>>> + struct ib_uobject *uobj;
>>>> + int ret;
>>>> +
>>>> + if (access == UVERBS_IDR_ACCESS_NEW) {
>>>> + uobj = kmalloc(type->obj_size, GFP_KERNEL);
>>>> + if (!uobj)
>>>> + return ERR_PTR(-ENOMEM);
>>>> +
>>>> + init_uobj(uobj, 0, ucontext);
>>>> +
>>>> + /* lock idr */
>>>
>>> Command to lock idr, but no lock is obtained.
>>>
>>
>> ib_uverbs_uobject_add calls add_uobj which locks the IDR.
>>
>>>> + ret = ib_uverbs_uobject_add(uobj, type);
>>>> + if (ret) {
>>>> + kfree(uobj);
>>>> + return ERR_PTR(ret);
>>>> + }
>>>> +
>>>> + } else {
>>>> + uobj = get_uobject_from_context(ucontext, type, idr,
>>>> + access);
>>>> +
>>>> + if (!uobj)
>>>> + return ERR_PTR(-ENOENT);
>>>> + }
>>>> +
>>>> + return uobj;
>>>> +}
>>>> +
>>>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>>>> uverbs_type_alloc_action *type,
>>>> + struct ib_ucontext
>> *ucontext,
>>>> + enum uverbs_idr_access
>> access,
>>>> + int fd)
>>>> +{
>>>> + if (access == UVERBS_IDR_ACCESS_NEW) {
>>>> + int _fd;
>>>> + struct ib_uobject *uobj = NULL;
>>>> + struct file *filp;
>>>> +
>>>> + _fd = get_unused_fd_flags(O_CLOEXEC);
>>>> + if (_fd < 0 || WARN_ON(type->obj_size < sizeof(struct
>>>> ib_uobject)))
>>>> + return ERR_PTR(_fd);
>>>> +
>>>> + uobj = kmalloc(type->obj_size, GFP_KERNEL);
>>>> + init_uobj(uobj, 0, ucontext);
>>>> +
>>>> + if (!uobj)
>>>> + return ERR_PTR(-ENOMEM);
>>>> +
>>>> + filp = anon_inode_getfile(type->fd.name, type-
>>> fd.fops,
>>>> + uobj + 1, type->fd.flags);
>>>> + if (IS_ERR(filp)) {
>>>> + put_unused_fd(_fd);
>>>> + kfree(uobj);
>>>> + return (void *)filp;
>>>> + }
>>>> +
>>>> + uobj->type = type;
>>>> + uobj->id = _fd;
>>>> + uobj->object = filp;
>>>> +
>>>> + return uobj;
>>>> + } else if (access == UVERBS_IDR_ACCESS_READ) {
>>>> + struct file *f = fget(fd);
>>>> + struct ib_uobject *uobject;
>>>> +
>>>> + if (!f)
>>>> + return ERR_PTR(-EBADF);
>>>> +
>>>> + uobject = f->private_data - sizeof(struct ib_uobject);
>>>> + if (f->f_op != type->fd.fops ||
>>>> + !uobject->live) {
>>>> + fput(f);
>>>> + return ERR_PTR(-EBADF);
>>>> + }
>>>> +
>>>> + /*
>>>> + * No need to protect it with a ref count, as fget
>>>> increases
>>>> + * f_count.
>>>> + */
>>>> + return uobject;
>>>> + } else {
>>>> + return ERR_PTR(-EOPNOTSUPP);
>>>> + }
>>>> +}
>>>> +
>>>> +static void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
>>>> +{
>>>> + mutex_lock(&uobject->context->uobjects_lock->lock);
>>>> + list_add(&uobject->list, &uobject->context->uobjects);
>>>> + mutex_unlock(&uobject->context->uobjects_lock->lock);
>>>
>>> Why not just insert the object into the list on creation?
>>>
>>>> + uobject->live = 1;
>>>
>>> See my comments above on removing the live field.
>>>
>>
>> Seems that the list could suffice, but I'll look into that.
>>
>>>> +}
>>>> +
>>>> +static void ib_uverbs_uobject_remove(struct ib_uobject *uobject,
>> bool
>>>> lock)
>>>> +{
>>>> + /*
>>>> + * Calling remove requires exclusive access, so it's not
>> possible
>>>> + * another thread will use our object.
>>>> + */
>>>> + uobject->live = 0;
>>>> + uobject->type->free_fn(uobject->type, uobject);
>>>> + if (lock)
>>>> + mutex_lock(&uobject->context->uobjects_lock->lock);
>>>> + list_del(&uobject->list);
>>>> + if (lock)
>>>> + mutex_unlock(&uobject->context->uobjects_lock->lock);
>>>> + remove_uobj(uobject);
>>>> + put_uobj(uobject);
>>>> +}
>>>> +
>>>> +static void uverbs_unlock_idr(struct ib_uobject *uobj,
>>>> + enum uverbs_idr_access access,
>>>> + bool success)
>>>> +{
>>>> + switch (access) {
>>>> + case UVERBS_IDR_ACCESS_READ:
>>>> + up_read(&uobj->usecnt);
>>>> + break;
>>>> + case UVERBS_IDR_ACCESS_NEW:
>>>> + if (success) {
>>>> + ib_uverbs_uobject_enable(uobj);
>>>> + } else {
>>>> + remove_uobj(uobj);
>>>> + put_uobj(uobj);
>>>> + }
>>>> + break;
>>>> + case UVERBS_IDR_ACCESS_WRITE:
>>>> + up_write(&uobj->usecnt);
>>>> + break;
>>>> + case UVERBS_IDR_ACCESS_DESTROY:
>>>> + if (success)
>>>> + ib_uverbs_uobject_remove(uobj, true);
>>>> + else
>>>> + up_write(&uobj->usecnt);
>>>> + break;
>>>> + }
>>>> +}
>>>> +
>>>> +static void uverbs_unlock_fd(struct ib_uobject *uobj,
>>>> + enum uverbs_idr_access access,
>>>> + bool success)
>>>> +{
>>>> + struct file *filp = uobj->object;
>>>> +
>>>> + if (access == UVERBS_IDR_ACCESS_NEW) {
>>>> + if (success) {
>>>> + kref_get(&uobj->context->ufile->ref);
>>>> + uobj->uobjects_lock = uobj->context-
>>> uobjects_lock;
>>>> + kref_get(&uobj->uobjects_lock->ref);
>>>> + ib_uverbs_uobject_enable(uobj);
>>>> + fd_install(uobj->id, uobj->object);
>>>
>>> I don't get this. The function is unlocking something, but there are
>> calls to get krefs?
>>>
>>
>> Before invoking the user's callback, we're first locking all objects
>> and afterwards we're unlocking them. When we need to create a new
>> object, the lock becomes object creation and the unlock could become
>> (assuming the user's callback succeeded) enabling this new object.
>> When you add a new object (or fd in this case), we take a reference
>> count to both the uverbs_file and the locking context.
>>
>>>> + } else {
>>>> + fput(uobj->object);
>>>> + put_unused_fd(uobj->id);
>>>> + kfree(uobj);
>>>> + }
>>>> + } else {
>>>> + fput(filp);
>>>> + }
>>>> +}
>>>> +
>>>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>>>> + enum uverbs_idr_access access,
>>>> + bool success)
>>>> +{
>>>> + if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
>>>> + uverbs_unlock_idr(uobj, access, success);
>>>> + else if (uobj->type->type == UVERBS_ATTR_TYPE_FD)
>>>> + uverbs_unlock_fd(uobj, access, success);
>>>> + else
>>>> + WARN_ON(true);
>>>> +}
>>>> +
>>>> +static void ib_uverbs_remove_fd(struct ib_uobject *uobject)
>>>> +{
>>>> + /*
>>>> + * user should release the uobject in the release
>>>> + * callback.
>>>> + */
>>>> + if (uobject->live) {
>>>> + uobject->live = 0;
>>>> + list_del(&uobject->list);
>>>> + uobject->type->free_fn(uobject->type, uobject);
>>>> + kref_put(&uobject->context->ufile->ref,
>>>> ib_uverbs_release_file);
>>>> + uobject->context = NULL;
>>>> + }
>>>> +}
>>>> +
>>>> +void ib_uverbs_close_fd(struct file *f)
>>>> +{
>>>> + struct ib_uobject *uobject = f->private_data - sizeof(struct
>>>> ib_uobject);
>>>> +
>>>> + mutex_lock(&uobject->uobjects_lock->lock);
>>>> + if (uobject->live) {
>>>> + uobject->live = 0;
>>>> + list_del(&uobject->list);
>>>> + kref_put(&uobject->context->ufile->ref,
>>>> ib_uverbs_release_file);
>>>> + uobject->context = NULL;
>>>> + }
>>>> + mutex_unlock(&uobject->uobjects_lock->lock);
>>>> + kref_put(&uobject->uobjects_lock->ref,
>>>> release_uobjects_list_lock);
>>>> +}
>>>> +
>>>> +void ib_uverbs_cleanup_fd(void *private_data)
>>>> +{
>>>> + struct ib_uboject *uobject = private_data - sizeof(struct
>>>> ib_uobject);
>>>> +
>>>> + kfree(uobject);
>>>> +}
>>>> +
>>>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>>>> + size_t num,
>>>> + const struct uverbs_action_spec *spec,
>>>> + bool success)
>>>> +{
>>>> + unsigned int i;
>>>> +
>>>> + for (i = 0; i < num; i++) {
>>>> + struct uverbs_attr_array *attr_spec_array =
>> &attr_array[i];
>>>> + const struct uverbs_attr_group_spec *group_spec =
>>>> + spec->attr_groups[i];
>>>> + unsigned int j;
>>>> +
>>>> + for (j = 0; j < attr_spec_array->num_attrs; j++) {
>>>> + struct uverbs_attr *attr = &attr_spec_array-
>>>>> attrs[j];
>>>> + struct uverbs_attr_spec *spec = &group_spec-
>>>>> attrs[j];
>>>> +
>>>> + if (!attr->valid)
>>>> + continue;
>>>> +
>>>> + if (spec->type == UVERBS_ATTR_TYPE_IDR ||
>>>> + spec->type == UVERBS_ATTR_TYPE_FD)
>>>> + /*
>>>> + * refcounts should be handled at the
>> object
>>>> + * level and not at the uobject level.
>>>> + */
>>>> + uverbs_unlock_object(attr-
>>> obj_attr.uobject,
>>>> + spec->obj.access,
>> success);
>>>> + }
>>>> + }
>>>> +}
>>>> +
>>>> +static unsigned int get_type_orders(const struct uverbs_types_group
>>>> *types_group)
>>>> +{
>>>> + unsigned int i;
>>>> + unsigned int max = 0;
>>>> +
>>>> + for (i = 0; i < types_group->num_groups; i++) {
>>>> + unsigned int j;
>>>> + const struct uverbs_types *types = types_group-
>>>>> type_groups[i];
>>>> +
>>>> + for (j = 0; j < types->num_types; j++) {
>>>> + if (!types->types[j] || !types->types[j]-
>>> alloc)
>>>> + continue;
>>>> + if (types->types[j]->alloc->order > max)
>>>> + max = types->types[j]->alloc->order;
>>>> + }
>>>> + }
>>>> +
>>>> + return max;
>>>> +}
>>>> +
>>>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>>>> *ucontext,
>>>> + const struct
>> uverbs_types_group
>>>> *types_group)
>>>> +{
>>>> + unsigned int num_orders = get_type_orders(types_group);
>>>> + unsigned int i;
>>>> +
>>>> + for (i = 0; i <= num_orders; i++) {
>>>> + struct ib_uobject *obj, *next_obj;
>>>> +
>>>> + /*
>>>> + * No need to take lock here, as cleanup should be
>> called
>>>> + * after all commands finished executing. Newly
>> executed
>>>> + * commands should fail.
>>>> + */
>>>> + mutex_lock(&ucontext->uobjects_lock->lock);
>>>
>>> It's really confusing to see a comment about 'no need to take lock'
>> immediately followed by a call to lock.
>>>
>>
>> Yeah :) That was before adding the fd. I'll delete the comment.
>>
>>>> + list_for_each_entry_safe(obj, next_obj, &ucontext-
>>>>> uobjects,
>>>> + list)
>>>> + if (obj->type->order == i) {
>>>> + if (obj->type->type ==
>> UVERBS_ATTR_TYPE_IDR)
>>>> + ib_uverbs_uobject_remove(obj,
>> false);
>>>> + else
>>>> + ib_uverbs_remove_fd(obj);
>>>> + }
>>>> + mutex_unlock(&ucontext->uobjects_lock->lock);
>>>> + }
>>>> + kref_put(&ucontext->uobjects_lock->ref,
>>>> release_uobjects_list_lock);
>>>> +}
>>>> +
>>>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>>>> *ucontext)
>>>
>>> Please work on the function names. This is horrendously long and
>> still doesn't help describe what it does.
>>>
>>
>> This just initialized the types part of the ucontext. Any suggestions?
>>
>>>> +{
>>>> + ucontext->uobjects_lock = kmalloc(sizeof(*ucontext-
>>>>> uobjects_lock),
>>>> + GFP_KERNEL);
>>>> + if (!ucontext->uobjects_lock)
>>>> + return -ENOMEM;
>>>> +
>>>> + init_uobjects_list_lock(ucontext->uobjects_lock);
>>>> + INIT_LIST_HEAD(&ucontext->uobjects);
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>>>> *ucontext)
>>>> +{
>>>> + kfree(ucontext->uobjects_lock);
>>>> +}
>>>
>>> No need to wrap a call to 'free'.
>>>
>>
>> In order to abstract away the ucontext type data structure.
>>
>>>> +
>>>> diff --git a/drivers/infiniband/core/rdma_core.h
>>>> b/drivers/infiniband/core/rdma_core.h
>>>> new file mode 100644
>>>> index 0000000..8990115
>>>> --- /dev/null
>>>> +++ b/drivers/infiniband/core/rdma_core.h
>>>> @@ -0,0 +1,75 @@
>>>> +/*
>>>> + * Copyright (c) 2005 Topspin Communications. All rights reserved.
>>>> + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved.
>>>> + * Copyright (c) 2005-2016 Mellanox Technologies. All rights
>> reserved.
>>>> + * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
>>>> + * Copyright (c) 2005 PathScale, Inc. All rights reserved.
>>>> + *
>>>> + * This software is available to you under a choice of one of two
>>>> + * licenses. You may choose to be licensed under the terms of the
>> GNU
>>>> + * General Public License (GPL) Version 2, available from the file
>>>> + * COPYING in the main directory of this source tree, or the
>>>> + * OpenIB.org BSD license below:
>>>> + *
>>>> + * Redistribution and use in source and binary forms, with or
>>>> + * without modification, are permitted provided that the
>> following
>>>> + * conditions are met:
>>>> + *
>>>> + * - Redistributions of source code must retain the above
>>>> + * copyright notice, this list of conditions and the
>> following
>>>> + * disclaimer.
>>>> + *
>>>> + * - Redistributions in binary form must reproduce the above
>>>> + * copyright notice, this list of conditions and the
>> following
>>>> + * disclaimer in the documentation and/or other materials
>>>> + * provided with the distribution.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> OF
>>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> HOLDERS
>>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
>> AN
>>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
>> IN
>>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>>>> + * SOFTWARE.
>>>> + */
>>>> +
>>>> +#ifndef UOBJECT_H
>>>> +#define UOBJECT_H
>>>> +
>>>> +#include <linux/idr.h>
>>>> +#include <rdma/uverbs_ioctl.h>
>>>> +#include <rdma/ib_verbs.h>
>>>> +#include <linux/mutex.h>
>>>> +
>>>> +const struct uverbs_type *uverbs_get_type(const struct ib_device
>>>> *ibdev,
>>>> + uint16_t type);
>>>> +struct ib_uobject *uverbs_get_type_from_idr(const struct
>>>> uverbs_type_alloc_action *type,
>>>> + struct ib_ucontext
>> *ucontext,
>>>> + enum uverbs_idr_access
>> access,
>>>> + uint32_t idr);
>>>> +struct ib_uobject *uverbs_get_type_from_fd(const struct
>>>> uverbs_type_alloc_action *type,
>>>> + struct ib_ucontext
>> *ucontext,
>>>> + enum uverbs_idr_access
>> access,
>>>> + int fd);
>>>> +void uverbs_unlock_object(struct ib_uobject *uobj,
>>>> + enum uverbs_idr_access access,
>>>> + bool success);
>>>> +void uverbs_unlock_objects(struct uverbs_attr_array *attr_array,
>>>> + size_t num,
>>>> + const struct uverbs_action_spec *spec,
>>>> + bool success);
>>>> +
>>>> +void ib_uverbs_uobject_type_cleanup_ucontext(struct ib_ucontext
>>>> *ucontext,
>>>> + const struct
>> uverbs_types_group
>>>> *types_group);
>>>> +int ib_uverbs_uobject_type_initialize_ucontext(struct ib_ucontext
>>>> *ucontext);
>>>> +void ib_uverbs_uobject_type_release_ucontext(struct ib_ucontext
>>>> *ucontext);
>>>> +void ib_uverbs_close_fd(struct file *f);
>>>> +void ib_uverbs_cleanup_fd(void *private_data);
>>>> +
>>>> +static inline void *uverbs_fd_to_priv(struct ib_uobject *uobj)
>>>> +{
>>>> + return uobj + 1;
>>>> +}
>>>
>>> This seems like a rather useless function.
>>>
>>
>> Why? The user sholdn't know or care how we put our structs together.
>>
>>>> +
>>>> +#endif /* UIDR_H */
>>>> diff --git a/drivers/infiniband/core/uverbs.h
>>>> b/drivers/infiniband/core/uverbs.h
>>>> index 8074705..ae7d4b8 100644
>>>> --- a/drivers/infiniband/core/uverbs.h
>>>> +++ b/drivers/infiniband/core/uverbs.h
>>>> @@ -180,6 +180,7 @@ void idr_remove_uobj(struct ib_uobject *uobj);
>>>> struct file *ib_uverbs_alloc_event_file(struct ib_uverbs_file
>>>> *uverbs_file,
>>>> struct ib_device *ib_dev,
>>>> int is_async);
>>>> +void ib_uverbs_release_file(struct kref *ref);
>>>> void ib_uverbs_free_async_event_file(struct ib_uverbs_file
>>>> *uverbs_file);
>>>> struct ib_uverbs_event_file *ib_uverbs_lookup_comp_file(int fd);
>>>>
>>>> diff --git a/drivers/infiniband/core/uverbs_main.c
>>>> b/drivers/infiniband/core/uverbs_main.c
>>>> index f783723..e63357a 100644
>>>> --- a/drivers/infiniband/core/uverbs_main.c
>>>> +++ b/drivers/infiniband/core/uverbs_main.c
>>>> @@ -341,7 +341,7 @@ static void ib_uverbs_comp_dev(struct
>>>> ib_uverbs_device *dev)
>>>> complete(&dev->comp);
>>>> }
>>>>
>>>> -static void ib_uverbs_release_file(struct kref *ref)
>>>> +void ib_uverbs_release_file(struct kref *ref)
>>>> {
>>>> struct ib_uverbs_file *file =
>>>> container_of(ref, struct ib_uverbs_file, ref);
>>>> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
>>>> index b5d2075..7240615 100644
>>>> --- a/include/rdma/ib_verbs.h
>>>> +++ b/include/rdma/ib_verbs.h
>>>> @@ -1329,8 +1329,11 @@ struct ib_fmr_attr {
>>>>
>>>> struct ib_umem;
>>>>
>>>> +struct ib_ucontext_lock;
>>>> +
>>>> struct ib_ucontext {
>>>> struct ib_device *device;
>>>> + struct ib_uverbs_file *ufile;
>>>> struct list_head pd_list;
>>>> struct list_head mr_list;
>>>> struct list_head mw_list;
>>>> @@ -1344,6 +1347,10 @@ struct ib_ucontext {
>>>> struct list_head rwq_ind_tbl_list;
>>>> int closing;
>>>>
>>>> + /* lock for uobjects list */
>>>> + struct ib_ucontext_lock *uobjects_lock;
>>>> + struct list_head uobjects;
>>>> +
>>>> struct pid *tgid;
>>>> #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>>>> struct rb_root umem_tree;
>>>> @@ -1363,16 +1370,28 @@ struct ib_ucontext {
>>>> #endif
>>>> };
>>>>
>>>> +struct uverbs_object_list;
>>>> +
>>>> +#define OLD_ABI_COMPAT
>>>> +
>>>> struct ib_uobject {
>>>> u64 user_handle; /* handle given to us
>> by userspace
>>>> */
>>>> struct ib_ucontext *context; /* associated user
>> context
>>>> */
>>>> void *object; /* containing object
>> */
>>>> struct list_head list; /* link to context's
>> list */
>>>> - int id; /* index into kernel
>> idr */
>>>> - struct kref ref;
>>>> - struct rw_semaphore mutex; /* protects .live */
>>>> + int id; /* index into kernel
>> idr/fd */
>>>> +#ifdef OLD_ABI_COMPAT
>>>> + struct kref ref;
>>>> +#endif
>>>> + struct rw_semaphore usecnt; /* protects exclusive
>>>> access */
>>>> +#ifdef OLD_ABI_COMPAT
>>>> + struct rw_semaphore mutex; /* protects .live */
>>>> +#endif
>>>> struct rcu_head rcu; /* kfree_rcu()
>> overhead */
>>>> int live;
>>>> +
>>>> + const struct uverbs_type_alloc_action *type;
>>>> + struct ib_ucontext_lock *uobjects_lock;
>>>> };
>>>>
>>>> struct ib_udata {
>>>> @@ -2101,6 +2120,9 @@ struct ib_device {
>>>> */
>>>> int (*get_port_immutable)(struct ib_device *, u8, struct
>>>> ib_port_immutable *);
>>>> void (*get_dev_fw_str)(struct ib_device *, char *str, size_t
>>>> str_len);
>>>> + struct list_head type_list;
>>>> +
>>>> + const struct uverbs_types_group *types_group;
>>>> };
>>>>
>>>> struct ib_client {
>>>> diff --git a/include/rdma/uverbs_ioctl.h
>> b/include/rdma/uverbs_ioctl.h
>>>> new file mode 100644
>>>> index 0000000..2f50045
>>>> --- /dev/null
>>>> +++ b/include/rdma/uverbs_ioctl.h
>>>> @@ -0,0 +1,195 @@
>>>> +/*
>>>> + * Copyright (c) 2016, Mellanox Technologies inc. All rights
>>>> reserved.
>>>> + *
>>>> + * This software is available to you under a choice of one of two
>>>> + * licenses. You may choose to be licensed under the terms of the
>> GNU
>>>> + * General Public License (GPL) Version 2, available from the file
>>>> + * COPYING in the main directory of this source tree, or the
>>>> + * OpenIB.org BSD license below:
>>>> + *
>>>> + * Redistribution and use in source and binary forms, with or
>>>> + * without modification, are permitted provided that the
>> following
>>>> + * conditions are met:
>>>> + *
>>>> + * - Redistributions of source code must retain the above
>>>> + * copyright notice, this list of conditions and the
>> following
>>>> + * disclaimer.
>>>> + *
>>>> + * - Redistributions in binary form must reproduce the above
>>>> + * copyright notice, this list of conditions and the
>> following
>>>> + * disclaimer in the documentation and/or other materials
>>>> + * provided with the distribution.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>>> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
>> OF
>>>> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>>> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
>> HOLDERS
>>>> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN
>> AN
>>>> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR
>> IN
>>>> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>>>> + * SOFTWARE.
>>>> + */
>>>> +
>>>> +#ifndef _UVERBS_IOCTL_
>>>> +#define _UVERBS_IOCTL_
>>>> +
>>>> +#include <linux/kernel.h>
>>>> +
>>>> +struct uverbs_object_type;
>>>> +struct ib_ucontext;
>>>> +struct ib_uobject;
>>>> +struct ib_device;
>>>> +struct uverbs_uobject_type;
>>>> +
>>>> +/*
>>>> + * =======================================
>>>> + * Verbs action specifications
>>>> + * =======================================
>>>> + */
>>>
>>> I intentionally used urdma (though condensed to 3 letters that I
>> don't recall atm), rather than uverbs. This will need to work with
>> non-verbs devices and interfaces -- again, consider how this fits with
>> the rdma cm. Verbs has a very specific meaning, which gets lost if we
>> start referring to everything as 'verbs'. It's bad enough that we're
>> stuck with 'drivers/infiniband' and 'rdma', such that 'infiniband' also
>> means ethernet and rdma means nothing.
>>>
>>
>> IMHO - let's agree on the concept of this infrastructure. One we
>> decide its scope, we could generalize it (i.e - ioctl_provider and
>> ioctl_context) and implement it to rdma-cm as well.
>>
>>>> +
>>>> +enum uverbs_attr_type {
>>>> + UVERBS_ATTR_TYPE_PTR_IN,
>>>> + UVERBS_ATTR_TYPE_PTR_OUT,
>>>> + UVERBS_ATTR_TYPE_IDR,
>>>> + UVERBS_ATTR_TYPE_FD,
>>>> +};
>>>> +
>>>> +enum uverbs_idr_access {
>>>> + UVERBS_IDR_ACCESS_READ,
>>>> + UVERBS_IDR_ACCESS_WRITE,
>>>> + UVERBS_IDR_ACCESS_NEW,
>>>> + UVERBS_IDR_ACCESS_DESTROY
>>>> +};
>>>> +
>>>> +struct uverbs_attr_spec {
>>>> + u16 len;
>>>> + enum uverbs_attr_type type;
>>>> + struct {
>>>> + u16 obj_type;
>>>> + u8 access;
>>>
>>> Is access intended to be an enum uverbs_idr_access value?
>>>
>>
>> Yeah, worth using this enum. Thanks.
>>
>>>> + } obj;
>>>
>>> I would remove (flatten) the substructure and re-order the fields for
>> better alignment.
>>>
>>
>> I noticed there are several places which aren't aliged. It's in my todo
>> list.
>>
>>>> +};
>>>> +
>>>> +struct uverbs_attr_group_spec {
>>>> + struct uverbs_attr_spec *attrs;
>>>> + size_t num_attrs;
>>>> +};
>>>> +
>>>> +struct uverbs_action_spec {
>>>> + const struct uverbs_attr_group_spec **attr_groups;
>>>> + /* if > 0 -> validator, otherwise, error */
>>>
>>> ? not sure what this comment means
>>>
>>>> + int (*dist)(__u16 *attr_id, void *priv);
>>>
>>> What does 'dist' stand for?
>>>
>>
>> dist = distribution function.
>> It maps the attributes you got from the user-space to your groups. You
>> could think of each group as a namespace - where its attributes (or
>> types/actions) starts from zero in the sake of compactness.
>> So, for example, it gets an attribute 0x8010 and maps to to "group 1"
>> (provider) and attribute 0x10.
>>
>>>> + void *priv;
>>>> + size_t num_groups;
>>>> +};
>>>> +
>>>> +struct uverbs_attr_array;
>>>> +struct ib_uverbs_file;
>>>> +
>>>> +struct uverbs_action {
>>>> + struct uverbs_action_spec spec;
>>>> + void *priv;
>>>> + int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file
>>>> *ufile,
>>>> + struct uverbs_attr_array *ctx, size_t num, void
>>>> *priv);
>>>> +};
>>>> +
>>>> +struct uverbs_type_alloc_action;
>>>> +typedef void (*free_type)(const struct uverbs_type_alloc_action
>>>> *uobject_type,
>>>> + struct ib_uobject *uobject);
>>>> +
>>>> +struct uverbs_type_alloc_action {
>>>> + enum uverbs_attr_type type;
>>>> + int order;
>>>
>>> I think this is being used as destroy order, in which case I would
>> rename it to clarify the intent. Though I'd prefer we come up with a
>> more efficient destruction mechanism than the repeated nested looping.
>>>
>>
>> In one of the earlier revisions I used a sorted list, which was
>> efficient. I recall that Jason didn't like its complexity and
>> re-thinking about that - he's right. Most of your types are "order
>> number" 0 anyway. So you'll probably iterate very few objects in the
>> next round (in verbs, everything but MRs and PDs).
>>
>>>> + size_t obj_size;
>>>
>>> This can be alloc_fn
>>>
>>>> + free_type free_fn;
>>>> + struct {
>>>> + const struct file_operations *fops;
>>>> + const char *name;
>>>> + int flags;
>>>> + } fd;
>>>> +};
>>>> +
>>>> +struct uverbs_type_actions_group {
>>>> + size_t num_actions;
>>>> + const struct uverbs_action **actions;
>>>> +};
>>>> +
>>>> +struct uverbs_type {
>>>> + size_t num_groups;
>>>> + const struct uverbs_type_actions_group **action_groups;
>>>> + const struct uverbs_type_alloc_action *alloc;
>>>> + int (*dist)(__u16 *action_id, void *priv);
>>>> + void *priv;
>>>> +};
>>>> +
>>>> +struct uverbs_types {
>>>> + size_t num_types;
>>>> + const struct uverbs_type **types;
>>>> +};
>>>> +
>>>> +struct uverbs_types_group {
>>>> + const struct uverbs_types **type_groups;
>>>> + size_t num_groups;
>>>> + int (*dist)(__u16 *type_id, void *priv);
>>>> + void *priv;
>>>> +};
>>>> +
>>>> +/* =================================================
>>>> + * Parsing infrastructure
>>>> + * =================================================
>>>> + */
>>>> +
>>>> +struct uverbs_ptr_attr {
>>>> + void * __user ptr;
>>>> + __u16 len;
>>>> +};
>>>> +
>>>> +struct uverbs_fd_attr {
>>>> + int fd;
>>>> +};
>>>> +
>>>> +struct uverbs_uobj_attr {
>>>> + /* idr handle */
>>>> + __u32 idr;
>>>> +};
>>>> +
>>>> +struct uverbs_obj_attr {
>>>> + /* pointer to the kernel descriptor -> type, access, etc */
>>>> + const struct uverbs_attr_spec *val;
>>>> + struct ib_uverbs_attr __user *uattr;
>>>> + const struct uverbs_type_alloc_action *type;
>>>> + struct ib_uobject *uobject;
>>>> + union {
>>>> + struct uverbs_fd_attr fd;
>>>> + struct uverbs_uobj_attr uobj;
>>>> + };
>>>> +};
>>>> +
>>>> +struct uverbs_attr {
>>>> + bool valid; > >> + union {
>>>> + struct uverbs_ptr_attr cmd_attr;
>>>> + struct uverbs_obj_attr obj_attr;
>>>> + };
>>>> +};
>>>
>>> It's odd to have a union that's part of a structure without some
>> field to indicate which union field is accessible.
>>>
>>
>> You index this array but the attribute id from the user's callback
>> funciton. The user should know what's the type of the attribute, as
>> [s]he declared the specification.
>>
>>>> +
>>>> +/* output of one validator */
>>>> +struct uverbs_attr_array {
>>>> + size_t num_attrs;
>>>> + /* arrays of attrubytes, index is the id i.e SEND_CQ */
>>>> + struct uverbs_attr *attrs;
>>>> +};
>>>> +
>>>> +/* =================================================
>>>> + * Types infrastructure
>>>> + * =================================================
>>>> + */
>>>> +
>>>> +int ib_uverbs_uobject_type_add(struct list_head *head,
>>>> + void (*free)(struct uverbs_uobject_type
>> *type,
>>>> + struct ib_uobject
>> *uobject,
>>>> + struct ib_ucontext
>> *ucontext),
>>>> + uint16_t obj_type);
>>>> +void ib_uverbs_uobject_types_remove(struct ib_device *ib_dev);
>>>> +
>>>> +#endif
>>>> --
>>>> 2.7.4
>
> Matan, please re-look at the architecture that I proposed:
>
> https://patchwork.kernel.org/patch/9178991/
>
> including the terminology (and consider using common OOP terms). The *core* of the ioctl framework is to simply invoke a function dispatch table. IMO, that's where we should start. Anything beyond that is extra that we should have a strong reason for including. (Yes, I think we need more.) Starting simple and adding necessary functionality should let us get something upstream quicker and re-use more of the existing code.
Actually, I've looked at your proposal. Some of the ideas are blend
between your proposal and my original proposal.
One of our goals is to get rid of the commonalities between handlers and
push the validation and locking to a common code which could validated
once to be correct. It tends to be a lot easier (and correct) than to
re-examine every function call.
>
> If we're going to re-create netlink as part of the rdma ioctl interface, then why don't we just use netlink directly?
>
We had a netlink based proposal first, but:
(a) We want one call dispatching (as opposed to send-receive).
(b) We don't want to copy the driver's data from the user-space to the
libibverbs buffer.
(c) We don't need the complexity of nesting.
(d) Bare netlink is considered unreliable.
(e) We need semantics of objects.
(f) By using pointers, we could eliminate some copies.
Thanks for looking at this patch.
Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH rdma-rc] IB/IPoIB: Remove can't use GFP_NOIO warning
From: Yuval Shaia @ 2016-11-10 8:33 UTC (permalink / raw)
To: Leon Romanovsky
Cc: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, Kamal Heib
In-Reply-To: <1478765808-12517-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
On Thu, Nov 10, 2016 at 10:16:48AM +0200, Leon Romanovsky wrote:
> From: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>
> Remove the warning print of "can't use of GFP_NOIO" to avoid prints in
> each QP creation when devices aren't supporting IB_QP_CREATE_USE_GFP_NOIO.
>
> This print become more annoying when the IPoIB interface is configured
> to work in connected mode.
>
> Fixes: 09b93088d750 ('IB: Add a QP creation flag to use GFP_NOIO allocations')
> Signed-off-by: Kamal Heib <kamalh-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> ---
> drivers/infiniband/ulp/ipoib/ipoib_cm.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index 4ad297d..50c9772 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -1053,8 +1053,6 @@ static struct ib_qp *ipoib_cm_create_tx_qp(struct net_device *dev, struct ipoib_
>
> tx_qp = ib_create_qp(priv->pd, &attr);
> if (PTR_ERR(tx_qp) == -EINVAL) {
> - ipoib_warn(priv, "can't use GFP_NOIO for QPs on device %s, using GFP_KERNEL\n",
> - priv->ca->name);
> attr.create_flags &= ~IB_QP_CREATE_USE_GFP_NOIO;
> tx_qp = ib_create_qp(priv->pd, &attr);
> }
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH rdma-rc V1 0/9] mlx4 fixes for 4.9
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Doug,
Please find below second version of fixes for mlx4 driver.
This patchset was generated against v4.9-rc3.
Available in the "topic/mlx4-fixes-4.9" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/mlx4-fixes-4.9
Changes v0 -> v1:
Patch 1:
* No change
Patch 2:
* Add Yuval's ROB tag
Patch 3:
* Remove function name from title
Patch 4:
* No change
Patch 5:
* Use the port subnet prefix instead of default
Patch 6:
* Add Yuval's ROB tag
Patch 7:
* No change
Patch 8:
* Remove function name from title
* Reword commit message
Patch 9:
* No change
Thanks,
Daniel and Leon.
Daniel Jurgens (1):
IB/mlx4: Check gid_index return value
Eran Ben Elisha (2):
IB/mlx4: Fail to allocate NET_IF QPs when DMFS for IPoIB in
unavailable
IB/mlx4: Check if GRH is available before using it in modify QP
Jack Morgenstein (1):
IB/mlx4: Handle well-known-gid in mad_demux processing
Maor Gottlieb (2):
IB/mlx4: Set traffic class in AH
IB/mlx4: Put non zero value in max_ah device attribute
Matan Barak (1):
IB/mlx4: Fix create CQ error flow
Moni Shoua (1):
IB/mlx4: Handle IPv4 header when demultiplexing MAD
Saeed Mahameed (1):
IB/mlx4: Fix port query for 56Gb Ethernet links
drivers/infiniband/core/verbs.c | 16 +++++++------
drivers/infiniband/hw/mlx4/ah.c | 11 ++++++---
drivers/infiniband/hw/mlx4/cq.c | 5 +++-
drivers/infiniband/hw/mlx4/mad.c | 49 +++++++++++++++++++++++++++++++++------
drivers/infiniband/hw/mlx4/main.c | 30 +++++++++++++++---------
drivers/infiniband/hw/mlx4/qp.c | 4 ++--
include/rdma/ib_verbs.h | 19 +++++++++++++++
7 files changed, 103 insertions(+), 31 deletions(-)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH rdma-rc V1 1/9] IB/mlx4: Set traffic class in AH
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Maor Gottlieb, Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Set traffic class within sl_tclass_flowlabel when create iboe AH.
Without this the TOS value will be empty when running VLAN tagged
traffic, because the TOS value is taken from the traffic class in the
address handle attributes.
Fixes: 9106c4106974 ('IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE')
Signed-off-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/ah.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index 5fc6233..6be7dc3 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -111,7 +111,9 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
!(1 << ah->av.eth.stat_rate & dev->caps.stat_rate_support))
--ah->av.eth.stat_rate;
}
-
+ ah->av.eth.sl_tclass_flowlabel |=
+ cpu_to_be32((ah_attr->grh.traffic_class << 20) |
+ ah_attr->grh.flow_label);
/*
* HW requires multicast LID so we just choose one.
*/
@@ -119,7 +121,7 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
ah->av.ib.dlid = cpu_to_be16(0xc000);
memcpy(ah->av.eth.dgid, ah_attr->grh.dgid.raw, 16);
- ah->av.eth.sl_tclass_flowlabel = cpu_to_be32(ah_attr->sl << 29);
+ ah->av.eth.sl_tclass_flowlabel |= cpu_to_be32(ah_attr->sl << 29);
return &ah->ibah;
}
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 2/9] IB/mlx4: Check gid_index return value
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Check the returned GID index value and return an error if it is invalid.
Fixes: 5070cd2239bd ('IB/mlx4: Replace mechanism for RoCE GID management')
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/ah.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index 6be7dc3..8dfc76f 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -102,7 +102,10 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
if (vlan_tag < 0x1000)
vlan_tag |= (ah_attr->sl & 7) << 13;
ah->av.eth.port_pd = cpu_to_be32(to_mpd(pd)->pdn | (ah_attr->port_num << 24));
- ah->av.eth.gid_index = mlx4_ib_gid_index_to_real_index(ibdev, ah_attr->port_num, ah_attr->grh.sgid_index);
+ ret = mlx4_ib_gid_index_to_real_index(ibdev, ah_attr->port_num, ah_attr->grh.sgid_index);
+ if (ret < 0)
+ return ERR_PTR(ret);
+ ah->av.eth.gid_index = ret;
ah->av.eth.vlan = cpu_to_be16(vlan_tag);
ah->av.eth.hop_limit = ah_attr->grh.hop_limit;
if (ah_attr->static_rate) {
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 3/9] IB/mlx4: Fix create CQ error flow
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Matan Barak, Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Currently, if ib_copy_to_udata fails, the CQ
won't be deleted from the radix tree and the HW (HW2SW).
Fixes: 225c7b1feef1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')
Signed-off-by: Matan Barak <matanb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/cq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx4/cq.c b/drivers/infiniband/hw/mlx4/cq.c
index 1ea686b..6a0fec3 100644
--- a/drivers/infiniband/hw/mlx4/cq.c
+++ b/drivers/infiniband/hw/mlx4/cq.c
@@ -253,11 +253,14 @@ struct ib_cq *mlx4_ib_create_cq(struct ib_device *ibdev,
if (context)
if (ib_copy_to_udata(udata, &cq->mcq.cqn, sizeof (__u32))) {
err = -EFAULT;
- goto err_dbmap;
+ goto err_cq_free;
}
return &cq->ibcq;
+err_cq_free:
+ mlx4_cq_free(dev->dev, &cq->mcq);
+
err_dbmap:
if (context)
mlx4_ib_db_unmap_user(to_mucontext(context), &cq->db);
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 4/9] IB/mlx4: Handle IPv4 header when demultiplexing MAD
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Moni Shoua, Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
When MAD arrives to the hypervisor, we need to identify which slave it
should be sent by destination GID. When L3 protocol is IPv4 the
GRH is replaced by an IPv4 header. This patch detects when IPv4 header
needs to be parsed instead of GRH.
Fixes: b6ffaeffaea4 ('mlx4: In RoCE allow guests to have multiple GIDS')
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/verbs.c | 16 +++++++++-------
drivers/infiniband/hw/mlx4/mad.c | 33 ++++++++++++++++++++++++++++++---
include/rdma/ib_verbs.h | 19 +++++++++++++++++++
3 files changed, 58 insertions(+), 10 deletions(-)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8368764..b3a915f 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -328,7 +328,7 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
}
EXPORT_SYMBOL(ib_create_ah);
-static int ib_get_header_version(const union rdma_network_hdr *hdr)
+int ib_get_rdma_header_version(const union rdma_network_hdr *hdr)
{
const struct iphdr *ip4h = (struct iphdr *)&hdr->roce4grh;
struct iphdr ip4h_checked;
@@ -359,6 +359,7 @@ static int ib_get_header_version(const union rdma_network_hdr *hdr)
return 4;
return 6;
}
+EXPORT_SYMBOL(ib_get_rdma_header_version);
static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
u8 port_num,
@@ -369,7 +370,7 @@ static enum rdma_network_type ib_get_net_type_by_grh(struct ib_device *device,
if (rdma_protocol_ib(device, port_num))
return RDMA_NETWORK_IB;
- grh_version = ib_get_header_version((union rdma_network_hdr *)grh);
+ grh_version = ib_get_rdma_header_version((union rdma_network_hdr *)grh);
if (grh_version == 4)
return RDMA_NETWORK_IPV4;
@@ -415,9 +416,9 @@ static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num,
&context, gid_index);
}
-static int get_gids_from_rdma_hdr(union rdma_network_hdr *hdr,
- enum rdma_network_type net_type,
- union ib_gid *sgid, union ib_gid *dgid)
+int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr,
+ enum rdma_network_type net_type,
+ union ib_gid *sgid, union ib_gid *dgid)
{
struct sockaddr_in src_in;
struct sockaddr_in dst_in;
@@ -447,6 +448,7 @@ static int get_gids_from_rdma_hdr(union rdma_network_hdr *hdr,
return -EINVAL;
}
}
+EXPORT_SYMBOL(ib_get_gids_from_rdma_hdr);
int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
const struct ib_wc *wc, const struct ib_grh *grh,
@@ -469,8 +471,8 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num,
net_type = ib_get_net_type_by_grh(device, port_num, grh);
gid_type = ib_network_to_gid_type(net_type);
}
- ret = get_gids_from_rdma_hdr((union rdma_network_hdr *)grh, net_type,
- &sgid, &dgid);
+ ret = ib_get_gids_from_rdma_hdr((union rdma_network_hdr *)grh, net_type,
+ &sgid, &dgid);
if (ret)
return ret;
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index 1672907..b8e9013 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -39,6 +39,8 @@
#include <linux/mlx4/cmd.h>
#include <linux/gfp.h>
#include <rdma/ib_pma.h>
+#include <linux/ip.h>
+#include <net/ipv6.h>
#include <linux/mlx4/driver.h>
#include "mlx4_ib.h"
@@ -480,6 +482,23 @@ static int find_slave_port_pkey_ix(struct mlx4_ib_dev *dev, int slave,
return -EINVAL;
}
+static int get_gids_from_l3_hdr(struct ib_grh *grh, union ib_gid *sgid,
+ union ib_gid *dgid)
+{
+ int version = ib_get_rdma_header_version((const union rdma_network_hdr *)grh);
+ enum rdma_network_type net_type;
+
+ if (version == 4)
+ net_type = RDMA_NETWORK_IPV4;
+ else if (version == 6)
+ net_type = RDMA_NETWORK_IPV6;
+ else
+ return -EINVAL;
+
+ return ib_get_gids_from_rdma_hdr((union rdma_network_hdr *)grh, net_type,
+ sgid, dgid);
+}
+
int mlx4_ib_send_to_slave(struct mlx4_ib_dev *dev, int slave, u8 port,
enum ib_qp_type dest_qpt, struct ib_wc *wc,
struct ib_grh *grh, struct ib_mad *mad)
@@ -538,7 +557,10 @@ int mlx4_ib_send_to_slave(struct mlx4_ib_dev *dev, int slave, u8 port,
memset(&attr, 0, sizeof attr);
attr.port_num = port;
if (is_eth) {
- memcpy(&attr.grh.dgid.raw[0], &grh->dgid.raw[0], 16);
+ union ib_gid sgid;
+
+ if (get_gids_from_l3_hdr(grh, &sgid, &attr.grh.dgid))
+ return -EINVAL;
attr.ah_flags = IB_AH_GRH;
}
ah = ib_create_ah(tun_ctx->pd, &attr);
@@ -651,6 +673,11 @@ static int mlx4_ib_demux_mad(struct ib_device *ibdev, u8 port,
is_eth = 1;
if (is_eth) {
+ union ib_gid dgid;
+ union ib_gid sgid;
+
+ if (get_gids_from_l3_hdr(grh, &sgid, &dgid))
+ return -EINVAL;
if (!(wc->wc_flags & IB_WC_GRH)) {
mlx4_ib_warn(ibdev, "RoCE grh not present.\n");
return -EINVAL;
@@ -659,10 +686,10 @@ static int mlx4_ib_demux_mad(struct ib_device *ibdev, u8 port,
mlx4_ib_warn(ibdev, "RoCE mgmt class is not CM\n");
return -EINVAL;
}
- err = mlx4_get_slave_from_roce_gid(dev->dev, port, grh->dgid.raw, &slave);
+ err = mlx4_get_slave_from_roce_gid(dev->dev, port, dgid.raw, &slave);
if (err && mlx4_is_mf_bonded(dev->dev)) {
other_port = (port == 1) ? 2 : 1;
- err = mlx4_get_slave_from_roce_gid(dev->dev, other_port, grh->dgid.raw, &slave);
+ err = mlx4_get_slave_from_roce_gid(dev->dev, other_port, dgid.raw, &slave);
if (!err) {
port = other_port;
pr_debug("resolved slave %d from gid %pI6 wire port %d other %d\n",
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..467a4b4 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2581,6 +2581,24 @@ void ib_dealloc_pd(struct ib_pd *pd);
struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
/**
+ * ib_get_gids_from_rdma_hdr - Get sgid and dgid from GRH or IPv4 header
+ * work completion.
+ * @hdr: the L3 header to parse
+ * @net_type: type of header to parse
+ * @sgid: place to store source gid
+ * @dgid: place to store destination gid
+ */
+int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr,
+ enum rdma_network_type net_type,
+ union ib_gid *sgid, union ib_gid *dgid);
+
+/**
+ * ib_get_rdma_header_version - Get the header version
+ * @hdr: the L3 header to parse
+ */
+int ib_get_rdma_header_version(const union rdma_network_hdr *hdr);
+
+/**
* ib_init_ah_from_wc - Initializes address handle attributes from a
* work completion.
* @device: Device on which the received message arrived.
@@ -3357,4 +3375,5 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
void ib_drain_rq(struct ib_qp *qp);
void ib_drain_sq(struct ib_qp *qp);
void ib_drain_qp(struct ib_qp *qp);
+
#endif /* IB_VERBS_H */
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 5/9] IB/mlx4: Handle well-known-gid in mad_demux processing
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Jack Morgenstein,
Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Jack Morgenstein <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
If OpenSM runs over a ConnectX-3, and there are ConnectX-4 or Connect-IB
VFs active on the network, the OpenSM will receive QP1 packets containing
a GRH where the destination GID is the "Well-Known GID" -- which is not a
GID in the HCA Port's GID Table.
This GID must be tested-for separately -- and packets which contain
this destination GID should be routed to slave 0 (the PF).
Fixes: 37bfc7c1e83f ('IB/mlx4: SR-IOV multiplex and demultiplex MADs')
Signed-off-by: Jack Morgenstein <jackm-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/mad.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/mad.c b/drivers/infiniband/hw/mlx4/mad.c
index b8e9013..7bcae41 100644
--- a/drivers/infiniband/hw/mlx4/mad.c
+++ b/drivers/infiniband/hw/mlx4/mad.c
@@ -729,10 +729,18 @@ static int mlx4_ib_demux_mad(struct ib_device *ibdev, u8 port,
/* If a grh is present, we demux according to it */
if (wc->wc_flags & IB_WC_GRH) {
- slave = mlx4_ib_find_real_gid(ibdev, port, grh->dgid.global.interface_id);
- if (slave < 0) {
- mlx4_ib_warn(ibdev, "failed matching grh\n");
- return -ENOENT;
+ if (grh->dgid.global.interface_id ==
+ cpu_to_be64(IB_SA_WELL_KNOWN_GUID) &&
+ grh->dgid.global.subnet_prefix == cpu_to_be64(
+ atomic64_read(&dev->sriov.demux[port - 1].subnet_prefix))) {
+ slave = 0;
+ } else {
+ slave = mlx4_ib_find_real_gid(ibdev, port,
+ grh->dgid.global.interface_id);
+ if (slave < 0) {
+ mlx4_ib_warn(ibdev, "failed matching grh\n");
+ return -ENOENT;
+ }
}
}
/* Class-specific handling */
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 6/9] IB/mlx4: Put non zero value in max_ah device attribute
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Maor Gottlieb, Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Use INT_MAX since this is the max value the attribute can hold, though
hardware capability is unlimited.
Fixes: 225c7b1feef1 ('IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters')
Signed-off-by: Maor Gottlieb <maorg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/main.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index b597e82..05ab3cb 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -547,6 +547,7 @@ static int mlx4_ib_query_device(struct ib_device *ibdev,
props->max_map_per_fmr = dev->dev->caps.max_fmr_maps;
props->hca_core_clock = dev->dev->caps.hca_core_clock * 1000UL;
props->timestamp_mask = 0xFFFFFFFFFFFFULL;
+ props->max_ah = INT_MAX;
if (!mlx4_is_slave(dev->dev))
err = mlx4_get_internal_clock_params(dev->dev, &clock_params);
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 7/9] IB/mlx4: Fix port query for 56Gb Ethernet links
From: Leon Romanovsky @ 2016-11-10 9:30 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Saeed Mahameed, Yishai Hadas,
Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Report the correct speed in the port attributes when using a 56Gbps
ethernet link. Without this change the field is incorrectly set to 10.
Fixes: a9c766bb75ee ('IB/mlx4: Fix info returned when querying IBoE ports')
Fixes: 2e96691c31ec ('IB: Use central enum for speed instead of hard-coded values')
Signed-off-by: Saeed Mahameed <saeedm-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/main.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 05ab3cb..4054a1b 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -698,9 +698,11 @@ static int eth_link_query_port(struct ib_device *ibdev, u8 port,
if (err)
goto out;
- props->active_width = (((u8 *)mailbox->buf)[5] == 0x40) ?
- IB_WIDTH_4X : IB_WIDTH_1X;
- props->active_speed = IB_SPEED_QDR;
+ props->active_width = (((u8 *)mailbox->buf)[5] == 0x40) ||
+ (((u8 *)mailbox->buf)[5] == 0x20 /*56Gb*/) ?
+ IB_WIDTH_4X : IB_WIDTH_1X;
+ props->active_speed = (((u8 *)mailbox->buf)[5] == 0x20 /*56Gb*/) ?
+ IB_SPEED_FDR : IB_SPEED_QDR;
props->port_cap_flags = IB_PORT_CM_SUP | IB_PORT_IP_BASED_GIDS;
props->gid_tbl_len = mdev->dev->caps.gid_table_len[port];
props->max_msg_sz = mdev->dev->caps.max_msg_sz;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 8/9] IB/mlx4: Fail to allocate NET_IF QPs when DMFS for IPoIB in unavailable
From: Leon Romanovsky @ 2016-11-10 9:31 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha,
Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
supported only if dmfs_ipoib bit is set.
If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
function won't find any free QP and will fail.
Fixes: c1c98501121e ('IB/mlx4: Add support for steerable IB UD QPs')
Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/main.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 4054a1b..4a67ffc 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -2823,14 +2823,19 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
goto err_steer_qp_release;
}
- bitmap_zero(ibdev->ib_uc_qpns_bitmap, ibdev->steer_qpn_count);
-
- err = mlx4_FLOW_STEERING_IB_UC_QP_RANGE(
- dev, ibdev->steer_qpn_base,
- ibdev->steer_qpn_base +
- ibdev->steer_qpn_count - 1);
- if (err)
- goto err_steer_free_bitmap;
+ if (dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_DMFS_IPOIB) {
+ bitmap_zero(ibdev->ib_uc_qpns_bitmap,
+ ibdev->steer_qpn_count);
+ err = mlx4_FLOW_STEERING_IB_UC_QP_RANGE(
+ dev, ibdev->steer_qpn_base,
+ ibdev->steer_qpn_base +
+ ibdev->steer_qpn_count - 1);
+ if (err)
+ goto err_steer_free_bitmap;
+ } else {
+ bitmap_fill(ibdev->ib_uc_qpns_bitmap,
+ ibdev->steer_qpn_count);
+ }
}
for (j = 1; j <= ibdev->dev->caps.num_ports; j++)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-rc V1 9/9] IB/mlx4: Check if GRH is available before using it in modify QP
From: Leon Romanovsky @ 2016-11-10 9:31 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Eran Ben Elisha,
Daniel Jurgens
In-Reply-To: <1478770261-5775-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Before reading GRH attributes, need to make sure AH contains GRH,
and in addition, initialize GID type.
Fixes: dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Signed-off-by: Eran Ben Elisha <eranbe-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Mark Bloch <markb-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx4/qp.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c
index 570bc86..0914731 100644
--- a/drivers/infiniband/hw/mlx4/qp.c
+++ b/drivers/infiniband/hw/mlx4/qp.c
@@ -1764,14 +1764,14 @@ static int __mlx4_ib_modify_qp(struct ib_qp *ibqp,
u8 port_num = mlx4_is_bonded(to_mdev(ibqp->device)->dev) ? 1 :
attr_mask & IB_QP_PORT ? attr->port_num : qp->port;
union ib_gid gid;
- struct ib_gid_attr gid_attr;
+ struct ib_gid_attr gid_attr = {.gid_type = IB_GID_TYPE_IB};
u16 vlan = 0xffff;
u8 smac[ETH_ALEN];
int status = 0;
int is_eth = rdma_cap_eth_ah(&dev->ib_dev, port_num) &&
attr->ah_attr.ah_flags & IB_AH_GRH;
- if (is_eth) {
+ if (is_eth && attr->ah_attr.ah_flags & IB_AH_GRH) {
int index = attr->ah_attr.grh.sgid_index;
status = ib_get_cached_gid(ibqp->device, port_num,
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3 rdma-core 0/7] libhns: userspace library for hns
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
This patch series introduces userspace library for hns RoCE driver.
changes v2 -> v3:
1. Fix the code style, for example, if (addr == NULL)
2. Fix the bug for hns_roce_u_reg_mr
changes v1 -> v2:
1. Delete the min() definition and instead of ccan header
2. Delete the CHECK_C_SOURCE_COMPILES
3. sort the c file in rdma_provider()
4. Delete the unused code in hns_roce_u_db.h
Lijun Ou (7):
libhns: Add initial main frame
libhns: Add verbs of querying device and querying port
libhns: Add verbs of pd and mr support
libhns: Add verbs of cq support
libhns: Add verbs of qp support
libhns: Add verbs of post_send and post_recv support
libhns: Add consolidated repo for userspace library of hns
CMakeLists.txt | 1 +
MAINTAINERS | 6 +
README.md | 1 +
providers/hns/CMakeLists.txt | 6 +
providers/hns/hns_roce_u.c | 228 +++++++++++
providers/hns/hns_roce_u.h | 255 ++++++++++++
providers/hns/hns_roce_u_abi.h | 69 ++++
providers/hns/hns_roce_u_buf.c | 61 +++
providers/hns/hns_roce_u_db.h | 54 +++
providers/hns/hns_roce_u_hw_v1.c | 839 +++++++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u_hw_v1.h | 242 +++++++++++
providers/hns/hns_roce_u_verbs.c | 525 ++++++++++++++++++++++++
12 files changed, 2287 insertions(+)
create mode 100644 providers/hns/CMakeLists.txt
create mode 100644 providers/hns/hns_roce_u.c
create mode 100644 providers/hns/hns_roce_u.h
create mode 100644 providers/hns/hns_roce_u_abi.h
create mode 100644 providers/hns/hns_roce_u_buf.c
create mode 100644 providers/hns/hns_roce_u_db.h
create mode 100644 providers/hns/hns_roce_u_hw_v1.c
create mode 100644 providers/hns/hns_roce_u_hw_v1.h
create mode 100644 providers/hns/hns_roce_u_verbs.c
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH v3 rdma-core 1/7] libhns: Add initial main frame
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1478781977-116803-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces initial main frame for
userspace library of hns.
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v3/v2:
- No change over the v1
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 163 +++++++++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u.h | 86 ++++++++++++++++++++++
providers/hns/hns_roce_u_abi.h | 43 +++++++++++
3 files changed, 292 insertions(+)
create mode 100644 providers/hns/hns_roce_u.c
create mode 100644 providers/hns/hns_roce_u.h
create mode 100644 providers/hns/hns_roce_u_abi.h
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
new file mode 100644
index 0000000..bda4dd8
--- /dev/null
+++ b/providers/hns/hns_roce_u.c
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+
+#define HID_LEN 15
+#define DEV_MATCH_LEN 128
+
+static const struct {
+ char hid[HID_LEN];
+} acpi_table[] = {
+ {"acpi:HISI00D1:"},
+ {},
+};
+
+static const struct {
+ char compatible[DEV_MATCH_LEN];
+} dt_table[] = {
+ {"hisilicon,hns-roce-v1"},
+ {},
+};
+
+static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
+ int cmd_fd)
+{
+ int i;
+ struct ibv_get_context cmd;
+ struct ibv_device_attr dev_attrs;
+ struct hns_roce_context *context;
+ struct hns_roce_alloc_ucontext_resp resp;
+ struct hns_roce_device *hr_dev = to_hr_dev(ibdev);
+
+ context = calloc(1, sizeof(*context));
+ if (!context)
+ return NULL;
+
+ context->ibv_ctx.cmd_fd = cmd_fd;
+ if (ibv_cmd_get_context(&context->ibv_ctx, &cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp)))
+ goto err_free;
+
+ context->num_qps = resp.qp_tab_size;
+ context->qp_table_shift = ffs(context->num_qps) - 1 -
+ HNS_ROCE_QP_TABLE_BITS;
+ context->qp_table_mask = (1 << context->qp_table_shift) - 1;
+
+ pthread_mutex_init(&context->qp_table_mutex, NULL);
+ for (i = 0; i < HNS_ROCE_QP_TABLE_SIZE; ++i)
+ context->qp_table[i].refcnt = 0;
+
+ context->uar = mmap(NULL, to_hr_dev(ibdev)->page_size,
+ PROT_READ | PROT_WRITE, MAP_SHARED, cmd_fd, 0);
+ if (context->uar == MAP_FAILED) {
+ fprintf(stderr, PFX "Warning: failed to mmap() uar page.\n");
+ goto err_free;
+ }
+
+ pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
+
+ context->max_qp_wr = dev_attrs.max_qp_wr;
+ context->max_sge = dev_attrs.max_sge;
+ context->max_cqe = dev_attrs.max_cqe;
+
+ return &context->ibv_ctx;
+
+err_free:
+ free(context);
+ return NULL;
+}
+
+static void hns_roce_free_context(struct ibv_context *ibctx)
+{
+ struct hns_roce_context *context = to_hr_ctx(ibctx);
+
+ munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+
+ context->uar = NULL;
+
+ free(context);
+ context = NULL;
+}
+
+static struct ibv_device_ops hns_roce_dev_ops = {
+ .alloc_context = hns_roce_alloc_context,
+ .free_context = hns_roce_free_context
+};
+
+static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
+ int abi_version)
+{
+ struct hns_roce_device *dev;
+ char value[128];
+ int i;
+
+ if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
+ value, sizeof(value)) > 0)
+ for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
+ if (!strcmp(value, acpi_table[i].hid))
+ goto found;
+
+ if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
+ value, sizeof(value)) > 0)
+ for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
+ if (!strcmp(value, dt_table[i].compatible))
+ goto found;
+
+ return NULL;
+
+found:
+ dev = malloc(sizeof(struct hns_roce_device));
+ if (!dev) {
+ fprintf(stderr, PFX "Fatal: couldn't allocate device for %s\n",
+ uverbs_sys_path);
+ return NULL;
+ }
+
+ dev->ibv_dev.ops = hns_roce_dev_ops;
+ dev->page_size = sysconf(_SC_PAGESIZE);
+ return &dev->ibv_dev;
+}
+
+static __attribute__((constructor)) void hns_roce_register_driver(void)
+{
+ ibv_register_driver("hns", hns_roce_driver_init);
+}
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
new file mode 100644
index 0000000..3eef171
--- /dev/null
+++ b/providers/hns/hns_roce_u.h
@@ -0,0 +1,86 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_H
+#define _HNS_ROCE_U_H
+
+#include <stddef.h>
+
+#include <infiniband/driver.h>
+#include <infiniband/arch.h>
+#include <infiniband/verbs.h>
+#include <ccan/container_of.h>
+
+#define HNS_ROCE_HW_VER1 ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
+
+#define PFX "hns: "
+
+enum {
+ HNS_ROCE_QP_TABLE_BITS = 8,
+ HNS_ROCE_QP_TABLE_SIZE = 1 << HNS_ROCE_QP_TABLE_BITS,
+};
+
+struct hns_roce_device {
+ struct ibv_device ibv_dev;
+ int page_size;
+};
+
+struct hns_roce_context {
+ struct ibv_context ibv_ctx;
+ void *uar;
+ pthread_spinlock_t uar_lock;
+
+ struct {
+ int refcnt;
+ } qp_table[HNS_ROCE_QP_TABLE_SIZE];
+
+ pthread_mutex_t qp_table_mutex;
+
+ int num_qps;
+ int qp_table_shift;
+ int qp_table_mask;
+ unsigned int max_qp_wr;
+ unsigned int max_sge;
+ int max_cqe;
+};
+
+static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
+{
+ return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
+}
+
+static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
+{
+ return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
+}
+
+#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
new file mode 100644
index 0000000..4bfc8fa
--- /dev/null
+++ b/providers/hns/hns_roce_u_abi.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_ABI_H
+#define _HNS_ROCE_U_ABI_H
+
+#include <infiniband/kern-abi.h>
+
+struct hns_roce_alloc_ucontext_resp {
+ struct ibv_get_context_resp ibv_resp;
+ __u32 qp_tab_size;
+};
+
+#endif /* _HNS_ROCE_U_ABI_H */
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3 rdma-core 2/7] libhns: Add verbs of querying device and querying port
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1478781977-116803-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces query verbs for querying device
and querying port.
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v3/v2:
- No change over the v1
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 7 ++++
providers/hns/hns_roce_u.h | 4 +++
providers/hns/hns_roce_u_verbs.c | 73 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 84 insertions(+)
create mode 100644 providers/hns/hns_roce_u_verbs.c
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index bda4dd8..c0f6fe9 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -95,12 +95,19 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
+ context->ibv_ctx.ops.query_device = hns_roce_u_query_device;
+ context->ibv_ctx.ops.query_port = hns_roce_u_query_port;
+
+ if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
+ goto tptr_free;
+
context->max_qp_wr = dev_attrs.max_qp_wr;
context->max_sge = dev_attrs.max_sge;
context->max_cqe = dev_attrs.max_cqe;
return &context->ibv_ctx;
+tptr_free:
err_free:
free(context);
return NULL;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 3eef171..aa58ee6 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -83,4 +83,8 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
}
+int hns_roce_u_query_device(struct ibv_context *context,
+ struct ibv_device_attr *attr);
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+ struct ibv_port_attr *attr);
#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
new file mode 100644
index 0000000..be55fe8
--- /dev/null
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -0,0 +1,73 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+#include <pthread.h>
+#include <sys/mman.h>
+#include <fcntl.h>
+#include <unistd.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_u_query_device(struct ibv_context *context,
+ struct ibv_device_attr *attr)
+{
+ int ret;
+ struct ibv_query_device cmd;
+ unsigned long raw_fw_ver;
+ unsigned int major, minor, sub_minor;
+
+ ret = ibv_cmd_query_device(context, attr, &raw_fw_ver, &cmd,
+ sizeof(cmd));
+ if (ret)
+ return ret;
+
+ major = (raw_fw_ver >> 32) & 0xffff;
+ minor = (raw_fw_ver >> 16) & 0xffff;
+ sub_minor = raw_fw_ver & 0xffff;
+
+ snprintf(attr->fw_ver, sizeof(attr->fw_ver), "%d.%d.%03d", major, minor,
+ sub_minor);
+
+ return 0;
+}
+
+int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
+ struct ibv_port_attr *attr)
+{
+ struct ibv_query_port cmd;
+
+ return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
+}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3 rdma-core 3/7] libhns: Add verbs of pd and mr support
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1478781977-116803-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the verbs with pd and mr,
included alloc_pd, dealloc_pd, reg_mr and dereg_mr.
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v3:
This fixes the comments given by Leon Romanovsky on PATCHv2
v2:
- No change over v1
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 4 ++
providers/hns/hns_roce_u.h | 18 +++++++++
providers/hns/hns_roce_u_abi.h | 6 +++
providers/hns/hns_roce_u_verbs.c | 79 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 107 insertions(+)
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index c0f6fe9..53e2720 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -97,6 +97,10 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.query_device = hns_roce_u_query_device;
context->ibv_ctx.ops.query_port = hns_roce_u_query_port;
+ context->ibv_ctx.ops.alloc_pd = hns_roce_u_alloc_pd;
+ context->ibv_ctx.ops.dealloc_pd = hns_roce_u_free_pd;
+ context->ibv_ctx.ops.reg_mr = hns_roce_u_reg_mr;
+ context->ibv_ctx.ops.dereg_mr = hns_roce_u_dereg_mr;
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index aa58ee6..5b73794 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -73,6 +73,11 @@ struct hns_roce_context {
int max_cqe;
};
+struct hns_roce_pd {
+ struct ibv_pd ibv_pd;
+ unsigned int pdn;
+};
+
static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
{
return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -83,8 +88,21 @@ static inline struct hns_roce_context *to_hr_ctx(struct ibv_context *ibv_ctx)
return container_of(ibv_ctx, struct hns_roce_context, ibv_ctx);
}
+static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
+{
+ return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr);
int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
struct ibv_port_attr *attr);
+
+struct ibv_pd *hns_roce_u_alloc_pd(struct ibv_context *context);
+int hns_roce_u_free_pd(struct ibv_pd *pd);
+
+struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+ int access);
+int hns_roce_u_dereg_mr(struct ibv_mr *mr);
+
#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 4bfc8fa..0a0cd0c 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -40,4 +40,10 @@ struct hns_roce_alloc_ucontext_resp {
__u32 qp_tab_size;
};
+struct hns_roce_alloc_pd_resp {
+ struct ibv_alloc_pd_resp ibv_resp;
+ __u32 pdn;
+ __u32 reserved;
+};
+
#endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index be55fe8..476b6ba 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -71,3 +71,82 @@ int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
return ibv_cmd_query_port(context, port, attr, &cmd, sizeof(cmd));
}
+
+struct ibv_pd *hns_roce_u_alloc_pd(struct ibv_context *context)
+{
+ struct ibv_alloc_pd cmd;
+ struct hns_roce_pd *pd;
+ struct hns_roce_alloc_pd_resp resp;
+
+ pd = (struct hns_roce_pd *)malloc(sizeof(*pd));
+ if (!pd)
+ return NULL;
+
+ if (ibv_cmd_alloc_pd(context, &pd->ibv_pd, &cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp))) {
+ free(pd);
+ return NULL;
+ }
+
+ pd->pdn = resp.pdn;
+
+ return &pd->ibv_pd;
+}
+
+int hns_roce_u_free_pd(struct ibv_pd *pd)
+{
+ int ret;
+
+ ret = ibv_cmd_dealloc_pd(pd);
+ if (ret)
+ return ret;
+
+ free(to_hr_pd(pd));
+
+ return ret;
+}
+
+struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
+ int access)
+{
+ int ret;
+ struct ibv_mr *mr;
+ struct ibv_reg_mr cmd;
+ struct ibv_reg_mr_resp resp;
+
+ if (!addr) {
+ fprintf(stderr, "2nd parm addr is NULL!\n");
+ return NULL;
+ }
+
+ if (!length) {
+ fprintf(stderr, "3st parm length is 0!\n");
+ return NULL;
+ }
+
+ mr = malloc(sizeof(*mr));
+ if (!mr)
+ return NULL;
+
+ ret = ibv_cmd_reg_mr(pd, addr, length, (uintptr_t) addr, access, mr,
+ &cmd, sizeof(cmd), &resp, sizeof(resp));
+ if (ret) {
+ free(mr);
+ return NULL;
+ }
+
+ return mr;
+}
+
+int hns_roce_u_dereg_mr(struct ibv_mr *mr)
+{
+ int ret;
+
+ ret = ibv_cmd_dereg_mr(mr);
+ if (ret)
+ return ret;
+
+ free(mr);
+
+ return ret;
+}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3 rdma-core 4/7] libhns: Add verbs of cq support
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1478781977-116803-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the relatived cq verbs for userspace
of hns, include:
1. create_cq
2. poll_cq
3. req_notify_cq
4. cq_event
5. destroy_cq
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v3:
- No change over the v2
v2:
- Delete the unused code
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 57 +++++-
providers/hns/hns_roce_u.h | 94 ++++++++++
providers/hns/hns_roce_u_abi.h | 12 ++
providers/hns/hns_roce_u_buf.c | 61 +++++++
providers/hns/hns_roce_u_db.h | 54 ++++++
providers/hns/hns_roce_u_hw_v1.c | 370 +++++++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u_hw_v1.h | 163 +++++++++++++++++
providers/hns/hns_roce_u_verbs.c | 116 ++++++++++++
8 files changed, 922 insertions(+), 5 deletions(-)
create mode 100644 providers/hns/hns_roce_u_buf.c
create mode 100644 providers/hns/hns_roce_u_db.h
create mode 100644 providers/hns/hns_roce_u_hw_v1.c
create mode 100644 providers/hns/hns_roce_u_hw_v1.h
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 53e2720..e435bea 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -46,15 +46,19 @@
static const struct {
char hid[HID_LEN];
+ void *data;
+ int version;
} acpi_table[] = {
- {"acpi:HISI00D1:"},
- {},
+ {"acpi:HISI00D1:", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
+ {},
};
static const struct {
char compatible[DEV_MATCH_LEN];
+ void *data;
+ int version;
} dt_table[] = {
- {"hisilicon,hns-roce-v1"},
+ {"hisilicon,hns-roce-v1", &hns_roce_u_hw_v1, HNS_ROCE_HW_VER1},
{},
};
@@ -93,6 +97,21 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
goto err_free;
}
+ if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+ /*
+ * when vma->vm_pgoff is 1, the cq_tptr_base includes 64K CQ,
+ * a pointer of CQ need 2B size
+ */
+ context->cq_tptr_base = mmap(NULL, HNS_ROCE_CQ_DB_BUF_SIZE,
+ PROT_READ | PROT_WRITE, MAP_SHARED,
+ cmd_fd, HNS_ROCE_TPTR_OFFSET);
+ if (context->cq_tptr_base == MAP_FAILED) {
+ fprintf(stderr,
+ PFX "Warning: Failed to mmap cq_tptr page.\n");
+ goto db_free;
+ }
+ }
+
pthread_spin_init(&context->uar_lock, PTHREAD_PROCESS_PRIVATE);
context->ibv_ctx.ops.query_device = hns_roce_u_query_device;
@@ -102,6 +121,12 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.reg_mr = hns_roce_u_reg_mr;
context->ibv_ctx.ops.dereg_mr = hns_roce_u_dereg_mr;
+ context->ibv_ctx.ops.create_cq = hns_roce_u_create_cq;
+ context->ibv_ctx.ops.poll_cq = hr_dev->u_hw->poll_cq;
+ context->ibv_ctx.ops.req_notify_cq = hr_dev->u_hw->arm_cq;
+ context->ibv_ctx.ops.cq_event = hns_roce_u_cq_event;
+ context->ibv_ctx.ops.destroy_cq = hns_roce_u_destroy_cq;
+
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
@@ -112,6 +137,16 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
return &context->ibv_ctx;
tptr_free:
+ if (hr_dev->hw_version == HNS_ROCE_HW_VER1) {
+ if (munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE))
+ fprintf(stderr, PFX "Warning: Munmap tptr failed.\n");
+ context->cq_tptr_base = NULL;
+ }
+
+db_free:
+ munmap(context->uar, to_hr_dev(ibdev)->page_size);
+ context->uar = NULL;
+
err_free:
free(context);
return NULL;
@@ -122,6 +157,8 @@ static void hns_roce_free_context(struct ibv_context *ibctx)
struct hns_roce_context *context = to_hr_ctx(ibctx);
munmap(context->uar, to_hr_dev(ibctx->device)->page_size);
+ if (to_hr_dev(ibctx->device)->hw_version == HNS_ROCE_HW_VER1)
+ munmap(context->cq_tptr_base, HNS_ROCE_CQ_DB_BUF_SIZE);
context->uar = NULL;
@@ -140,18 +177,26 @@ static struct ibv_device *hns_roce_driver_init(const char *uverbs_sys_path,
struct hns_roce_device *dev;
char value[128];
int i;
+ void *u_hw;
+ int hw_version;
if (ibv_read_sysfs_file(uverbs_sys_path, "device/modalias",
value, sizeof(value)) > 0)
for (i = 0; i < sizeof(acpi_table) / sizeof(acpi_table[0]); ++i)
- if (!strcmp(value, acpi_table[i].hid))
+ if (!strcmp(value, acpi_table[i].hid)) {
+ u_hw = acpi_table[i].data;
+ hw_version = acpi_table[i].version;
goto found;
+ }
if (ibv_read_sysfs_file(uverbs_sys_path, "device/of_node/compatible",
value, sizeof(value)) > 0)
for (i = 0; i < sizeof(dt_table) / sizeof(dt_table[0]); ++i)
- if (!strcmp(value, dt_table[i].compatible))
+ if (!strcmp(value, dt_table[i].compatible)) {
+ u_hw = dt_table[i].data;
+ hw_version = dt_table[i].version;
goto found;
+ }
return NULL;
@@ -164,6 +209,8 @@ found:
}
dev->ibv_dev.ops = hns_roce_dev_ops;
+ dev->u_hw = (struct hns_roce_u_hw *)u_hw;
+ dev->hw_version = hw_version;
dev->page_size = sysconf(_SC_PAGESIZE);
return &dev->ibv_dev;
}
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 5b73794..c3e364d 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -40,18 +40,53 @@
#include <infiniband/verbs.h>
#include <ccan/container_of.h>
+#define HNS_ROCE_CQE_ENTRY_SIZE 0x20
+
+#define HNS_ROCE_MAX_CQ_NUM 0x10000
+#define HNS_ROCE_MIN_CQE_NUM 0x40
+#define HNS_ROCE_CQ_DB_BUF_SIZE ((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
+#define HNS_ROCE_TPTR_OFFSET 0x1000
#define HNS_ROCE_HW_VER1 ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
#define PFX "hns: "
+#define roce_get_field(origin, mask, shift) \
+ (((origin) & (mask)) >> (shift))
+
+#define roce_get_bit(origin, shift) \
+ roce_get_field((origin), (1ul << (shift)), (shift))
+
+#define roce_set_field(origin, mask, shift, val) \
+ do { \
+ (origin) &= (~(mask)); \
+ (origin) |= (((unsigned int)(val) << (shift)) & (mask)); \
+ } while (0)
+
+#define roce_set_bit(origin, shift, val) \
+ roce_set_field((origin), (1ul << (shift)), (shift), (val))
+
enum {
HNS_ROCE_QP_TABLE_BITS = 8,
HNS_ROCE_QP_TABLE_SIZE = 1 << HNS_ROCE_QP_TABLE_BITS,
};
+/* operation type list */
+enum {
+ /* rq&srq operation */
+ HNS_ROCE_OPCODE_SEND_DATA_RECEIVE = 0x06,
+ HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE = 0x07,
+};
+
struct hns_roce_device {
struct ibv_device ibv_dev;
int page_size;
+ struct hns_roce_u_hw *u_hw;
+ int hw_version;
+};
+
+struct hns_roce_buf {
+ void *buf;
+ unsigned int length;
};
struct hns_roce_context {
@@ -59,7 +94,10 @@ struct hns_roce_context {
void *uar;
pthread_spinlock_t uar_lock;
+ void *cq_tptr_base;
+
struct {
+ struct hns_roce_qp **table;
int refcnt;
} qp_table[HNS_ROCE_QP_TABLE_SIZE];
@@ -78,6 +116,44 @@ struct hns_roce_pd {
unsigned int pdn;
};
+struct hns_roce_cq {
+ struct ibv_cq ibv_cq;
+ struct hns_roce_buf buf;
+ pthread_spinlock_t lock;
+ unsigned int cqn;
+ unsigned int cq_depth;
+ unsigned int cons_index;
+ unsigned int *set_ci_db;
+ unsigned int *arm_db;
+ int arm_sn;
+};
+
+struct hns_roce_wq {
+ unsigned long *wrid;
+ unsigned int wqe_cnt;
+ unsigned int tail;
+ int wqe_shift;
+ int offset;
+};
+
+struct hns_roce_qp {
+ struct ibv_qp ibv_qp;
+ struct hns_roce_buf buf;
+ unsigned int sq_signal_bits;
+ struct hns_roce_wq sq;
+ struct hns_roce_wq rq;
+};
+
+struct hns_roce_u_hw {
+ int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
+ int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+};
+
+static inline unsigned long align(unsigned long val, unsigned long align)
+{
+ return (val + align - 1) & ~(align - 1);
+}
+
static inline struct hns_roce_device *to_hr_dev(struct ibv_device *ibv_dev)
{
return container_of(ibv_dev, struct hns_roce_device, ibv_dev);
@@ -93,6 +169,11 @@ static inline struct hns_roce_pd *to_hr_pd(struct ibv_pd *ibv_pd)
return container_of(ibv_pd, struct hns_roce_pd, ibv_pd);
}
+static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
+{
+ return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr);
int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -105,4 +186,17 @@ struct ibv_mr *hns_roce_u_reg_mr(struct ibv_pd *pd, void *addr, size_t length,
int access);
int hns_roce_u_dereg_mr(struct ibv_mr *mr);
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector);
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq);
+void hns_roce_u_cq_event(struct ibv_cq *cq);
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+ int page_size);
+void hns_roce_free_buf(struct hns_roce_buf *buf);
+
+extern struct hns_roce_u_hw hns_roce_u_hw_v1;
+
#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 0a0cd0c..1e62a7e 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -46,4 +46,16 @@ struct hns_roce_alloc_pd_resp {
__u32 reserved;
};
+struct hns_roce_create_cq {
+ struct ibv_create_cq ibv_cmd;
+ __u64 buf_addr;
+ __u64 db_addr;
+};
+
+struct hns_roce_create_cq_resp {
+ struct ibv_create_cq_resp ibv_resp;
+ __u32 cqn;
+ __u32 reserved;
+};
+
#endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_buf.c b/providers/hns/hns_roce_u_buf.c
new file mode 100644
index 0000000..f92ea65
--- /dev/null
+++ b/providers/hns/hns_roce_u_buf.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <errno.h>
+#include <sys/mman.h>
+
+#include "hns_roce_u.h"
+
+int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
+ int page_size)
+{
+ int ret;
+
+ buf->length = align(size, page_size);
+ buf->buf = mmap(NULL, buf->length, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (buf->buf == MAP_FAILED)
+ return errno;
+
+ ret = ibv_dontfork_range(buf->buf, size);
+ if (ret)
+ munmap(buf->buf, buf->length);
+
+ return ret;
+}
+
+void hns_roce_free_buf(struct hns_roce_buf *buf)
+{
+ ibv_dofork_range(buf->buf, buf->length);
+
+ munmap(buf->buf, buf->length);
+}
diff --git a/providers/hns/hns_roce_u_db.h b/providers/hns/hns_roce_u_db.h
new file mode 100644
index 0000000..76d13ce
--- /dev/null
+++ b/providers/hns/hns_roce_u_db.h
@@ -0,0 +1,54 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/types.h>
+
+#include "hns_roce_u.h"
+
+#ifndef _HNS_ROCE_U_DB_H
+#define _HNS_ROCE_U_DB_H
+
+#if __BYTE_ORDER == __LITTLE_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[1] << 32 | val[0])
+#elif __BYTE_ORDER == __BIG_ENDIAN
+#define HNS_ROCE_PAIR_TO_64(val) ((uint64_t) val[0] << 32 | val[1])
+#else
+#error __BYTE_ORDER not defined
+#endif
+
+static inline void hns_roce_write64(uint32_t val[2],
+ struct hns_roce_context *ctx, int offset)
+{
+ *(volatile uint64_t *) (ctx->uar + offset) = HNS_ROCE_PAIR_TO_64(val);
+}
+
+#endif /* _HNS_ROCE_U_DB_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
new file mode 100644
index 0000000..2676021
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -0,0 +1,370 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <malloc.h>
+#include "hns_roce_u_db.h"
+#include "hns_roce_u_hw_v1.h"
+#include "hns_roce_u.h"
+
+static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
+ struct hns_roce_cq *cq)
+{
+ struct hns_roce_cq_db cq_db;
+
+ cq_db.u32_4 = 0;
+ cq_db.u32_8 = 0;
+
+ roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+ CQ_DB_U32_8_CMD_MDF_S, 0);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+ cq->cqn);
+ roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+ CQ_DB_U32_4_CONS_IDX_S,
+ cq->cons_index & ((cq->cq_depth << 1) - 1));
+
+ hns_roce_write64((uint32_t *)&cq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_handle_error_cqe(struct hns_roce_cqe *cqe,
+ struct ibv_wc *wc)
+{
+ switch (roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) &
+ HNS_ROCE_CQE_STATUS_MASK) {
+ fprintf(stderr, PFX "error cqe!\n");
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR:
+ wc->status = IBV_WC_LOC_LEN_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR:
+ wc->status = IBV_WC_LOC_QP_OP_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR:
+ wc->status = IBV_WC_LOC_PROT_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR:
+ wc->status = IBV_WC_WR_FLUSH_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR:
+ wc->status = IBV_WC_MW_BIND_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR:
+ wc->status = IBV_WC_BAD_RESP_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR:
+ wc->status = IBV_WC_LOC_ACCESS_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR:
+ wc->status = IBV_WC_REM_INV_REQ_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR:
+ wc->status = IBV_WC_REM_ACCESS_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR:
+ wc->status = IBV_WC_REM_OP_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR:
+ wc->status = IBV_WC_RETRY_EXC_ERR;
+ break;
+ case HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR:
+ wc->status = IBV_WC_RNR_RETRY_EXC_ERR;
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+}
+
+static struct hns_roce_cqe *get_cqe(struct hns_roce_cq *cq, int entry)
+{
+ return cq->buf.buf + entry * HNS_ROCE_CQE_ENTRY_SIZE;
+}
+
+static void *get_sw_cqe(struct hns_roce_cq *cq, int n)
+{
+ struct hns_roce_cqe *cqe = get_cqe(cq, n & cq->ibv_cq.cqe);
+
+ return (!!(roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_OWNER_S)) ^
+ !!(n & (cq->ibv_cq.cqe + 1))) ? cqe : NULL;
+}
+
+static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
+{
+ return get_sw_cqe(cq, cq->cons_index);
+}
+
+static void *get_send_wqe(struct hns_roce_qp *qp, int n)
+{
+ if ((n < 0) || (n > qp->sq.wqe_cnt)) {
+ printf("sq wqe index:%d,sq wqe cnt:%d\r\n", n, qp->sq.wqe_cnt);
+ return NULL;
+ }
+
+ return (void *)((uint64_t)(qp->buf.buf) + qp->sq.offset +
+ (n << qp->sq.wqe_shift));
+}
+
+static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
+ uint32_t qpn)
+{
+ int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+ if (ctx->qp_table[tind].refcnt) {
+ return ctx->qp_table[tind].table[qpn & ctx->qp_table_mask];
+ } else {
+ printf("hns_roce_find_qp fail!\n");
+ return NULL;
+ }
+}
+
+static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
+ struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
+{
+ uint32_t qpn;
+ int is_send;
+ uint16_t wqe_ctr;
+ uint32_t local_qpn;
+ struct hns_roce_wq *wq = NULL;
+ struct hns_roce_cqe *cqe = NULL;
+ struct hns_roce_wqe_ctrl_seg *sq_wqe = NULL;
+
+ /* According to CI, find the relative cqe */
+ cqe = next_cqe_sw(cq);
+ if (!cqe)
+ return CQ_EMPTY;
+
+ /* Get the next cqe, CI will be added gradually */
+ ++cq->cons_index;
+
+ rmb();
+
+ qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+ CQE_BYTE_16_LOCAL_QPN_S);
+
+ is_send = (roce_get_bit(cqe->cqe_byte_4, CQE_BYTE_4_SQ_RQ_FLAG_S) ==
+ HNS_ROCE_CQE_IS_SQ);
+
+ local_qpn = roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+ CQE_BYTE_16_LOCAL_QPN_S);
+
+ /* if qp is zero, it will not get the correct qpn */
+ if (!*cur_qp ||
+ (local_qpn & HNS_ROCE_CQE_QPN_MASK) != (*cur_qp)->ibv_qp.qp_num) {
+
+ *cur_qp = hns_roce_find_qp(to_hr_ctx(cq->ibv_cq.context),
+ qpn & 0xffffff);
+ if (!*cur_qp) {
+ fprintf(stderr, PFX "can't find qp!\n");
+ return CQ_POLL_ERR;
+ }
+ }
+ wc->qp_num = qpn & 0xffffff;
+
+ if (is_send) {
+ wq = &(*cur_qp)->sq;
+ /*
+ * if sq_signal_bits is 1, the tail pointer first update to
+ * the wqe corresponding the current cqe
+ */
+ if ((*cur_qp)->sq_signal_bits) {
+ wqe_ctr = (uint16_t)(roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_WQE_INDEX_M,
+ CQE_BYTE_4_WQE_INDEX_S));
+ /*
+ * wq->tail will plus a positive number every time,
+ * when wq->tail exceeds 32b, it is 0 and acc
+ */
+ wq->tail += (wqe_ctr - (uint16_t) wq->tail) &
+ (wq->wqe_cnt - 1);
+ }
+ /* write the wr_id of wq into the wc */
+ wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+ ++wq->tail;
+ } else {
+ wq = &(*cur_qp)->rq;
+ wc->wr_id = wq->wrid[wq->tail & (wq->wqe_cnt - 1)];
+ ++wq->tail;
+ }
+
+ /*
+ * HW maintains wc status, set the err type and directly return, after
+ * generated the incorrect CQE
+ */
+ if (roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_M,
+ CQE_BYTE_4_STATUS_OF_THE_OPERATION_S) != HNS_ROCE_CQE_SUCCESS) {
+ hns_roce_handle_error_cqe(cqe, wc);
+ return CQ_OK;
+ }
+ wc->status = IBV_WC_SUCCESS;
+
+ /*
+ * According to the opcode type of cqe, mark the opcode and other
+ * information of wc
+ */
+ if (is_send) {
+ /* Get opcode and flag before update the tail point for send */
+ sq_wqe = (struct hns_roce_wqe_ctrl_seg *)
+ (uint64_t)get_send_wqe(*cur_qp,
+ roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_WQE_INDEX_M,
+ CQE_BYTE_4_WQE_INDEX_S));
+ switch (sq_wqe->flag & HNS_ROCE_WQE_OPCODE_MASK) {
+ case HNS_ROCE_WQE_OPCODE_SEND:
+ wc->opcode = IBV_WC_SEND;
+ break;
+ case HNS_ROCE_WQE_OPCODE_RDMA_READ:
+ wc->opcode = IBV_WC_RDMA_READ;
+ wc->byte_len = cqe->byte_cnt;
+ break;
+ case HNS_ROCE_WQE_OPCODE_RDMA_WRITE:
+ wc->opcode = IBV_WC_RDMA_WRITE;
+ break;
+ case HNS_ROCE_WQE_OPCODE_BIND_MW2:
+ wc->opcode = IBV_WC_BIND_MW;
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+ wc->wc_flags = (sq_wqe->flag & HNS_ROCE_WQE_IMM ?
+ IBV_WC_WITH_IMM : 0);
+ } else {
+ /* Get opcode and flag in rq&srq */
+ wc->byte_len = (cqe->byte_cnt);
+
+ switch (roce_get_field(cqe->cqe_byte_4,
+ CQE_BYTE_4_OPERATION_TYPE_M,
+ CQE_BYTE_4_OPERATION_TYPE_S) &
+ HNS_ROCE_CQE_OPCODE_MASK) {
+ case HNS_ROCE_OPCODE_RDMA_WITH_IMM_RECEIVE:
+ wc->opcode = IBV_WC_RECV_RDMA_WITH_IMM;
+ wc->wc_flags = IBV_WC_WITH_IMM;
+ wc->imm_data = cqe->immediate_data;
+ break;
+ case HNS_ROCE_OPCODE_SEND_DATA_RECEIVE:
+ if (roce_get_bit(cqe->cqe_byte_4,
+ CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S)) {
+ wc->opcode = IBV_WC_RECV;
+ wc->wc_flags = IBV_WC_WITH_IMM;
+ wc->imm_data = cqe->immediate_data;
+ } else {
+ wc->opcode = IBV_WC_RECV;
+ wc->wc_flags = 0;
+ }
+ break;
+ default:
+ wc->status = IBV_WC_GENERAL_ERR;
+ break;
+ }
+ }
+
+ return CQ_OK;
+}
+
+static int hns_roce_u_v1_poll_cq(struct ibv_cq *ibvcq, int ne,
+ struct ibv_wc *wc)
+{
+ int npolled;
+ int err = CQ_OK;
+ struct hns_roce_qp *qp = NULL;
+ struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+ struct hns_roce_context *ctx = to_hr_ctx(ibvcq->context);
+ struct hns_roce_device *dev = to_hr_dev(ibvcq->context->device);
+
+ pthread_spin_lock(&cq->lock);
+
+ for (npolled = 0; npolled < ne; ++npolled) {
+ err = hns_roce_v1_poll_one(cq, &qp, wc + npolled);
+ if (err != CQ_OK)
+ break;
+ }
+
+ if (npolled) {
+ if (dev->hw_version == HNS_ROCE_HW_VER1) {
+ *cq->set_ci_db = (unsigned short)(cq->cons_index &
+ ((cq->cq_depth << 1) - 1));
+ mb();
+ }
+
+ hns_roce_update_cq_cons_index(ctx, cq);
+ }
+
+ pthread_spin_unlock(&cq->lock);
+
+ return err == CQ_POLL_ERR ? err : npolled;
+}
+
+/**
+ * hns_roce_u_v1_arm_cq - request completion notification on a CQ
+ * @ibvcq: The completion queue to request notification for.
+ * @solicited: If non-zero, a event will be generated only for
+ * the next solicited CQ entry. If zero, any CQ entry,
+ * solicited or not, will generate an event
+ */
+static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
+{
+ uint32_t ci;
+ uint32_t solicited_flag;
+ struct hns_roce_cq_db cq_db;
+ struct hns_roce_cq *cq = to_hr_cq(ibvcq);
+
+ ci = cq->cons_index & ((cq->cq_depth << 1) - 1);
+ solicited_flag = solicited ? HNS_ROCE_CQ_DB_REQ_SOL :
+ HNS_ROCE_CQ_DB_REQ_NEXT;
+
+ cq_db.u32_4 = 0;
+ cq_db.u32_8 = 0;
+
+ roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_HW_SYNC_S, 1);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_M, CQ_DB_U32_8_CMD_S, 3);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CMD_MDF_M,
+ CQ_DB_U32_8_CMD_MDF_S, 1);
+ roce_set_bit(cq_db.u32_8, CQ_DB_U32_8_NOTIFY_TYPE_S, solicited_flag);
+ roce_set_field(cq_db.u32_8, CQ_DB_U32_8_CQN_M, CQ_DB_U32_8_CQN_S,
+ cq->cqn);
+ roce_set_field(cq_db.u32_4, CQ_DB_U32_4_CONS_IDX_M,
+ CQ_DB_U32_4_CONS_IDX_S, ci);
+
+ hns_roce_write64((uint32_t *)&cq_db, to_hr_ctx(ibvcq->context),
+ ROCEE_DB_OTHERS_L_0_REG);
+ return 0;
+}
+
+struct hns_roce_u_hw hns_roce_u_hw_v1 = {
+ .poll_cq = hns_roce_u_v1_poll_cq,
+ .arm_cq = hns_roce_u_v1_arm_cq,
+};
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
new file mode 100644
index 0000000..b249f54
--- /dev/null
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -0,0 +1,163 @@
+/*
+ * Copyright (c) 2016 Hisilicon Limited.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _HNS_ROCE_U_HW_V1_H
+#define _HNS_ROCE_U_HW_V1_H
+
+#define HNS_ROCE_CQ_DB_REQ_SOL 1
+#define HNS_ROCE_CQ_DB_REQ_NEXT 0
+
+#define HNS_ROCE_CQE_IS_SQ 0
+
+#define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN 32
+
+enum {
+ HNS_ROCE_WQE_IMM = 1 << 23,
+ HNS_ROCE_WQE_OPCODE_SEND = 0 << 16,
+ HNS_ROCE_WQE_OPCODE_RDMA_READ = 1 << 16,
+ HNS_ROCE_WQE_OPCODE_RDMA_WRITE = 2 << 16,
+ HNS_ROCE_WQE_OPCODE_BIND_MW2 = 6 << 16,
+ HNS_ROCE_WQE_OPCODE_MASK = 15 << 16,
+};
+
+struct hns_roce_wqe_ctrl_seg {
+ __be32 sgl_pa_h;
+ __be32 flag;
+};
+
+enum {
+ CQ_OK = 0,
+ CQ_EMPTY = -1,
+ CQ_POLL_ERR = -2,
+};
+
+enum {
+ HNS_ROCE_CQE_QPN_MASK = 0x3ffff,
+ HNS_ROCE_CQE_STATUS_MASK = 0x1f,
+ HNS_ROCE_CQE_OPCODE_MASK = 0xf,
+};
+
+enum {
+ HNS_ROCE_CQE_SUCCESS,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_LENGTH_ERR,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_QP_OP_ERR,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_PROT_ERR,
+ HNS_ROCE_CQE_SYNDROME_WR_FLUSH_ERR,
+ HNS_ROCE_CQE_SYNDROME_MEM_MANAGE_OPERATE_ERR,
+ HNS_ROCE_CQE_SYNDROME_BAD_RESP_ERR,
+ HNS_ROCE_CQE_SYNDROME_LOCAL_ACCESS_ERR,
+ HNS_ROCE_CQE_SYNDROME_REMOTE_INVAL_REQ_ERR,
+ HNS_ROCE_CQE_SYNDROME_REMOTE_ACCESS_ERR,
+ HNS_ROCE_CQE_SYNDROME_REMOTE_OP_ERR,
+ HNS_ROCE_CQE_SYNDROME_TRANSPORT_RETRY_EXC_ERR,
+ HNS_ROCE_CQE_SYNDROME_RNR_RETRY_EXC_ERR,
+};
+
+struct hns_roce_cq_db {
+ unsigned int u32_4;
+ unsigned int u32_8;
+};
+#define CQ_DB_U32_4_CONS_IDX_S 0
+#define CQ_DB_U32_4_CONS_IDX_M (((1UL << 16) - 1) << CQ_DB_U32_4_CONS_IDX_S)
+
+#define CQ_DB_U32_8_CQN_S 0
+#define CQ_DB_U32_8_CQN_M (((1UL << 16) - 1) << CQ_DB_U32_8_CQN_S)
+
+#define CQ_DB_U32_8_NOTIFY_TYPE_S 16
+
+#define CQ_DB_U32_8_CMD_MDF_S 24
+#define CQ_DB_U32_8_CMD_MDF_M (((1UL << 4) - 1) << CQ_DB_U32_8_CMD_MDF_S)
+
+#define CQ_DB_U32_8_CMD_S 28
+#define CQ_DB_U32_8_CMD_M (((1UL << 3) - 1) << CQ_DB_U32_8_CMD_S)
+
+#define CQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_cqe {
+ unsigned int cqe_byte_4;
+ union {
+ unsigned int r_key;
+ unsigned int immediate_data;
+ };
+ unsigned int byte_cnt;
+ unsigned int cqe_byte_16;
+ unsigned int cqe_byte_20;
+ unsigned int s_mac_l;
+ unsigned int cqe_byte_28;
+ unsigned int reserved;
+};
+#define CQE_BYTE_4_OPERATION_TYPE_S 0
+#define CQE_BYTE_4_OPERATION_TYPE_M \
+ (((1UL << 4) - 1) << CQE_BYTE_4_OPERATION_TYPE_S)
+
+#define CQE_BYTE_4_OWNER_S 7
+
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_S 8
+#define CQE_BYTE_4_STATUS_OF_THE_OPERATION_M \
+ (((1UL << 5) - 1) << CQE_BYTE_4_STATUS_OF_THE_OPERATION_S)
+
+#define CQE_BYTE_4_SQ_RQ_FLAG_S 14
+
+#define CQE_BYTE_4_IMMEDIATE_DATA_FLAG_S 15
+
+#define CQE_BYTE_4_WQE_INDEX_S 16
+#define CQE_BYTE_4_WQE_INDEX_M (((1UL << 14) - 1) << CQE_BYTE_4_WQE_INDEX_S)
+
+#define CQE_BYTE_16_LOCAL_QPN_S 0
+#define CQE_BYTE_16_LOCAL_QPN_M (((1UL << 24) - 1) << CQE_BYTE_16_LOCAL_QPN_S)
+
+#define ROCEE_DB_SQ_L_0_REG 0x230
+
+#define ROCEE_DB_OTHERS_L_0_REG 0x238
+
+struct hns_roce_rc_send_wqe {
+ unsigned int sgl_ba_31_0;
+ unsigned int u32_1;
+ union {
+ unsigned int r_key;
+ unsigned int immediate_data;
+ };
+ unsigned int msg_length;
+ unsigned int rvd_3;
+ unsigned int rvd_4;
+ unsigned int rvd_5;
+ unsigned int rvd_6;
+ uint64_t va0;
+ unsigned int l_key0;
+ unsigned int length0;
+
+ uint64_t va1;
+ unsigned int l_key1;
+ unsigned int length1;
+};
+
+#endif /* _HNS_ROCE_U_HW_V1_H */
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 476b6ba..61f48f1 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -40,6 +40,8 @@
#include <unistd.h>
#include "hns_roce_u.h"
+#include "hns_roce_u_abi.h"
+#include "hns_roce_u_hw_v1.h"
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr)
@@ -150,3 +152,117 @@ int hns_roce_u_dereg_mr(struct ibv_mr *mr)
return ret;
}
+
+static int align_cq_size(int req)
+{
+ int nent;
+
+ for (nent = HNS_ROCE_MIN_CQE_NUM; nent < req; nent <<= 1)
+ ;
+
+ return nent;
+}
+
+static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
+{
+ if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
+ fprintf(stderr, "cqe = %d, less than minimum CQE number.\n",
+ *cqe);
+ *cqe = HNS_ROCE_MIN_CQE_NUM;
+ }
+
+ if (*cqe > context->max_cqe)
+ return -1;
+
+ return 0;
+}
+
+static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
+ struct hns_roce_buf *buf, int nent)
+{
+ if (hns_roce_alloc_buf(buf,
+ align(nent * HNS_ROCE_CQE_ENTRY_SIZE, dev->page_size),
+ dev->page_size))
+ return -1;
+ memset(buf->buf, 0, nent * HNS_ROCE_CQE_ENTRY_SIZE);
+
+ return 0;
+}
+
+struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
+ struct ibv_comp_channel *channel,
+ int comp_vector)
+{
+ struct hns_roce_create_cq cmd;
+ struct hns_roce_create_cq_resp resp;
+ struct hns_roce_cq *cq;
+ int ret;
+
+ if (hns_roce_verify_cq(&cqe, to_hr_ctx(context)))
+ return NULL;
+
+ cq = malloc(sizeof(*cq));
+ if (!cq)
+ return NULL;
+
+ cq->cons_index = 0;
+
+ if (pthread_spin_init(&cq->lock, PTHREAD_PROCESS_PRIVATE))
+ goto err;
+
+ cqe = align_cq_size(cqe);
+
+ if (hns_roce_alloc_cq_buf(to_hr_dev(context->device), &cq->buf, cqe))
+ goto err;
+
+ cmd.buf_addr = (uintptr_t) cq->buf.buf;
+
+ ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector,
+ &cq->ibv_cq, &cmd.ibv_cmd, sizeof(cmd),
+ &resp.ibv_resp, sizeof(resp));
+ if (ret)
+ goto err_db;
+
+ cq->cqn = resp.cqn;
+ cq->cq_depth = cqe;
+
+ if (to_hr_dev(context->device)->hw_version == HNS_ROCE_HW_VER1)
+ cq->set_ci_db = to_hr_ctx(context)->cq_tptr_base + cq->cqn * 2;
+ else
+ cq->set_ci_db = to_hr_ctx(context)->uar +
+ ROCEE_DB_OTHERS_L_0_REG;
+
+ cq->arm_db = cq->set_ci_db;
+ cq->arm_sn = 1;
+ *(cq->set_ci_db) = 0;
+ *(cq->arm_db) = 0;
+
+ return &cq->ibv_cq;
+
+err_db:
+ hns_roce_free_buf(&cq->buf);
+
+err:
+ free(cq);
+
+ return NULL;
+}
+
+void hns_roce_u_cq_event(struct ibv_cq *cq)
+{
+ to_hr_cq(cq)->arm_sn++;
+}
+
+int hns_roce_u_destroy_cq(struct ibv_cq *cq)
+{
+ int ret;
+
+ ret = ibv_cmd_destroy_cq(cq);
+ if (ret)
+ return ret;
+
+ hns_roce_free_buf(&to_hr_cq(cq)->buf);
+ free(to_hr_cq(cq));
+
+ return ret;
+}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3 rdma-core 5/7] libhns: Add verbs of qp support
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1478781977-116803-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the relatived qp verbs for userspace
library of hns, include:
1. create_qp
2. query_qp
3. modify_qp
4. destroy_qp
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v3:
- No change over v2
v2:
- Delete the min() and use the ccan header
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 5 +
providers/hns/hns_roce_u.h | 45 +++++++
providers/hns/hns_roce_u_abi.h | 8 ++
providers/hns/hns_roce_u_hw_v1.c | 155 +++++++++++++++++++++++
providers/hns/hns_roce_u_verbs.c | 259 ++++++++++++++++++++++++++++++++++++++-
5 files changed, 471 insertions(+), 1 deletion(-)
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index e435bea..30f8678 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -127,6 +127,11 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.cq_event = hns_roce_u_cq_event;
context->ibv_ctx.ops.destroy_cq = hns_roce_u_destroy_cq;
+ context->ibv_ctx.ops.create_qp = hns_roce_u_create_qp;
+ context->ibv_ctx.ops.query_qp = hns_roce_u_query_qp;
+ context->ibv_ctx.ops.modify_qp = hr_dev->u_hw->modify_qp;
+ context->ibv_ctx.ops.destroy_qp = hr_dev->u_hw->destroy_qp;
+
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index c3e364d..02b9251 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -44,6 +44,7 @@
#define HNS_ROCE_MAX_CQ_NUM 0x10000
#define HNS_ROCE_MIN_CQE_NUM 0x40
+#define HNS_ROCE_MIN_WQE_NUM 0x20
#define HNS_ROCE_CQ_DB_BUF_SIZE ((HNS_ROCE_MAX_CQ_NUM >> 11) << 12)
#define HNS_ROCE_TPTR_OFFSET 0x1000
#define HNS_ROCE_HW_VER1 ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
@@ -128,10 +129,29 @@ struct hns_roce_cq {
int arm_sn;
};
+struct hns_roce_srq {
+ struct ibv_srq ibv_srq;
+ struct hns_roce_buf buf;
+ pthread_spinlock_t lock;
+ unsigned long *wrid;
+ unsigned int srqn;
+ int max;
+ unsigned int max_gs;
+ int wqe_shift;
+ int head;
+ int tail;
+ unsigned int *db;
+ unsigned short counter;
+};
+
struct hns_roce_wq {
unsigned long *wrid;
+ pthread_spinlock_t lock;
unsigned int wqe_cnt;
+ int max_post;
+ unsigned int head;
unsigned int tail;
+ unsigned int max_gs;
int wqe_shift;
int offset;
};
@@ -139,14 +159,21 @@ struct hns_roce_wq {
struct hns_roce_qp {
struct ibv_qp ibv_qp;
struct hns_roce_buf buf;
+ int max_inline_data;
+ int buf_size;
unsigned int sq_signal_bits;
struct hns_roce_wq sq;
struct hns_roce_wq rq;
+ int port_num;
+ int sl;
};
struct hns_roce_u_hw {
int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+ int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask);
+ int (*destroy_qp)(struct ibv_qp *ibqp);
};
static inline unsigned long align(unsigned long val, unsigned long align)
@@ -174,6 +201,16 @@ static inline struct hns_roce_cq *to_hr_cq(struct ibv_cq *ibv_cq)
return container_of(ibv_cq, struct hns_roce_cq, ibv_cq);
}
+static inline struct hns_roce_srq *to_hr_srq(struct ibv_srq *ibv_srq)
+{
+ return container_of(ibv_srq, struct hns_roce_srq, ibv_srq);
+}
+
+static inline struct hns_roce_qp *to_hr_qp(struct ibv_qp *ibv_qp)
+{
+ return container_of(ibv_qp, struct hns_roce_qp, ibv_qp);
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr);
int hns_roce_u_query_port(struct ibv_context *context, uint8_t port,
@@ -193,10 +230,18 @@ struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
int hns_roce_u_destroy_cq(struct ibv_cq *cq);
void hns_roce_u_cq_event(struct ibv_cq *cq);
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+ struct ibv_qp_init_attr *attr);
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+ int attr_mask, struct ibv_qp_init_attr *init_attr);
+
int hns_roce_alloc_buf(struct hns_roce_buf *buf, unsigned int size,
int page_size);
void hns_roce_free_buf(struct hns_roce_buf *buf);
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp);
+
extern struct hns_roce_u_hw hns_roce_u_hw_v1;
#endif /* _HNS_ROCE_U_H */
diff --git a/providers/hns/hns_roce_u_abi.h b/providers/hns/hns_roce_u_abi.h
index 1e62a7e..e78f967 100644
--- a/providers/hns/hns_roce_u_abi.h
+++ b/providers/hns/hns_roce_u_abi.h
@@ -58,4 +58,12 @@ struct hns_roce_create_cq_resp {
__u32 reserved;
};
+struct hns_roce_create_qp {
+ struct ibv_create_qp ibv_cmd;
+ __u64 buf_addr;
+ __u8 log_sq_bb_count;
+ __u8 log_sq_stride;
+ __u8 reserved[5];
+};
+
#endif /* _HNS_ROCE_U_ABI_H */
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index 2676021..fb81634 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -150,6 +150,16 @@ static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
}
}
+static void hns_roce_clear_qp(struct hns_roce_context *ctx, uint32_t qpn)
+{
+ int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+ if (!--ctx->qp_table[tind].refcnt)
+ free(ctx->qp_table[tind].table);
+ else
+ ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = NULL;
+}
+
static int hns_roce_v1_poll_one(struct hns_roce_cq *cq,
struct hns_roce_qp **cur_qp, struct ibv_wc *wc)
{
@@ -364,7 +374,152 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
return 0;
}
+static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
+ struct hns_roce_srq *srq)
+{
+ int nfreed = 0;
+ uint32_t prod_index;
+ uint8_t owner_bit = 0;
+ struct hns_roce_cqe *cqe, *dest;
+ struct hns_roce_context *ctx = to_hr_ctx(cq->ibv_cq.context);
+
+ for (prod_index = cq->cons_index; get_sw_cqe(cq, prod_index);
+ ++prod_index)
+ if (prod_index == cq->cons_index + cq->ibv_cq.cqe)
+ break;
+
+ while ((int) --prod_index - (int) cq->cons_index >= 0) {
+ cqe = get_cqe(cq, prod_index & cq->ibv_cq.cqe);
+ if ((roce_get_field(cqe->cqe_byte_16, CQE_BYTE_16_LOCAL_QPN_M,
+ CQE_BYTE_16_LOCAL_QPN_S) & 0xffffff) == qpn) {
+ ++nfreed;
+ } else if (nfreed) {
+ dest = get_cqe(cq,
+ (prod_index + nfreed) & cq->ibv_cq.cqe);
+ owner_bit = roce_get_bit(dest->cqe_byte_4,
+ CQE_BYTE_4_OWNER_S);
+ memcpy(dest, cqe, sizeof(*cqe));
+ roce_set_bit(dest->cqe_byte_4, CQE_BYTE_4_OWNER_S,
+ owner_bit);
+ }
+ }
+
+ if (nfreed) {
+ cq->cons_index += nfreed;
+ wmb();
+ hns_roce_update_cq_cons_index(ctx, cq);
+ }
+}
+
+static void hns_roce_v1_cq_clean(struct hns_roce_cq *cq, unsigned int qpn,
+ struct hns_roce_srq *srq)
+{
+ pthread_spin_lock(&cq->lock);
+ __hns_roce_v1_cq_clean(cq, qpn, srq);
+ pthread_spin_unlock(&cq->lock);
+}
+
+static int hns_roce_u_v1_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
+ int attr_mask)
+{
+ int ret;
+ struct ibv_modify_qp cmd;
+ struct hns_roce_qp *hr_qp = to_hr_qp(qp);
+
+ ret = ibv_cmd_modify_qp(qp, attr, attr_mask, &cmd, sizeof(cmd));
+
+ if (!ret && (attr_mask & IBV_QP_STATE) &&
+ attr->qp_state == IBV_QPS_RESET) {
+ hns_roce_v1_cq_clean(to_hr_cq(qp->recv_cq), qp->qp_num,
+ qp->srq ? to_hr_srq(qp->srq) : NULL);
+ if (qp->send_cq != qp->recv_cq)
+ hns_roce_v1_cq_clean(to_hr_cq(qp->send_cq), qp->qp_num,
+ NULL);
+
+ hns_roce_init_qp_indices(to_hr_qp(qp));
+ }
+
+ if (!ret && (attr_mask & IBV_QP_PORT)) {
+ hr_qp->port_num = attr->port_num;
+ printf("hr_qp->port_num= 0x%x\n", hr_qp->port_num);
+ }
+
+ hr_qp->sl = attr->ah_attr.sl;
+
+ return ret;
+}
+
+static void hns_roce_lock_cqs(struct ibv_qp *qp)
+{
+ struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+ struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+ if (send_cq == recv_cq) {
+ pthread_spin_lock(&send_cq->lock);
+ } else if (send_cq->cqn < recv_cq->cqn) {
+ pthread_spin_lock(&send_cq->lock);
+ pthread_spin_lock(&recv_cq->lock);
+ } else {
+ pthread_spin_lock(&recv_cq->lock);
+ pthread_spin_lock(&send_cq->lock);
+ }
+}
+
+static void hns_roce_unlock_cqs(struct ibv_qp *qp)
+{
+ struct hns_roce_cq *send_cq = to_hr_cq(qp->send_cq);
+ struct hns_roce_cq *recv_cq = to_hr_cq(qp->recv_cq);
+
+ if (send_cq == recv_cq) {
+ pthread_spin_unlock(&send_cq->lock);
+ } else if (send_cq->cqn < recv_cq->cqn) {
+ pthread_spin_unlock(&recv_cq->lock);
+ pthread_spin_unlock(&send_cq->lock);
+ } else {
+ pthread_spin_unlock(&send_cq->lock);
+ pthread_spin_unlock(&recv_cq->lock);
+ }
+}
+
+static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
+{
+ int ret;
+ struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+ pthread_mutex_lock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+ ret = ibv_cmd_destroy_qp(ibqp);
+ if (ret) {
+ pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+ return ret;
+ }
+
+ hns_roce_lock_cqs(ibqp);
+
+ __hns_roce_v1_cq_clean(to_hr_cq(ibqp->recv_cq), ibqp->qp_num,
+ ibqp->srq ? to_hr_srq(ibqp->srq) : NULL);
+
+ if (ibqp->send_cq != ibqp->recv_cq)
+ __hns_roce_v1_cq_clean(to_hr_cq(ibqp->send_cq), ibqp->qp_num,
+ NULL);
+
+ hns_roce_clear_qp(to_hr_ctx(ibqp->context), ibqp->qp_num);
+
+ hns_roce_unlock_cqs(ibqp);
+ pthread_mutex_unlock(&to_hr_ctx(ibqp->context)->qp_table_mutex);
+
+ free(qp->sq.wrid);
+ if (qp->rq.wqe_cnt)
+ free(qp->rq.wrid);
+
+ hns_roce_free_buf(&qp->buf);
+ free(qp);
+
+ return ret;
+}
+
struct hns_roce_u_hw hns_roce_u_hw_v1 = {
.poll_cq = hns_roce_u_v1_poll_cq,
.arm_cq = hns_roce_u_v1_arm_cq,
+ .modify_qp = hns_roce_u_v1_modify_qp,
+ .destroy_qp = hns_roce_u_v1_destroy_qp,
};
diff --git a/providers/hns/hns_roce_u_verbs.c b/providers/hns/hns_roce_u_verbs.c
index 61f48f1..d54b345 100644
--- a/providers/hns/hns_roce_u_verbs.c
+++ b/providers/hns/hns_roce_u_verbs.c
@@ -38,11 +38,19 @@
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
-
+#include <ccan/minmax.h>
#include "hns_roce_u.h"
#include "hns_roce_u_abi.h"
#include "hns_roce_u_hw_v1.h"
+void hns_roce_init_qp_indices(struct hns_roce_qp *qp)
+{
+ qp->sq.head = 0;
+ qp->sq.tail = 0;
+ qp->rq.head = 0;
+ qp->rq.tail = 0;
+}
+
int hns_roce_u_query_device(struct ibv_context *context,
struct ibv_device_attr *attr)
{
@@ -163,6 +171,29 @@ static int align_cq_size(int req)
return nent;
}
+static int align_qp_size(int req)
+{
+ int nent;
+
+ for (nent = HNS_ROCE_MIN_WQE_NUM; nent < req; nent <<= 1)
+ ;
+
+ return nent;
+}
+
+static void hns_roce_set_sq_sizes(struct hns_roce_qp *qp,
+ struct ibv_qp_cap *cap, enum ibv_qp_type type)
+{
+ struct hns_roce_context *ctx = to_hr_ctx(qp->ibv_qp.context);
+
+ qp->sq.max_gs = 2;
+ cap->max_send_sge = min(ctx->max_sge, qp->sq.max_gs);
+ qp->sq.max_post = min(ctx->max_qp_wr, qp->sq.wqe_cnt);
+ cap->max_send_wr = qp->sq.max_post;
+ qp->max_inline_data = 32;
+ cap->max_inline_data = qp->max_inline_data;
+}
+
static int hns_roce_verify_cq(int *cqe, struct hns_roce_context *context)
{
if (*cqe < HNS_ROCE_MIN_CQE_NUM) {
@@ -189,6 +220,17 @@ static int hns_roce_alloc_cq_buf(struct hns_roce_device *dev,
return 0;
}
+static void hns_roce_calc_sq_wqe_size(struct ibv_qp_cap *cap,
+ enum ibv_qp_type type,
+ struct hns_roce_qp *qp)
+{
+ int size = sizeof(struct hns_roce_rc_send_wqe);
+
+ for (qp->sq.wqe_shift = 6; 1 << qp->sq.wqe_shift < size;
+ qp->sq.wqe_shift++)
+ ;
+}
+
struct ibv_cq *hns_roce_u_create_cq(struct ibv_context *context, int cqe,
struct ibv_comp_channel *channel,
int comp_vector)
@@ -266,3 +308,218 @@ int hns_roce_u_destroy_cq(struct ibv_cq *cq)
return ret;
}
+
+static int hns_roce_verify_qp(struct ibv_qp_init_attr *attr,
+ struct hns_roce_context *context)
+{
+ if (attr->cap.max_send_wr < HNS_ROCE_MIN_WQE_NUM) {
+ fprintf(stderr,
+ "max_send_wr = %d, less than minimum WQE number.\n",
+ attr->cap.max_send_wr);
+ attr->cap.max_send_wr = HNS_ROCE_MIN_WQE_NUM;
+ }
+
+ if (attr->cap.max_recv_wr < HNS_ROCE_MIN_WQE_NUM) {
+ fprintf(stderr,
+ "max_recv_wr = %d, less than minimum WQE number.\n",
+ attr->cap.max_recv_wr);
+ attr->cap.max_recv_wr = HNS_ROCE_MIN_WQE_NUM;
+ }
+
+ if (attr->cap.max_recv_sge < 1)
+ attr->cap.max_recv_sge = 1;
+ if (attr->cap.max_send_wr > context->max_qp_wr ||
+ attr->cap.max_recv_wr > context->max_qp_wr ||
+ attr->cap.max_send_sge > context->max_sge ||
+ attr->cap.max_recv_sge > context->max_sge)
+ return -1;
+
+ if ((attr->qp_type != IBV_QPT_RC) && (attr->qp_type != IBV_QPT_UD))
+ return -1;
+
+ if ((attr->qp_type == IBV_QPT_RC) &&
+ (attr->cap.max_inline_data > HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN))
+ return -1;
+
+ if (attr->qp_type == IBV_QPT_UC)
+ return -1;
+
+ return 0;
+}
+
+static int hns_roce_alloc_qp_buf(struct ibv_pd *pd, struct ibv_qp_cap *cap,
+ enum ibv_qp_type type, struct hns_roce_qp *qp)
+{
+ qp->sq.wrid =
+ (unsigned long *)malloc(qp->sq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->sq.wrid)
+ return -1;
+
+ if (qp->rq.wqe_cnt) {
+ qp->rq.wrid = malloc(qp->rq.wqe_cnt * sizeof(uint64_t));
+ if (!qp->rq.wrid) {
+ free(qp->sq.wrid);
+ return -1;
+ }
+ }
+
+ for (qp->rq.wqe_shift = 4;
+ 1 << qp->rq.wqe_shift < sizeof(struct hns_roce_rc_send_wqe);
+ qp->rq.wqe_shift++)
+ ;
+
+ qp->buf_size = align((qp->sq.wqe_cnt << qp->sq.wqe_shift), 0x1000) +
+ (qp->rq.wqe_cnt << qp->rq.wqe_shift);
+
+ if (qp->rq.wqe_shift > qp->sq.wqe_shift) {
+ qp->rq.offset = 0;
+ qp->sq.offset = qp->rq.wqe_cnt << qp->rq.wqe_shift;
+ } else {
+ qp->rq.offset = align((qp->sq.wqe_cnt << qp->sq.wqe_shift),
+ 0x1000);
+ qp->sq.offset = 0;
+ }
+
+ if (hns_roce_alloc_buf(&qp->buf, align(qp->buf_size, 0x1000),
+ to_hr_dev(pd->context->device)->page_size)) {
+ free(qp->sq.wrid);
+ free(qp->rq.wrid);
+ return -1;
+ }
+
+ memset(qp->buf.buf, 0, qp->buf_size);
+
+ return 0;
+}
+
+static int hns_roce_store_qp(struct hns_roce_context *ctx, uint32_t qpn,
+ struct hns_roce_qp *qp)
+{
+ int tind = (qpn & (ctx->num_qps - 1)) >> ctx->qp_table_shift;
+
+ if (!ctx->qp_table[tind].refcnt) {
+ ctx->qp_table[tind].table = calloc(ctx->qp_table_mask + 1,
+ sizeof(struct hns_roce_qp *));
+ if (!ctx->qp_table[tind].table)
+ return -1;
+ }
+
+ ++ctx->qp_table[tind].refcnt;
+ ctx->qp_table[tind].table[qpn & ctx->qp_table_mask] = qp;
+
+ return 0;
+}
+
+struct ibv_qp *hns_roce_u_create_qp(struct ibv_pd *pd,
+ struct ibv_qp_init_attr *attr)
+{
+ int ret;
+ struct hns_roce_qp *qp = NULL;
+ struct hns_roce_create_qp cmd;
+ struct ibv_create_qp_resp resp;
+ struct hns_roce_context *context = to_hr_ctx(pd->context);
+
+ if (hns_roce_verify_qp(attr, context)) {
+ fprintf(stderr, "hns_roce_verify_sizes failed!\n");
+ return NULL;
+ }
+
+ qp = malloc(sizeof(*qp));
+ if (!qp) {
+ fprintf(stderr, "malloc failed!\n");
+ return NULL;
+ }
+
+ hns_roce_calc_sq_wqe_size(&attr->cap, attr->qp_type, qp);
+ qp->sq.wqe_cnt = align_qp_size(attr->cap.max_send_wr);
+ qp->rq.wqe_cnt = align_qp_size(attr->cap.max_recv_wr);
+
+ if (hns_roce_alloc_qp_buf(pd, &attr->cap, attr->qp_type, qp)) {
+ fprintf(stderr, "hns_roce_alloc_qp_buf failed!\n");
+ goto err;
+ }
+
+ hns_roce_init_qp_indices(qp);
+
+ if (pthread_spin_init(&qp->sq.lock, PTHREAD_PROCESS_PRIVATE) ||
+ pthread_spin_init(&qp->rq.lock, PTHREAD_PROCESS_PRIVATE)) {
+ fprintf(stderr, "pthread_spin_init failed!\n");
+ goto err_free;
+ }
+
+ cmd.buf_addr = (uintptr_t) qp->buf.buf;
+ cmd.log_sq_stride = qp->sq.wqe_shift;
+ for (cmd.log_sq_bb_count = 0; qp->sq.wqe_cnt > 1 << cmd.log_sq_bb_count;
+ ++cmd.log_sq_bb_count)
+ ;
+
+ memset(cmd.reserved, 0, sizeof(cmd.reserved));
+
+ pthread_mutex_lock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+ ret = ibv_cmd_create_qp(pd, &qp->ibv_qp, attr, &cmd.ibv_cmd,
+ sizeof(cmd), &resp, sizeof(resp));
+ if (ret) {
+ fprintf(stderr, "ibv_cmd_create_qp failed!\n");
+ goto err_rq_db;
+ }
+
+ ret = hns_roce_store_qp(to_hr_ctx(pd->context), qp->ibv_qp.qp_num, qp);
+ if (ret) {
+ fprintf(stderr, "hns_roce_store_qp failed!\n");
+ goto err_destroy;
+ }
+ pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+ qp->rq.wqe_cnt = attr->cap.max_recv_wr;
+ qp->rq.max_gs = attr->cap.max_recv_sge;
+
+ /* adjust rq maxima to not exceed reported device maxima */
+ attr->cap.max_recv_wr = min(context->max_qp_wr, attr->cap.max_recv_wr);
+ attr->cap.max_recv_sge = min(context->max_sge, attr->cap.max_recv_sge);
+
+ qp->rq.max_post = attr->cap.max_recv_wr;
+ hns_roce_set_sq_sizes(qp, &attr->cap, attr->qp_type);
+
+ qp->sq_signal_bits = attr->sq_sig_all ? 0 : 1;
+
+ return &qp->ibv_qp;
+
+err_destroy:
+ ibv_cmd_destroy_qp(&qp->ibv_qp);
+
+err_rq_db:
+ pthread_mutex_unlock(&to_hr_ctx(pd->context)->qp_table_mutex);
+
+err_free:
+ free(qp->sq.wrid);
+ if (qp->rq.wqe_cnt)
+ free(qp->rq.wrid);
+ hns_roce_free_buf(&qp->buf);
+
+err:
+ free(qp);
+
+ return NULL;
+}
+
+int hns_roce_u_query_qp(struct ibv_qp *ibqp, struct ibv_qp_attr *attr,
+ int attr_mask, struct ibv_qp_init_attr *init_attr)
+{
+ int ret;
+ struct ibv_query_qp cmd;
+ struct hns_roce_qp *qp = to_hr_qp(ibqp);
+
+ ret = ibv_cmd_query_qp(ibqp, attr, attr_mask, init_attr, &cmd,
+ sizeof(cmd));
+ if (ret)
+ return ret;
+
+ init_attr->cap.max_send_wr = qp->sq.max_post;
+ init_attr->cap.max_send_sge = qp->sq.max_gs;
+ init_attr->cap.max_inline_data = qp->max_inline_data;
+
+ attr->cap = init_attr->cap;
+
+ return ret;
+}
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH v3 rdma-core 6/7] libhns: Add verbs of post_send and post_recv support
From: Lijun Ou @ 2016-11-10 12:46 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: linuxarm-hv44wF8Li93QT0dZR+AlfA
In-Reply-To: <1478781977-116803-1-git-send-email-oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
This patch mainly introduces the verbs of posting send
and psoting recv.
Signed-off-by: Lijun Ou <oulijun-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Wei Hu <xavier.huwei-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
v3/v2:
- No change over the v1
v1:
- The initial submit
---
providers/hns/hns_roce_u.c | 2 +
providers/hns/hns_roce_u.h | 8 +
providers/hns/hns_roce_u_hw_v1.c | 314 +++++++++++++++++++++++++++++++++++++++
providers/hns/hns_roce_u_hw_v1.h | 79 ++++++++++
4 files changed, 403 insertions(+)
diff --git a/providers/hns/hns_roce_u.c b/providers/hns/hns_roce_u.c
index 30f8678..bceed84 100644
--- a/providers/hns/hns_roce_u.c
+++ b/providers/hns/hns_roce_u.c
@@ -131,6 +131,8 @@ static struct ibv_context *hns_roce_alloc_context(struct ibv_device *ibdev,
context->ibv_ctx.ops.query_qp = hns_roce_u_query_qp;
context->ibv_ctx.ops.modify_qp = hr_dev->u_hw->modify_qp;
context->ibv_ctx.ops.destroy_qp = hr_dev->u_hw->destroy_qp;
+ context->ibv_ctx.ops.post_send = hr_dev->u_hw->post_send;
+ context->ibv_ctx.ops.post_recv = hr_dev->u_hw->post_recv;
if (hns_roce_u_query_device(&context->ibv_ctx, &dev_attrs))
goto tptr_free;
diff --git a/providers/hns/hns_roce_u.h b/providers/hns/hns_roce_u.h
index 02b9251..4a6ed8e 100644
--- a/providers/hns/hns_roce_u.h
+++ b/providers/hns/hns_roce_u.h
@@ -51,6 +51,10 @@
#define PFX "hns: "
+#ifndef likely
+#define likely(x) __builtin_expect(!!(x), 1)
+#endif
+
#define roce_get_field(origin, mask, shift) \
(((origin) & (mask)) >> (shift))
@@ -171,6 +175,10 @@ struct hns_roce_qp {
struct hns_roce_u_hw {
int (*poll_cq)(struct ibv_cq *ibvcq, int ne, struct ibv_wc *wc);
int (*arm_cq)(struct ibv_cq *ibvcq, int solicited);
+ int (*post_send)(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr);
+ int (*post_recv)(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr);
int (*modify_qp)(struct ibv_qp *qp, struct ibv_qp_attr *attr,
int attr_mask);
int (*destroy_qp)(struct ibv_qp *ibqp);
diff --git a/providers/hns/hns_roce_u_hw_v1.c b/providers/hns/hns_roce_u_hw_v1.c
index fb81634..a3aad1c 100644
--- a/providers/hns/hns_roce_u_hw_v1.c
+++ b/providers/hns/hns_roce_u_hw_v1.c
@@ -37,6 +37,59 @@
#include "hns_roce_u_hw_v1.h"
#include "hns_roce_u.h"
+static inline void set_raddr_seg(struct hns_roce_wqe_raddr_seg *rseg,
+ uint64_t remote_addr, uint32_t rkey)
+{
+ rseg->raddr = remote_addr;
+ rseg->rkey = rkey;
+ rseg->len = 0;
+}
+
+static void set_data_seg(struct hns_roce_wqe_data_seg *dseg, struct ibv_sge *sg)
+{
+
+ dseg->lkey = sg->lkey;
+ dseg->addr = sg->addr;
+ dseg->len = sg->length;
+}
+
+static void hns_roce_update_rq_head(struct hns_roce_context *ctx,
+ unsigned int qpn, unsigned int rq_head)
+{
+ struct hns_roce_rq_db rq_db;
+
+ rq_db.u32_4 = 0;
+ rq_db.u32_8 = 0;
+
+ roce_set_field(rq_db.u32_4, RQ_DB_U32_4_RQ_HEAD_M,
+ RQ_DB_U32_4_RQ_HEAD_S, rq_head);
+ roce_set_field(rq_db.u32_8, RQ_DB_U32_8_QPN_M, RQ_DB_U32_8_QPN_S, qpn);
+ roce_set_field(rq_db.u32_8, RQ_DB_U32_8_CMD_M, RQ_DB_U32_8_CMD_S, 1);
+ roce_set_bit(rq_db.u32_8, RQ_DB_U32_8_HW_SYNC_S, 1);
+
+ hns_roce_write64((uint32_t *)&rq_db, ctx, ROCEE_DB_OTHERS_L_0_REG);
+}
+
+static void hns_roce_update_sq_head(struct hns_roce_context *ctx,
+ unsigned int qpn, unsigned int port,
+ unsigned int sl, unsigned int sq_head)
+{
+ struct hns_roce_sq_db sq_db;
+
+ sq_db.u32_4 = 0;
+ sq_db.u32_8 = 0;
+
+ roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SQ_HEAD_M,
+ SQ_DB_U32_4_SQ_HEAD_S, sq_head);
+ roce_set_field(sq_db.u32_4, SQ_DB_U32_4_PORT_M, SQ_DB_U32_4_PORT_S,
+ port);
+ roce_set_field(sq_db.u32_4, SQ_DB_U32_4_SL_M, SQ_DB_U32_4_SL_S, sl);
+ roce_set_field(sq_db.u32_8, SQ_DB_U32_8_QPN_M, SQ_DB_U32_8_QPN_S, qpn);
+ roce_set_bit(sq_db.u32_8, SQ_DB_U32_8_HW_SYNC, 1);
+
+ hns_roce_write64((uint32_t *)&sq_db, ctx, ROCEE_DB_SQ_L_0_REG);
+}
+
static void hns_roce_update_cq_cons_index(struct hns_roce_context *ctx,
struct hns_roce_cq *cq)
{
@@ -126,6 +179,16 @@ static struct hns_roce_cqe *next_cqe_sw(struct hns_roce_cq *cq)
return get_sw_cqe(cq, cq->cons_index);
}
+static void *get_recv_wqe(struct hns_roce_qp *qp, int n)
+{
+ if ((n < 0) || (n > qp->rq.wqe_cnt)) {
+ printf("rq wqe index:%d,rq wqe cnt:%d\r\n", n, qp->rq.wqe_cnt);
+ return NULL;
+ }
+
+ return qp->buf.buf + qp->rq.offset + (n << qp->rq.wqe_shift);
+}
+
static void *get_send_wqe(struct hns_roce_qp *qp, int n)
{
if ((n < 0) || (n > qp->sq.wqe_cnt)) {
@@ -137,6 +200,26 @@ static void *get_send_wqe(struct hns_roce_qp *qp, int n)
(n << qp->sq.wqe_shift));
}
+static int hns_roce_wq_overflow(struct hns_roce_wq *wq, int nreq,
+ struct hns_roce_cq *cq)
+{
+ unsigned int cur;
+
+ cur = wq->head - wq->tail;
+ if (cur + nreq < wq->max_post)
+ return 0;
+
+ /* While the num of wqe exceeds cap of the device, cq will be locked */
+ pthread_spin_lock(&cq->lock);
+ cur = wq->head - wq->tail;
+ pthread_spin_unlock(&cq->lock);
+
+ printf("wq:(head = %d, tail = %d, max_post = %d), nreq = 0x%x\n",
+ wq->head, wq->tail, wq->max_post, nreq);
+
+ return cur + nreq >= wq->max_post;
+}
+
static struct hns_roce_qp *hns_roce_find_qp(struct hns_roce_context *ctx,
uint32_t qpn)
{
@@ -374,6 +457,144 @@ static int hns_roce_u_v1_arm_cq(struct ibv_cq *ibvcq, int solicited)
return 0;
}
+static int hns_roce_u_v1_post_send(struct ibv_qp *ibvqp, struct ibv_send_wr *wr,
+ struct ibv_send_wr **bad_wr)
+{
+ unsigned int ind;
+ void *wqe;
+ int nreq;
+ int ps_opcode, i;
+ int ret = 0;
+ struct hns_roce_wqe_ctrl_seg *ctrl = NULL;
+ struct hns_roce_wqe_data_seg *dseg = NULL;
+ struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+ struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+ pthread_spin_lock(&qp->sq.lock);
+
+ /* check that state is OK to post send */
+ ind = qp->sq.head;
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ if (hns_roce_wq_overflow(&qp->sq, nreq,
+ to_hr_cq(qp->ibv_qp.send_cq))) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+ if (wr->num_sge > qp->sq.max_gs) {
+ ret = -1;
+ *bad_wr = wr;
+ printf("wr->num_sge(<=%d) = %d, check failed!\r\n",
+ qp->sq.max_gs, wr->num_sge);
+ goto out;
+ }
+
+ ctrl = wqe = get_send_wqe(qp, ind & (qp->sq.wqe_cnt - 1));
+ memset(ctrl, 0, sizeof(struct hns_roce_wqe_ctrl_seg));
+
+ qp->sq.wrid[ind & (qp->sq.wqe_cnt - 1)] = wr->wr_id;
+ for (i = 0; i < wr->num_sge; i++)
+ ctrl->msg_length += wr->sg_list[i].length;
+
+
+ ctrl->flag |= ((wr->send_flags & IBV_SEND_SIGNALED) ?
+ HNS_ROCE_WQE_CQ_NOTIFY : 0) |
+ (wr->send_flags & IBV_SEND_SOLICITED ?
+ HNS_ROCE_WQE_SE : 0) |
+ ((wr->opcode == IBV_WR_SEND_WITH_IMM ||
+ wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM) ?
+ HNS_ROCE_WQE_IMM : 0) |
+ (wr->send_flags & IBV_SEND_FENCE ?
+ HNS_ROCE_WQE_FENCE : 0);
+
+ if (wr->opcode == IBV_WR_SEND_WITH_IMM ||
+ wr->opcode == IBV_WR_RDMA_WRITE_WITH_IMM)
+ ctrl->imm_data = wr->imm_data;
+
+ wqe += sizeof(struct hns_roce_wqe_ctrl_seg);
+
+ /* set remote addr segment */
+ switch (ibvqp->qp_type) {
+ case IBV_QPT_RC:
+ switch (wr->opcode) {
+ case IBV_WR_RDMA_READ:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_READ;
+ set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+ wr->wr.rdma.rkey);
+ break;
+ case IBV_WR_RDMA_WRITE:
+ case IBV_WR_RDMA_WRITE_WITH_IMM:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_RDMA_WRITE;
+ set_raddr_seg(wqe, wr->wr.rdma.remote_addr,
+ wr->wr.rdma.rkey);
+ break;
+ case IBV_WR_SEND:
+ case IBV_WR_SEND_WITH_IMM:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_SEND;
+ break;
+ case IBV_WR_ATOMIC_CMP_AND_SWP:
+ case IBV_WR_ATOMIC_FETCH_AND_ADD:
+ default:
+ ps_opcode = HNS_ROCE_WQE_OPCODE_MASK;
+ break;
+ }
+ ctrl->flag |= (ps_opcode);
+ wqe += sizeof(struct hns_roce_wqe_raddr_seg);
+ break;
+ case IBV_QPT_UC:
+ case IBV_QPT_UD:
+ default:
+ break;
+ }
+
+ dseg = wqe;
+
+ /* Inline */
+ if (wr->send_flags & IBV_SEND_INLINE && wr->num_sge) {
+ if (ctrl->msg_length > qp->max_inline_data) {
+ ret = -1;
+ *bad_wr = wr;
+ printf("inline data len(1-32)=%d, send_flags = 0x%x, check failed!\r\n",
+ wr->send_flags, ctrl->msg_length);
+ return ret;
+ }
+
+ for (i = 0; i < wr->num_sge; i++) {
+ memcpy(wqe,
+ ((void *) (uintptr_t) wr->sg_list[i].addr),
+ wr->sg_list[i].length);
+ wqe = wqe + wr->sg_list[i].length;
+ }
+
+ ctrl->flag |= HNS_ROCE_WQE_INLINE;
+ } else {
+ /* set sge */
+ for (i = 0; i < wr->num_sge; i++)
+ set_data_seg(dseg+i, wr->sg_list + i);
+
+ ctrl->flag |= wr->num_sge << HNS_ROCE_WQE_SGE_NUM_BIT;
+ }
+
+ ind++;
+ }
+
+out:
+ /* Set DB return */
+ if (likely(nreq)) {
+ qp->sq.head += nreq;
+ wmb();
+
+ hns_roce_update_sq_head(ctx, qp->ibv_qp.qp_num,
+ qp->port_num - 1, qp->sl,
+ qp->sq.head & ((qp->sq.wqe_cnt << 1) - 1));
+ }
+
+ pthread_spin_unlock(&qp->sq.lock);
+
+ return ret;
+}
+
static void __hns_roce_v1_cq_clean(struct hns_roce_cq *cq, uint32_t qpn,
struct hns_roce_srq *srq)
{
@@ -517,9 +738,102 @@ static int hns_roce_u_v1_destroy_qp(struct ibv_qp *ibqp)
return ret;
}
+static int hns_roce_u_v1_post_recv(struct ibv_qp *ibvqp, struct ibv_recv_wr *wr,
+ struct ibv_recv_wr **bad_wr)
+{
+ int ret = 0;
+ int nreq;
+ int ind;
+ struct ibv_sge *sg;
+ struct hns_roce_rc_rq_wqe *rq_wqe;
+ struct hns_roce_qp *qp = to_hr_qp(ibvqp);
+ struct hns_roce_context *ctx = to_hr_ctx(ibvqp->context);
+
+ pthread_spin_lock(&qp->rq.lock);
+
+ /* check that state is OK to post receive */
+ ind = qp->rq.head & (qp->rq.wqe_cnt - 1);
+
+ for (nreq = 0; wr; ++nreq, wr = wr->next) {
+ if (hns_roce_wq_overflow(&qp->rq, nreq,
+ to_hr_cq(qp->ibv_qp.recv_cq))) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (wr->num_sge > qp->rq.max_gs) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ rq_wqe = get_recv_wqe(qp, ind);
+ if (wr->num_sge > HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+ ret = -1;
+ *bad_wr = wr;
+ goto out;
+ }
+
+ if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM) {
+ roce_set_field(rq_wqe->u32_2,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+ HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM);
+ sg = wr->sg_list;
+
+ rq_wqe->va0 = (sg->addr);
+ rq_wqe->l_key0 = (sg->lkey);
+ rq_wqe->length0 = (sg->length);
+
+ sg = wr->sg_list + 1;
+
+ rq_wqe->va1 = (sg->addr);
+ rq_wqe->l_key1 = (sg->lkey);
+ rq_wqe->length1 = (sg->length);
+ } else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1) {
+ roce_set_field(rq_wqe->u32_2,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+ HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 1);
+ sg = wr->sg_list;
+
+ rq_wqe->va0 = (sg->addr);
+ rq_wqe->l_key0 = (sg->lkey);
+ rq_wqe->length0 = (sg->length);
+
+ } else if (wr->num_sge == HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2) {
+ roce_set_field(rq_wqe->u32_2,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_M,
+ RC_RQ_WQE_NUMBER_OF_DATA_SEG_S,
+ HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM - 2);
+ }
+
+ qp->rq.wrid[ind] = wr->wr_id;
+
+ ind = (ind + 1) & (qp->rq.wqe_cnt - 1);
+ }
+
+out:
+ if (nreq) {
+ qp->rq.head += nreq;
+
+ wmb();
+
+ hns_roce_update_rq_head(ctx, qp->ibv_qp.qp_num,
+ qp->rq.head & ((qp->rq.wqe_cnt << 1) - 1));
+ }
+
+ pthread_spin_unlock(&qp->rq.lock);
+
+ return ret;
+}
+
struct hns_roce_u_hw hns_roce_u_hw_v1 = {
.poll_cq = hns_roce_u_v1_poll_cq,
.arm_cq = hns_roce_u_v1_arm_cq,
+ .post_send = hns_roce_u_v1_post_send,
+ .post_recv = hns_roce_u_v1_post_recv,
.modify_qp = hns_roce_u_v1_modify_qp,
.destroy_qp = hns_roce_u_v1_destroy_qp,
};
diff --git a/providers/hns/hns_roce_u_hw_v1.h b/providers/hns/hns_roce_u_hw_v1.h
index b249f54..128c66f 100644
--- a/providers/hns/hns_roce_u_hw_v1.h
+++ b/providers/hns/hns_roce_u_hw_v1.h
@@ -39,9 +39,15 @@
#define HNS_ROCE_CQE_IS_SQ 0
#define HNS_ROCE_RC_WQE_INLINE_DATA_MAX_LEN 32
+#define HNS_ROCE_RC_RQ_WQE_MAX_SGE_NUM 2
enum {
+ HNS_ROCE_WQE_INLINE = 1 << 31,
+ HNS_ROCE_WQE_SE = 1 << 30,
+ HNS_ROCE_WQE_SGE_NUM_BIT = 24,
HNS_ROCE_WQE_IMM = 1 << 23,
+ HNS_ROCE_WQE_FENCE = 1 << 21,
+ HNS_ROCE_WQE_CQ_NOTIFY = 1 << 20,
HNS_ROCE_WQE_OPCODE_SEND = 0 << 16,
HNS_ROCE_WQE_OPCODE_RDMA_READ = 1 << 16,
HNS_ROCE_WQE_OPCODE_RDMA_WRITE = 2 << 16,
@@ -52,6 +58,20 @@ enum {
struct hns_roce_wqe_ctrl_seg {
__be32 sgl_pa_h;
__be32 flag;
+ __be32 imm_data;
+ __be32 msg_length;
+};
+
+struct hns_roce_wqe_data_seg {
+ __be64 addr;
+ __be32 lkey;
+ __be32 len;
+};
+
+struct hns_roce_wqe_raddr_seg {
+ __be32 rkey;
+ __be32 len;
+ __be64 raddr;
};
enum {
@@ -102,6 +122,43 @@ struct hns_roce_cq_db {
#define CQ_DB_U32_8_HW_SYNC_S 31
+struct hns_roce_rq_db {
+ unsigned int u32_4;
+ unsigned int u32_8;
+};
+
+#define RQ_DB_U32_4_RQ_HEAD_S 0
+#define RQ_DB_U32_4_RQ_HEAD_M (((1UL << 15) - 1) << RQ_DB_U32_4_RQ_HEAD_S)
+
+#define RQ_DB_U32_8_QPN_S 0
+#define RQ_DB_U32_8_QPN_M (((1UL << 24) - 1) << RQ_DB_U32_8_QPN_S)
+
+#define RQ_DB_U32_8_CMD_S 28
+#define RQ_DB_U32_8_CMD_M (((1UL << 3) - 1) << RQ_DB_U32_8_CMD_S)
+
+#define RQ_DB_U32_8_HW_SYNC_S 31
+
+struct hns_roce_sq_db {
+ unsigned int u32_4;
+ unsigned int u32_8;
+};
+
+#define SQ_DB_U32_4_SQ_HEAD_S 0
+#define SQ_DB_U32_4_SQ_HEAD_M (((1UL << 15) - 1) << SQ_DB_U32_4_SQ_HEAD_S)
+
+#define SQ_DB_U32_4_SL_S 16
+#define SQ_DB_U32_4_SL_M (((1UL << 2) - 1) << SQ_DB_U32_4_SL_S)
+
+#define SQ_DB_U32_4_PORT_S 18
+#define SQ_DB_U32_4_PORT_M (((1UL << 3) - 1) << SQ_DB_U32_4_PORT_S)
+
+#define SQ_DB_U32_4_DIRECT_WQE_S 31
+
+#define SQ_DB_U32_8_QPN_S 0
+#define SQ_DB_U32_8_QPN_M (((1UL << 24) - 1) << SQ_DB_U32_8_QPN_S)
+
+#define SQ_DB_U32_8_HW_SYNC 31
+
struct hns_roce_cqe {
unsigned int cqe_byte_4;
union {
@@ -160,4 +217,26 @@ struct hns_roce_rc_send_wqe {
unsigned int length1;
};
+struct hns_roce_rc_rq_wqe {
+ unsigned int u32_0;
+ unsigned int sgl_ba_31_0;
+ unsigned int u32_2;
+ unsigned int rvd_5;
+ unsigned int rvd_6;
+ unsigned int rvd_7;
+ unsigned int rvd_8;
+ unsigned int rvd_9;
+
+ uint64_t va0;
+ unsigned int l_key0;
+ unsigned int length0;
+
+ uint64_t va1;
+ unsigned int l_key1;
+ unsigned int length1;
+};
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_S 16
+#define RC_RQ_WQE_NUMBER_OF_DATA_SEG_M \
+ (((1UL << 6) - 1) << RC_RQ_WQE_NUMBER_OF_DATA_SEG_S)
+
#endif /* _HNS_ROCE_U_HW_V1_H */
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox