* [PATCH v6 0/9] SELinux support for Infiniband RDMA
From: Dan Jurgens @ 2016-11-23 14:17 UTC (permalink / raw)
To: chrisw-69jw2NvuJkxg9hUCZPvPmw, paul-r2n+y4ga6xFZroRs9YW3xA,
sds-+05T5uksL2qpZYMLLGbcSA, eparis-FjpueFixGhCM4zKIHC2jIg,
dledford-H+wXaHxf7aLQT0dZR+AlfA,
sean.hefty-ral2JQCrhuEAvxtiuMwx3w,
hal.rosenstock-Re5JQEeQqe8AvxtiuMwx3w
Cc: selinux-+05T5uksL2qpZYMLLGbcSA,
linux-security-module-u79uwXL29TY76Z2rM5mHXA,
linux-rdma-u79uwXL29TY76Z2rM5mHXA,
yevgenyp-VPRAkNaXOzVWk0Htik3J/w, Daniel Jurgens
From: Daniel Jurgens <danielj-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Infiniband applications access HW from user-space -- traffic is generated
directly by HW, bypassing the kernel. Consequently, Infiniband Partitions,
which are associated directly with HW transport endpoints, are a natural
choice for enforcing granular mandatory access control for Infiniband. QPs may
only send or receives packets tagged with the corresponding partition key
(PKey). The PKey is not a cryptographic key; it's a 16 bit number identifying
the partition.
Every Infiniband fabric is controlled by a central Subnet Manager (SM). The SM
provisions the partitions by assigning each port with the partitions it can
access. In addition, the SM tags each port with a subnet prefix, which
identifies the subnet. Determining which users are allowed to access which
partition keys on a given subnet forms an effective policy for isolating users
on the fabric. Any application that attempts to send traffic on a given subnet
is automatically subject to the policy, regardless of which device and port it
uses. SM software configures the subnet through a privileged Subnet Management
Interface (SMI), which is presented by each Infiniband port. Thus, the SMI must
also be controlled to prevent unauthorized changes to fabric configuration and
partitioning.
To support access control for IB partitions and subnet management, security
contexts must be provided for two new types of objects - PKeys and IB ports.
A PKey label consists of a subnet prefix and a range of PKey values and is
similar to the labeling mechanism for netports. Each Infiniband port can reside
on a different subnet. So labeling the PKey values for specific subnet prefixes
provides the user maximum flexibility, as PKey values may be determined
independently for different subnets. There is a single access vector for PKeys
called "access".
An Infiniband port is labeled by device name and port number. There is a single
access vector for IB ports called "manage_subnet".
Because RDMA allows kernel bypass, enforcement must be done during connection
setup. Communication over RDMA requires a send and receive queue, collectively
known as a Queue Pair (QP). A QP must be initialized by privileged system calls
before it can be used to send or receive data. During initialization the user
must provide the PKey and port the QP will use; at this time access control can
be enforced.
Because there is a possibility that the enforcement settings or security
policy can change, a means of notifying the ib_core module of such changes
is required. To facilitate this a generic notification callback mechanism
is added to the LSM. One callback is registered for checking the QP PKey
associations when the policy changes. Mad agents also register a callback,
they cache the permission to send and receive SMPs to avoid another per
packet call to the LSM.
Because frequent accesses to the same PKey's SID is expected a cache is
implemented which is very similar to the netport cache.
In order to properly enforce security when changes to the PKey table or
security policy or enforcement occur ib_core must track which QPs are
using which port, pkey index, and alternate path for every IB device.
This makes operations that used to be atomic transactional.
When modifying a QP, ib_core must associate it with the PKey index, port,
and alternate path specified. If the QP was already associated with
different settings, the QP is added to the new list prior to the
modification. If the modify succeeds then the old listing is removed. If
the modify fails the new listing is removed and the old listing remains
unchanged.
When destroying a QP the ib_qp structure is freed by the decive specific
driver (i.e. mlx4_ib) if the 'destroy' is successful. This requires storing
security related information in a separate structure. When a 'destroy'
request is in process the ib_qp structure is in an undefined state so if
there are changes to the security policy or PKey table, the security checks
cannot reset the QP if it doesn't have permission for the new setting. If
the 'destroy' fails, security for that QP must be enforced again and its
status in the list is restored. If the 'destroy' succeeds the security info
can be cleaned up and freed.
There are a number of locks required to protect the QP security structure
and the QP to device/port/pkey index lists. If multiple locks are required,
the safe locking order is: QP security structure mutex first, followed by
any list locks needed, which are sorted first by port followed by pkey
index.
---
v2:
- Use void* blobs in the LSM hooks. Paul Moore
- Make the policy change callback generic. Yuval Shaia, Paul Moore
- Squash LSM changes into the patches where the calls are added. Paul Moore
- Don't add new initial SIDs. Stephen Smalley
- Squash MAD agent PKey and SMI patches and move logic to IB security. Dan Jurgens
- Changed ib_end_port to ib_port. Paul Moore
- Changed ib_port access vector from smp to manage_subnet. Paul Moore
- Added pkey and ib_port details to the audit log. Paul Moore
- See individual patches for more detail.
v3:
- ib_port -> ib_endport. Paul Moore
- use notifier chains for LSM notifications. Paul Moore
- reorder parameters in hooks to put security blob first. Paul Moore
- Don't treat device name as untrusted string in audit log. Paul Moore
v4:
- Added separate AVC callback for LSM notifier. Paul Moore
- Removed unneeded braces in ocontext_read. Paul Moore
v5:
- Fix link error when CONFIG_SECURITY is not set. Build Robot
- Strip issue and Gerrit-Id: Leon Romanovsky
v6:
- Whitespace and bracket cleanup. James Morris
- Cleanup error flow in sel_pkey_sid_slow. James Morris
Daniel Jurgens (9):
IB/core: IB cache enhancements to support Infiniband security
IB/core: Enforce PKey security on QPs
selinux lsm IB/core: Implement LSM notification system
IB/core: Enforce security on management datagrams
selinux: Create policydb version for Infiniband support
selinux: Allocate and free infiniband security hooks
selinux: Implement Infiniband PKey "Access" access vector
selinux: Add IB Port SMP access vector
selinux: Add a cache for quicker retreival of PKey SIDs
drivers/infiniband/core/Makefile | 3 +-
drivers/infiniband/core/cache.c | 57 ++-
drivers/infiniband/core/core_priv.h | 115 ++++++
drivers/infiniband/core/device.c | 86 +++++
drivers/infiniband/core/mad.c | 52 ++-
drivers/infiniband/core/security.c | 709 +++++++++++++++++++++++++++++++++++
drivers/infiniband/core/uverbs_cmd.c | 20 +-
drivers/infiniband/core/verbs.c | 27 +-
include/linux/lsm_audit.h | 15 +
include/linux/lsm_hooks.h | 35 ++
include/linux/security.h | 50 +++
include/rdma/ib_mad.h | 4 +
include/rdma/ib_verbs.h | 49 +++
security/Kconfig | 9 +
security/lsm_audit.c | 16 +
security/security.c | 59 +++
security/selinux/Makefile | 2 +-
security/selinux/hooks.c | 86 ++++-
security/selinux/ibpkey.c | 245 ++++++++++++
security/selinux/include/classmap.h | 4 +
security/selinux/include/ibpkey.h | 31 ++
security/selinux/include/objsec.h | 11 +
security/selinux/include/security.h | 7 +-
security/selinux/selinuxfs.c | 2 +
security/selinux/ss/policydb.c | 129 ++++++-
security/selinux/ss/policydb.h | 27 +-
security/selinux/ss/services.c | 81 ++++
27 files changed, 1886 insertions(+), 45 deletions(-)
create mode 100644 drivers/infiniband/core/security.c
create mode 100644 security/selinux/ibpkey.c
create mode 100644 security/selinux/include/ibpkey.h
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [bug report] qedr: Add support for QP verbs
From: Dan Carpenter @ 2016-11-23 10:58 UTC (permalink / raw)
To: Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hello Ram Amrani,
The patch cecbcddf6461: "qedr: Add support for QP verbs" from Oct 10,
2016, leads to the following static checker warning:
drivers/infiniband/hw/qedr/verbs.c:1494 qedr_create_qp()
warn: possible memory leak of 'qp'
drivers/infiniband/hw/qedr/verbs.c
1484
1485 rc = qedr_check_qp_attrs(ibpd, dev, attrs);
1486 if (rc)
1487 return ERR_PTR(rc);
1488
1489 qp = kzalloc(sizeof(*qp), GFP_KERNEL);
1490 if (!qp)
1491 return ERR_PTR(-ENOMEM);
1492
1493 if (attrs->srq)
1494 return ERR_PTR(-EINVAL);
You should move this in front of the allocation to avoid the memory
leak.
1495
1496 DP_DEBUG(dev, QEDR_MSG_QP,
1497 "create qp: sq_cq=%p, sq_icid=%d, rq_cq=%p, rq_icid=%d\n",
1498 get_qedr_cq(attrs->send_cq),
1499 get_qedr_cq(attrs->send_cq)->icid,
1500 get_qedr_cq(attrs->recv_cq),
1501 get_qedr_cq(attrs->recv_cq)->icid);
1502
1503 qedr_set_qp_init_params(dev, qp, pd, attrs);
1504
1505 if (attrs->qp_type == IB_QPT_GSI) {
1506 if (udata) {
1507 DP_ERR(dev,
1508 "create qp: unexpected udata when creating GSI QP\n");
1509 goto err0;
Ugh... GW-BASIC style numbered labels... What does goto err0 do???
Imagine if instead of function names we should use numbers like:
one();
two();
five();
Use a meaningful label names like "goto free_qp;"
1510 }
1511 return qedr_create_gsi_qp(dev, attrs, qp);
We should free qp if qedr_create_gsi_qp() fails as well.
1512 }
1513
1514 memset(&in_params, 0, sizeof(in_params));
1515
1516 if (udata) {
1517 if (!(udata && ibpd->uobject && ibpd->uobject->context))
1518 goto err0;
1519
regards,
dan carpenter
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] IB/iser: extended CDB support
From: Vladimir Neyelov @ 2016-11-23 8:52 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA, sagi-NQWnxTmZq1alnMjI0IkVqw,
maxg-VPRAkNaXOzVWk0Htik3J/w
Cc: Vladimir Neyelov
Support of vendor specific CDBs in iser. SCSI supports max command size
SCSI_MAX_VARLEN_CDB_SIZE (260 bytes). ISER currently supports max scsi
command 16 bytes. This commit changes max scsi command for ISER
(to align with iscsi/tcp) to SCSI_MAX_VARLEN_CDB_SIZE.
Signed-off-by: Vladimir Neyelov <vladimirn-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
drivers/infiniband/ulp/iser/iscsi_iser.c | 17 +++++++++++++----
drivers/infiniband/ulp/iser/iscsi_iser.h | 1 +
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c b/drivers/infiniband/ulp/iser/iscsi_iser.c
index 64b3d11..865ce48 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.c
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
@@ -163,7 +163,8 @@ iscsi_iser_pdu_alloc(struct iscsi_task *task, uint8_t opcode)
struct iscsi_iser_task *iser_task = task->dd_data;
task->hdr = (struct iscsi_hdr *)&iser_task->desc.iscsi_header;
- task->hdr_max = sizeof(iser_task->desc.iscsi_header);
+ task->hdr_max = sizeof(iser_task->desc.iscsi_header) +
+ sizeof(iser_task->desc.iscsi_ecdb_header);
return 0;
}
@@ -189,6 +190,14 @@ iser_initialize_task_headers(struct iscsi_task *task,
u64 dma_addr;
const bool mgmt_task = !task->sc && !in_interrupt();
int ret = 0;
+ int headers_len = ISER_HEADERS_LEN;
+ struct scsi_cmnd *sc = task->sc;
+
+ if(sc) {
+ if(sc->cmd_len > ISCSI_CDB_SIZE)
+ headers_len += offsetof(struct iscsi_ecdb_ahdr, ecdb) +
+ (sc->cmd_len - ISCSI_CDB_SIZE);
+ }
if (unlikely(mgmt_task))
mutex_lock(&iser_conn->state_mutex);
@@ -199,7 +208,7 @@ iser_initialize_task_headers(struct iscsi_task *task,
}
dma_addr = ib_dma_map_single(device->ib_device, (void *)tx_desc,
- ISER_HEADERS_LEN, DMA_TO_DEVICE);
+ headers_len, DMA_TO_DEVICE );
if (ib_dma_mapping_error(device->ib_device, dma_addr)) {
ret = -ENOMEM;
goto out;
@@ -209,7 +218,7 @@ iser_initialize_task_headers(struct iscsi_task *task,
tx_desc->mapped = true;
tx_desc->dma_addr = dma_addr;
tx_desc->tx_sg[0].addr = tx_desc->dma_addr;
- tx_desc->tx_sg[0].length = ISER_HEADERS_LEN;
+ tx_desc->tx_sg[0].length = headers_len;
tx_desc->tx_sg[0].lkey = device->pd->local_dma_lkey;
iser_task->iser_conn = iser_conn;
@@ -623,7 +632,7 @@ iscsi_iser_session_create(struct iscsi_endpoint *ep,
shost->max_lun = iscsi_max_lun;
shost->max_id = 0;
shost->max_channel = 0;
- shost->max_cmd_len = 16;
+ shost->max_cmd_len = SCSI_MAX_VARLEN_CDB_SIZE;
/*
* older userspace tools (before 2.0-870) did not pass us
diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h b/drivers/infiniband/ulp/iser/iscsi_iser.h
index 0be6a7c..e1ae9b8 100644
--- a/drivers/infiniband/ulp/iser/iscsi_iser.h
+++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
@@ -254,6 +254,7 @@ enum iser_desc_type {
struct iser_tx_desc {
struct iser_ctrl iser_header;
struct iscsi_hdr iscsi_header;
+ struct iscsi_ecdb_ahdr iscsi_ecdb_header;
enum iser_desc_type type;
u64 dma_addr;
struct ib_sge tx_sg[2];
--
2.4.3
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* Re: Enabling peer to peer device transactions for PCIe devices
From: Christian König @ 2016-11-23 8:51 UTC (permalink / raw)
To: Dan Williams, Serguei Sagalovitch, Dave Hansen,
linux-nvdimm@lists.01.org, linux-rdma@vger.kernel.org,
linux-pci@vger.kernel.org, Kuehling, Felix,
linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
Sander, Ben, Suthikulpanit, Suravee, Deucher, Alexander,
Blinzer, Paul, Linux-media@vger.kernel.org
In-Reply-To: <20161123074902.ph7a5cmlw3pclugx@phenom.ffwll.local>
Am 23.11.2016 um 08:49 schrieb Daniel Vetter:
> On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote:
>> On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
>>> On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
>>> <serguei.sagalovitch@amd.com> wrote:
>>>> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
>>>>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams@intel.com>
>>>>> wrote:
>>>>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
>>>>>> <serguei.sagalovitch@amd.com> wrote:
>>>>>>> I personally like "device-DAX" idea but my concerns are:
>>>>>>>
>>>>>>> - How well it will co-exists with the DRM infrastructure /
>>>>>>> implementations
>>>>>>> in part dealing with CPU pointers?
>>>>>> Inside the kernel a device-DAX range is "just memory" in the sense
>>>>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
>>>>>> not migratable. To be honest I do not know how well that co-exists
>>>>>> with drm infrastructure.
>>>>>>
>>>>>>> - How well we will be able to handle case when we need to
>>>>>>> "move"/"evict"
>>>>>>> memory/data to the new location so CPU pointer should point to the
>>>>>>> new
>>>>>>> physical location/address
>>>>>>> (and may be not in PCI device memory at all)?
>>>>>> So, device-DAX deliberately avoids support for in-kernel migration or
>>>>>> overcommit. Those cases are left to the core mm or drm. The device-dax
>>>>>> interface is for cases where all that is needed is a direct-mapping to
>>>>>> a statically-allocated physical-address range be it persistent memory
>>>>>> or some other special reserved memory range.
>>>>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
>>>>> pull off) I think we want all the magic in core mm, i.e. migration and
>>>>> overcommit. At least that seems to be the very strong drive in all
>>>>> general-purpose gpu abstractions and implementations, where memory is
>>>>> allocated with malloc, and then mapped/moved into vram/gpu address
>>>>> space through some magic,
>>>> It is possible that there is other way around: memory is requested to be
>>>> allocated and should be kept in vram for performance reason but due
>>>> to possible overcommit case we need at least temporally to "move" such
>>>> allocation to system memory.
>>> With migration I meant migrating both ways of course. And with stuff
>>> like numactl we can also influence where exactly the malloc'ed memory
>>> is allocated originally, at least if we'd expose the vram range as a
>>> very special numa node that happens to be far away and not hold any
>>> cpu cores.
>> I don't think we should be using numa distance to reverse engineer a
>> certain allocation behavior. The latency data should be truthful, but
>> you're right we'll need a mechanism to keep general purpose
>> allocations out of that range by default. Btw, strict isolation is
>> another design point of device-dax, but I think in this case we're
>> describing something between the two extremes of full isolation and
>> full compatibility with existing numactl apis.
> Yes, agreed. My idea with exposing vram sections using numa nodes wasn't
> to reuse all the existing allocation policies directly, those won't work.
> So at boot-up your default numa policy would exclude any vram nodes.
>
> But I think (as an -mm layman) that numa gives us a lot of the tools and
> policy interface that we need to implement what we want for gpus.
Agree completely. From a ten mile high view our GPUs are just command
processors with local memory as well .
Basically this is also the whole idea of what AMD is pushing with HSA
for a while.
It's just that a lot of problems start to pop up when you look at all
the nasty details. For example only part of the GPU memory is usually
accessible by the CPU.
So even when numa nodes expose a good foundation for this I think there
is still a lot of code to write.
BTW: I should probably start to read into the numa code of the kernel.
Any good pointers for that?
Regards,
Christian.
> Wrt isolation: There's a sliding scale of what different users expect,
> from full auto everything, including migrating pages around if needed to
> full isolation all seems to be on the table. As long as we keep vram nodes
> out of any default allocation numasets, full isolation should be possible.
> -Daniel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply
* RE: [PATCH] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-23 8:05 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: Elior, Ariel, Kalderon, Michal, Mintz, Yuval,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> The recent introduction of qedr driver support in qede causes a GPF when
> probing the driver in a server without a RoCE enabled QLogic NIC. This fix avoids
> using an uninitialized pointer in such a case. Caught by the kernel test robot.
>
> ...
Hi Doug,
Note that I've previously sent this patch wrongfully as [PATCH rdma-core].
I guess this made you miss it.
It's an important bug fix that I hope will make it into 4.9.
Thanks,
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-23 8:03 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: Elior, Ariel, Kalderon, Michal, Mintz, Yuval,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
The recent introduction of qedr driver support in qede causes a GPF when probing the driver in a server without a RoCE enabled QLogic NIC. This fix avoids using an uninitialized pointer in such a case. Caught by the kernel test robot.
Signed-off-by: Ram Amrani <Ram.Amrani-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
---
drivers/net/ethernet/qlogic/qede/qede_roce.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_roce.c b/drivers/net/ethernet/qlogic/qede/qede_roce.c
index 9867f96..4927271 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_roce.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_roce.c
@@ -191,8 +191,8 @@ int qede_roce_register_driver(struct qedr_driver *drv)
}
mutex_unlock(&qedr_dev_list_lock);
- DP_INFO(edev, "qedr: discovered and registered %d RoCE funcs\n",
- qedr_counter);
+ pr_notice("qedr: discovered and registered %d RoCE funcs\n",
+ qedr_counter);
return 0;
}
--
1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* RE: [PATCH] qede: fix general protection fault may occur on probe
From: Amrani, Ram @ 2016-11-23 8:01 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org
Cc: Elior, Ariel, Kalderon, Michal, Mintz, Yuval,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> The recent introduction of qedr driver support in qede causes a GPF when
> probing the driver in a server without a RoCE enabled QLogic NIC. This fix avoids
> using an uninitialized pointer in such a case. Caught by the kernel test robot.
>
> ...
This e-mail was wrongly sent as HTML.
I'll resend correctly.
Ram
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: Enabling peer to peer device transactions for PCIe devices
From: Daniel Vetter @ 2016-11-23 7:49 UTC (permalink / raw)
To: Dan Williams
Cc: Dave Hansen, Suthikulpanit, Suravee,
linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Kuehling, Felix, Serguei Sagalovitch, Blinzer, Paul,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org,
Sander, Ben, Daniel Vetter, Deucher, Alexander, Koenig, Christian,
Linux-media-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <CAPcyv4ind0fxek7g25MX=49rDfT5X151tb4=TYudMBmUJFZZNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Tue, Nov 22, 2016 at 01:21:03PM -0800, Dan Williams wrote:
> On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter <daniel-/w4YWyX8dFk@public.gmane.org> wrote:
> > On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch
> > <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
> >>
> >> On 2016-11-22 03:10 PM, Daniel Vetter wrote:
> >>>
> >>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams <dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> >>> wrote:
> >>>>
> >>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch
> >>>> <serguei.sagalovitch-5C7GfCeVMHo@public.gmane.org> wrote:
> >>>>>
> >>>>> I personally like "device-DAX" idea but my concerns are:
> >>>>>
> >>>>> - How well it will co-exists with the DRM infrastructure /
> >>>>> implementations
> >>>>> in part dealing with CPU pointers?
> >>>>
> >>>> Inside the kernel a device-DAX range is "just memory" in the sense
> >>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is
> >>>> not migratable. To be honest I do not know how well that co-exists
> >>>> with drm infrastructure.
> >>>>
> >>>>> - How well we will be able to handle case when we need to
> >>>>> "move"/"evict"
> >>>>> memory/data to the new location so CPU pointer should point to the
> >>>>> new
> >>>>> physical location/address
> >>>>> (and may be not in PCI device memory at all)?
> >>>>
> >>>> So, device-DAX deliberately avoids support for in-kernel migration or
> >>>> overcommit. Those cases are left to the core mm or drm. The device-dax
> >>>> interface is for cases where all that is needed is a direct-mapping to
> >>>> a statically-allocated physical-address range be it persistent memory
> >>>> or some other special reserved memory range.
> >>>
> >>> For some of the fancy use-cases (e.g. to be comparable to what HMM can
> >>> pull off) I think we want all the magic in core mm, i.e. migration and
> >>> overcommit. At least that seems to be the very strong drive in all
> >>> general-purpose gpu abstractions and implementations, where memory is
> >>> allocated with malloc, and then mapped/moved into vram/gpu address
> >>> space through some magic,
> >>
> >> It is possible that there is other way around: memory is requested to be
> >> allocated and should be kept in vram for performance reason but due
> >> to possible overcommit case we need at least temporally to "move" such
> >> allocation to system memory.
> >
> > With migration I meant migrating both ways of course. And with stuff
> > like numactl we can also influence where exactly the malloc'ed memory
> > is allocated originally, at least if we'd expose the vram range as a
> > very special numa node that happens to be far away and not hold any
> > cpu cores.
>
> I don't think we should be using numa distance to reverse engineer a
> certain allocation behavior. The latency data should be truthful, but
> you're right we'll need a mechanism to keep general purpose
> allocations out of that range by default. Btw, strict isolation is
> another design point of device-dax, but I think in this case we're
> describing something between the two extremes of full isolation and
> full compatibility with existing numactl apis.
Yes, agreed. My idea with exposing vram sections using numa nodes wasn't
to reuse all the existing allocation policies directly, those won't work.
So at boot-up your default numa policy would exclude any vram nodes.
But I think (as an -mm layman) that numa gives us a lot of the tools and
policy interface that we need to implement what we want for gpus.
Wrt isolation: There's a sliding scale of what different users expect,
from full auto everything, including migrating pages around if needed to
full isolation all seems to be on the table. As long as we keep vram nodes
out of any default allocation numasets, full isolation should be possible.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply
* [PATCH rdma-next 5/5] IB/mlx5: Make create/destroy address handle available to userspace
From: Leon Romanovsky @ 2016-11-23 6:23 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Moni Shoua
In-Reply-To: <1479882206-31212-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Advertise that create_ah and destroy_ah verbs are accessible from
uverbs interface.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 527e4f5..ce5c350 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -2993,6 +2993,8 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
(1ull << IB_USER_VERBS_CMD_QUERY_PORT) |
(1ull << IB_USER_VERBS_CMD_ALLOC_PD) |
(1ull << IB_USER_VERBS_CMD_DEALLOC_PD) |
+ (1ull << IB_USER_VERBS_CMD_CREATE_AH) |
+ (1ull << IB_USER_VERBS_CMD_DESTROY_AH) |
(1ull << IB_USER_VERBS_CMD_REG_MR) |
(1ull << IB_USER_VERBS_CMD_REREG_MR) |
(1ull << IB_USER_VERBS_CMD_DEREG_MR) |
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 4/5] IB/mlx5: Use kernel driver to help userspace create address handle
From: Leon Romanovsky @ 2016-11-23 6:23 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Moni Shoua
In-Reply-To: <1479882206-31212-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Resolving a MAC address for a given IP address in userspace is inefficient.
This patch lets mlx5 user driver using the kernel driver to resolve the mac
and get the answer in the private section of the response.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/ah.c | 21 +++++++++++++++++++++
include/uapi/rdma/mlx5-abi.h | 6 ++++++
2 files changed, 27 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index ecac9ea..d090e96 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -77,6 +77,27 @@ struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
if (ll == IB_LINK_LAYER_ETHERNET && !(ah_attr->ah_flags & IB_AH_GRH))
return ERR_PTR(-EINVAL);
+ if (ll == IB_LINK_LAYER_ETHERNET && udata) {
+ int err;
+ struct mlx5_ib_create_ah_resp resp = {};
+ u32 min_resp_len = offsetof(typeof(resp), dmac) +
+ sizeof(resp.dmac);
+
+ if (udata->outlen < min_resp_len)
+ return ERR_PTR(-EINVAL);
+
+ resp.response_length = min_resp_len;
+
+ err = ib_resolve_eth_dmac(pd->device, ah_attr);
+ if (err)
+ return ERR_PTR(err);
+
+ memcpy(resp.dmac, ah_attr->dmac, ETH_ALEN);
+ err = ib_copy_to_udata(udata, &resp, resp.response_length);
+ if (err)
+ return ERR_PTR(err);
+ }
+
ah = kzalloc(sizeof(*ah), GFP_ATOMIC);
if (!ah)
return ERR_PTR(-ENOMEM);
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index ef05a98..1b7326f 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -233,6 +233,12 @@ struct mlx5_ib_create_wq {
__u32 reserved;
};
+struct mlx5_ib_create_ah_resp {
+ __u32 response_length;
+ __u8 dmac[ETH_ALEN];
+ __u8 reserved[6];
+};
+
struct mlx5_ib_create_wq_resp {
__u32 response_length;
__u32 reserved;
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 3/5] IB/core: Let the verb create_ah return extended response to user
From: Leon Romanovsky @ 2016-11-23 6:23 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Moni Shoua, Knut Omang
In-Reply-To: <1479882206-31212-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Add struct ib_udata to the signature of create_ah callback that is
implemented by IB device drivers. This allows HW drivers to return extra
data to the userspace library.
This patch prepares the ground for mlx5 driver to resolve destination
mac address for a given GID and return it to userspace.
This patch was previously submitted by Knut Omang as a part of the
patch set to support Oracle's Infiniband HCA (SIF).
Signed-off-by: Knut Omang <knut.omang-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/uverbs_cmd.c | 11 ++++++++++-
drivers/infiniband/core/verbs.c | 2 +-
drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 ++-
drivers/infiniband/hw/cxgb4/provider.c | 4 +++-
drivers/infiniband/hw/hns/hns_roce_ah.c | 3 ++-
drivers/infiniband/hw/hns/hns_roce_device.h | 3 ++-
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 4 +++-
drivers/infiniband/hw/mlx4/ah.c | 4 +++-
drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 ++-
drivers/infiniband/hw/mlx5/ah.c | 4 +++-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 ++-
drivers/infiniband/hw/mthca/mthca_provider.c | 4 +++-
drivers/infiniband/hw/nes/nes_verbs.c | 3 ++-
drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 3 ++-
drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 4 +++-
drivers/infiniband/hw/qedr/verbs.c | 3 ++-
drivers/infiniband/hw/qedr/verbs.h | 3 ++-
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 4 +++-
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 4 +++-
drivers/infiniband/sw/rxe/rxe_verbs.c | 4 +++-
include/rdma/ib_verbs.h | 3 ++-
21 files changed, 58 insertions(+), 21 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 790af84..1bebc66 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2877,6 +2877,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
struct ib_ah *ah;
struct ib_ah_attr attr;
int ret;
+ struct ib_udata udata;
if (out_len < sizeof resp)
return -ENOSPC;
@@ -2884,6 +2885,10 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
if (copy_from_user(&cmd, buf, sizeof cmd))
return -EFAULT;
+ INIT_UDATA(&udata, buf + sizeof(cmd),
+ (unsigned long)cmd.response + sizeof(resp),
+ in_len - sizeof(cmd), out_len - sizeof(resp));
+
uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
if (!uobj)
return -ENOMEM;
@@ -2910,12 +2915,16 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
memset(&attr.dmac, 0, sizeof(attr.dmac));
memcpy(attr.grh.dgid.raw, cmd.attr.grh.dgid, 16);
- ah = ib_create_ah(pd, &attr);
+ ah = pd->device->create_ah(pd, &attr, &udata);
+
if (IS_ERR(ah)) {
ret = PTR_ERR(ah);
goto err_put;
}
+ ah->device = pd->device;
+ ah->pd = pd;
+ atomic_inc(&pd->usecnt);
ah->uobject = uobj;
uobj->object = ah;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 83d01ef..650ce27 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -315,7 +315,7 @@ struct ib_ah *ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
{
struct ib_ah *ah;
- ah = pd->device->create_ah(pd, ah_attr);
+ ah = pd->device->create_ah(pd, ah_attr, NULL);
if (!IS_ERR(ah)) {
ah->device = pd->device;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index cba57bb..9d5fe18 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -62,7 +62,8 @@
#include "common.h"
static struct ib_ah *iwch_ah_create(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr)
+ struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
{
return ERR_PTR(-ENOSYS);
}
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 645e606..49b51b7 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -59,7 +59,9 @@ module_param(fastreg_support, int, 0644);
MODULE_PARM_DESC(fastreg_support, "Advertise fastreg support (default=1)");
static struct ib_ah *c4iw_ah_create(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr)
+ struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
+
{
return ERR_PTR(-ENOSYS);
}
diff --git a/drivers/infiniband/hw/hns/hns_roce_ah.c b/drivers/infiniband/hw/hns/hns_roce_ah.c
index 24f79ee..0ac294d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_ah.c
+++ b/drivers/infiniband/hw/hns/hns_roce_ah.c
@@ -39,7 +39,8 @@
#define HNS_ROCE_VLAN_SL_BIT_MASK 7
#define HNS_ROCE_VLAN_SL_SHIFT 13
-struct ib_ah *hns_roce_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *ah_attr)
+struct ib_ah *hns_roce_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
{
struct hns_roce_dev *hr_dev = to_hr_dev(ibpd->device);
struct device *dev = &hr_dev->pdev->dev;
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..470615f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -667,7 +667,8 @@ int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
unsigned long obj, int cnt);
-struct ib_ah *hns_roce_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
+struct ib_ah *hns_roce_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata);
int hns_roce_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
int hns_roce_destroy_ah(struct ib_ah *ah);
diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
index 6329c97..f03fc15 100644
--- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c
+++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c
@@ -2562,7 +2562,9 @@ static int i40iw_query_pkey(struct ib_device *ibdev,
* @ah_attr: address handle attributes
*/
static struct ib_ah *i40iw_create_ah(struct ib_pd *ibpd,
- struct ib_ah_attr *attr)
+ struct ib_ah_attr *attr,
+ struct ib_udata *udata)
+
{
return ERR_PTR(-ENOSYS);
}
diff --git a/drivers/infiniband/hw/mlx4/ah.c b/drivers/infiniband/hw/mlx4/ah.c
index 5fc6233..0b288a7 100644
--- a/drivers/infiniband/hw/mlx4/ah.c
+++ b/drivers/infiniband/hw/mlx4/ah.c
@@ -124,7 +124,9 @@ static struct ib_ah *create_iboe_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr
return &ah->ibah;
}
-struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
+
{
struct mlx4_ib_ah *ah;
struct ib_ah *ret;
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 35141f4..7f3d976 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -742,7 +742,8 @@ int mlx4_ib_arm_cq(struct ib_cq *cq, enum ib_cq_notify_flags flags);
void __mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq);
void mlx4_ib_cq_clean(struct mlx4_ib_cq *cq, u32 qpn, struct mlx4_ib_srq *srq);
-struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
+struct ib_ah *mlx4_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata);
int mlx4_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
int mlx4_ib_destroy_ah(struct ib_ah *ah);
diff --git a/drivers/infiniband/hw/mlx5/ah.c b/drivers/infiniband/hw/mlx5/ah.c
index 745efa4..ecac9ea 100644
--- a/drivers/infiniband/hw/mlx5/ah.c
+++ b/drivers/infiniband/hw/mlx5/ah.c
@@ -64,7 +64,9 @@ static struct ib_ah *create_ib_ah(struct mlx5_ib_dev *dev,
return &ah->ibah;
}
-struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
+
{
struct mlx5_ib_ah *ah;
struct mlx5_ib_dev *dev = to_mdev(pd->device);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d5d0077..b67eaf0 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -737,7 +737,8 @@ void mlx5_ib_free_srq_wqe(struct mlx5_ib_srq *srq, int wqe_index);
int mlx5_MAD_IFC(struct mlx5_ib_dev *dev, int ignore_mkey, int ignore_bkey,
u8 port, const struct ib_wc *in_wc, const struct ib_grh *in_grh,
const void *in_mad, void *response_mad);
-struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr);
+struct ib_ah *mlx5_ib_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata);
int mlx5_ib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr);
int mlx5_ib_destroy_ah(struct ib_ah *ah);
struct ib_srq *mlx5_ib_create_srq(struct ib_pd *pd,
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 358930a4..d317087 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -410,7 +410,9 @@ static int mthca_dealloc_pd(struct ib_pd *pd)
}
static struct ib_ah *mthca_ah_create(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr)
+ struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
+
{
int err;
struct mthca_ah *ah;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index bd69125..0bb857c 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -771,7 +771,8 @@ static int nes_dealloc_pd(struct ib_pd *ibpd)
/**
* nes_create_ah
*/
-static struct ib_ah *nes_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr)
+static struct ib_ah *nes_create_ah(struct ib_pd *pd, struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
{
return ERR_PTR(-ENOSYS);
}
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
index 797362a..14d33b0 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.c
@@ -154,7 +154,8 @@ static inline int set_av_attr(struct ocrdma_dev *dev, struct ocrdma_ah *ah,
return status;
}
-struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
+struct ib_ah *ocrdma_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr,
+ struct ib_udata *udata)
{
u32 *ahid_addr;
int status;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
index 3856dd4..0704a24 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_ah.h
@@ -50,7 +50,9 @@ enum {
OCRDMA_AH_L3_TYPE_MASK = 0x03,
OCRDMA_AH_L3_TYPE_SHIFT = 0x1D /* 29 bits */
};
-struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *);
+
+struct ib_ah *ocrdma_create_ah(struct ib_pd *, struct ib_ah_attr *,
+ struct ib_udata *);
int ocrdma_destroy_ah(struct ib_ah *);
int ocrdma_query_ah(struct ib_ah *, struct ib_ah_attr *);
int ocrdma_modify_ah(struct ib_ah *, struct ib_ah_attr *);
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index a615142..ccff6c6 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -2094,7 +2094,8 @@ int qedr_destroy_qp(struct ib_qp *ibqp)
return rc;
}
-struct ib_ah *qedr_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
+struct ib_ah *qedr_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr,
+ struct ib_udata *udata)
{
struct qedr_ah *ah;
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index a9b5e67..070677c 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -70,7 +70,8 @@ int qedr_query_qp(struct ib_qp *, struct ib_qp_attr *qp_attr,
int qp_attr_mask, struct ib_qp_init_attr *);
int qedr_destroy_qp(struct ib_qp *ibqp);
-struct ib_ah *qedr_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr);
+struct ib_ah *qedr_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr,
+ struct ib_udata *udata);
int qedr_destroy_ah(struct ib_ah *ibah);
int qedr_dereg_mr(struct ib_mr *);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index a5bfbba..fd2a50e 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -738,7 +738,9 @@ int usnic_ib_mmap(struct ib_ucontext *context,
/* In ib callbacks section - Start of stub funcs */
struct ib_ah *usnic_ib_create_ah(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr)
+ struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata)
+
{
usnic_dbg("\n");
return ERR_PTR(-EPERM);
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index 0d9d2e6a..0ed8e07 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -75,7 +75,9 @@ int usnic_ib_dealloc_ucontext(struct ib_ucontext *ibcontext);
int usnic_ib_mmap(struct ib_ucontext *context,
struct vm_area_struct *vma);
struct ib_ah *usnic_ib_create_ah(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr);
+ struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata);
+
int usnic_ib_destroy_ah(struct ib_ah *ah);
int usnic_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
struct ib_send_wr **bad_wr);
diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
index 19841c8..187d85c 100644
--- a/drivers/infiniband/sw/rxe/rxe_verbs.c
+++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
@@ -316,7 +316,9 @@ static int rxe_init_av(struct rxe_dev *rxe, struct ib_ah_attr *attr,
return err;
}
-static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr)
+static struct ib_ah *rxe_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *attr,
+ struct ib_udata *udata)
+
{
int err;
struct rxe_dev *rxe = to_rdev(ibpd->device);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a2f21e0..9bc3812 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1933,7 +1933,8 @@ struct ib_device {
struct ib_udata *udata);
int (*dealloc_pd)(struct ib_pd *pd);
struct ib_ah * (*create_ah)(struct ib_pd *pd,
- struct ib_ah_attr *ah_attr);
+ struct ib_ah_attr *ah_attr,
+ struct ib_udata *udata);
int (*modify_ah)(struct ib_ah *ah,
struct ib_ah_attr *ah_attr);
int (*query_ah)(struct ib_ah *ah,
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 2/5] IB/mlx5: Report that device supports user data response in create_ah
From: Leon Romanovsky @ 2016-11-23 6:23 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Moni Shoua
In-Reply-To: <1479882206-31212-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
To make mlx5 user driver aware of whether kernel driver returns dmac
in user data response add a new flag that will be returned back to
user-space through alloc_ucontext.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/hw/mlx5/main.c | 3 ++-
include/uapi/rdma/mlx5-abi.h | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 8e0dbd5..527e4f5 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1093,7 +1093,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
resp.response_length += sizeof(resp.cqe_version);
if (field_avail(typeof(resp), cmds_supp_uhw, udata->outlen)) {
- resp.cmds_supp_uhw |= MLX5_USER_CMDS_SUPP_UHW_QUERY_DEVICE;
+ resp.cmds_supp_uhw |= MLX5_USER_CMDS_SUPP_UHW_QUERY_DEVICE |
+ MLX5_USER_CMDS_SUPP_UHW_CREATE_AH;
resp.response_length += sizeof(resp.cmds_supp_uhw);
}
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index f5d0f4e..ef05a98 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -82,6 +82,7 @@ enum mlx5_ib_alloc_ucontext_resp_mask {
enum mlx5_user_cmds_supp_uhw {
MLX5_USER_CMDS_SUPP_UHW_QUERY_DEVICE = 1 << 0,
+ MLX5_USER_CMDS_SUPP_UHW_CREATE_AH = 1 << 1,
};
struct mlx5_ib_alloc_ucontext_resp {
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 1/5] IB/core: Enhance ib_resolve_eth_dmac to be usable for creating AH
From: Leon Romanovsky @ 2016-11-23 6:23 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA
Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, Moni Shoua
In-Reply-To: <1479882206-31212-1-git-send-email-leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
From: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
The function ib_resolve_eth_dmac() requires struct qp_attr * and
qp_attr_mask as parameters while the function might be useful to resolve
dmac for address handles. This patch changes the signature of the
function so it can be used in the flow of creating an address handle.
Signed-off-by: Moni Shoua <monis-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yishai Hadas <yishaih-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
---
drivers/infiniband/core/core_priv.h | 3 --
drivers/infiniband/core/uverbs_cmd.c | 8 ++--
drivers/infiniband/core/verbs.c | 84 ++++++++++++++++++------------------
include/rdma/ib_verbs.h | 3 ++
4 files changed, 50 insertions(+), 48 deletions(-)
diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h
index 19d499d..1acc95b 100644
--- a/drivers/infiniband/core/core_priv.h
+++ b/drivers/infiniband/core/core_priv.h
@@ -72,9 +72,6 @@ void ib_device_unregister_sysfs(struct ib_device *device);
void ib_cache_setup(void);
void ib_cache_cleanup(void);
-int ib_resolve_eth_dmac(struct ib_qp *qp,
- struct ib_qp_attr *qp_attr, int *qp_attr_mask);
-
typedef void (*roce_netdev_callback)(struct ib_device *device, u8 port,
struct net_device *idev, void *cookie);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index cb3f515a..790af84 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -2402,9 +2402,11 @@ ssize_t ib_uverbs_modify_qp(struct ib_uverbs_file *file,
attr->alt_ah_attr.port_num = cmd.alt_dest.port_num;
if (qp->real_qp == qp) {
- ret = ib_resolve_eth_dmac(qp, attr, &cmd.attr_mask);
- if (ret)
- goto release_qp;
+ if (cmd.attr_mask & IB_QP_AV) {
+ ret = ib_resolve_eth_dmac(qp->device, &attr->ah_attr);
+ if (ret)
+ goto release_qp;
+ }
ret = qp->device->modify_qp(qp, attr,
modify_qp_mask(qp->qp_type, cmd.attr_mask), &udata);
} else {
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 8368764..83d01ef 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -1196,66 +1196,66 @@ int ib_modify_qp_is_ok(enum ib_qp_state cur_state, enum ib_qp_state next_state,
}
EXPORT_SYMBOL(ib_modify_qp_is_ok);
-int ib_resolve_eth_dmac(struct ib_qp *qp,
- struct ib_qp_attr *qp_attr, int *qp_attr_mask)
+int ib_resolve_eth_dmac(struct ib_device *device,
+ struct ib_ah_attr *ah_attr)
{
int ret = 0;
- if (*qp_attr_mask & IB_QP_AV) {
- if (qp_attr->ah_attr.port_num < rdma_start_port(qp->device) ||
- qp_attr->ah_attr.port_num > rdma_end_port(qp->device))
- return -EINVAL;
-
- if (!rdma_cap_eth_ah(qp->device, qp_attr->ah_attr.port_num))
- return 0;
-
- if (rdma_link_local_addr((struct in6_addr *)qp_attr->ah_attr.grh.dgid.raw)) {
- rdma_get_ll_mac((struct in6_addr *)qp_attr->ah_attr.grh.dgid.raw,
- qp_attr->ah_attr.dmac);
- } else {
- union ib_gid sgid;
- struct ib_gid_attr sgid_attr;
- int ifindex;
- int hop_limit;
-
- ret = ib_query_gid(qp->device,
- qp_attr->ah_attr.port_num,
- qp_attr->ah_attr.grh.sgid_index,
- &sgid, &sgid_attr);
-
- if (ret || !sgid_attr.ndev) {
- if (!ret)
- ret = -ENXIO;
- goto out;
- }
+ if (ah_attr->port_num < rdma_start_port(device) ||
+ ah_attr->port_num > rdma_end_port(device))
+ return -EINVAL;
- ifindex = sgid_attr.ndev->ifindex;
+ if (!rdma_cap_eth_ah(device, ah_attr->port_num))
+ return 0;
- ret = rdma_addr_find_l2_eth_by_grh(&sgid,
- &qp_attr->ah_attr.grh.dgid,
- qp_attr->ah_attr.dmac,
- NULL, &ifindex, &hop_limit);
+ if (rdma_link_local_addr((struct in6_addr *)ah_attr->grh.dgid.raw)) {
+ rdma_get_ll_mac((struct in6_addr *)ah_attr->grh.dgid.raw,
+ ah_attr->dmac);
+ } else {
+ union ib_gid sgid;
+ struct ib_gid_attr sgid_attr;
+ int ifindex;
+ int hop_limit;
+
+ ret = ib_query_gid(device,
+ ah_attr->port_num,
+ ah_attr->grh.sgid_index,
+ &sgid, &sgid_attr);
+
+ if (ret || !sgid_attr.ndev) {
+ if (!ret)
+ ret = -ENXIO;
+ goto out;
+ }
- dev_put(sgid_attr.ndev);
+ ifindex = sgid_attr.ndev->ifindex;
- qp_attr->ah_attr.grh.hop_limit = hop_limit;
- }
+ ret = rdma_addr_find_l2_eth_by_grh(&sgid,
+ &ah_attr->grh.dgid,
+ ah_attr->dmac,
+ NULL, &ifindex, &hop_limit);
+
+ dev_put(sgid_attr.ndev);
+
+ ah_attr->grh.hop_limit = hop_limit;
}
out:
return ret;
}
EXPORT_SYMBOL(ib_resolve_eth_dmac);
-
int ib_modify_qp(struct ib_qp *qp,
struct ib_qp_attr *qp_attr,
int qp_attr_mask)
{
- int ret;
- ret = ib_resolve_eth_dmac(qp, qp_attr, &qp_attr_mask);
- if (ret)
- return ret;
+ if (qp_attr_mask & IB_QP_AV) {
+ int ret;
+
+ ret = ib_resolve_eth_dmac(qp->device, &qp_attr->ah_attr);
+ if (ret)
+ return ret;
+ }
return qp->device->modify_qp(qp->real_qp, qp_attr, qp_attr_mask, NULL);
}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5ad43a4..a2f21e0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -3357,4 +3357,7 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
void ib_drain_rq(struct ib_qp *qp);
void ib_drain_sq(struct ib_qp *qp);
void ib_drain_qp(struct ib_qp *qp);
+
+int ib_resolve_eth_dmac(struct ib_device *device,
+ struct ib_ah_attr *ah_attr);
#endif /* IB_VERBS_H */
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH rdma-next 0/5] Optimize RoCE address handle creation for userspace
From: Leon Romanovsky @ 2016-11-23 6:23 UTC (permalink / raw)
To: dledford-H+wXaHxf7aLQT0dZR+AlfA; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Hi Doug,
Please find below the patchset from Moni.
--------------------------------------------------------------------------
Creating a UD address handler (user or kernel) when link layer is Ethernet
requires resolving the remote L3 address (GID) to a L2 address (MAC/VLAN).
Doing it in kernel is easy with an interface in that the module ib_addr
provides. In userspace such an interface does not exist and kernel help
is required.
Until now the way to resolve GID (which is the remote IP or a function
of it) to a MAC was with an interface supplied by libnl. The implementation
of this interface is heavy and fails on large load of requests to create an
address handle.
This series of patches is an infrastructure for user drivers that care
for it to optimize the resolution of L3 to L2 addresses with uverbs
interface.
------------------------------------------------------------------------
The patch #3 was originally posted by Knut Omang and Moni extended it to
support all available drivers along with enhanced commit message. We are
sending the same patch as it exists in our review system, but feel free to
change authorship to Knut if it is matter.
Thanks
Available in the "topic/create_ah" topic branch of this git repo:
git://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git
Or for browsing:
https://git.kernel.org/cgit/linux/kernel/git/leon/linux-rdma.git/log/?h=topic/create_ah
Moni Shoua (5):
IB/core: Enhance ib_resolve_eth_dmac to be usable for creating AH
IB/mlx5: Report that device supports user data response in create_ah
IB/core: Let the verb create_ah return extended response to user
IB/mlx5: Use kernel driver to help userspace create address handle
IB/mlx5: Make create/destroy address handle available to userspace
drivers/infiniband/core/core_priv.h | 3 -
drivers/infiniband/core/uverbs_cmd.c | 19 ++++--
drivers/infiniband/core/verbs.c | 86 ++++++++++++++--------------
drivers/infiniband/hw/cxgb3/iwch_provider.c | 3 +-
drivers/infiniband/hw/cxgb4/provider.c | 4 +-
drivers/infiniband/hw/hns/hns_roce_ah.c | 3 +-
drivers/infiniband/hw/hns/hns_roce_device.h | 3 +-
drivers/infiniband/hw/i40iw/i40iw_verbs.c | 4 +-
drivers/infiniband/hw/mlx4/ah.c | 4 +-
drivers/infiniband/hw/mlx4/mlx4_ib.h | 3 +-
drivers/infiniband/hw/mlx5/ah.c | 25 +++++++-
drivers/infiniband/hw/mlx5/main.c | 5 +-
drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 +-
drivers/infiniband/hw/mthca/mthca_provider.c | 4 +-
drivers/infiniband/hw/nes/nes_verbs.c | 3 +-
drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 3 +-
drivers/infiniband/hw/ocrdma/ocrdma_ah.h | 4 +-
drivers/infiniband/hw/qedr/verbs.c | 3 +-
drivers/infiniband/hw/qedr/verbs.h | 3 +-
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 4 +-
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 4 +-
drivers/infiniband/sw/rxe/rxe_verbs.c | 4 +-
include/rdma/ib_verbs.h | 6 +-
include/uapi/rdma/mlx5-abi.h | 7 +++
24 files changed, 140 insertions(+), 70 deletions(-)
--
2.7.4
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v2 02/11] IB/core: Change port_attr.sm_lid from 16 to 32 bits
From: Chandramouli, Dasaratharaman @ 2016-11-23 5:43 UTC (permalink / raw)
To: Jason Gunthorpe; +Cc: Ira Weiny, Don Hiatt, linux-rdma, Doug Ledford
In-Reply-To: <20161122212411.GC6484-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On 11/22/2016 1:24 PM, Jason Gunthorpe wrote:
> On Tue, Nov 22, 2016 at 01:13:26PM -0800, Chandramouli, Dasaratharaman wrote:
>>
>>
>> On 11/22/2016 12:57 PM, Jason Gunthorpe wrote:
>>> On Tue, Nov 22, 2016 at 02:38:43PM -0500, Dasaratharaman Chandramouli wrote:
>>>> +++ b/drivers/infiniband/core/sa_query.c
>>>> @@ -958,7 +958,7 @@ static void update_sm_ah(struct work_struct *work)
>>>> pr_err("Couldn't find index for default PKey\n");
>>>>
>>>> memset(&ah_attr, 0, sizeof ah_attr);
>>>> - ah_attr.dlid = port_attr.sm_lid;
>>>> + ah_attr.dlid = (u16)port_attr.sm_lid;
>>>
>>> Why are we dropping bits here?
>>
>> Patch 3 increases ah_attr.dlid to 32 bits and the typecast is dropped.
>> We added it in this patch just to avoid compiler warnings, if any.
>
> It would be alot better to fix this series so adding typecasts and
> then dropping them didn't happen so extensively. Just reodering some
> patched would do the trick. That would make it alot less churny and
> easy to read..
Yes, will re-order them. Thanks
>
>>>> - sport->sm_lid = port_attr.sm_lid;
>>>> + sport->sm_lid = (u16)port_attr.sm_lid;
>>>> sport->lid = port_attr.lid;
>>>
>>> And here..
>
>> Patch 11 increases lid and sm_lid fields in struct srpt_port and the
>> typecast is dropped. We added it in this patch just to avoid
>> compiler warnings, if any.
>
> Eg if you do patch 11 first this hunk goes away.
>
> I also don't really care for all the added u16 casts, the compiler
> doesn't warn on implicit demotion without a higher warning level...
>
It's that higher warning level that we were a little concerned about.
If you feel very strongly about not having these casts, they can be
removed but i would prefer leaving them there to silence certain odd
ball compiler configurations.
>>>
>>>> +#define OPA_TO_IB_UCAST_LID(x) (((x) >= be16_to_cpu(IB_MULTICAST_LID_BASE)) \
>>>> + ? 0 : x)
>>>
>>> static inline function please.
>> Will change. Thanks.
>
> All of them please.
Will change.
>
> And I think you should re-think how this is being used, the pattern
> around OPA_TO_IB_UCAST_LID is copied too many times for my liking, and
> isn't this UAPI compatability only?
They are only used when there is a user-space query into the kernel for
lid and sm_lid.
>
> Jason
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Vishwanathapura, Niranjana @ 2016-11-23 0:53 UTC (permalink / raw)
To: Christoph Lameter
Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, netdev,
Dennis Dalessandro
In-Reply-To: <alpine.DEB.2.20.1611221656390.8214@east.gentwo.org>
On Tue, Nov 22, 2016 at 05:04:37PM -0600, Christoph Lameter wrote:
>On Tue, 22 Nov 2016, Vishwanathapura, Niranjana wrote:
>
>> Ok, I do understand Jason's point that we should probably not put this driver
>> under drivers/infiniband/sw/.., as this driver is not a HCA.
>> It is an ULP similar to ipoib, built on top of Omni-path irrespective of
>> whether we register a hfi_vnic_bus or a direct custom interface with HFI1.
>> This ULP will transmit and recieve Omni-path packets over the fabric, and is
>> dependent on IB MAD interface and the HFI1 driver.
>
>This is something that encapsulates IP (v4 right?) in something else.
>Would belong into
>
> linux/net/ipv4
>
>You already have similar implementations there
>
>See f.e. ipip.c, ip_tunnel.c and lots more (try
> ls linux/net/ipv4/*tunnel*
>
>)
>
>If this is more like a device then it would belong into
>
>linux/drivers/net/hfi or so (see also linux/drivers/net/ppp, plip,
>loopback, etc etc)
>
It is Ethernet packet encapsulated in Omni-path header by hfi_vnic driver.
The packets are sent and received over the wire by the HFI1 device driven by
HFI1 driver. The encapsulation information is obtained via IB MAD control
interface.
Niranjana
>
>
^ permalink raw reply
* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Jason Gunthorpe @ 2016-11-23 0:49 UTC (permalink / raw)
To: ira.weiny
Cc: Vishwanathapura, Niranjana, Doug Ledford, linux-rdma, netdev,
Dennis Dalessandro
In-Reply-To: <20161123000502.GA27968@phlsvsds.ph.intel.com>
On Tue, Nov 22, 2016 at 07:05:05PM -0500, ira.weiny wrote:
> On Tue, Nov 22, 2016 at 10:04:07AM -0700, Jason Gunthorpe wrote:
> > On Mon, Nov 21, 2016 at 05:53:04PM -0800, Vishwanathapura, Niranjana wrote:
> > > There are many example drivers in kernel which are using bus_register() in
> > > an initcall.
> >
> > There really are not, certainly not in major subsystems.
>
> I see 2 drivers in the Block subsystem which do this:
>
>
> 19 5354 /nfs/site/home/iweiny/linux-stable/drivers/block/cciss.c <<cciss_init>>
> err = bus_register(&cciss_bus_type);
> 20 6447 /nfs/site/home/iweiny/linux-stable/drivers/block/rbd.c <<rbd_sysfs_init>>
> ret = bus_register(&rbd_bus_type);
>
> 2 drivers in the drm subsystem which do this:
>
>
> 29 1155 /nfs/site/home/iweiny/linux-stable/drivers/gpu/drm/drm_mipi_dsi.c <<mipi_dsi_bus_init>>
> return bus_register(&mipi_dsi_bus_type);
> 30 242 /nfs/site/home/iweiny/linux-stable/drivers/gpu/host1x/dev.c <<tegra_host1x_init>>
> err = bus_register(&host1x_bus_type);
IMHO this is all obscure or legacy stuff (eg ccsiss lost its bus when
it was reworked into hpsa). Who knows about that SOC stuff, maybe
there really is a special on-chip bus under those drivers.
The point is using a bus as a generic interconnect between two driver
modules seems very rare, and is not what we have historically ever
done in drivers/infiniband - all our split drivers use a trivial
register scheme. eg see cxgb4_register_uld/mlx4_register_interface/etc.
Should a multi-function driver use a bus or class to connect its
parts? Who knows. Maybe Greg KH/etc has an opinion. But that is not
what we have been doing, it doesn't seem very simplifying, and
this series doesn't even make module auto-loading work...
Since doing this creates a bunch of uapis (again, from a driver, ugh) it
seems like a bad idea without more support as 'the right way'
.. and yes, it would be nice to have a lightweight mechanism to
replace those register functions that could handle module auto loading
too, and maybe that is a 'multi part driver bus/class' or somesuch
... This is really a topic for the device core maintainers, IMHO.
> > > We could add a custom Interface between HFI1 driver and hfi_vnic drivers
> > > without involving a bus.
> >
> > hfi is already registering on the infiniband class, just use that.
>
> I don't understand what you mean here?
Get the struct ib_device for the hfi and then do something to get hfi
specific function calls.
Or work it backwards with a _register function..
> [*] As an aside why does the ib_core not use this methodology? It dawned on
> me that this may be a better way to fix our module load problems. However, I
> have not looked into details.
ib_core is a class, which is appropriate. RDMA devices are not busses.
Jason
^ permalink raw reply
* Re: [PATCH] infiniband: hw: hfi1: constify mmu_notifier_ops structure
From: ira.weiny @ 2016-11-23 0:43 UTC (permalink / raw)
To: Bhumika Goyal
Cc: julia.lawall, mike.marciniszyn, dennis.dalessandro, dledford,
sean.hefty, hal.rosenstock, linux-rdma, linux-kernel
In-Reply-To: <1479548868-13563-1-git-send-email-bhumirks@gmail.com>
On Sat, Nov 19, 2016 at 03:17:48PM +0530, Bhumika Goyal wrote:
> Declare the structure mmu_notifier_ops as const as it is only stored in
> the ops field of a mmu_notifier structure. The ops field is of type
> const struct mmu_notifier_ops *, so mmu_notifier_ops structures having
> this property can be declared as const.
> Done using coccinelle:
> @r1 disable optional_qualifier @
> identifier i;
> position p;
> @@
> static struct mmu_notifier_ops i@p = {...};
>
> @ok1@
> identifier r1.i;
> position p;
> struct mmu_rb_handler handler;
> @@
> handler.mn.ops=&i@p
>
> @bad@
> position p!={r1.p,ok1.p};
> identifier r1.i;
> @@
> i@p
>
> @depends on !bad disable optional_qualifier@
> identifier r1.i;
> @@
> static
> +const
> struct mmu_notifier_ops i={...};
>
> @depends on !bad disable optional_qualifier@
> identifier r1.i;
> @@
> +const
> struct mmu_notifier_ops i;
>
> File size before:
> text data bss dec hex filename
> 3566 72 16 3654 e46
> drivers/infiniband/hw/hfi1/mmu_rb.o
>
> File size after:
> text data bss dec hex filename
> 3658 0 16 3674 e5a
> drivers/infiniband/hw/hfi1/mmu_rb.o
>
> Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> ---
> drivers/infiniband/hw/hfi1/mmu_rb.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
> index 7ad3089..ccbf52c 100644
> --- a/drivers/infiniband/hw/hfi1/mmu_rb.c
> +++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
> @@ -81,7 +81,7 @@ static void do_remove(struct mmu_rb_handler *handler,
> struct list_head *del_list);
> static void handle_remove(struct work_struct *work);
>
> -static struct mmu_notifier_ops mn_opts = {
> +static const struct mmu_notifier_ops mn_opts = {
> .invalidate_page = mmu_notifier_page,
> .invalidate_range_start = mmu_notifier_range_start,
> };
> --
> 1.9.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* [PATCH 2/2] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Bart Van Assche @ 2016-11-23 0:17 UTC (permalink / raw)
To: James Bottomley, Martin K. Petersen
Cc: Doug Ledford, Christoph Hellwig, Sagi Grimberg, Max Gurtovoy,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <39d8cb23-0406-e8c3-6e3a-a467ebe41470-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Ensure that if scsi-mq is enabled that scsi_internal_device_block()
waits until ongoing shost->hostt->queuecommand() calls have finished.
Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: James Bottomley <jejb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/scsi/scsi_lib.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 84c9e61..11d082d 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2872,7 +2872,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
* request queue.
*/
if (q->mq_ops) {
- blk_mq_stop_hw_queues(q);
+ blk_mq_quiesce_queue(q);
} else {
spin_lock_irqsave(q->queue_lock, flags);
blk_stop_queue(q);
--
2.10.2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 1/2] SRP transport: Move queuecommand() wait code to SCSI core
From: Bart Van Assche @ 2016-11-23 0:17 UTC (permalink / raw)
To: James Bottomley, Martin K. Petersen
Cc: Doug Ledford, Christoph Hellwig, Sagi Grimberg, Max Gurtovoy,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <39d8cb23-0406-e8c3-6e3a-a467ebe41470-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Additionally, rename srp_wait_for_queuecommand() into
scsi_wait_for_queuecommand() and add a comment about the
queuecommand() call from scsi_send_eh_cmnd().
Note: this patch changes scsi_internal_device_block from a function
that did not sleep into a function that may sleep. This is fine for
all callers of this function:
* scsi_internal_device_block() is called from the mpt3sas device while
that driver holds the ioc->dm_cmds.mutex. This means that the mpt3sas
driver calls this function from thread context.
* scsi_target_block() is called by __iscsi_block_session() from
kernel thread context and with IRQs enabled.
* The SRP transport code also calls scsi_target_block() from kernel
thread context while sleeping is allowed.
* The snic driver also calls scsi_target_block() from a context from
which sleeping is allowed. The scsi_target_block() call namely occurs
immediately after a scsi_flush_work() call.
Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagi-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
Cc: James Bottomley <jejb-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
drivers/scsi/scsi_lib.c | 41 +++++++++++++++++++++++++++++++++++++--
drivers/scsi/scsi_transport_srp.c | 41 ++++++---------------------------------
2 files changed, 45 insertions(+), 37 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b4f682c..84c9e61 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2721,6 +2721,39 @@ void sdev_evt_send_simple(struct scsi_device *sdev,
EXPORT_SYMBOL_GPL(sdev_evt_send_simple);
/**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ * @sdev: SCSI device to count the number of scsi_request_fn() callers for.
+ */
+static int scsi_request_fn_active(struct scsi_device *sdev)
+{
+ struct request_queue *q = sdev->request_queue;
+ int request_fn_active;
+
+ WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+ spin_lock_irq(q->queue_lock);
+ request_fn_active = q->request_fn_active;
+ spin_unlock_irq(q->queue_lock);
+
+ return request_fn_active;
+}
+
+/**
+ * scsi_wait_for_queuecommand() - wait for ongoing queuecommand() calls
+ * @shost: SCSI host pointer.
+ *
+ * Wait until the ongoing shost->hostt->queuecommand() calls that are
+ * invoked from scsi_request_fn() have finished.
+ */
+static void scsi_wait_for_queuecommand(struct scsi_device *sdev)
+{
+ WARN_ON_ONCE(sdev->host->use_blk_mq);
+
+ while (scsi_request_fn_active(sdev))
+ msleep(20);
+}
+
+/**
* scsi_device_quiesce - Block user issued commands.
* @sdev: scsi device to quiesce.
*
@@ -2804,8 +2837,7 @@ EXPORT_SYMBOL(scsi_target_resume);
* @sdev: device to block
*
* Block request made by scsi lld's to temporarily stop all
- * scsi commands on the specified device. Called from interrupt
- * or normal process context.
+ * scsi commands on the specified device. May sleep.
*
* Returns zero if successful or error if not
*
@@ -2814,6 +2846,10 @@ EXPORT_SYMBOL(scsi_target_resume);
* (which must be a legal transition). When the device is in this
* state, all commands are deferred until the scsi lld reenables
* the device with scsi_device_unblock or device_block_tmo fires.
+ *
+ * To do: avoid that scsi_send_eh_cmnd() calls queuecommand() after
+ * scsi_internal_device_block() has blocked a SCSI device and also
+ * remove the rport mutex lock and unlock calls from srp_queuecommand().
*/
int
scsi_internal_device_block(struct scsi_device *sdev)
@@ -2841,6 +2877,7 @@ scsi_internal_device_block(struct scsi_device *sdev)
spin_lock_irqsave(q->queue_lock, flags);
blk_stop_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
+ scsi_wait_for_queuecommand(sdev);
}
return 0;
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index e3cd3ec..b48328a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,7 +24,6 @@
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/string.h>
-#include <linux/delay.h>
#include <scsi/scsi.h>
#include <scsi/scsi_cmnd.h>
@@ -402,36 +401,6 @@ static void srp_reconnect_work(struct work_struct *work)
}
}
-/**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- *
- * To do: add support for scsi-mq in this function.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
- struct scsi_device *sdev;
- struct request_queue *q;
- int request_fn_active = 0;
-
- shost_for_each_device(sdev, shost) {
- q = sdev->request_queue;
-
- spin_lock_irq(q->queue_lock);
- request_fn_active += q->request_fn_active;
- spin_unlock_irq(q->queue_lock);
- }
-
- return request_fn_active;
-}
-
-/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
-static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
-{
- while (scsi_request_fn_active(shost))
- msleep(20);
-}
-
static void __rport_fail_io_fast(struct srp_rport *rport)
{
struct Scsi_Host *shost = rport_to_shost(rport);
@@ -441,14 +410,17 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
return;
+ /*
+ * Call scsi_target_block() to wait for ongoing shost->queuecommand()
+ * calls before invoking i->f->terminate_rport_io().
+ */
+ scsi_target_block(rport->dev.parent);
scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
/* Involve the LLD if possible to terminate all I/O on the rport. */
i = to_srp_internal(shost->transportt);
- if (i->f->terminate_rport_io) {
- srp_wait_for_queuecommand(shost);
+ if (i->f->terminate_rport_io)
i->f->terminate_rport_io(rport);
- }
}
/**
@@ -576,7 +548,6 @@ int srp_reconnect_rport(struct srp_rport *rport)
if (res)
goto out;
scsi_target_block(&shost->shost_gendev);
- srp_wait_for_queuecommand(shost);
res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
pr_debug("%s (state %d): transport.reconnect() returned %d\n",
dev_name(&shost->shost_gendev), rport->state, res);
--
2.10.2
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related
* [PATCH 0/2] SRP transport, scsi-mq: Wait for .queue_rq() if necessary
From: Bart Van Assche @ 2016-11-23 0:16 UTC (permalink / raw)
To: James Bottomley, Martin K. Petersen
Cc: Doug Ledford, Christoph Hellwig, Sagi Grimberg, Max Gurtovoy,
linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Hello James and Martin,
The SRP transport code must wait until ongoing .queuecommand() /
.queue_rq() callback function invocations have finished before
reconnecting at the transport layer level and also before invoking
.terminate_rport_io(). This is already the case for the single queue
path but not yet for the scsi-mq path. This patch series realizes the
proper serialization for the scsi-mq path. Compared to last time these
patches were posted, only the patch descriptions and one comment have
been changed.
See also "[PATCH v5 0/14] Fix race conditions related to stopping block
layer queues"
(https://www.mail-archive.com/linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org/msg01830.html)
for a previous post of these patches.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: Andrew Lunn @ 2016-11-23 0:06 UTC (permalink / raw)
To: Vishwanathapura, Niranjana
Cc: Jason Gunthorpe, Doug Ledford, linux-rdma, netdev,
Dennis Dalessandro
In-Reply-To: <20161122194918.GA69241@knc-06.sc.intel.com>
On Tue, Nov 22, 2016 at 11:49:18AM -0800, Vishwanathapura, Niranjana wrote:
> Ok, I do understand Jason's point that we should probably not put
> this driver under drivers/infiniband/sw/.., as this driver is not a
> HCA.
> It is an ULP similar to ipoib, built on top of Omni-path
> irrespective of whether we register a hfi_vnic_bus or a direct
> custom interface with HFI1.
> This ULP will transmit and recieve Omni-path packets over the
> fabric, and is dependent on IB MAD interface and the HFI1 driver.
>
> Doug,
> Will it be acceptable if we put it under 'drivers/infiniband/ulp/hfi_vnic'?
How about turning this whole discussion around.
This is a network driver. So ask the network Maintainers where he
wants it. Send the patch to David Miller <davem@davemloft.net> and
netdev with the question, where does this code belong?
Andrew
^ permalink raw reply
* Re: [RFC 02/10] IB/hfi-vnic: Virtual Network Interface Controller (VNIC) Bus driver
From: ira.weiny @ 2016-11-23 0:05 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Vishwanathapura, Niranjana, Doug Ledford,
linux-rdma-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
Dennis Dalessandro
In-Reply-To: <20161122170407.GE3956-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
On Tue, Nov 22, 2016 at 10:04:07AM -0700, Jason Gunthorpe wrote:
> On Mon, Nov 21, 2016 at 05:53:04PM -0800, Vishwanathapura, Niranjana wrote:
> > There are many example drivers in kernel which are using bus_register() in
> > an initcall.
>
> There really are not, certainly not in major subsystems.
I see 2 drivers in the Block subsystem which do this:
19 5354 /nfs/site/home/iweiny/linux-stable/drivers/block/cciss.c <<cciss_init>>
err = bus_register(&cciss_bus_type);
20 6447 /nfs/site/home/iweiny/linux-stable/drivers/block/rbd.c <<rbd_sysfs_init>>
ret = bus_register(&rbd_bus_type);
2 drivers in the drm subsystem which do this:
29 1155 /nfs/site/home/iweiny/linux-stable/drivers/gpu/drm/drm_mipi_dsi.c <<mipi_dsi_bus_init>>
return bus_register(&mipi_dsi_bus_type);
30 242 /nfs/site/home/iweiny/linux-stable/drivers/gpu/host1x/dev.c <<tegra_host1x_init>>
err = bus_register(&host1x_bus_type);
And I think there are a couple others.
I'm not sure what these devices/buses do but they are registering their own bus
while being in another major subsystem. Is what we are doing really so
crazy/wrong?
>
> > We could add a custom Interface between HFI1 driver and hfi_vnic drivers
> > without involving a bus.
>
> hfi is already registering on the infiniband class, just use that.
>
I don't understand what you mean here?
The bus_register provides a really clean way for the hfi1 driver and hfi_vnic
driver to find each other. This includes being able to support hfi1 with or
without hfi_vnic being loaded. Note that without configuration from the "EM"
Ethernet Manager the hfi_vnic does not export a net device.
Why wouldn't we use this core kernel support?[*]
> > But using the existing bus model gave a lot of in-built flexibility in
> > decoupling devices from the drivers.
>
> If you want to have your own bus then you need your own hfi
> subsystem. drivers/infiniband is not a dumping ground..
>
We don't consider drivers/infiniband a "dumping ground". There is a
requirement on ib_mad from the hfi_vnic driver.
Ira
[*] As an aside why does the ib_core not use this methodology? It dawned on
me that this may be a better way to fix our module load problems. However, I
have not looked into details.
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH v5 2/9] IB/core: Enforce PKey security on QPs
From: Daniel Jurgens @ 2016-11-22 23:39 UTC (permalink / raw)
To: James Morris
Cc: chrisw@sous-sol.org, paul@paul-moore.com, sds@tycho.nsa.gov,
eparis@parisplace.org, dledford@redhat.com, sean.hefty@intel.com,
hal.rosenstock@gmail.com, selinux@tycho.nsa.gov,
linux-security-module@vger.kernel.org, linux-rdma@vger.kernel.org,
Yevgeny Petrilin
In-Reply-To: <alpine.LRH.2.20.1611230923240.3895@namei.org>
On 11/22/2016 5:24 PM, James Morris wrote:
> On Tue, 22 Nov 2016, Dan Jurgens wrote:
>
>> From: Daniel Jurgens <danielj@mellanox.com>
>>
>> Add new LSM hooks to allocate and free security contexts and check for
>> permission to access a PKey.
> I guess Doug's is best tree for these patches?
Maybe? In earlier versions I had kept the LSM, RDMA, and SELinux changes in separate patches Paul Moore thought it was best to squash them together functionally. Once everyone agrees they can merged they could all go to one tree. They apply cleanly to Paul and Doug's trees for sure.
>
>> ---
>> v2:
>> - Squashed LSM hook additions. Paul Moore
>> - Changed security blobs to void*. Paul Moore
>>
>> v3:
>> - Change parameter order of pkey_access hook. Paul Moore
>> ---
>> drivers/infiniband/core/Makefile | 3 +-
>> drivers/infiniband/core/cache.c | 21 +-
>> drivers/infiniband/core/core_priv.h | 77 +++++
>> drivers/infiniband/core/device.c | 33 ++
>> drivers/infiniband/core/security.c | 617 +++++++++++++++++++++++++++++++++++
>> drivers/infiniband/core/uverbs_cmd.c | 20 +-
>> drivers/infiniband/core/verbs.c | 27 +-
>> include/linux/lsm_hooks.h | 27 ++
>> include/linux/security.h | 21 ++
>> include/rdma/ib_verbs.h | 48 +++
>> security/Kconfig | 9 +
>> security/security.c | 31 ++
>> 12 files changed, 925 insertions(+), 9 deletions(-)
>> create mode 100644 drivers/infiniband/core/security.c
>
> Acked-by: James Morris <james.l.morris@oracle.com>
>
>
^ permalink raw reply
* Re: [PATCH 1/7] IB/rxe: Allocate enough space for an IPv6 addr
From: Bart Van Assche @ 2016-11-22 23:28 UTC (permalink / raw)
To: Yonatan Cohen, Andrew Boyer, monis-VPRAkNaXOzVWk0Htik3J/w,
linux-rdma-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <975a89bc-033e-14ab-72f2-4244c0205e59-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
On 11/22/2016 07:21 AM, Yonatan Cohen wrote:
> On 11/18/2016 4:36 PM, Andrew Boyer wrote:
>> Avoid smashing the stack when an ICRC error occurs on an IPv6 network.
>>
>> Signed-off-by: Andrew Boyer <andrew.boyer-8PEkshWhKlo@public.gmane.org>
>> ---
>> drivers/infiniband/sw/rxe/rxe_recv.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c
>> b/drivers/infiniband/sw/rxe/rxe_recv.c
>> index 46f0628..b40ab8d 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
>> @@ -391,7 +391,7 @@ int rxe_rcv(struct sk_buff *skb)
>> payload_size(pkt));
>> calc_icrc = cpu_to_be32(~calc_icrc);
>> if (unlikely(calc_icrc != pack_icrc)) {
>> - char saddr[sizeof(struct in6_addr)];
>> + char saddr[64];
>>
>> if (skb->protocol == htons(ETH_P_IPV6))
>> sprintf(saddr, "%pI6", &ipv6_hdr(skb)->saddr);
>>
> you fixed a bug here but i think the following would be better
> than hard coding 64 bytes on the stack
>
> --- a/drivers/infiniband/sw/rxe/rxe_recv.c
> +++ b/drivers/infiniband/sw/rxe/rxe_recv.c
> @@ -391,16 +391,14 @@ int rxe_rcv(struct sk_buff *skb)
> payload_size(pkt));
> calc_icrc = cpu_to_be32(~calc_icrc);
> if (unlikely(calc_icrc != pack_icrc)) {
> - char saddr[sizeof(struct in6_addr)];
>
> if (skb->protocol == htons(ETH_P_IPV6))
> - sprintf(saddr, "%pI6", &ipv6_hdr(skb)->saddr);
> + pr_warn_ratelimited("bad ICRC from %pI6\n",
> &ipv6_hdr(skb)->saddr);
> else if (skb->protocol == htons(ETH_P_IP))
> - sprintf(saddr, "%pI4", &ip_hdr(skb)->saddr);
> + pr_warn_ratelimited("bad ICRC from %pI4\n",
> &ip_hdr(skb)->saddr);
> else
> - sprintf(saddr, "unknown");
> + pr_warn_ratelimited("bad ICRC from unknown\n");
>
> - pr_warn_ratelimited("bad ICRC from %s\n", saddr);
> goto drop;
> }
Hello Yonatan,
Apparently our e-mails crossed. Anyway, have you considered to use %pIS
instead of %pI4 / %pI6? See also
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=1067964305df.
Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox