* [PATCH RFC v2 00/10] Introduce Signature feature
@ 2013-10-31 12:24 Sagi Grimberg
[not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
0 siblings, 1 reply; 45+ messages in thread
From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw)
To: linux-rdma-u79uwXL29TY76Z2rM5mHXA
Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w
This patchset Introduces Verbs level support for signature handover
feature. Siganture is intended to implement end-to-end data integrity
on a transactional basis in a completely offloaded manner.
There are several end-to-end data integrity methods used today in various
applications and/or upper layer protocols such as T10-DIF defined by SCSI
specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs
support only for T10-DIF. The proposed framework allows adding more
signature methods in the future.
In T10-DIF, when a series of 512-byte data blocks are transferred, each
block is followed by an 8-byte guard. The guard consists of CRC that
protects the integrity of the data in the block, and some other tags
that protects against mis-directed IOs.
Data can be protected when transferred over the wire, but can also be
protected in the memory of the sender/receiver. This allows true end-
to-end protection against bits flipping either over the wire, through
gateways, in memory, over PCI, etc.
While T10-DIF clearly defines that over the wire protection guards are
interleaved into the data stream (each 512-Byte block followed by 8-byte
guard), when in memory, the protection guards may reside in a buffer
separated from the data. Depending on the application, it is usually
easier to handle the data when it is contiguous. In this case the data
buffer will be of size 512xN and the protection buffer will be of size
8xN (where N is the number of blocks in the transaction).
There are 3 kinds of signature handover operation:
1. Take unprotected data (from wire or memory) and ADD protection
guards.
2. Take protetected data (from wire or memory), validate the data
integrity against the protection guards and STRIP the protection
guards.
3. Take protected data (from wire or memory), validate the data
integrity against the protection guards and PASS the data with
the guards as-is.
This translates to defining to the HCA how/if data protection exists
in memory domain, and how/if data protection exists is wire domain.
The way that data integrity is performed is by using a new kind of
memory region: signature-enabled MR, and a new kind of work request:
REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR,
and defines all the needed information for the signature handover
(data buffer, protection buffer if needed and signature attributes).
The result is an MR that can be used for data transfer as usual,
that will also add/validate/strip/pass protection guards.
When the data transfer is successfully completed, it does not mean
that there are no integrity errors. The user must afterwards check
the signature status of the handover operation using a new light-weight
verb.
This feature shall be used in storage upper layer protocols iSER/SRP
implementing end-to-end data integrity T10-DIF. Following this patchset,
we will soon submit krping patches which will demonstrate the usage of
these signature verbs.
Patchset summary:
- Intoduce verbs for create/destroy memory regions supporting signature.
- Introduce IB core signature verbs API.
- Implement mr create/destroy verbs in mlx5 driver.
- Preperation patches for signature support in mlx5 driver.
- Implement signature handover work request in mlx5 driver.
- Implement signature error collection and handling in mlx5 driver.
Changes from v1:
- IB/core: Reduced sizeof ib_send_wr by using wr->sg_list for data
and dedicated ib_sge for protection guards buffer.
Currently sig_handover extension does not increase sizeof ib_send_wr
- IB/core: Change enum to int for container variables.
- IB/mlx5: Validate wr->num_sge=1 for REG_SIG_MR work request.
Changes from v0:
- Commit messages: Added more detailed explanation for signature work request.
- IB/core: Remove indirect memory registration enablement from create_mr.
Keep only signature enablement.
- IB/mlx5: Changed signature error processing via MR radix lookup.
Sagi Grimberg (10):
IB/core: Introduce protected memory regions
IB/core: Introduce Signature Verbs API
IB/mlx5, mlx5_core: Support for create_mr and destroy_mr
IB/mlx5: Initialize mlx5_ib_qp signature related
IB/mlx5: Break wqe handling to begin & finish routines
IB/mlx5: remove MTT access mode from umr flags helper function
IB/mlx5: Keep mlx5 MRs in a radix tree under device
IB/mlx5: Support IB_WR_REG_SIG_MR
IB/mlx5: Collect signature error completion
IB/mlx5: Publish support in signature feature
drivers/infiniband/core/verbs.c | 47 +++
drivers/infiniband/hw/mlx5/cq.c | 53 +++
drivers/infiniband/hw/mlx5/main.c | 12 +
drivers/infiniband/hw/mlx5/mlx5_ib.h | 14 +
drivers/infiniband/hw/mlx5/mr.c | 138 +++++++
drivers/infiniband/hw/mlx5/qp.c | 525 ++++++++++++++++++++++--
drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/mr.c | 84 ++++
include/linux/mlx5/cq.h | 1 +
include/linux/mlx5/device.h | 43 ++
include/linux/mlx5/driver.h | 35 ++
include/linux/mlx5/qp.h | 62 +++
include/rdma/ib_verbs.h | 165 ++++++++-
13 files changed, 1141 insertions(+), 39 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 45+ messages in thread[parent not found: <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-10-31 12:24 ` Sagi Grimberg [not found] ` <1383222255-22699-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API Sagi Grimberg ` (11 subsequent siblings) 12 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w This commit introduces verbs for creating/destoying memory regions which will allow new types of memory key operations such as protected memory registration. Indirect memory registration is registering several (one of more) pre-registered memory regions in a specific layout. The Indirect region may potentialy describe several regions and some repitition format between them. Protected Memory registration is registering a memory region with various data integrity attributes which will describe protection schemes that will be handled by the HCA in an offloaded manner. These memory regions will be applicable for a new REG_SIG_MR work request introduced later in this patchset. In the future these routines may replace or implement current memory regions creation routines existing today: - ib_reg_user_mr - ib_alloc_fast_reg_mr - ib_get_dma_mr - ib_dereg_mr Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/verbs.c | 39 +++++++++++++++++++++++++++++++++++++++ include/rdma/ib_verbs.h | 38 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 22192de..1d94a5c 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1052,6 +1052,45 @@ int ib_dereg_mr(struct ib_mr *mr) } EXPORT_SYMBOL(ib_dereg_mr); +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct ib_mr *mr; + + if (!pd->device->create_mr) + return ERR_PTR(-ENOSYS); + + mr = pd->device->create_mr(pd, mr_init_attr); + + if (!IS_ERR(mr)) { + mr->device = pd->device; + mr->pd = pd; + mr->uobject = NULL; + atomic_inc(&pd->usecnt); + atomic_set(&mr->usecnt, 0); + } + + return mr; +} +EXPORT_SYMBOL(ib_create_mr); + +int ib_destroy_mr(struct ib_mr *mr) +{ + struct ib_pd *pd; + int ret; + + if (atomic_read(&mr->usecnt)) + return -EBUSY; + + pd = mr->pd; + ret = mr->device->destroy_mr(mr); + if (!ret) + atomic_dec(&pd->usecnt); + + return ret; +} +EXPORT_SYMBOL(ib_destroy_mr); + struct ib_mr *ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { struct ib_mr *mr; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 645c3ce..53f065d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -925,6 +925,22 @@ enum ib_mr_rereg_flags { IB_MR_REREG_ACCESS = (1<<2) }; +enum ib_mr_create_flags { + IB_MR_SIGNATURE_EN = 1, +}; + +/** + * ib_mr_init_attr - Memory region init attributes passed to routine + * ib_create_mr. + * @max_reg_descriptors: max number of registration units that + * may be used with UMR work requests. + * @flags: MR creation flags bit mask. + */ +struct ib_mr_init_attr { + int max_reg_descriptors; + int flags; +}; + /** * struct ib_mw_bind - Parameters for a type 1 memory window bind operation. * @wr_id: Work request id. @@ -1257,6 +1273,9 @@ struct ib_device { int (*query_mr)(struct ib_mr *mr, struct ib_mr_attr *mr_attr); int (*dereg_mr)(struct ib_mr *mr); + int (*destroy_mr)(struct ib_mr *mr); + struct ib_mr * (*create_mr)(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr * (*alloc_fast_reg_mr)(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list * (*alloc_fast_reg_page_list)(struct ib_device *device, @@ -2092,6 +2111,25 @@ int ib_query_mr(struct ib_mr *mr, struct ib_mr_attr *mr_attr); */ int ib_dereg_mr(struct ib_mr *mr); + +/** + * ib_create_mr - creates memory region that may be used for + * direct or indirect registration models via UMR WR. + * @pd: The protection domain associated with the region. + * @mr_init_attr: memory region init attributes. + */ +struct ib_mr *ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); + +/** + * ib_destroy_mr - Destroys a memory region that was created using + * ib_create_mr and removes it from HW translation tables. + * @mr: The memory region to destroy. + * + * This function can fail, if the memory region has memory windows bound to it. + */ +int ib_destroy_mr(struct ib_mr *mr); + /** * ib_alloc_fast_reg_mr - Allocates memory region usable with the * IB_WR_FAST_REG_MR send work request. -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
[parent not found: <1383222255-22699-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions [not found] ` <1383222255-22699-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-11-01 17:09 ` Bart Van Assche [not found] ` <5273E03C.3010501-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 17:09 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > +/** > + * ib_mr_init_attr - Memory region init attributes passed to routine > + * ib_create_mr. > + * @max_reg_descriptors: max number of registration units that > + * may be used with UMR work requests. > + * @flags: MR creation flags bit mask. > + */ > +struct ib_mr_init_attr { > + int max_reg_descriptors; > + int flags; > +}; Is this the first patch that add the abbreviation "UMR" to a header file in include/rdma ? If so, I think it's a good idea not only to mention the abbreviation but also what UMR stands for. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5273E03C.3010501-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions [not found] ` <5273E03C.3010501-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:14 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:14 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/1/2013 7:09 PM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> +/** >> + * ib_mr_init_attr - Memory region init attributes passed to routine >> + * ib_create_mr. >> + * @max_reg_descriptors: max number of registration units that >> + * may be used with UMR work requests. >> + * @flags: MR creation flags bit mask. >> + */ >> +struct ib_mr_init_attr { >> + int max_reg_descriptors; >> + int flags; >> +}; > > Is this the first patch that add the abbreviation "UMR" to a header > file in include/rdma ? If so, I think it's a good idea not only to > mention the abbreviation but also what UMR stands for. > > Bart. > You are correct, I prefer to remove this abbreviation UMR as it is not tightly related to signature. The the max_reg_descriptors parameter is the equivalent to max_page_list_len of ib_alloc_fast_reg_mr(). The difference is that this memory region can also register indirect memory descriptors {key, addr, len} rather than u64 physical addresses. For example signature enabled memory region may register 2 descriptors: data and protection. I'll modify the explanation here in v3. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg [not found] ` <1383222255-22699-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr Sagi Grimberg ` (10 subsequent siblings) 12 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w This commit Introduces the Verbs Interface for signature related operations. A signature handover operation shall configure the layouts of data and protection attributes both in memory and wire domains. Signature operations are: - INSERT Generate and insert protection information when handing over data from input space to output space. - vaildate and STRIP: Validate protection information and remove it when handing over data from input space to output space. - validate and PASS: Validate protection information and pass it when handing over data from input space to output space. Once the signature handover opration is done, the HCA will offload data integrity generation/validation while performing the actual data transfer. Additions: 1. HCA signature capabilities in device attributes Verbs provider supporting Signature handover operations shall fill relevant fields in device attributes structure returned by ib_query_device. 2. QP creation flag IB_QP_CREATE_SIGNATURE_EN Creating QP that will carry signature handover operations may require some special preperations from the verbs provider. So we add QP creation flag IB_QP_CREATE_SIGNATURE_EN to declare that the created QP may carry out signature handover operations. Expose signature support to verbs layer (no support for now) 3. New send work request IB_WR_REG_SIG_MR Signature handover work request. This WR will define the signature handover properties of the memory/wire domains as well as the domains layout. The purpose of this work request is to bind all the needed information for the signature operation: - data to be transferred: wr->sg_list. * The raw data, pre-registered to a single MR (normally, before signature, this MR would have been used directly for the data transfer). the user will pass the data sge via sg_list exsisting member. - data protection guards: sig_handover.prot. * The data protection buffer, pre-registered to a single MR, which contains the data integrity guards of the raw data blocks. Note that it may not always exist, only in cases where the user is interested in storing protection guards in memory. - signature operation attributes: sig_handover.sig_attrs. * Tells the HCA how to validate/generate the protection information. Once the work request is executed, the memory region which will describe the signature transaction will be the sig_mr. The application can now go ahead and send the sig_mr.rkey or use the sig_mr.lkey for data transfer. 4. New Verb ib_check_sig_status check_sig_status Verb shall check if any signature errors are pending for a specific signature-enabled ib_mr. This Verb is a lightwight check and is allowed to be taken from interrupt context. Application must call this verb after it is known that the actual data transfer has finished. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/core/verbs.c | 8 +++ include/rdma/ib_verbs.h | 127 ++++++++++++++++++++++++++++++++++++++- 2 files changed, 134 insertions(+), 1 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 1d94a5c..5636d65 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1293,3 +1293,11 @@ int ib_dealloc_xrcd(struct ib_xrcd *xrcd) return xrcd->device->dealloc_xrcd(xrcd); } EXPORT_SYMBOL(ib_dealloc_xrcd); + +int ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err) +{ + return sig_mr->device->check_sig_status ? + sig_mr->device->check_sig_status(sig_mr, sig_err) : -ENOSYS; +} +EXPORT_SYMBOL(ib_check_sig_status); diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 53f065d..19b37eb 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -116,7 +116,19 @@ enum ib_device_cap_flags { IB_DEVICE_MEM_MGT_EXTENSIONS = (1<<21), IB_DEVICE_BLOCK_MULTICAST_LOOPBACK = (1<<22), IB_DEVICE_MEM_WINDOW_TYPE_2A = (1<<23), - IB_DEVICE_MEM_WINDOW_TYPE_2B = (1<<24) + IB_DEVICE_MEM_WINDOW_TYPE_2B = (1<<24), + IB_DEVICE_SIGNATURE_HANDOVER = (1<<25), +}; + +enum ib_signature_prot_cap { + IB_PROT_T10DIF_TYPE_1 = 1, + IB_PROT_T10DIF_TYPE_2 = 1 << 1, + IB_PROT_T10DIF_TYPE_3 = 1 << 2, +}; + +enum ib_signature_guard_cap { + IB_GUARD_T10DIF_CRC = 1, + IB_GUARD_T10DIF_CSUM = 1 << 1, }; enum ib_atomic_cap { @@ -166,6 +178,8 @@ struct ib_device_attr { unsigned int max_fast_reg_page_list_len; u16 max_pkeys; u8 local_ca_ack_delay; + int sig_prot_cap; + int sig_guard_cap; }; enum ib_mtu { @@ -630,6 +644,7 @@ enum ib_qp_type { enum ib_qp_create_flags { IB_QP_CREATE_IPOIB_UD_LSO = 1 << 0, IB_QP_CREATE_BLOCK_MULTICAST_LOOPBACK = 1 << 1, + IB_QP_CREATE_SIGNATURE_EN = 1 << 2, /* reserve bits 26-31 for low level drivers' internal use */ IB_QP_CREATE_RESERVED_START = 1 << 26, IB_QP_CREATE_RESERVED_END = 1 << 31, @@ -780,6 +795,7 @@ enum ib_wr_opcode { IB_WR_MASKED_ATOMIC_CMP_AND_SWP, IB_WR_MASKED_ATOMIC_FETCH_AND_ADD, IB_WR_BIND_MW, + IB_WR_REG_SIG_MR, /* reserve values for low level drivers' internal use. * These values will not be used at all in the ib core layer. */ @@ -885,6 +901,12 @@ struct ib_send_wr { u32 rkey; struct ib_mw_bind_info bind_info; } bind_mw; + struct { + struct ib_sig_attrs *sig_attrs; + struct ib_mr *sig_mr; + int access_flags; + struct ib_sge *prot; + } sig_handover; } wr; u32 xrc_remote_srq_num; /* XRC TGT QPs only */ }; @@ -941,6 +963,93 @@ struct ib_mr_init_attr { int flags; }; +enum ib_signature_type { + IB_SIG_TYPE_T10_DIF, +}; + +/** + * T10-DIF Signature types + * T10-DIF types are defined by SCSI + * specifications. + */ +enum ib_t10_dif_type { + IB_T10DIF_NONE, + IB_T10DIF_TYPE1, + IB_T10DIF_TYPE2, + IB_T10DIF_TYPE3 +}; + +/** + * Signature T10-DIF block-guard types + */ +enum ib_t10_dif_bg_type { + IB_T10DIF_CRC, + IB_T10DIF_CSUM +}; + +/** + * struct ib_sig_domain - Parameters specific for T10-DIF + * domain. + * @sig_type: specific signauture type + * @sig: union of all signature domain attributes that may + * be used to set domain layout. + * @dif: + * @type: T10-DIF type (0|1|2|3) + * @bg_type: T10-DIF block guard type (CRC|CSUM) + * @block_size: block size in signature domain. + * @app_tag: if app_tag is owned be the user, + * HCA will take this value to be app_tag. + * @ref_tag: initial ref_tag of signature handover. + * @type3_inc_reftag: T10-DIF type 3 does not state + * about the reference tag, it is the user + * choice to increment it or not. + */ +struct ib_sig_domain { + enum ib_signature_type sig_type; + union { + struct { + enum ib_t10_dif_type type; + enum ib_t10_dif_bg_type bg_type; + u16 block_size; + u16 bg; + u16 app_tag; + u32 ref_tag; + bool type3_inc_reftag; + } dif; + } sig; +}; + +/** + * struct ib_sig_attrs - Parameters for signature handover operation + * @check_mask: bitmask for signature byte check (8 bytes) + * @mem: memory domain layout desciptor. + * @wire: wire domain layout desciptor. + */ +struct ib_sig_attrs { + u8 check_mask; + struct ib_sig_domain mem; + struct ib_sig_domain wire; +}; + +enum ib_sig_err_type { + IB_SIG_BAD_CRC, + IB_SIG_BAD_REFTAG, + IB_SIG_BAD_APPTAG, +}; + +/** + * struct ib_sig_err - signature error descriptor + */ +struct ib_sig_err { + enum ib_sig_err_type err_type; + u16 expected_guard; + u16 actual_guard; + u32 expected_logical_block; + u32 actual_logical_block; + u64 sig_err_offset; + u32 key; +}; + /** * struct ib_mw_bind - Parameters for a type 1 memory window bind operation. * @wr_id: Work request id. @@ -1319,6 +1428,8 @@ struct ib_device { struct ib_ucontext *ucontext, struct ib_udata *udata); int (*dealloc_xrcd)(struct ib_xrcd *xrcd); + int (*check_sig_status)(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err); struct ib_dma_mapping_ops *dma_ops; @@ -2298,4 +2409,18 @@ struct ib_xrcd *ib_alloc_xrcd(struct ib_device *device); */ int ib_dealloc_xrcd(struct ib_xrcd *xrcd); +/** + * ib_check_sig_result: lightweight check of signature result + * on specific signature enabled MR and QP. + * Return value: + * - 0 for signature status SUCCESS. + * - 1 for signature status FAILURE. + * + * @sig_mr: The signature enabled MR that describes the + * protected domain. + * @sig_err: The container of the signature error in + * case of signature error indeed occured. + */ +int ib_check_sig_status(struct ib_mr *sig_mr, struct ib_sig_err *sig_err); + #endif /* IB_VERBS_H */ -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
[parent not found: <1383222255-22699-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <1383222255-22699-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-11-01 15:13 ` Bart Van Assche [not found] ` <5273C4FC.4070708-HInyCGIudOg@public.gmane.org> 2013-11-01 18:46 ` Bart Van Assche 2013-11-01 22:23 ` Bart Van Assche 2 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 15:13 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > +/** > + * struct ib_sig_domain - Parameters specific for T10-DIF > + * domain. > + * @sig_type: specific signauture type > + * @sig: union of all signature domain attributes that may > + * be used to set domain layout. > + * @dif: > + * @type: T10-DIF type (0|1|2|3) > + * @bg_type: T10-DIF block guard type (CRC|CSUM) > + * @block_size: block size in signature domain. > + * @app_tag: if app_tag is owned be the user, > + * HCA will take this value to be app_tag. > + * @ref_tag: initial ref_tag of signature handover. > + * @type3_inc_reftag: T10-DIF type 3 does not state > + * about the reference tag, it is the user > + * choice to increment it or not. > + */ > +struct ib_sig_domain { > + enum ib_signature_type sig_type; > + union { > + struct { > + enum ib_t10_dif_type type; > + enum ib_t10_dif_bg_type bg_type; > + u16 block_size; > + u16 bg; > + u16 app_tag; > + u32 ref_tag; > + bool type3_inc_reftag; > + } dif; > + } sig; > +}; My understanding from SPC-4 is that in that when using protection information such information is inserted after every protection interval. A protection interval can be smaller than a logical block. Shouldn't the name "block_size" be changed into something like "pi_interval" to avoid confusion with the logical block size ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5273C4FC.4070708-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <5273C4FC.4070708-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:15 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:15 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/1/2013 5:13 PM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> +/** >> + * struct ib_sig_domain - Parameters specific for T10-DIF >> + * domain. >> + * @sig_type: specific signauture type >> + * @sig: union of all signature domain attributes that may >> + * be used to set domain layout. >> + * @dif: >> + * @type: T10-DIF type (0|1|2|3) >> + * @bg_type: T10-DIF block guard type (CRC|CSUM) >> + * @block_size: block size in signature domain. >> + * @app_tag: if app_tag is owned be the user, >> + * HCA will take this value to be app_tag. >> + * @ref_tag: initial ref_tag of signature handover. >> + * @type3_inc_reftag: T10-DIF type 3 does not state >> + * about the reference tag, it is the user >> + * choice to increment it or not. >> + */ >> +struct ib_sig_domain { >> + enum ib_signature_type sig_type; >> + union { >> + struct { >> + enum ib_t10_dif_type type; >> + enum ib_t10_dif_bg_type bg_type; >> + u16 block_size; >> + u16 bg; >> + u16 app_tag; >> + u32 ref_tag; >> + bool type3_inc_reftag; >> + } dif; >> + } sig; >> +}; > > My understanding from SPC-4 is that in that when using protection > information such information is inserted after every protection > interval. A protection interval can be smaller than a logical block. > Shouldn't the name "block_size" be changed into something like > "pi_interval" to avoid confusion with the logical block size ? > > Bart. > True, for DIF types 2,3 protection interval is not restricted to be logical block length and may be smaller. I agree with pi_interval naming. Note that pi_intervals smaller than 512 bytes are not supported at the moment. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <1383222255-22699-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-11-01 15:13 ` Bart Van Assche @ 2013-11-01 18:46 ` Bart Van Assche [not found] ` <5273F6F4.3000300-HInyCGIudOg@public.gmane.org> 2013-11-01 22:23 ` Bart Van Assche 2 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 18:46 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > +/** > + * Signature T10-DIF block-guard types > + */ > +enum ib_t10_dif_bg_type { > + IB_T10DIF_CRC, > + IB_T10DIF_CSUM > +}; In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find which guard computation method IB_T10DIF_CSUM corresponds to ? Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5273F6F4.3000300-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <5273F6F4.3000300-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:15 ` Sagi Grimberg [not found] ` <52763E68.2040605-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:15 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/1/2013 8:46 PM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> +/** >> + * Signature T10-DIF block-guard types >> + */ >> +enum ib_t10_dif_bg_type { >> + IB_T10DIF_CRC, >> + IB_T10DIF_CSUM >> +}; > > In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC > computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 > + x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find > which guard computation method IB_T10DIF_CSUM corresponds to ? > > Bart. The IB_T10DIF_CSUM computation method corresponds to IP checksum rules. this is aligned with SHOST_DIX_GUARD_IP guard type. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <52763E68.2040605-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <52763E68.2040605-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-11-03 14:41 ` Bart Van Assche [not found] ` <5276608D.2020605-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-03 14:41 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 3/11/2013 4:15, Sagi Grimberg wrote: > On 11/1/2013 8:46 PM, Bart Van Assche wrote: >> On 31/10/2013 5:24, Sagi Grimberg wrote: >>> +/** >>> + * Signature T10-DIF block-guard types >>> + */ >>> +enum ib_t10_dif_bg_type { >>> + IB_T10DIF_CRC, >>> + IB_T10DIF_CSUM >>> +}; >> >> In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC >> computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 >> + x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find >> which guard computation method IB_T10DIF_CSUM corresponds to ? >> >> Bart. > > The IB_T10DIF_CSUM computation method corresponds to IP checksum rules. > this is aligned with SHOST_DIX_GUARD_IP guard type. Since the declarations added in <rdma/ib_verbs.h> constitute an interface definition I think it would help if it would be made more clear what these two symbols stand for. How about mentioning the names of the standards these two guard computation methods come from ? An alternative is to add a comment like the one above scsi_host_guard_type in <scsi/scsi_host.h> which explains the two guard computation methods well: /* * All DIX-capable initiators must support the T10-mandated CRC * checksum. Controllers can optionally implement the IP checksum * scheme which has much lower impact on system performance. Note * that the main rationale for the checksum is to match integrity * metadata with data. Detecting bit errors are a job for ECC memory * and buses. */ Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5276608D.2020605-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <5276608D.2020605-HInyCGIudOg@public.gmane.org> @ 2013-11-03 16:30 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 16:30 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/3/2013 4:41 PM, Bart Van Assche wrote: > On 3/11/2013 4:15, Sagi Grimberg wrote: >> On 11/1/2013 8:46 PM, Bart Van Assche wrote: >>> On 31/10/2013 5:24, Sagi Grimberg wrote: >>>> +/** >>>> + * Signature T10-DIF block-guard types >>>> + */ >>>> +enum ib_t10_dif_bg_type { >>>> + IB_T10DIF_CRC, >>>> + IB_T10DIF_CSUM >>>> +}; >>> >>> In SPC-4 paragraph 4.22.4 I found that the T10-PI guard is the CRC >>> computed from the generator polynomial x^16 + x^15 + x^11 + x^9 + x^8 >>> + x^7 + x^5 + x^4 + x^2 + x + 1. Could you tell me where I can find >>> which guard computation method IB_T10DIF_CSUM corresponds to ? >>> >>> Bart. >> >> The IB_T10DIF_CSUM computation method corresponds to IP checksum rules. >> this is aligned with SHOST_DIX_GUARD_IP guard type. > > Since the declarations added in <rdma/ib_verbs.h> constitute an > interface definition I think it would help if it would be made more > clear what these two symbols stand for. How about mentioning the names > of the standards these two guard computation methods come from ? An > alternative is to add a comment like the one above > scsi_host_guard_type in <scsi/scsi_host.h> which explains the two > guard computation methods well: > > /* > * All DIX-capable initiators must support the T10-mandated CRC > * checksum. Controllers can optionally implement the IP checksum > * scheme which has much lower impact on system performance. Note > * that the main rationale for the checksum is to match integrity > * metadata with data. Detecting bit errors are a job for ECC memory > * and buses. > */ > > Bart. > Agreed, I'll comment on each type correspondence (T10-DIF CRC checksum and IP checksum). Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <1383222255-22699-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-11-01 15:13 ` Bart Van Assche 2013-11-01 18:46 ` Bart Van Assche @ 2013-11-01 22:23 ` Bart Van Assche [not found] ` <527429E7.7010705-HInyCGIudOg@public.gmane.org> 2 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 22:23 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > + * @type3_inc_reftag: T10-DIF type 3 does not state > + * about the reference tag, it is the user > + * choice to increment it or not. Can you explain this further ? Does this mean that the HCA can check whether the reference tags are increasing when receiving data for TYPE 3 protection mode ? My understanding of SPC-4 is that the application is free to use the reference tag in any way when using TYPE 3 protection and hence that the HCA must not check whether the reference tag is increasing for TYPE 3 protection. See e.g. sd_dif_type3_get_tag() in drivers/scsi/sd_dif.c. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <527429E7.7010705-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API [not found] ` <527429E7.7010705-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:16 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:16 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/2/2013 12:23 AM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> + * @type3_inc_reftag: T10-DIF type 3 does not state >> + * about the reference tag, it is the user >> + * choice to increment it or not. > > Can you explain this further ? Does this mean that the HCA can check > whether the reference tags are increasing when receiving data for TYPE > 3 protection mode ? My understanding of SPC-4 is that the application > is free to use the reference tag in any way when using TYPE 3 > protection and hence that the HCA must not check whether the reference > tag is increasing for TYPE 3 protection. See e.g. > sd_dif_type3_get_tag() in drivers/scsi/sd_dif.c. > > Bart. As I understand TYPE 3, the reference tag is free for the application to use - which may choose to inc it each PI or not. This option allows the application to inc ref_tag in type 3. The DIF check is determined via check_mask. As I see it, correct use in case of DIF TYPE 3 is not to validate reference tag i.e. set REF_TAG bits in check_mask to zero. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions Sagi Grimberg 2013-10-31 12:24 ` [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg [not found] ` <1383222255-22699-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related Sagi Grimberg ` (9 subsequent siblings) 12 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w Support create_mr and destroy_mr verbs. Creating ib_mr may be done for either ib_mr that will register regular page lists like alloc_fast_reg_mr routine, or indirect ib_mr's that can register other (pre-registered) ib_mr's in an indirect manner. In addition user may request signature enable, that will mean that the created ib_mr may be attached with signature attributes (BSF, PSVs). Currently we only allow direct/indirect registration modes. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/main.c | 2 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 + drivers/infiniband/hw/mlx5/mr.c | 109 ++++++++++++++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++++++++++++++ include/linux/mlx5/device.h | 25 ++++++ include/linux/mlx5/driver.h | 19 +++++ 6 files changed, 223 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 3f831de..2e67a37 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, dev->ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; dev->ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; dev->ib_dev.dereg_mr = mlx5_ib_dereg_mr; + dev->ib_dev.destroy_mr = mlx5_ib_destroy_mr; dev->ib_dev.attach_mcast = mlx5_ib_mcg_attach; dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach; dev->ib_dev.process_mad = mlx5_ib_process_mad; + dev->ib_dev.create_mr = mlx5_ib_create_mr; dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 836be91..45d7424 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -262,6 +262,7 @@ struct mlx5_ib_mr { int npages; struct completion done; enum ib_wc_status status; + struct mlx5_core_sig_ctx *sig; }; struct mlx5_ib_fast_reg_page_list { @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt_addr, int access_flags, struct ib_udata *udata); int mlx5_ib_dereg_mr(struct ib_mr *ibmr); +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr); struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len); struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index bd41df9..44f7e46 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) return 0; } +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, + struct ib_mr_init_attr *mr_init_attr) +{ + struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct mlx5_create_mkey_mbox_in *in; + struct mlx5_ib_mr *mr; + int access_mode, err; + int ndescs = roundup(mr_init_attr->max_reg_descriptors, 4); + + mr = kzalloc(sizeof(*mr), GFP_KERNEL); + if (!mr) + return ERR_PTR(-ENOMEM); + + in = kzalloc(sizeof(*in), GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free; + } + + in->seg.status = 1 << 6; /* free */ + in->seg.xlt_oct_size = cpu_to_be32(ndescs); + in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8); + in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn); + access_mode = MLX5_ACCESS_MODE_MTT; + + if (mr_init_attr->flags & IB_MR_SIGNATURE_EN) { + u32 psv_index[2]; + + in->seg.flags_pd = cpu_to_be32(be32_to_cpu(in->seg.flags_pd) | + MLX5_MKEY_BSF_EN); + in->seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); + mr->sig = kzalloc(sizeof(*mr->sig), GFP_KERNEL); + if (!mr->sig) { + err = -ENOMEM; + goto err_free; + } + + /* create mem & wire PSVs */ + err = mlx5_core_create_psv(&dev->mdev, to_mpd(pd)->pdn, + 2, psv_index); + if (err) + goto err_free_sig; + + access_mode = MLX5_ACCESS_MODE_KLM; + mr->sig->psv_memory.psv_idx = psv_index[0]; + mr->sig->psv_wire.psv_idx = psv_index[1]; + } + + in->seg.flags = MLX5_PERM_UMR_EN | access_mode; + err = mlx5_core_create_mkey(&dev->mdev, &mr->mmr, in, sizeof(*in)); + kfree(in); + if (err) + goto err_destroy_psv; + + mr->ibmr.lkey = mr->mmr.key; + mr->ibmr.rkey = mr->mmr.key; + mr->umem = NULL; + + return &mr->ibmr; + +err_destroy_psv: + if (mr->sig) { + if (mlx5_core_destroy_psv(&dev->mdev, + mr->sig->psv_memory.psv_idx)) + mlx5_ib_warn(dev, "failed to destroy mem psv %d\n", + mr->sig->psv_memory.psv_idx); + if (mlx5_core_destroy_psv(&dev->mdev, + mr->sig->psv_wire.psv_idx)) + mlx5_ib_warn(dev, "failed to destroy wire psv %d\n", + mr->sig->psv_wire.psv_idx); + } +err_free_sig: + if (mr->sig) + kfree(mr->sig); +err_free: + kfree(mr); + return ERR_PTR(err); +} + +int mlx5_ib_destroy_mr(struct ib_mr *ibmr) +{ + struct mlx5_ib_dev *dev = to_mdev(ibmr->device); + struct mlx5_ib_mr *mr = to_mmr(ibmr); + int err; + + if (mr->sig) { + if (mlx5_core_destroy_psv(&dev->mdev, + mr->sig->psv_memory.psv_idx)) + mlx5_ib_warn(dev, "failed to destroy mem psv %d\n", + mr->sig->psv_memory.psv_idx); + if (mlx5_core_destroy_psv(&dev->mdev, + mr->sig->psv_wire.psv_idx)) + mlx5_ib_warn(dev, "failed to destroy wire psv %d\n", + mr->sig->psv_wire.psv_idx); + kfree(mr->sig); + } + + err = mlx5_core_destroy_mkey(&dev->mdev, &mr->mmr); + if (err) { + mlx5_ib_warn(dev, "failed to destroy mkey 0x%x (%d)\n", + mr->mmr.key, err); + return err; + } + + kfree(mr); + + return err; +} + struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, int max_page_list_len) { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index 5b44e2e..2ade604 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -134,3 +134,67 @@ int mlx5_core_dump_fill_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, return err; } EXPORT_SYMBOL(mlx5_core_dump_fill_mkey); + +int mlx5_core_create_psv(struct mlx5_core_dev *dev, u32 pdn, + int npsvs, u32 *sig_index) +{ + struct mlx5_allocate_psv_in in; + struct mlx5_allocate_psv_out out; + int i, err; + + if (npsvs > MLX5_MAX_PSVS) { + err = -EINVAL; + goto out; + } + + memset(&in, 0, sizeof(in)); + memset(&out, 0, sizeof(out)); + + in.hdr.opcode = cpu_to_be16(MLX5_CMD_OP_CREATE_PSV); + in.npsv_pd = cpu_to_be32((npsvs << 28) | pdn); + err = mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out)); + if (err) { + mlx5_core_err(dev, "cmd exec failed %d\n", err); + return err; + } + + if (out.hdr.status) { + mlx5_core_err(dev, "create_psv bad status %d\n", out.hdr.status); + return mlx5_cmd_status_to_err(&out.hdr); + } + + for (i = 0; i < npsvs; i++) + sig_index[i] = be32_to_cpu(out.psv_idx[i]) & 0xffffff; + +out: + return err; +} +EXPORT_SYMBOL(mlx5_core_create_psv); + +int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num) +{ + struct mlx5_destroy_psv_in in; + struct mlx5_destroy_psv_out out; + int err; + + memset(&in, 0, sizeof(in)); + memset(&out, 0, sizeof(out)); + + in.psv_number = cpu_to_be32(psv_num); + in.hdr.opcode = cpu_to_be16(MLX5_CMD_OP_DESTROY_PSV); + err = mlx5_cmd_exec(dev, &in, sizeof(in), &out, sizeof(out)); + if (err) { + mlx5_core_err(dev, "destroy_psv cmd exec failed %d\n", err); + goto out; + } + + if (out.hdr.status) { + mlx5_core_err(dev, "destroy_psv bad status %d\n", out.hdr.status); + err = mlx5_cmd_status_to_err(&out.hdr); + goto out; + } + +out: + return err; +} +EXPORT_SYMBOL(mlx5_core_destroy_psv); diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 68029b3..aef7eed 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -48,6 +48,8 @@ enum { MLX5_MAX_COMMANDS = 32, MLX5_CMD_DATA_BLOCK_SIZE = 512, MLX5_PCI_CMD_XPORT = 7, + MLX5_MKEY_BSF_OCTO_SIZE = 4, + MLX5_MAX_PSVS = 4, }; enum { @@ -908,4 +910,27 @@ enum { MLX_EXT_PORT_CAP_FLAG_EXTENDED_PORT_INFO = 1 << 0 }; +struct mlx5_allocate_psv_in { + struct mlx5_inbox_hdr hdr; + __be32 npsv_pd; + __be32 rsvd_psv0; +}; + +struct mlx5_allocate_psv_out { + struct mlx5_outbox_hdr hdr; + u8 rsvd[8]; + __be32 psv_idx[4]; +}; + +struct mlx5_destroy_psv_in { + struct mlx5_inbox_hdr hdr; + __be32 psv_number; + u8 rsvd[4]; +}; + +struct mlx5_destroy_psv_out { + struct mlx5_outbox_hdr hdr; + u8 rsvd[8]; +}; + #endif /* MLX5_DEVICE_H */ diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 8888381..7c33487 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -398,6 +398,22 @@ struct mlx5_eq { struct mlx5_rsc_debug *dbg; }; +struct mlx5_core_psv { + u32 psv_idx; + struct psv_layout { + u32 pd; + u16 syndrome; + u16 reserved; + u16 bg; + u16 app_tag; + u32 ref_tag; + } psv; +}; + +struct mlx5_core_sig_ctx { + struct mlx5_core_psv psv_memory; + struct mlx5_core_psv psv_wire; +}; struct mlx5_core_mr { u64 iova; @@ -734,6 +750,9 @@ void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db *db); const char *mlx5_command_str(int command); int mlx5_cmdif_debugfs_init(struct mlx5_core_dev *dev); void mlx5_cmdif_debugfs_cleanup(struct mlx5_core_dev *dev); +int mlx5_core_create_psv(struct mlx5_core_dev *dev, u32 pdn, + int npsvs, u32 *sig_index); +int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num); static inline u32 mlx5_mkey_to_idx(u32 mkey) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
[parent not found: <1383222255-22699-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr [not found] ` <1383222255-22699-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-10-31 12:52 ` Jack Wang [not found] ` <52725299.7020105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Jack Wang @ 2013-10-31 12:52 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 10/31/2013 01:24 PM, Sagi Grimberg wrote: > Support create_mr and destroy_mr verbs. > Creating ib_mr may be done for either ib_mr that will > register regular page lists like alloc_fast_reg_mr routine, > or indirect ib_mr's that can register other (pre-registered) > ib_mr's in an indirect manner. > > In addition user may request signature enable, that will mean > that the created ib_mr may be attached with signature attributes > (BSF, PSVs). > > Currently we only allow direct/indirect registration modes. > > Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> > --- > drivers/infiniband/hw/mlx5/main.c | 2 + > drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 + > drivers/infiniband/hw/mlx5/mr.c | 109 ++++++++++++++++++++++++++ > drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++++++++++++++ > include/linux/mlx5/device.h | 25 ++++++ > include/linux/mlx5/driver.h | 19 +++++ > 6 files changed, 223 insertions(+), 0 deletions(-) > > diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c > index 3f831de..2e67a37 100644 > --- a/drivers/infiniband/hw/mlx5/main.c > +++ b/drivers/infiniband/hw/mlx5/main.c > @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, > dev->ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; > dev->ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; > dev->ib_dev.dereg_mr = mlx5_ib_dereg_mr; > + dev->ib_dev.destroy_mr = mlx5_ib_destroy_mr; > dev->ib_dev.attach_mcast = mlx5_ib_mcg_attach; > dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach; > dev->ib_dev.process_mad = mlx5_ib_process_mad; > + dev->ib_dev.create_mr = mlx5_ib_create_mr; > dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; > dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; > dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; > diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h > index 836be91..45d7424 100644 > --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h > +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h > @@ -262,6 +262,7 @@ struct mlx5_ib_mr { > int npages; > struct completion done; > enum ib_wc_status status; > + struct mlx5_core_sig_ctx *sig; > }; > > struct mlx5_ib_fast_reg_page_list { > @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, > u64 virt_addr, int access_flags, > struct ib_udata *udata); > int mlx5_ib_dereg_mr(struct ib_mr *ibmr); > +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); > +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, > + struct ib_mr_init_attr *mr_init_attr); > struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, > int max_page_list_len); > struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, > diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c > index bd41df9..44f7e46 100644 > --- a/drivers/infiniband/hw/mlx5/mr.c > +++ b/drivers/infiniband/hw/mlx5/mr.c > @@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) > return 0; > } > > +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, > + struct ib_mr_init_attr *mr_init_attr) > +{ > + struct mlx5_ib_dev *dev = to_mdev(pd->device); > + struct mlx5_create_mkey_mbox_in *in; > + struct mlx5_ib_mr *mr; > + int access_mode, err; > + int ndescs = roundup(mr_init_attr->max_reg_descriptors, 4); > + > + mr = kzalloc(sizeof(*mr), GFP_KERNEL); > + if (!mr) > + return ERR_PTR(-ENOMEM); > + > + in = kzalloc(sizeof(*in), GFP_KERNEL); > + if (!in) { > + err = -ENOMEM; > + goto err_free; > + } > + > + in->seg.status = 1 << 6; /* free */ > + in->seg.xlt_oct_size = cpu_to_be32(ndescs); > + in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8); > + in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn); > + access_mode = MLX5_ACCESS_MODE_MTT; > + > + if (mr_init_attr->flags & IB_MR_SIGNATURE_EN) { > + u32 psv_index[2]; > + > + in->seg.flags_pd = cpu_to_be32(be32_to_cpu(in->seg.flags_pd) | > + MLX5_MKEY_BSF_EN); > + in->seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); > + mr->sig = kzalloc(sizeof(*mr->sig), GFP_KERNEL); > + if (!mr->sig) { > + err = -ENOMEM; > + goto err_free; > + } > + > + /* create mem & wire PSVs */ > + err = mlx5_core_create_psv(&dev->mdev, to_mpd(pd)->pdn, > + 2, psv_index); > + if (err) > + goto err_free_sig; > + > + access_mode = MLX5_ACCESS_MODE_KLM; > + mr->sig->psv_memory.psv_idx = psv_index[0]; > + mr->sig->psv_wire.psv_idx = psv_index[1]; > + } > + > + in->seg.flags = MLX5_PERM_UMR_EN | access_mode; > + err = mlx5_core_create_mkey(&dev->mdev, &mr->mmr, in, sizeof(*in)); > + kfree(in); > + if (err) > + goto err_destroy_psv; > + > + mr->ibmr.lkey = mr->mmr.key; > + mr->ibmr.rkey = mr->mmr.key; > + mr->umem = NULL; > + > + return &mr->ibmr; > + > +err_destroy_psv: > + if (mr->sig) { > + if (mlx5_core_destroy_psv(&dev->mdev, > + mr->sig->psv_memory.psv_idx)) > + mlx5_ib_warn(dev, "failed to destroy mem psv %d\n", > + mr->sig->psv_memory.psv_idx); > + if (mlx5_core_destroy_psv(&dev->mdev, > + mr->sig->psv_wire.psv_idx)) > + mlx5_ib_warn(dev, "failed to destroy wire psv %d\n", > + mr->sig->psv_wire.psv_idx); > + } > +err_free_sig: > + if (mr->sig) > + kfree(mr->sig); > +err_free: > + kfree(mr); > + return ERR_PTR(err); > +} > + There are memory leak in this function, you forget to kfree in in error case. Jack -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <52725299.7020105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr [not found] ` <52725299.7020105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-31 12:59 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:59 UTC (permalink / raw) To: Jack Wang Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 10/31/2013 2:52 PM, Jack Wang wrote: > On 10/31/2013 01:24 PM, Sagi Grimberg wrote: >> Support create_mr and destroy_mr verbs. >> Creating ib_mr may be done for either ib_mr that will >> register regular page lists like alloc_fast_reg_mr routine, >> or indirect ib_mr's that can register other (pre-registered) >> ib_mr's in an indirect manner. >> >> In addition user may request signature enable, that will mean >> that the created ib_mr may be attached with signature attributes >> (BSF, PSVs). >> >> Currently we only allow direct/indirect registration modes. >> >> Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> >> --- >> drivers/infiniband/hw/mlx5/main.c | 2 + >> drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 + >> drivers/infiniband/hw/mlx5/mr.c | 109 ++++++++++++++++++++++++++ >> drivers/net/ethernet/mellanox/mlx5/core/mr.c | 64 +++++++++++++++ >> include/linux/mlx5/device.h | 25 ++++++ >> include/linux/mlx5/driver.h | 19 +++++ >> 6 files changed, 223 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c >> index 3f831de..2e67a37 100644 >> --- a/drivers/infiniband/hw/mlx5/main.c >> +++ b/drivers/infiniband/hw/mlx5/main.c >> @@ -1401,9 +1401,11 @@ static int init_one(struct pci_dev *pdev, >> dev->ib_dev.get_dma_mr = mlx5_ib_get_dma_mr; >> dev->ib_dev.reg_user_mr = mlx5_ib_reg_user_mr; >> dev->ib_dev.dereg_mr = mlx5_ib_dereg_mr; >> + dev->ib_dev.destroy_mr = mlx5_ib_destroy_mr; >> dev->ib_dev.attach_mcast = mlx5_ib_mcg_attach; >> dev->ib_dev.detach_mcast = mlx5_ib_mcg_detach; >> dev->ib_dev.process_mad = mlx5_ib_process_mad; >> + dev->ib_dev.create_mr = mlx5_ib_create_mr; >> dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; >> dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; >> dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; >> diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h >> index 836be91..45d7424 100644 >> --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h >> +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h >> @@ -262,6 +262,7 @@ struct mlx5_ib_mr { >> int npages; >> struct completion done; >> enum ib_wc_status status; >> + struct mlx5_core_sig_ctx *sig; >> }; >> >> struct mlx5_ib_fast_reg_page_list { >> @@ -489,6 +490,9 @@ struct ib_mr *mlx5_ib_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, >> u64 virt_addr, int access_flags, >> struct ib_udata *udata); >> int mlx5_ib_dereg_mr(struct ib_mr *ibmr); >> +int mlx5_ib_destroy_mr(struct ib_mr *ibmr); >> +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, >> + struct ib_mr_init_attr *mr_init_attr); >> struct ib_mr *mlx5_ib_alloc_fast_reg_mr(struct ib_pd *pd, >> int max_page_list_len); >> struct ib_fast_reg_page_list *mlx5_ib_alloc_fast_reg_page_list(struct ib_device *ibdev, >> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c >> index bd41df9..44f7e46 100644 >> --- a/drivers/infiniband/hw/mlx5/mr.c >> +++ b/drivers/infiniband/hw/mlx5/mr.c >> @@ -921,6 +921,115 @@ int mlx5_ib_dereg_mr(struct ib_mr *ibmr) >> return 0; >> } >> >> +struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, >> + struct ib_mr_init_attr *mr_init_attr) >> +{ >> + struct mlx5_ib_dev *dev = to_mdev(pd->device); >> + struct mlx5_create_mkey_mbox_in *in; >> + struct mlx5_ib_mr *mr; >> + int access_mode, err; >> + int ndescs = roundup(mr_init_attr->max_reg_descriptors, 4); >> + >> + mr = kzalloc(sizeof(*mr), GFP_KERNEL); >> + if (!mr) >> + return ERR_PTR(-ENOMEM); >> + >> + in = kzalloc(sizeof(*in), GFP_KERNEL); >> + if (!in) { >> + err = -ENOMEM; >> + goto err_free; >> + } >> + >> + in->seg.status = 1 << 6; /* free */ >> + in->seg.xlt_oct_size = cpu_to_be32(ndescs); >> + in->seg.qpn_mkey7_0 = cpu_to_be32(0xffffff << 8); >> + in->seg.flags_pd = cpu_to_be32(to_mpd(pd)->pdn); >> + access_mode = MLX5_ACCESS_MODE_MTT; >> + >> + if (mr_init_attr->flags & IB_MR_SIGNATURE_EN) { >> + u32 psv_index[2]; >> + >> + in->seg.flags_pd = cpu_to_be32(be32_to_cpu(in->seg.flags_pd) | >> + MLX5_MKEY_BSF_EN); >> + in->seg.bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); >> + mr->sig = kzalloc(sizeof(*mr->sig), GFP_KERNEL); >> + if (!mr->sig) { >> + err = -ENOMEM; >> + goto err_free; >> + } >> + >> + /* create mem & wire PSVs */ >> + err = mlx5_core_create_psv(&dev->mdev, to_mpd(pd)->pdn, >> + 2, psv_index); >> + if (err) >> + goto err_free_sig; >> + >> + access_mode = MLX5_ACCESS_MODE_KLM; >> + mr->sig->psv_memory.psv_idx = psv_index[0]; >> + mr->sig->psv_wire.psv_idx = psv_index[1]; >> + } >> + >> + in->seg.flags = MLX5_PERM_UMR_EN | access_mode; >> + err = mlx5_core_create_mkey(&dev->mdev, &mr->mmr, in, sizeof(*in)); >> + kfree(in); >> + if (err) >> + goto err_destroy_psv; >> + >> + mr->ibmr.lkey = mr->mmr.key; >> + mr->ibmr.rkey = mr->mmr.key; >> + mr->umem = NULL; >> + >> + return &mr->ibmr; >> + >> +err_destroy_psv: >> + if (mr->sig) { >> + if (mlx5_core_destroy_psv(&dev->mdev, >> + mr->sig->psv_memory.psv_idx)) >> + mlx5_ib_warn(dev, "failed to destroy mem psv %d\n", >> + mr->sig->psv_memory.psv_idx); >> + if (mlx5_core_destroy_psv(&dev->mdev, >> + mr->sig->psv_wire.psv_idx)) >> + mlx5_ib_warn(dev, "failed to destroy wire psv %d\n", >> + mr->sig->psv_wire.psv_idx); >> + } >> +err_free_sig: >> + if (mr->sig) >> + kfree(mr->sig); >> +err_free: >> + kfree(mr); >> + return ERR_PTR(err); >> +} >> + > There are memory leak in this function, you forget to kfree in in error > case. > > Jack Nice Catch! Thanks! I'll make sure to fix and resend. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH RFC v2 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (2 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg 2013-10-31 12:24 ` [PATCH RFC v2 05/10] IB/mlx5: Break wqe handling to begin & finish routines Sagi Grimberg ` (8 subsequent siblings) 12 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w If user requested signature enable we Initialize relevant mlx5_ib_qp members. we mark the qp as sig_enable we initiatlize empty sig_err_list, and we increase qp size. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/mlx5_ib.h | 3 +++ drivers/infiniband/hw/mlx5/qp.c | 5 +++++ include/linux/mlx5/qp.h | 1 + 3 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 45d7424..758f0e1 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -189,6 +189,9 @@ struct mlx5_ib_qp { int create_type; u32 pa_lkey; + + /* Store signature errors */ + bool signature_en; }; struct mlx5_ib_cq_buf { diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 045f8cd..c80122e 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -734,6 +734,11 @@ static int create_qp_common(struct mlx5_ib_dev *dev, struct ib_pd *pd, spin_lock_init(&qp->sq.lock); spin_lock_init(&qp->rq.lock); + if (init_attr->create_flags == IB_QP_CREATE_SIGNATURE_EN) { + init_attr->cap.max_send_wr *= MLX5_SIGNATURE_SQ_MULT; + qp->signature_en = true; + } + if (init_attr->sq_sig_type == IB_SIGNAL_ALL_WR) qp->sq_signal_bits = MLX5_WQE_CTRL_CQ_UPDATE; diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index d9e3eac..174805c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -37,6 +37,7 @@ #include <linux/mlx5/driver.h> #define MLX5_INVALID_LKEY 0x100 +#define MLX5_SIGNATURE_SQ_MULT 3 enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH = 1 << 0, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH RFC v2 05/10] IB/mlx5: Break wqe handling to begin & finish routines [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (3 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg 2013-10-31 12:24 ` [PATCH RFC v2 06/10] IB/mlx5: remove MTT access mode from umr flags helper function Sagi Grimberg ` (7 subsequent siblings) 12 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w As a preliminary step for signature feature which will reuqire posting multiple (3) WQEs for a single WR, we break post_send routine WQE indexing into begin and finish routines. This patch does not change any functionality. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/qp.c | 95 ++++++++++++++++++++++++--------------- 1 files changed, 59 insertions(+), 36 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index c80122e..dc8d9fc 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1983,6 +1983,57 @@ static u8 get_fence(u8 fence, struct ib_send_wr *wr) } } +static int begin_wqe(struct mlx5_ib_qp *qp, void **seg, + struct mlx5_wqe_ctrl_seg **ctrl, + struct ib_send_wr *wr, int *idx, + int *size, int nreq) +{ + int err = 0; + if (unlikely(mlx5_wq_overflow(&qp->sq, nreq, qp->ibqp.send_cq))) { + err = -ENOMEM; + return err; + } + + *idx = qp->sq.cur_post & (qp->sq.wqe_cnt - 1); + *seg = mlx5_get_send_wqe(qp, *idx); + *ctrl = *seg; + *(uint32_t *)(*seg + 8) = 0; + (*ctrl)->imm = send_ieth(wr); + (*ctrl)->fm_ce_se = qp->sq_signal_bits | + (wr->send_flags & IB_SEND_SIGNALED ? + MLX5_WQE_CTRL_CQ_UPDATE : 0) | + (wr->send_flags & IB_SEND_SOLICITED ? + MLX5_WQE_CTRL_SOLICITED : 0); + + *seg += sizeof(**ctrl); + *size = sizeof(**ctrl) / 16; + + return err; +} + +static void finish_wqe(struct mlx5_ib_qp *qp, + struct mlx5_wqe_ctrl_seg *ctrl, + u8 size, unsigned idx, u64 wr_id, + int *nreq, u8 fence, u8 next_fence, + u32 mlx5_opcode) +{ + u8 opmod = 0; + ctrl->opmod_idx_opcode = cpu_to_be32(((u32)(qp->sq.cur_post) << 8) | + mlx5_opcode | ((u32)opmod << 24)); + ctrl->qpn_ds = cpu_to_be32(size | (qp->mqp.qpn << 8)); + ctrl->fm_ce_se |= fence; + qp->fm_cache = next_fence; + if (unlikely(qp->wq_sig)) + ctrl->signature = wq_sig(ctrl); + + qp->sq.wrid[idx] = wr_id; + qp->sq.w_list[idx].opcode = mlx5_opcode; + qp->sq.wqe_head[idx] = qp->sq.head + (*nreq)++; + qp->sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); + qp->sq.w_list[idx].next = qp->sq.cur_post; +} + + int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct ib_send_wr **bad_wr) { @@ -1996,7 +2047,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int uninitialized_var(size); void *qend = qp->sq.qend; unsigned long flags; - u32 mlx5_opcode; unsigned idx; int err = 0; int inl = 0; @@ -2005,7 +2055,6 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, int nreq; int i; u8 next_fence = 0; - u8 opmod = 0; u8 fence; spin_lock_irqsave(&qp->sq.lock, flags); @@ -2018,36 +2067,23 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, goto out; } - if (unlikely(mlx5_wq_overflow(&qp->sq, nreq, qp->ibqp.send_cq))) { + fence = qp->fm_cache; + num_sge = wr->num_sge; + if (unlikely(num_sge > qp->sq.max_gs)) { mlx5_ib_warn(dev, "\n"); err = -ENOMEM; *bad_wr = wr; goto out; } - fence = qp->fm_cache; - num_sge = wr->num_sge; - if (unlikely(num_sge > qp->sq.max_gs)) { + err = begin_wqe(qp, &seg, &ctrl, wr, &idx, &size, nreq); + if (err) { mlx5_ib_warn(dev, "\n"); err = -ENOMEM; *bad_wr = wr; goto out; } - idx = qp->sq.cur_post & (qp->sq.wqe_cnt - 1); - seg = mlx5_get_send_wqe(qp, idx); - ctrl = seg; - *(uint32_t *)(seg + 8) = 0; - ctrl->imm = send_ieth(wr); - ctrl->fm_ce_se = qp->sq_signal_bits | - (wr->send_flags & IB_SEND_SIGNALED ? - MLX5_WQE_CTRL_CQ_UPDATE : 0) | - (wr->send_flags & IB_SEND_SOLICITED ? - MLX5_WQE_CTRL_SOLICITED : 0); - - seg += sizeof(*ctrl); - size = sizeof(*ctrl) / 16; - switch (ibqp->qp_type) { case IB_QPT_XRC_INI: xrc = seg; @@ -2197,22 +2233,9 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, } } - mlx5_opcode = mlx5_ib_opcode[wr->opcode]; - ctrl->opmod_idx_opcode = cpu_to_be32(((u32)(qp->sq.cur_post) << 8) | - mlx5_opcode | - ((u32)opmod << 24)); - ctrl->qpn_ds = cpu_to_be32(size | (qp->mqp.qpn << 8)); - ctrl->fm_ce_se |= get_fence(fence, wr); - qp->fm_cache = next_fence; - if (unlikely(qp->wq_sig)) - ctrl->signature = wq_sig(ctrl); - - qp->sq.wrid[idx] = wr->wr_id; - qp->sq.w_list[idx].opcode = mlx5_opcode; - qp->sq.wqe_head[idx] = qp->sq.head + nreq; - qp->sq.cur_post += DIV_ROUND_UP(size * 16, MLX5_SEND_WQE_BB); - qp->sq.w_list[idx].next = qp->sq.cur_post; - + finish_wqe(qp, ctrl, size, idx, wr->wr_id,&nreq, + get_fence(fence, wr), next_fence, + mlx5_ib_opcode[wr->opcode]); if (0) dump_wqe(qp, idx, size); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH RFC v2 06/10] IB/mlx5: remove MTT access mode from umr flags helper function [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (4 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 05/10] IB/mlx5: Break wqe handling to begin & finish routines Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg 2013-10-31 12:24 ` [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device Sagi Grimberg ` (6 subsequent siblings) 12 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w get_umr_flags helper function might be used for types of access modes other than ACCESS_MODE_MTT, such as ACCESS_MODE_KLM. so remove it from helper and caller will add it's own access mode flag. This commit does not add/change functionality. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/qp.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index dc8d9fc..ca78078 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1773,7 +1773,7 @@ static u8 get_umr_flags(int acc) (acc & IB_ACCESS_REMOTE_WRITE ? MLX5_PERM_REMOTE_WRITE : 0) | (acc & IB_ACCESS_REMOTE_READ ? MLX5_PERM_REMOTE_READ : 0) | (acc & IB_ACCESS_LOCAL_WRITE ? MLX5_PERM_LOCAL_WRITE : 0) | - MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN | MLX5_ACCESS_MODE_MTT; + MLX5_PERM_LOCAL_READ | MLX5_PERM_UMR_EN; } static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, @@ -1785,7 +1785,8 @@ static void set_mkey_segment(struct mlx5_mkey_seg *seg, struct ib_send_wr *wr, return; } - seg->flags = get_umr_flags(wr->wr.fast_reg.access_flags); + seg->flags = get_umr_flags(wr->wr.fast_reg.access_flags) | + MLX5_ACCESS_MODE_MTT; *writ = seg->flags & (MLX5_PERM_LOCAL_WRITE | IB_ACCESS_REMOTE_WRITE); seg->qpn_mkey7_0 = cpu_to_be32((wr->wr.fast_reg.rkey & 0xff) | 0xffffff00); seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (5 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 06/10] IB/mlx5: remove MTT access mode from umr flags helper function Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg [not found] ` <1383222255-22699-8-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR Sagi Grimberg ` (5 subsequent siblings) 12 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w This will be useful when processing signature errors on a specific key. The mlx5 driver will lookup the matching mlx5 memory region structure and mark it as dirty (contains signature errors). Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/mr.c | 20 ++++++++++++++++++++ include/linux/mlx5/driver.h | 12 ++++++++++++ 3 files changed, 33 insertions(+), 0 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index b47739b..5b7b3c7 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -428,6 +428,7 @@ int mlx5_dev_init(struct mlx5_core_dev *dev, struct pci_dev *pdev) mlx5_init_cq_table(dev); mlx5_init_qp_table(dev); mlx5_init_srq_table(dev); + mlx5_init_mr_table(dev); return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mr.c b/drivers/net/ethernet/mellanox/mlx5/core/mr.c index 2ade604..f72e0b6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/mr.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/mr.c @@ -36,9 +36,18 @@ #include <linux/mlx5/cmd.h> #include "mlx5_core.h" +void mlx5_init_mr_table(struct mlx5_core_dev *dev) +{ + struct mlx5_mr_table *table = &dev->priv.mr_table; + + rwlock_init(&table->lock); + INIT_RADIX_TREE(&table->tree, GFP_ATOMIC); +} + int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen) { + struct mlx5_mr_table *table = &dev->priv.mr_table; struct mlx5_create_mkey_mbox_out out; int err; u8 key; @@ -63,14 +72,21 @@ int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, mr->key = mlx5_idx_to_mkey(be32_to_cpu(out.mkey) & 0xffffff) | key; mlx5_core_dbg(dev, "out 0x%x, key 0x%x, mkey 0x%x\n", be32_to_cpu(out.mkey), key, mr->key); + /* connect to MR tree */ + write_lock_irq(&table->lock); + err = radix_tree_insert(&table->tree, mr->key & 0xffffff00, mr); + write_unlock_irq(&table->lock); + return err; } EXPORT_SYMBOL(mlx5_core_create_mkey); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) { + struct mlx5_mr_table *table = &dev->priv.mr_table; struct mlx5_destroy_mkey_mbox_in in; struct mlx5_destroy_mkey_mbox_out out; + unsigned long flags; int err; memset(&in, 0, sizeof(in)); @@ -85,6 +101,10 @@ int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr) if (out.hdr.status) return mlx5_cmd_status_to_err(&out.hdr); + write_lock_irqsave(&table->lock, flags); + radix_tree_delete(&table->tree, mr->key & 0xffffff00); + write_unlock_irqrestore(&table->lock, flags); + return err; } EXPORT_SYMBOL(mlx5_core_destroy_mkey); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 7c33487..5fe0690 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -488,6 +488,13 @@ struct mlx5_srq_table { struct radix_tree_root tree; }; +struct mlx5_mr_table { + /* protect radix tree + */ + rwlock_t lock; + struct radix_tree_root tree; +}; + struct mlx5_priv { char name[MLX5_MAX_NAME_LEN]; struct mlx5_eq_table eq_table; @@ -516,6 +523,10 @@ struct mlx5_priv { struct mlx5_cq_table cq_table; /* end: cq staff */ + /* start: mr staff */ + struct mlx5_mr_table mr_table; + /* end: mr staff */ + /* start: alloc staff */ struct mutex pgdir_mutex; struct list_head pgdir_list; @@ -691,6 +702,7 @@ int mlx5_core_query_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, struct mlx5_query_srq_mbox_out *out); int mlx5_core_arm_srq(struct mlx5_core_dev *dev, struct mlx5_core_srq *srq, u16 lwm, int is_srq); +void mlx5_init_mr_table(struct mlx5_core_dev *dev); int mlx5_core_create_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr, struct mlx5_create_mkey_mbox_in *in, int inlen); int mlx5_core_destroy_mkey(struct mlx5_core_dev *dev, struct mlx5_core_mr *mr); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
[parent not found: <1383222255-22699-8-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device [not found] ` <1383222255-22699-8-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-11-01 20:46 ` Bart Van Assche [not found] ` <5274131C.90601-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 20:46 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > + /* connect to MR tree */ > + write_lock_irq(&table->lock); > + err = radix_tree_insert(&table->tree, mr->key & 0xffffff00, mr); > + write_unlock_irq(&table->lock); The conversion from MR key into radix tree index occurs three times so maybe it's worth to introduce an (inline) helper function for this conversion. Another comment about this conversion: is the mlx5 driver supported on 32-bit platforms ? I think that 0xffffff00 should be changed into 0xffffff00u to avoid that a compiler warning is triggered on 32-bit platforms. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5274131C.90601-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device [not found] ` <5274131C.90601-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:16 ` Sagi Grimberg [not found] ` <52763E88.4050300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:16 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/1/2013 10:46 PM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> + /* connect to MR tree */ >> + write_lock_irq(&table->lock); >> + err = radix_tree_insert(&table->tree, mr->key & 0xffffff00, mr); >> + write_unlock_irq(&table->lock); > > The conversion from MR key into radix tree index occurs three times so > maybe it's worth to introduce an (inline) helper function for this > conversion. > > Another comment about this conversion: is the mlx5 driver supported on > 32-bit platforms ? I think that 0xffffff00 should be changed into > 0xffffff00u to avoid that a compiler warning is triggered on 32-bit > platforms. > > Thanks, > > Bart. > Thanks for the hint. Although mlx5 currently doesn't support 32-bit platforms, I'll take your suggestion. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <52763E88.4050300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device [not found] ` <52763E88.4050300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-11-03 13:40 ` Or Gerlitz 0 siblings, 0 replies; 45+ messages in thread From: Or Gerlitz @ 2013-11-03 13:40 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: Sagi Grimberg, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 03/11/2013 14:16, Sagi Grimberg wrote: > mlx5 currently doesn't support 32-bit platforms, I'll take your > suggestion. I think 32-bit x86 is supported, but anyway, as we're picking your suggestion, should be fine. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (6 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg [not found] ` <1383222255-22699-9-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 12:24 ` [PATCH RFC v2 09/10] IB/mlx5: Collect signature error completion Sagi Grimberg ` (4 subsequent siblings) 12 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w This patch implements IB_WR_REG_SIG_MR posted by the user. Baisically this WR involvs 3 WQEs in order to prepare and properly register the signature layout: 1. post UMR WR to register the sig_mr in one of two possible ways: * In case the user registered a single MR for data so the UMR data segment consists of: - single klm (data MR) passed by the user - BSF with signature attributes requested by the user. * In case the user registered 2 MRs, one for data and one for protection, the UMR consists of: - strided block format which includes data and protection MRs and their repetitive block format. - BSF with signature attributes requested by the user. 2. post SET_PSV in order to set the for the memory domain initial signature parameters passed by the user. 3. post SET_PSV in order to set the for the wire domain initial signature parameters passed by the user. This patch also introduces some helper functions to set the BSF correctly and determining the signature format selectors. Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/qp.c | 416 +++++++++++++++++++++++++++++++++++++++ include/linux/mlx5/qp.h | 56 ++++++ 2 files changed, 472 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index ca78078..37e3715 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1719,6 +1719,26 @@ static __be64 frwr_mkey_mask(void) return cpu_to_be64(result); } +static __be64 sig_mkey_mask(void) +{ + u64 result; + + result = MLX5_MKEY_MASK_LEN | + MLX5_MKEY_MASK_PAGE_SIZE | + MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_RINVAL | + MLX5_MKEY_MASK_KEY | + MLX5_MKEY_MASK_LR | + MLX5_MKEY_MASK_LW | + MLX5_MKEY_MASK_RR | + MLX5_MKEY_MASK_RW | + MLX5_MKEY_MASK_SMALL_FENCE | + MLX5_MKEY_MASK_FREE | + MLX5_MKEY_MASK_BSF_EN; + + return cpu_to_be64(result); +} + static void set_frwr_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, struct ib_send_wr *wr, int li) { @@ -1901,6 +1921,339 @@ static int set_data_inl_seg(struct mlx5_ib_qp *qp, struct ib_send_wr *wr, return 0; } +static u16 prot_field_size(enum ib_signature_type type, u16 block_size) +{ + switch (type) { + case IB_SIG_TYPE_T10_DIF: + return MLX5_DIF_SIZE; + default: + return 0; + } +} + +static u8 bs_selector(int block_size) +{ + switch (block_size) { + case 512: return 0x1; + case 520: return 0x2; + case 4096: return 0x3; + case 4160: return 0x4; + case 1073741824: return 0x5; + default: return 0; + } +} + +static int format_selector(struct ib_sig_attrs *attr, + struct ib_sig_domain *domain, + int *selector) +{ + +#define FORMAT_DIF_NONE 0 +#define FORMAT_DIF_CRC_INC 4 +#define FORMAT_DIF_CSUM_INC 12 +#define FORMAT_DIF_CRC_NO_INC 13 +#define FORMAT_DIF_CSUM_NO_INC 14 + + switch (domain->sig.dif.type) { + case IB_T10DIF_NONE: + /* No DIF */ + *selector = FORMAT_DIF_NONE; + break; + case IB_T10DIF_TYPE1: /* Fall through */ + case IB_T10DIF_TYPE2: + switch (domain->sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = FORMAT_DIF_CRC_INC; + break; + case IB_T10DIF_CSUM: + *selector = FORMAT_DIF_CSUM_INC; + break; + default: + return 1; + } + break; + case IB_T10DIF_TYPE3: + switch (domain->sig.dif.bg_type) { + case IB_T10DIF_CRC: + *selector = domain->sig.dif.type3_inc_reftag ? + FORMAT_DIF_CRC_INC : + FORMAT_DIF_CRC_NO_INC; + break; + case IB_T10DIF_CSUM: + *selector = domain->sig.dif.type3_inc_reftag ? + FORMAT_DIF_CSUM_INC : + FORMAT_DIF_CSUM_NO_INC; + break; + default: + return 1; + } + break; + default: + return 1; + } + + return 0; +} + +static int mlx5_set_bsf(struct ib_mr *sig_mr, + struct ib_sig_attrs *sig_attrs, + struct mlx5_bsf *bsf, u32 data_size) +{ + struct mlx5_core_sig_ctx *msig = to_mmr(sig_mr)->sig; + struct mlx5_bsf_basic *basic = &bsf->basic; + struct ib_sig_domain *mem = &sig_attrs->mem; + struct ib_sig_domain *wire = &sig_attrs->wire; + int ret, selector; + + switch (sig_attrs->mem.sig_type) { + case IB_SIG_TYPE_T10_DIF: + if (sig_attrs->wire.sig_type != IB_SIG_TYPE_T10_DIF) + return -EINVAL; + + /* Input domain check byte mask */ + basic->check_byte_mask = sig_attrs->check_mask; + if (mem->sig.dif.block_size == wire->sig.dif.block_size && + mem->sig.dif.type == wire->sig.dif.type) { + /* Same block structure */ + basic->bsf_size_sbs = 1 << 4; + if (mem->sig.dif.bg_type == wire->sig.dif.bg_type) + basic->wire.copy_byte_mask = 0xff; + else + basic->wire.copy_byte_mask = 0x3f; + } else + basic->wire.bs_selector = bs_selector(wire->sig.dif.block_size); + + basic->mem.bs_selector = bs_selector(mem->sig.dif.block_size); + basic->raw_data_size = cpu_to_be32(data_size); + + ret = format_selector(sig_attrs, mem, &selector); + if (ret) + return -EINVAL; + basic->m_bfs_psv = cpu_to_be32(selector << 24 | + msig->psv_memory.psv_idx); + + ret = format_selector(sig_attrs, wire, &selector); + if (ret) + return -EINVAL; + basic->w_bfs_psv = cpu_to_be32(selector << 24 | + msig->psv_wire.psv_idx); + break; + + default: + return -EINVAL; + } + + return 0; +} + +static int set_sig_data_segment(struct ib_send_wr *wr, void **seg, int *size) +{ + struct ib_sig_attrs *sig_attrs = wr->wr.sig_handover.sig_attrs; + struct ib_mr *sig_mr = wr->wr.sig_handover.sig_mr; + struct mlx5_bsf *bsf; + u32 data_len = wr->sg_list->length; + u32 data_key = wr->sg_list->lkey; + u64 data_va = wr->sg_list->addr; + int ret; + int wqe_size; + + if (!wr->wr.sig_handover.prot->lkey) { + /** + * Source domain doesn't contain signature information + * So need construct: + * ------------------ + * | data_klm | + * ------------------ + * | BSF | + * ------------------ + **/ + struct mlx5_klm *data_klm = *seg; + + data_klm->bcount = cpu_to_be32(data_len); + data_klm->key = cpu_to_be32(data_key); + data_klm->va = cpu_to_be64(data_va); + wqe_size = ALIGN(sizeof(*data_klm), 64); + } else { + /** + * Source domain contains signature information + * So need construct a strided block format: + * --------------------------- + * | stride_block_ctrl | + * --------------------------- + * | data_klm | + * --------------------------- + * | prot_klm | + * --------------------------- + * | BSF | + * --------------------------- + **/ + struct mlx5_stride_block_ctrl_seg *sblock_ctrl; + struct mlx5_stride_block_entry *data_sentry; + struct mlx5_stride_block_entry *prot_sentry; + u32 prot_key = wr->wr.sig_handover.prot->lkey; + u64 prot_va = wr->wr.sig_handover.prot->addr; + u16 block_size = sig_attrs->mem.sig.dif.block_size; + int prot_size; + + sblock_ctrl = *seg; + data_sentry = (void *)sblock_ctrl + sizeof(*sblock_ctrl); + prot_sentry = (void *)data_sentry + sizeof(*data_sentry); + + prot_size = prot_field_size(sig_attrs->mem.sig_type, + block_size); + if (!prot_size) { + pr_err("Bad block size given: %u\n", block_size); + return -EINVAL; + } + sblock_ctrl->bcount_per_cycle = cpu_to_be32(block_size + + prot_size); + sblock_ctrl->op = cpu_to_be32(MLX5_STRIDE_BLOCK_OP); + sblock_ctrl->repeat_count = cpu_to_be32(data_len / block_size); + sblock_ctrl->num_entries = cpu_to_be16(2); + + data_sentry->bcount = cpu_to_be16(block_size); + data_sentry->key = cpu_to_be32(data_key); + data_sentry->va = cpu_to_be64(data_va); + prot_sentry->bcount = cpu_to_be16(prot_size); + prot_sentry->key = cpu_to_be32(prot_key); + + if (prot_key == data_key && prot_va == data_va) { + /** + * The data and protection are interleaved + * in a single memory region + **/ + prot_sentry->va = cpu_to_be64(data_va + block_size); + prot_sentry->stride = cpu_to_be16(block_size + prot_size); + data_sentry->stride = prot_sentry->stride; + } else { + /* The data and protection are two different buffers */ + prot_sentry->va = cpu_to_be64(prot_va); + data_sentry->stride = cpu_to_be16(block_size); + prot_sentry->stride = cpu_to_be16(prot_size); + } + wqe_size = ALIGN(sizeof(*sblock_ctrl) + sizeof(*data_sentry) + + sizeof(*prot_sentry), 64); + } + + bsf = *seg + wqe_size; + ret = mlx5_set_bsf(sig_mr, sig_attrs, bsf, data_len); + if (ret) + return -EINVAL; + + *seg = (void *)bsf + ALIGN(sizeof(*bsf), 64); + *size += (wqe_size + sizeof(*bsf)) / 16; + + return 0; +} + +static void set_sig_mkey_segment(struct mlx5_mkey_seg *seg, + struct ib_send_wr *wr, u32 nelements, + u32 length, u32 pdn) +{ + struct ib_mr *sig_mr = wr->wr.sig_handover.sig_mr; + u32 sig_key = sig_mr->rkey; + + memset(seg, 0, sizeof(*seg)); + + seg->status = 0x4; /*set free*/ + seg->flags = get_umr_flags(wr->wr.sig_handover.access_flags) | + MLX5_ACCESS_MODE_KLM; + seg->qpn_mkey7_0 = cpu_to_be32((sig_key & 0xff) | 0xffffff00); + seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL | + MLX5_MKEY_BSF_EN | pdn); + seg->start_addr = 0; + seg->len = cpu_to_be64(length); + seg->xlt_oct_size = cpu_to_be32(be16_to_cpu(get_klm_octo(nelements))); + seg->bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); +} + +static void set_sig_umr_segment(struct mlx5_wqe_umr_ctrl_seg *umr, + struct ib_send_wr *wr, u32 nelements) +{ + memset(umr, 0, sizeof(*umr)); + + umr->flags = 1 << 7 | 1 << 5; /* inline | check free */ + umr->klm_octowords = get_klm_octo(nelements); + umr->bsf_octowords = cpu_to_be16(MLX5_MKEY_BSF_OCTO_SIZE); + umr->mkey_mask = sig_mkey_mask(); +} + + +static int set_sig_umr_wr(struct ib_send_wr *wr, struct mlx5_ib_qp *qp, + void **seg, int *size) +{ + struct mlx5_ib_mr *sig_mr = to_mmr(wr->wr.sig_handover.sig_mr); + u32 pdn = get_pd(qp)->pdn; + u32 klm_oct_size; + int region_len, ret; + + if (unlikely(wr->num_sge != 1) || + unlikely(wr->wr.sig_handover.access_flags & + IB_ACCESS_REMOTE_ATOMIC) || + unlikely(!sig_mr->sig) || unlikely(!qp->signature_en)) + return -EINVAL; + + /* length of the protected region, data + protection */ + if (wr->sg_list->lkey == wr->wr.sig_handover.prot->lkey && + wr->sg_list->addr == wr->wr.sig_handover.prot->addr) + region_len = wr->sg_list->length; + else + region_len = wr->sg_list->length + + wr->wr.sig_handover.prot->length; + + /** + * KLM octoword size - if protection was provided + * then we use strided block format (3 octowords), + * else we use single KLM (1 octoword) + **/ + klm_oct_size = wr->wr.sig_handover.prot->lkey ? 3 : 1; + + set_sig_umr_segment(*seg, wr, klm_oct_size); + *seg += sizeof(struct mlx5_wqe_umr_ctrl_seg); + *size += sizeof(struct mlx5_wqe_umr_ctrl_seg) / 16; + if (unlikely((*seg == qp->sq.qend))) + *seg = mlx5_get_send_wqe(qp, 0); + + set_sig_mkey_segment(*seg, wr, klm_oct_size, region_len, pdn); + *seg += sizeof(struct mlx5_mkey_seg); + *size += sizeof(struct mlx5_mkey_seg) / 16; + if (unlikely((*seg == qp->sq.qend))) + *seg = mlx5_get_send_wqe(qp, 0); + + ret = set_sig_data_segment(wr, seg, size); + if (ret) + return ret; + + if (unlikely((*seg == qp->sq.qend))) + *seg = mlx5_get_send_wqe(qp, 0); + + return 0; +} + +static int set_psv_wr(struct ib_sig_domain *domain, + u32 psv_idx, void **seg, int *size) +{ + struct mlx5_seg_set_psv *psv_seg = *seg; + + psv_seg->psv_num = cpu_to_be32(psv_idx); + switch (domain->sig_type) { + case IB_SIG_TYPE_T10_DIF: + psv_seg->transient_sig = cpu_to_be32(domain->sig.dif.bg << 16 | + domain->sig.dif.app_tag); + psv_seg->ref_tag = cpu_to_be32(domain->sig.dif.ref_tag); + + *seg += sizeof(*psv_seg); + *size += sizeof(*psv_seg) / 16; + break; + + default: + pr_err("Bad signature type given.\n"); + return 1; + } + + return 0; +} + static int set_frwr_li_wr(void **seg, struct ib_send_wr *wr, int *size, struct mlx5_core_dev *mdev, struct mlx5_ib_pd *pd, struct mlx5_ib_qp *qp) { @@ -2042,6 +2395,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, struct mlx5_ib_dev *dev = to_mdev(ibqp->device); struct mlx5_core_dev *mdev = &dev->mdev; struct mlx5_ib_qp *qp = to_mqp(ibqp); + struct mlx5_ib_mr *mr; struct mlx5_wqe_data_seg *dpseg; struct mlx5_wqe_xrc_seg *xrc; struct mlx5_bf *bf = qp->bf; @@ -2154,6 +2508,67 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, num_sge = 0; break; + case IB_WR_REG_SIG_MR: + next_fence = MLX5_FENCE_MODE_INITIATOR_SMALL; + qp->sq.wr_data[idx] = IB_WR_REG_SIG_MR; + mr = to_mmr(wr->wr.sig_handover.sig_mr); + + ctrl->imm = cpu_to_be32(mr->ibmr.rkey); + err = set_sig_umr_wr(wr, qp, &seg, &size); + if (err) { + mlx5_ib_warn(dev, "\n"); + *bad_wr = wr; + goto out; + } + + finish_wqe(qp, ctrl, size, idx, wr->wr_id, + &nreq, get_fence(fence, wr), + next_fence, MLX5_OPCODE_UMR); + err = begin_wqe(qp, &seg, &ctrl, wr, + &idx, &size, nreq); + if (err) { + mlx5_ib_warn(dev, "\n"); + err = -ENOMEM; + *bad_wr = wr; + goto out; + } + + err = set_psv_wr(&wr->wr.sig_handover.sig_attrs->mem, + mr->sig->psv_memory.psv_idx, &seg, + &size); + if (err) { + mlx5_ib_warn(dev, "\n"); + *bad_wr = wr; + goto out; + } + + finish_wqe(qp, ctrl, size, idx, wr->wr_id, + &nreq, get_fence(fence, wr), + next_fence, MLX5_OPCODE_SET_PSV); + err = begin_wqe(qp, &seg, &ctrl, wr, + &idx, &size, nreq); + if (err) { + mlx5_ib_warn(dev, "\n"); + err = -ENOMEM; + *bad_wr = wr; + goto out; + } + + err = set_psv_wr(&wr->wr.sig_handover.sig_attrs->wire, + mr->sig->psv_wire.psv_idx, &seg, + &size); + if (err) { + mlx5_ib_warn(dev, "\n"); + *bad_wr = wr; + goto out; + } + + finish_wqe(qp, ctrl, size, idx, wr->wr_id, + &nreq, get_fence(fence, wr), + next_fence, MLX5_OPCODE_SET_PSV); + num_sge = 0; + goto skip_psv; + default: break; } @@ -2237,6 +2652,7 @@ int mlx5_ib_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, finish_wqe(qp, ctrl, size, idx, wr->wr_id,&nreq, get_fence(fence, wr), next_fence, mlx5_ib_opcode[wr->opcode]); +skip_psv: if (0) dump_wqe(qp, idx, size); } diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index 174805c..9ea6cf6 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -38,6 +38,8 @@ #define MLX5_INVALID_LKEY 0x100 #define MLX5_SIGNATURE_SQ_MULT 3 +#define MLX5_DIF_SIZE 8 +#define MLX5_STRIDE_BLOCK_OP 0x400 enum mlx5_qp_optpar { MLX5_QP_OPTPAR_ALT_ADDR_PATH = 1 << 0, @@ -279,6 +281,60 @@ struct mlx5_wqe_inline_seg { __be32 byte_count; }; +struct mlx5_bsf { + struct mlx5_bsf_basic { + u8 bsf_size_sbs; + u8 check_byte_mask; + union { + u8 copy_byte_mask; + u8 bs_selector; + u8 rsvd_wflags; + } wire; + union { + u8 bs_selector; + u8 rsvd_mflags; + } mem; + __be32 raw_data_size; + __be32 w_bfs_psv; + __be32 m_bfs_psv; + } basic; + struct mlx5_bsf_ext { + __be32 t_init_gen_pro_size; + __be32 rsvd_epi_size; + __be32 w_tfs_psv; + __be32 m_tfs_psv; + } ext; + struct mlx5_bsf_inl { + __be32 w_inl_vld; + __be32 w_rsvd; + __be64 w_block_format; + __be32 m_inl_vld; + __be32 m_rsvd; + __be64 m_block_format; + } inl; +}; + +struct mlx5_klm { + __be32 bcount; + __be32 key; + __be64 va; +}; + +struct mlx5_stride_block_entry { + __be16 stride; + __be16 bcount; + __be32 key; + __be64 va; +}; + +struct mlx5_stride_block_ctrl_seg { + __be32 bcount_per_cycle; + __be32 op; + __be32 repeat_count; + u16 rsvd; + __be16 num_entries; +}; + struct mlx5_core_qp { void (*event) (struct mlx5_core_qp *, int); int qpn; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
[parent not found: <1383222255-22699-9-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <1383222255-22699-9-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-10-31 13:01 ` Jack Wang 2013-11-01 15:05 ` Bart Van Assche 2013-11-01 20:37 ` Bart Van Assche 2 siblings, 0 replies; 45+ messages in thread From: Jack Wang @ 2013-10-31 13:01 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 10/31/2013 01:24 PM, Sagi Grimberg wrote: > +{ > + struct ib_mr *sig_mr = wr->wr.sig_handover.sig_mr; > + u32 sig_key = sig_mr->rkey; > + > + memset(seg, 0, sizeof(*seg)); > + > + seg->status = 0x4; /*set free*/ > + seg->flags = get_umr_flags(wr->wr.sig_handover.access_flags) | > + MLX5_ACCESS_MODE_KLM; > + seg->qpn_mkey7_0 = cpu_to_be32((sig_key & 0xff) | 0xffffff00); > + seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL | > + MLX5_MKEY_BSF_EN | pdn); > + seg->start_addr = 0; Already memset, no need to set start_addr here. Jack > + seg->len = cpu_to_be64(length); > + seg->xlt_oct_size = cpu_to_be32(be16_to_cpu(get_klm_octo(nelements))); > + seg->bsfs_octo_size = cpu_to_be32(MLX5_MKEY_BSF_OCTO_SIZE); > +} > + -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <1383222255-22699-9-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 13:01 ` Jack Wang @ 2013-11-01 15:05 ` Bart Van Assche [not found] ` <5273C32E.2020405-HInyCGIudOg@public.gmane.org> 2013-11-01 20:37 ` Bart Van Assche 2 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 15:05 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > +static u8 bs_selector(int block_size) > +{ > + switch (block_size) { > + case 512: return 0x1; > + case 520: return 0x2; > + case 4096: return 0x3; > + case 4160: return 0x4; > + case 1073741824: return 0x5; > + default: return 0; > + } > +} Would it be possible to provide some more information about how the five supported block sizes have been chosen ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5273C32E.2020405-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <5273C32E.2020405-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:16 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:16 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/1/2013 5:05 PM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> +static u8 bs_selector(int block_size) >> +{ >> + switch (block_size) { >> + case 512: return 0x1; >> + case 520: return 0x2; >> + case 4096: return 0x3; >> + case 4160: return 0x4; >> + case 1073741824: return 0x5; >> + default: return 0; >> + } >> +} > > Would it be possible to provide some more information about how the > five supported block sizes have been chosen ? > > Thanks, > > Bart. > These block_sizes were chosen from our costumers who were interested in signature. This is the current HCA support for the time being. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <1383222255-22699-9-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 2013-10-31 13:01 ` Jack Wang 2013-11-01 15:05 ` Bart Van Assche @ 2013-11-01 20:37 ` Bart Van Assche [not found] ` <527410F3.6040704-HInyCGIudOg@public.gmane.org> 2 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 20:37 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > This patch implements IB_WR_REG_SIG_MR posted by the user. > > Baisically this WR involvs 3 WQEs in order to prepare and properly > register the signature layout: > > 1. post UMR WR to register the sig_mr in one of two possible ways: > * In case the user registered a single MR for data so the UMR data segment > consists of: > - single klm (data MR) passed by the user > - BSF with signature attributes requested by the user. > * In case the user registered 2 MRs, one for data and one for protection, > the UMR consists of: > - strided block format which includes data and protection MRs and > their repetitive block format. > - BSF with signature attributes requested by the user. > > 2. post SET_PSV in order to set the for the memory domain initial > signature parameters passed by the user. > > 3. post SET_PSV in order to set the for the wire domain initial > signature parameters passed by the user. > > This patch also introduces some helper functions to set the BSF correctly > and determining the signature format selectors. Has it already been explained somewhere what the abbreviations KLM, BSF and PSV stand for ? Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <527410F3.6040704-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <527410F3.6040704-HInyCGIudOg@public.gmane.org> @ 2013-11-02 19:21 ` Or Gerlitz [not found] ` <CAJZOPZLnyqzzx91ohmW+exy0k8g-FX6reSBCGmh_F2tTGWWOog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Or Gerlitz @ 2013-11-02 19:21 UTC (permalink / raw) To: Bart Van Assche Cc: Sagi Grimberg, linux-rdma, oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Tzahi Oved On Fri, Nov 1, 2013 at 10:37 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> >> This patch implements IB_WR_REG_SIG_MR posted by the user. >> >> Baisically this WR involvs 3 WQEs in order to prepare and properly >> register the signature layout: >> >> 1. post UMR WR to register the sig_mr in one of two possible ways: >> * In case the user registered a single MR for data so the UMR data >> segment >> consists of: >> - single klm (data MR) passed by the user >> - BSF with signature attributes requested by the user. >> * In case the user registered 2 MRs, one for data and one for >> protection, >> the UMR consists of: >> - strided block format which includes data and protection MRs and >> their repetitive block format. >> - BSF with signature attributes requested by the user. >> >> 2. post SET_PSV in order to set the for the memory domain initial >> signature parameters passed by the user. >> >> 3. post SET_PSV in order to set the for the wire domain initial >> signature parameters passed by the user. >> >> This patch also introduces some helper functions to set the BSF correctly >> and determining the signature format selectors. > > > Has it already been explained somewhere what the abbreviations KLM, BSF and > PSV stand for ? Bart, these are all HW T10 related objects/concepts, we made an effort to keep them contained within the mlx5 driver such that they don't show up on the IB core layer. If this helps for the review, Sagi can spare few words on each, sure. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <CAJZOPZLnyqzzx91ohmW+exy0k8g-FX6reSBCGmh_F2tTGWWOog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <CAJZOPZLnyqzzx91ohmW+exy0k8g-FX6reSBCGmh_F2tTGWWOog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2013-11-02 21:59 ` Bart Van Assche [not found] ` <527575D0.9050802-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-02 21:59 UTC (permalink / raw) To: Or Gerlitz Cc: Sagi Grimberg, linux-rdma, oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Tzahi Oved On 2/11/2013 12:21, Or Gerlitz wrote: > On Fri, Nov 1, 2013 at 10:37 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> wrote: >> On 31/10/2013 5:24, Sagi Grimberg wrote: >>> >>> This patch implements IB_WR_REG_SIG_MR posted by the user. >>> >>> Baisically this WR involvs 3 WQEs in order to prepare and properly >>> register the signature layout: >>> >>> 1. post UMR WR to register the sig_mr in one of two possible ways: >>> * In case the user registered a single MR for data so the UMR data >>> segment >>> consists of: >>> - single klm (data MR) passed by the user >>> - BSF with signature attributes requested by the user. >>> * In case the user registered 2 MRs, one for data and one for >>> protection, >>> the UMR consists of: >>> - strided block format which includes data and protection MRs and >>> their repetitive block format. >>> - BSF with signature attributes requested by the user. >>> >>> 2. post SET_PSV in order to set the for the memory domain initial >>> signature parameters passed by the user. >>> >>> 3. post SET_PSV in order to set the for the wire domain initial >>> signature parameters passed by the user. >>> >>> This patch also introduces some helper functions to set the BSF correctly >>> and determining the signature format selectors. >> >> >> Has it already been explained somewhere what the abbreviations KLM, BSF and >> PSV stand for ? > > Bart, these are all HW T10 related objects/concepts, we made an effort > to keep them contained within the mlx5 driver such that they don't > show up on the IB core layer. If this helps for the review, Sagi can > spare few words on each, sure. Hello Or, I would certainly appreciate it if these abbreviations could be clarified further. That would allow me to understand what has been explained in the above patch description :-) Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <527575D0.9050802-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR [not found] ` <527575D0.9050802-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:20 ` Sagi Grimberg 0 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:20 UTC (permalink / raw) To: Bart Van Assche, Or Gerlitz Cc: linux-rdma, oren-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org, Tzahi Oved On 11/2/2013 11:59 PM, Bart Van Assche wrote: > On 2/11/2013 12:21, Or Gerlitz wrote: >> On Fri, Nov 1, 2013 at 10:37 PM, Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> >> wrote: >>> On 31/10/2013 5:24, Sagi Grimberg wrote: >>>> >>>> This patch implements IB_WR_REG_SIG_MR posted by the user. >>>> >>>> Baisically this WR involvs 3 WQEs in order to prepare and properly >>>> register the signature layout: >>>> >>>> 1. post UMR WR to register the sig_mr in one of two possible ways: >>>> * In case the user registered a single MR for data so the UMR >>>> data >>>> segment >>>> consists of: >>>> - single klm (data MR) passed by the user >>>> - BSF with signature attributes requested by the user. >>>> * In case the user registered 2 MRs, one for data and one for >>>> protection, >>>> the UMR consists of: >>>> - strided block format which includes data and protection >>>> MRs and >>>> their repetitive block format. >>>> - BSF with signature attributes requested by the user. >>>> >>>> 2. post SET_PSV in order to set the for the memory domain initial >>>> signature parameters passed by the user. >>>> >>>> 3. post SET_PSV in order to set the for the wire domain initial >>>> signature parameters passed by the user. >>>> >>>> This patch also introduces some helper functions to set the BSF >>>> correctly >>>> and determining the signature format selectors. >>> >>> >>> Has it already been explained somewhere what the abbreviations KLM, >>> BSF and >>> PSV stand for ? >> >> Bart, these are all HW T10 related objects/concepts, we made an effort >> to keep them contained within the mlx5 driver such that they don't >> show up on the IB core layer. If this helps for the review, Sagi can >> spare few words on each, sure. > > Hello Or, > > I would certainly appreciate it if these abbreviations could be > clarified further. That would allow me to understand what has been > explained in the above patch description :-) > > Bart. > > Hey Bart, As Or said, these concepts are vendor specific and not exposed to IB core layer And their naming is also pure Mellanox. This is also might change in future generation HCAs. In general the sig_mr (signature enabled) is a memory region that can register other memory regions (hint: data MR and protection MR) and is attached to (mlx5) signature objects. KLM: A tuple {key, addr, len} that is used for indirect registration. BSF: this is the object that describes the wire and memory layouts. we call it a byte-stream format. PSV: this is the signature variable that is computing the guards - used for generation and/or validation. exists for each domain. So We constructed REG_SIG_MR operation as a 3-way operation: - Online registration for sig_mr: Register in an indirect manner for data and protection (if exists). If no protection exists in memory domain the sig_mr registers the data buffer (KLM). If protection exists in memory domain (DIX) the sig_mr registers data and protections buffers (KLMs) In the DIX case, order to transfer DIF every pi_interval the registration also defines the strided format of the execution (pi_interval of data followed by 8byte of protection in a repetitive manner). - Define signature format of wire/memory domains (BSF object) tell the HW how to treat the signature layout in each domain (signature type, pi_interval etc...) - Set the signature variables for each domain (memory, wire) Here we place the seeds that the HW starts signature computation (In the DIF case, Initial CRC, Initial ref_tag, initial app_tag). Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH RFC v2 09/10] IB/mlx5: Collect signature error completion [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (7 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg 2013-10-31 12:24 ` [PATCH RFC v2 10/10] IB/mlx5: Publish support in signature feature Sagi Grimberg ` (3 subsequent siblings) 12 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w This commit takes care of the generated signature error cqe generated by the HW (if happened). The underlying mlx5 driver will handle signature error completions and will mark the relevant memory region as dirty. Once the user will get the completion for the transaction he must check for signature errors on signature memory region using a new lightweight verb ib_check_sig_status and if such exsists, he will get the signature error information. In case the user will not check for signature error, i.e. won't call ib_check_sig_status, it will not be allowed to use the memory region for another signature operation (REG_SIG_MR work request will fail). Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/cq.c | 53 ++++++++++++++++++++++++++++++++++ drivers/infiniband/hw/mlx5/main.c | 1 + drivers/infiniband/hw/mlx5/mlx5_ib.h | 7 ++++ drivers/infiniband/hw/mlx5/mr.c | 29 ++++++++++++++++++ drivers/infiniband/hw/mlx5/qp.c | 8 ++++- include/linux/mlx5/cq.h | 1 + include/linux/mlx5/device.h | 18 +++++++++++ include/linux/mlx5/driver.h | 4 ++ include/linux/mlx5/qp.h | 5 +++ 9 files changed, 124 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c index 344ab03..041a7a0 100644 --- a/drivers/infiniband/hw/mlx5/cq.c +++ b/drivers/infiniband/hw/mlx5/cq.c @@ -351,6 +351,33 @@ static void handle_atomics(struct mlx5_ib_qp *qp, struct mlx5_cqe64 *cqe64, qp->sq.last_poll = tail; } +static void get_sig_err_item(struct mlx5_sig_err_cqe *cqe, + struct ib_sig_err *item) +{ + u16 syndrome = be16_to_cpu(cqe->syndrome); + + switch (syndrome) { + case 13: + item->err_type = IB_SIG_BAD_CRC; + break; + case 12: + item->err_type = IB_SIG_BAD_APPTAG; + break; + case 11: + item->err_type = IB_SIG_BAD_REFTAG; + break; + default: + break; + } + + item->expected_guard = be32_to_cpu(cqe->expected_trans_sig) >> 16; + item->actual_guard = be32_to_cpu(cqe->actual_trans_sig) >> 16; + item->expected_logical_block = be32_to_cpu(cqe->expected_reftag); + item->actual_logical_block = be32_to_cpu(cqe->actual_reftag); + item->sig_err_offset = be64_to_cpu(cqe->err_offset); + item->key = be32_to_cpu(cqe->mkey); +} + static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_ib_qp **cur_qp, struct ib_wc *wc) @@ -360,12 +387,16 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, struct mlx5_cqe64 *cqe64; struct mlx5_core_qp *mqp; struct mlx5_ib_wq *wq; + struct mlx5_sig_err_cqe *sig_err_cqe; + struct mlx5_core_mr *mmr; + struct mlx5_ib_mr *mr; uint8_t opcode; uint32_t qpn; u16 wqe_ctr; void *cqe; int idx; +repoll: cqe = next_cqe_sw(cq); if (!cqe) return -EAGAIN; @@ -449,6 +480,28 @@ static int mlx5_poll_one(struct mlx5_ib_cq *cq, } } break; + case MLX5_CQE_SIG_ERR: + sig_err_cqe = (struct mlx5_sig_err_cqe *)cqe64; + + read_lock(&dev->mdev.priv.mr_table.lock); + mmr = __mlx5_mr_lookup(&dev->mdev, + be32_to_cpu(sig_err_cqe->mkey) & 0xffffff00); + if (unlikely(!mmr)) { + read_unlock(&dev->mdev.priv.mr_table.lock); + mlx5_ib_warn(dev, "CQE@CQ %06x for unknown MR %6x\n", + cq->mcq.cqn, be32_to_cpu(sig_err_cqe->mkey)); + return -EINVAL; + } + + mr = to_mibmr(mmr); + get_sig_err_item(sig_err_cqe, &mr->sig->err_item); + mr->sig->sig_err_exists = true; + + mlx5_ib_dbg(dev, "Got SIGERR on key: 0x%x\n", + mr->sig->err_item.key); + + read_unlock(&dev->mdev.priv.mr_table.lock); + goto repoll; } return 0; diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 2e67a37..f3c7111 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1409,6 +1409,7 @@ static int init_one(struct pci_dev *pdev, dev->ib_dev.alloc_fast_reg_mr = mlx5_ib_alloc_fast_reg_mr; dev->ib_dev.alloc_fast_reg_page_list = mlx5_ib_alloc_fast_reg_page_list; dev->ib_dev.free_fast_reg_page_list = mlx5_ib_free_fast_reg_page_list; + dev->ib_dev.check_sig_status = mlx5_ib_check_sig_status; if (mdev->caps.flags & MLX5_DEV_CAP_FLAG_XRC) { dev->ib_dev.alloc_xrcd = mlx5_ib_alloc_xrcd; diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 758f0e1..f175fa4 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -394,6 +394,11 @@ static inline struct mlx5_ib_qp *to_mibqp(struct mlx5_core_qp *mqp) return container_of(mqp, struct mlx5_ib_qp, mqp); } +static inline struct mlx5_ib_mr *to_mibmr(struct mlx5_core_mr *mmr) +{ + return container_of(mmr, struct mlx5_ib_mr, mmr); +} + static inline struct mlx5_ib_pd *to_mpd(struct ib_pd *ibpd) { return container_of(ibpd, struct mlx5_ib_pd, ibpd); @@ -531,6 +536,8 @@ int mlx5_mr_cache_init(struct mlx5_ib_dev *dev); int mlx5_mr_cache_cleanup(struct mlx5_ib_dev *dev); int mlx5_mr_ib_cont_pages(struct ib_umem *umem, u64 addr, int *count, int *shift); void mlx5_umr_cq_handler(struct ib_cq *cq, void *cq_context); +int mlx5_ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err); static inline void init_query_mad(struct ib_smp *mad) { diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 44f7e46..d796d60 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -967,6 +967,11 @@ struct ib_mr *mlx5_ib_create_mr(struct ib_pd *pd, access_mode = MLX5_ACCESS_MODE_KLM; mr->sig->psv_memory.psv_idx = psv_index[0]; mr->sig->psv_wire.psv_idx = psv_index[1]; + + mr->sig->sig_status_checked = true; + mr->sig->sig_err_exists = false; + /* Next UMR, Arm SIGERR */ + ++mr->sig->sigerr_count; } in->seg.flags = MLX5_PERM_UMR_EN | access_mode; @@ -1114,3 +1119,27 @@ void mlx5_ib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list) kfree(mfrpl->ibfrpl.page_list); kfree(mfrpl); } + +int mlx5_ib_check_sig_status(struct ib_mr *sig_mr, + struct ib_sig_err *sig_err) +{ + struct mlx5_ib_mr *mmr = to_mmr(sig_mr); + int ret = 0; + + if (!mmr->sig->sig_err_exists) + goto out; + + if (sig_mr->lkey == mmr->sig->err_item.key) + memcpy(sig_err, &mmr->sig->err_item, sizeof(*sig_err)); + else { + sig_err->err_type = IB_SIG_BAD_CRC; + sig_err->sig_err_offset = 0; + sig_err->key = mmr->sig->err_item.key; + } + ret = 1; + mmr->sig->sig_err_exists = false; + mmr->sig->sigerr_count++; +out: + mmr->sig->sig_status_checked = true; + return ret; +} diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index 37e3715..439e07a 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -1726,6 +1726,7 @@ static __be64 sig_mkey_mask(void) result = MLX5_MKEY_MASK_LEN | MLX5_MKEY_MASK_PAGE_SIZE | MLX5_MKEY_MASK_START_ADDR | + MLX5_MKEY_MASK_EN_SIGERR | MLX5_MKEY_MASK_EN_RINVAL | MLX5_MKEY_MASK_KEY | MLX5_MKEY_MASK_LR | @@ -2152,6 +2153,7 @@ static void set_sig_mkey_segment(struct mlx5_mkey_seg *seg, { struct ib_mr *sig_mr = wr->wr.sig_handover.sig_mr; u32 sig_key = sig_mr->rkey; + u8 sigerr = to_mmr(sig_mr)->sig->sigerr_count & 1; memset(seg, 0, sizeof(*seg)); @@ -2159,7 +2161,7 @@ static void set_sig_mkey_segment(struct mlx5_mkey_seg *seg, seg->flags = get_umr_flags(wr->wr.sig_handover.access_flags) | MLX5_ACCESS_MODE_KLM; seg->qpn_mkey7_0 = cpu_to_be32((sig_key & 0xff) | 0xffffff00); - seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL | + seg->flags_pd = cpu_to_be32(MLX5_MKEY_REMOTE_INVAL | sigerr << 26 | MLX5_MKEY_BSF_EN | pdn); seg->start_addr = 0; seg->len = cpu_to_be64(length); @@ -2190,7 +2192,8 @@ static int set_sig_umr_wr(struct ib_send_wr *wr, struct mlx5_ib_qp *qp, if (unlikely(wr->num_sge != 1) || unlikely(wr->wr.sig_handover.access_flags & IB_ACCESS_REMOTE_ATOMIC) || - unlikely(!sig_mr->sig) || unlikely(!qp->signature_en)) + unlikely(!sig_mr->sig) || unlikely(!qp->signature_en) || + unlikely(!sig_mr->sig->sig_status_checked)) return -EINVAL; /* length of the protected region, data + protection */ @@ -2227,6 +2230,7 @@ static int set_sig_umr_wr(struct ib_send_wr *wr, struct mlx5_ib_qp *qp, if (unlikely((*seg == qp->sq.qend))) *seg = mlx5_get_send_wqe(qp, 0); + sig_mr->sig->sig_status_checked = false; return 0; } diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h index 3db67f7..e1974b0 100644 --- a/include/linux/mlx5/cq.h +++ b/include/linux/mlx5/cq.h @@ -80,6 +80,7 @@ enum { MLX5_CQE_RESP_SEND_IMM = 3, MLX5_CQE_RESP_SEND_INV = 4, MLX5_CQE_RESIZE_CQ = 0xff, /* TBD */ + MLX5_CQE_SIG_ERR = 12, MLX5_CQE_REQ_ERR = 13, MLX5_CQE_RESP_ERR = 14, }; diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index aef7eed..96b50e8 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -117,6 +117,7 @@ enum { MLX5_MKEY_MASK_START_ADDR = 1ull << 6, MLX5_MKEY_MASK_PD = 1ull << 7, MLX5_MKEY_MASK_EN_RINVAL = 1ull << 8, + MLX5_MKEY_MASK_EN_SIGERR = 1ull << 9, MLX5_MKEY_MASK_BSF_EN = 1ull << 12, MLX5_MKEY_MASK_KEY = 1ull << 13, MLX5_MKEY_MASK_QPN = 1ull << 14, @@ -544,6 +545,23 @@ struct mlx5_cqe64 { u8 op_own; }; +struct mlx5_sig_err_cqe { + u8 rsvd0[16]; + __be32 expected_trans_sig; + __be32 actual_trans_sig; + __be32 expected_reftag; + __be32 actual_reftag; + __be16 syndrome; + u8 rsvd22[2]; + __be32 mkey; + __be64 err_offset; + u8 rsvd30[8]; + __be32 qpn; + u8 rsvd38[2]; + u8 signature; + u8 opcode; +}; + struct mlx5_wqe_srq_next_seg { u8 rsvd0[2]; __be16 next_wqe_index; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 5fe0690..0c462bb 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -413,6 +413,10 @@ struct mlx5_core_psv { struct mlx5_core_sig_ctx { struct mlx5_core_psv psv_memory; struct mlx5_core_psv psv_wire; + struct ib_sig_err err_item; + bool sig_status_checked; + bool sig_err_exists; + u32 sigerr_count; }; struct mlx5_core_mr { diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index 9ea6cf6..557e0b3 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -501,6 +501,11 @@ static inline struct mlx5_core_qp *__mlx5_qp_lookup(struct mlx5_core_dev *dev, u return radix_tree_lookup(&dev->priv.qp_table.tree, qpn); } +static inline struct mlx5_core_mr *__mlx5_mr_lookup(struct mlx5_core_dev *dev, u32 key) +{ + return radix_tree_lookup(&dev->priv.mr_table.tree, key); +} + int mlx5_core_create_qp(struct mlx5_core_dev *dev, struct mlx5_core_qp *qp, struct mlx5_create_qp_mbox_in *in, -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH RFC v2 10/10] IB/mlx5: Publish support in signature feature [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (8 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 09/10] IB/mlx5: Collect signature error completion Sagi Grimberg @ 2013-10-31 12:24 ` Sagi Grimberg 2013-10-31 12:55 ` [PATCH RFC v2 00/10] Introduce Signature feature Jack Wang ` (2 subsequent siblings) 12 siblings, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 12:24 UTC (permalink / raw) To: linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w Currently support only T10-DIF types of signature handover operations (typs 1|2|3). Signed-off-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> --- drivers/infiniband/hw/mlx5/main.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index f3c7111..3dd8219 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -274,6 +274,15 @@ static int mlx5_ib_query_device(struct ib_device *ibdev, if (flags & MLX5_DEV_CAP_FLAG_XRC) props->device_cap_flags |= IB_DEVICE_XRC; props->device_cap_flags |= IB_DEVICE_MEM_MGT_EXTENSIONS; + if (flags & MLX5_DEV_CAP_FLAG_SIG_HAND_OVER) { + props->device_cap_flags |= IB_DEVICE_SIGNATURE_HANDOVER; + /* At this stage no support for signature handover */ + props->sig_prot_cap = IB_PROT_T10DIF_TYPE_1 | + IB_PROT_T10DIF_TYPE_2 | + IB_PROT_T10DIF_TYPE_3; + props->sig_guard_cap = IB_GUARD_T10DIF_CRC | + IB_GUARD_T10DIF_CSUM; + } props->vendor_id = be32_to_cpup((__be32 *)(out_mad->data + 36)) & 0xffffff; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (9 preceding siblings ...) 2013-10-31 12:24 ` [PATCH RFC v2 10/10] IB/mlx5: Publish support in signature feature Sagi Grimberg @ 2013-10-31 12:55 ` Jack Wang [not found] ` <5272535D.4090805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2013-11-01 15:03 ` Bart Van Assche 2013-11-01 22:06 ` Bart Van Assche 12 siblings, 1 reply; 45+ messages in thread From: Jack Wang @ 2013-10-31 12:55 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w Hi Sagi, I wander what's the performance overhead with this DIF support? And is there a roadmap for support SRP/ISER and target side for DIF? Regards, Jack On 10/31/2013 01:24 PM, Sagi Grimberg wrote: > This patchset Introduces Verbs level support for signature handover > feature. Siganture is intended to implement end-to-end data integrity > on a transactional basis in a completely offloaded manner. > > There are several end-to-end data integrity methods used today in various > applications and/or upper layer protocols such as T10-DIF defined by SCSI > specifications (SBC), CRC32, XOR8 and more. This patchset adds verbs > support only for T10-DIF. The proposed framework allows adding more > signature methods in the future. > > In T10-DIF, when a series of 512-byte data blocks are transferred, each > block is followed by an 8-byte guard. The guard consists of CRC that > protects the integrity of the data in the block, and some other tags > that protects against mis-directed IOs. > > Data can be protected when transferred over the wire, but can also be > protected in the memory of the sender/receiver. This allows true end- > to-end protection against bits flipping either over the wire, through > gateways, in memory, over PCI, etc. > > While T10-DIF clearly defines that over the wire protection guards are > interleaved into the data stream (each 512-Byte block followed by 8-byte > guard), when in memory, the protection guards may reside in a buffer > separated from the data. Depending on the application, it is usually > easier to handle the data when it is contiguous. In this case the data > buffer will be of size 512xN and the protection buffer will be of size > 8xN (where N is the number of blocks in the transaction). > > There are 3 kinds of signature handover operation: > 1. Take unprotected data (from wire or memory) and ADD protection > guards. > 2. Take protetected data (from wire or memory), validate the data > integrity against the protection guards and STRIP the protection > guards. > 3. Take protected data (from wire or memory), validate the data > integrity against the protection guards and PASS the data with > the guards as-is. > > This translates to defining to the HCA how/if data protection exists > in memory domain, and how/if data protection exists is wire domain. > > The way that data integrity is performed is by using a new kind of > memory region: signature-enabled MR, and a new kind of work request: > REG_SIG_MR. The REG_SIG_MR WR operates on the signature-enabled MR, > and defines all the needed information for the signature handover > (data buffer, protection buffer if needed and signature attributes). > The result is an MR that can be used for data transfer as usual, > that will also add/validate/strip/pass protection guards. > > When the data transfer is successfully completed, it does not mean > that there are no integrity errors. The user must afterwards check > the signature status of the handover operation using a new light-weight > verb. > > This feature shall be used in storage upper layer protocols iSER/SRP > implementing end-to-end data integrity T10-DIF. Following this patchset, > we will soon submit krping patches which will demonstrate the usage of > these signature verbs. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5272535D.4090805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <5272535D.4090805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2013-10-31 13:20 ` Sagi Grimberg [not found] ` <52725930.7030702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-10-31 13:20 UTC (permalink / raw) To: Jack Wang Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w, Nicholas Bellinger On 10/31/2013 2:55 PM, Jack Wang wrote: > Hi Sagi, > > I wander what's the performance overhead with this DIF support? > And is there a roadmap for support SRP/ISER and target side for DIF? > > Regards, > Jack Well, all DIF operations are fully offloaded by the HCA so we don't expect any performance degradation other than the obvious 8-bytes integrity overhead. We have yet to take benchmarks on this and we definitely plan to do so. Regarding our roadmap, we plan to support iSER target (LIO) and initiator first. Some prior support for DIF needs to be added in target core level, then transport implementation is pretty straight-forward (iSER/SRP). So I aim for iSER DIF support (target+initiator) to make it into v3.14. Hope this helps, Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <52725930.7030702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <52725930.7030702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-10-31 13:29 ` Jack Wang 0 siblings, 0 replies; 45+ messages in thread From: Jack Wang @ 2013-10-31 13:29 UTC (permalink / raw) To: Sagi Grimberg Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w, Nicholas Bellinger On 10/31/2013 02:20 PM, Sagi Grimberg wrote: > On 10/31/2013 2:55 PM, Jack Wang wrote: >> Hi Sagi, >> >> I wander what's the performance overhead with this DIF support? >> And is there a roadmap for support SRP/ISER and target side for DIF? >> >> Regards, >> Jack > > Well, all DIF operations are fully offloaded by the HCA so we don't expect > any performance degradation other than the obvious 8-bytes integrity > overhead. > We have yet to take benchmarks on this and we definitely plan to do so. > > Regarding our roadmap, we plan to support iSER target (LIO) and > initiator first. > Some prior support for DIF needs to be added in target core level, > then transport implementation is pretty straight-forward (iSER/SRP). > > So I aim for iSER DIF support (target+initiator) to make it into v3.14. > > Hope this helps, > > Sagi. Good to know, thanks Jack -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (10 preceding siblings ...) 2013-10-31 12:55 ` [PATCH RFC v2 00/10] Introduce Signature feature Jack Wang @ 2013-11-01 15:03 ` Bart Van Assche [not found] ` <5273C2B6.7010901-HInyCGIudOg@public.gmane.org> 2013-11-01 22:06 ` Bart Van Assche 12 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 15:03 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > In T10-DIF, when a series of 512-byte data blocks are transferred, each > block is followed by an 8-byte guard. The guard consists of CRC that > protects the integrity of the data in the block, and some other tags > that protects against mis-directed IOs. Shouldn't that read "logical block length divided by 2**(protection interval exponent)" instead of "512" ? From the SPC-4 FORMAT UNIT section: <quote>For a type 2 protection or a type 3 protection format request, the protection interval exponent determines the length of user data to be transferred before protection information is transferred (i.e., the protection information interval). The protection information interval is calculated as follows: protection information interval = logical block length / 2**(protection interval exponent) where: logical block length is the number of bytes of user data in a logical block (see 4.5) and where protection interval exponent is zero if the short parameter list header (see table 36) is used or the contents of the PROTECTION INTERVAL EXPONENT field if the long parameter list header (see table 37) is used. If the protection information interval calculates to a value that is not an even number (e.g., 520 / 2**3 = 65) or not a whole number (e.g., 520 / 2**4 = 32.5 and 520 / 2**10 = 0.508), then the device server shall terminate the command with CHECK CONDITION status with the sense key set to ILLEGAL REQUEST and the additional sense code set to INVALID FIELD IN PARAMETER LIST.</quote> Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5273C2B6.7010901-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <5273C2B6.7010901-HInyCGIudOg@public.gmane.org> @ 2013-11-02 1:36 ` Nicholas A. Bellinger [not found] ` <1383356167.4216.16.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Nicholas A. Bellinger @ 2013-11-02 1:36 UTC (permalink / raw) To: Bart Van Assche Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: > > In T10-DIF, when a series of 512-byte data blocks are transferred, each > > block is followed by an 8-byte guard. The guard consists of CRC that > > protects the integrity of the data in the block, and some other tags > > that protects against mis-directed IOs. > > Shouldn't that read "logical block length divided by 2**(protection > interval exponent)" instead of "512" ? From the SPC-4 FORMAT UNIT > section: Why should the protection interval in FORMAT_UNIT be mentioned when it's not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..? --nab -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <1383356167.4216.16.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <1383356167.4216.16.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org> @ 2013-11-02 21:57 ` Bart Van Assche [not found] ` <52757555.6090907-HInyCGIudOg@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-02 21:57 UTC (permalink / raw) To: Nicholas A. Bellinger Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 1/11/2013 18:36, Nicholas A. Bellinger wrote: > On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote: >> On 31/10/2013 5:24, Sagi Grimberg wrote: >>> In T10-DIF, when a series of 512-byte data blocks are transferred, each >>> block is followed by an 8-byte guard. The guard consists of CRC that >>> protects the integrity of the data in the block, and some other tags >>> that protects against mis-directed IOs. >> >> Shouldn't that read "logical block length divided by 2**(protection >> interval exponent)" instead of "512" ? From the SPC-4 FORMAT UNIT >> section: > > Why should the protection interval in FORMAT_UNIT be mentioned when it's > not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..? Hello Nick, My understanding is that this patch series is not only intended for initiator drivers but also for target drivers like ib_srpt and ib_isert. As you know target drivers do not restrict the initiator operating system to Linux. Although I do not know whether there are already operating systems that support the "protection interval exponent", I think it is a good idea to stay as close as possible to the terminology of the SPC-4 standard. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <52757555.6090907-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <52757555.6090907-HInyCGIudOg@public.gmane.org> @ 2013-11-04 18:41 ` Nicholas A. Bellinger [not found] ` <1383590471.4216.22.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Nicholas A. Bellinger @ 2013-11-04 18:41 UTC (permalink / raw) To: Bart Van Assche Cc: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On Sat, 2013-11-02 at 14:57 -0700, Bart Van Assche wrote: > On 1/11/2013 18:36, Nicholas A. Bellinger wrote: > > On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote: > >> On 31/10/2013 5:24, Sagi Grimberg wrote: > >>> In T10-DIF, when a series of 512-byte data blocks are transferred, each > >>> block is followed by an 8-byte guard. The guard consists of CRC that > >>> protects the integrity of the data in the block, and some other tags > >>> that protects against mis-directed IOs. > >> > >> Shouldn't that read "logical block length divided by 2**(protection > >> interval exponent)" instead of "512" ? From the SPC-4 FORMAT UNIT > >> section: > > > > Why should the protection interval in FORMAT_UNIT be mentioned when it's > > not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..? > > Hello Nick, > > My understanding is that this patch series is not only intended for > initiator drivers but also for target drivers like ib_srpt and ib_isert. > As you know target drivers do not restrict the initiator operating > system to Linux. Although I do not know whether there are already > operating systems that support the "protection interval exponent", It's my understanding that Linux is still the only stack that supports DIF, so AFAICT no one is actually supporting this. > I think it is a good idea to stay as close as possible to the terminology > of the SPC-4 standard. > No, in this context it only adds pointless misdirection because 1) The hardware in question doesn't support it, and 2) Linux itself doesn't support it. --nab -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <1383590471.4216.22.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <1383590471.4216.22.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org> @ 2013-11-05 9:13 ` Sagi Grimberg [not found] ` <5278B6B2.1010006-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> 0 siblings, 1 reply; 45+ messages in thread From: Sagi Grimberg @ 2013-11-05 9:13 UTC (permalink / raw) To: Nicholas A. Bellinger, Bart Van Assche Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/4/2013 8:41 PM, Nicholas A. Bellinger wrote: > On Sat, 2013-11-02 at 14:57 -0700, Bart Van Assche wrote: >> On 1/11/2013 18:36, Nicholas A. Bellinger wrote: >>> On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote: >>>> On 31/10/2013 5:24, Sagi Grimberg wrote: >>>>> In T10-DIF, when a series of 512-byte data blocks are transferred, each >>>>> block is followed by an 8-byte guard. The guard consists of CRC that >>>>> protects the integrity of the data in the block, and some other tags >>>>> that protects against mis-directed IOs. >>>> Shouldn't that read "logical block length divided by 2**(protection >>>> interval exponent)" instead of "512" ? From the SPC-4 FORMAT UNIT >>>> section: >>> Why should the protection interval in FORMAT_UNIT be mentioned when it's >>> not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..? >> Hello Nick, >> >> My understanding is that this patch series is not only intended for >> initiator drivers but also for target drivers like ib_srpt and ib_isert. >> As you know target drivers do not restrict the initiator operating >> system to Linux. Although I do not know whether there are already >> operating systems that support the "protection interval exponent", > It's my understanding that Linux is still the only stack that supports > DIF, so AFAICT no one is actually supporting this. > >> I think it is a good idea to stay as close as possible to the terminology >> of the SPC-4 standard. >> > No, in this context it only adds pointless misdirection because 1) The > hardware in question doesn't support it, and 2) Linux itself doesn't > support it. I think that Bart is suggesting renaming block_size as pi_interval in ib_sig_domain. I tend to agree since even if support for that does not exist yet, it might be in the future. I think it is not a misdirection because it does represent the protection information interval. > --nab > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <5278B6B2.1010006-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <5278B6B2.1010006-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> @ 2013-11-05 18:30 ` Nicholas A. Bellinger 0 siblings, 0 replies; 45+ messages in thread From: Nicholas A. Bellinger @ 2013-11-05 18:30 UTC (permalink / raw) To: Sagi Grimberg Cc: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA, oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On Tue, 2013-11-05 at 11:13 +0200, Sagi Grimberg wrote: > On 11/4/2013 8:41 PM, Nicholas A. Bellinger wrote: > > On Sat, 2013-11-02 at 14:57 -0700, Bart Van Assche wrote: > >> On 1/11/2013 18:36, Nicholas A. Bellinger wrote: > >>> On Fri, 2013-11-01 at 08:03 -0700, Bart Van Assche wrote: > >>>> On 31/10/2013 5:24, Sagi Grimberg wrote: > >>>>> In T10-DIF, when a series of 512-byte data blocks are transferred, each > >>>>> block is followed by an 8-byte guard. The guard consists of CRC that > >>>>> protects the integrity of the data in the block, and some other tags > >>>>> that protects against mis-directed IOs. > >>>> Shouldn't that read "logical block length divided by 2**(protection > >>>> interval exponent)" instead of "512" ? From the SPC-4 FORMAT UNIT > >>>> section: > >>> Why should the protection interval in FORMAT_UNIT be mentioned when it's > >>> not supported by the hardware, nor by drivers/scsi/sd_dif.c itself..? > >> Hello Nick, > >> > >> My understanding is that this patch series is not only intended for > >> initiator drivers but also for target drivers like ib_srpt and ib_isert. > >> As you know target drivers do not restrict the initiator operating > >> system to Linux. Although I do not know whether there are already > >> operating systems that support the "protection interval exponent", > > It's my understanding that Linux is still the only stack that supports > > DIF, so AFAICT no one is actually supporting this. > > > >> I think it is a good idea to stay as close as possible to the terminology > >> of the SPC-4 standard. > >> > > No, in this context it only adds pointless misdirection because 1) The > > hardware in question doesn't support it, and 2) Linux itself doesn't > > support it. > > I think that Bart is suggesting renaming block_size as pi_interval in > ib_sig_domain. > I tend to agree since even if support for that does not exist yet, it > might be in the future. > I think it is not a misdirection because it does represent the > protection information interval. > The point is that changing the description from what the patch actually does, to something it does not do in order to 'stay as close as possible to the terminology of the SPC-4 standard' is pointlessly confusing. --nab -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org> ` (11 preceding siblings ...) 2013-11-01 15:03 ` Bart Van Assche @ 2013-11-01 22:06 ` Bart Van Assche [not found] ` <527425DA.7040609-HInyCGIudOg@public.gmane.org> 12 siblings, 1 reply; 45+ messages in thread From: Bart Van Assche @ 2013-11-01 22:06 UTC (permalink / raw) To: Sagi Grimberg, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 31/10/2013 5:24, Sagi Grimberg wrote: > While T10-DIF clearly defines that over the wire protection guards are > interleaved into the data stream (each 512-Byte block followed by 8-byte > guard), when in memory, the protection guards may reside in a buffer > separated from the data. Depending on the application, it is usually > easier to handle the data when it is contiguous. In this case the data > buffer will be of size 512xN and the protection buffer will be of size > 8xN (where N is the number of blocks in the transaction). It might be worth mentioning here that in the Linux block layer the approach has been chosen where actual data an protection information are in separate buffers. See also the bi_integrity field in struct bio. Bart. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
[parent not found: <527425DA.7040609-HInyCGIudOg@public.gmane.org>]
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <527425DA.7040609-HInyCGIudOg@public.gmane.org> @ 2013-11-03 12:13 ` Sagi Grimberg 2013-11-03 12:14 ` Sagi Grimberg 1 sibling, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:13 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/2/2013 12:06 AM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> While T10-DIF clearly defines that over the wire protection guards are >> interleaved into the data stream (each 512-Byte block followed by 8-byte >> guard), when in memory, the protection guards may reside in a buffer >> separated from the data. Depending on the application, it is usually >> easier to handle the data when it is contiguous. In this case the data >> buffer will be of size 512xN and the protection buffer will be of size >> 8xN (where N is the number of blocks in the transaction). > > It might be worth mentioning here that in the Linux block layer the > approach has been chosen where actual data an protection information > are in separate buffers. See also the bi_integrity field in struct bio. > > Bart. > Hey Bart, I was expecting your input on this Thanks for the insightful comments! The explanation here is an attempt to Introduce T10-DIF to the mailing-list as simple as possible, so I tried not to dive into SBC-3/SPC-4. You are correct, the 8-byte protection guards will follow the protection interval which won't necessarily be 512 (only for DIF types 2,3). Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH RFC v2 00/10] Introduce Signature feature [not found] ` <527425DA.7040609-HInyCGIudOg@public.gmane.org> 2013-11-03 12:13 ` Sagi Grimberg @ 2013-11-03 12:14 ` Sagi Grimberg 1 sibling, 0 replies; 45+ messages in thread From: Sagi Grimberg @ 2013-11-03 12:14 UTC (permalink / raw) To: Bart Van Assche, linux-rdma-u79uwXL29TY76Z2rM5mHXA Cc: oren-VPRAkNaXOzVWk0Htik3J/w, tzahio-VPRAkNaXOzVWk0Htik3J/w On 11/2/2013 12:06 AM, Bart Van Assche wrote: > On 31/10/2013 5:24, Sagi Grimberg wrote: >> While T10-DIF clearly defines that over the wire protection guards are >> interleaved into the data stream (each 512-Byte block followed by 8-byte >> guard), when in memory, the protection guards may reside in a buffer >> separated from the data. Depending on the application, it is usually >> easier to handle the data when it is contiguous. In this case the data >> buffer will be of size 512xN and the protection buffer will be of size >> 8xN (where N is the number of blocks in the transaction). > > It might be worth mentioning here that in the Linux block layer the > approach has been chosen where actual data an protection information > are in separate buffers. See also the bi_integrity field in struct bio. > > Bart. > This is true, but signature verbs interface supports also data and protection interleaving in memory space. A user wishes to do so will pass the same ib_sge both for data and protection. In fact this was a requirement we got from customers. Sagi. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2013-11-05 18:30 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-31 12:24 [PATCH RFC v2 00/10] Introduce Signature feature Sagi Grimberg
[not found] ` <1383222255-22699-1-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-10-31 12:24 ` [PATCH RFC v2 01/10] IB/core: Introduce protected memory regions Sagi Grimberg
[not found] ` <1383222255-22699-2-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-01 17:09 ` Bart Van Assche
[not found] ` <5273E03C.3010501-HInyCGIudOg@public.gmane.org>
2013-11-03 12:14 ` Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 02/10] IB/core: Introduce Signature Verbs API Sagi Grimberg
[not found] ` <1383222255-22699-3-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-01 15:13 ` Bart Van Assche
[not found] ` <5273C4FC.4070708-HInyCGIudOg@public.gmane.org>
2013-11-03 12:15 ` Sagi Grimberg
2013-11-01 18:46 ` Bart Van Assche
[not found] ` <5273F6F4.3000300-HInyCGIudOg@public.gmane.org>
2013-11-03 12:15 ` Sagi Grimberg
[not found] ` <52763E68.2040605-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-03 14:41 ` Bart Van Assche
[not found] ` <5276608D.2020605-HInyCGIudOg@public.gmane.org>
2013-11-03 16:30 ` Sagi Grimberg
2013-11-01 22:23 ` Bart Van Assche
[not found] ` <527429E7.7010705-HInyCGIudOg@public.gmane.org>
2013-11-03 12:16 ` Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 03/10] IB/mlx5, mlx5_core: Support for create_mr and destroy_mr Sagi Grimberg
[not found] ` <1383222255-22699-4-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-10-31 12:52 ` Jack Wang
[not found] ` <52725299.7020105-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-31 12:59 ` Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 04/10] IB/mlx5: Initialize mlx5_ib_qp signature related Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 05/10] IB/mlx5: Break wqe handling to begin & finish routines Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 06/10] IB/mlx5: remove MTT access mode from umr flags helper function Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 07/10] IB/mlx5: Keep mlx5 MRs in a radix tree under device Sagi Grimberg
[not found] ` <1383222255-22699-8-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-01 20:46 ` Bart Van Assche
[not found] ` <5274131C.90601-HInyCGIudOg@public.gmane.org>
2013-11-03 12:16 ` Sagi Grimberg
[not found] ` <52763E88.4050300-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-03 13:40 ` Or Gerlitz
2013-10-31 12:24 ` [PATCH RFC v2 08/10] IB/mlx5: Support IB_WR_REG_SIG_MR Sagi Grimberg
[not found] ` <1383222255-22699-9-git-send-email-sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-10-31 13:01 ` Jack Wang
2013-11-01 15:05 ` Bart Van Assche
[not found] ` <5273C32E.2020405-HInyCGIudOg@public.gmane.org>
2013-11-03 12:16 ` Sagi Grimberg
2013-11-01 20:37 ` Bart Van Assche
[not found] ` <527410F3.6040704-HInyCGIudOg@public.gmane.org>
2013-11-02 19:21 ` Or Gerlitz
[not found] ` <CAJZOPZLnyqzzx91ohmW+exy0k8g-FX6reSBCGmh_F2tTGWWOog-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-02 21:59 ` Bart Van Assche
[not found] ` <527575D0.9050802-HInyCGIudOg@public.gmane.org>
2013-11-03 12:20 ` Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 09/10] IB/mlx5: Collect signature error completion Sagi Grimberg
2013-10-31 12:24 ` [PATCH RFC v2 10/10] IB/mlx5: Publish support in signature feature Sagi Grimberg
2013-10-31 12:55 ` [PATCH RFC v2 00/10] Introduce Signature feature Jack Wang
[not found] ` <5272535D.4090805-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2013-10-31 13:20 ` Sagi Grimberg
[not found] ` <52725930.7030702-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-10-31 13:29 ` Jack Wang
2013-11-01 15:03 ` Bart Van Assche
[not found] ` <5273C2B6.7010901-HInyCGIudOg@public.gmane.org>
2013-11-02 1:36 ` Nicholas A. Bellinger
[not found] ` <1383356167.4216.16.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2013-11-02 21:57 ` Bart Van Assche
[not found] ` <52757555.6090907-HInyCGIudOg@public.gmane.org>
2013-11-04 18:41 ` Nicholas A. Bellinger
[not found] ` <1383590471.4216.22.camel-XoQW25Eq2zviZyQQd+hFbcojREIfoBdhmpATvIKMPHk@public.gmane.org>
2013-11-05 9:13 ` Sagi Grimberg
[not found] ` <5278B6B2.1010006-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-11-05 18:30 ` Nicholas A. Bellinger
2013-11-01 22:06 ` Bart Van Assche
[not found] ` <527425DA.7040609-HInyCGIudOg@public.gmane.org>
2013-11-03 12:13 ` Sagi Grimberg
2013-11-03 12:14 ` Sagi Grimberg
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox