* Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
From: Leon Romanovsky @ 2019-07-09 13:32 UTC (permalink / raw)
To: Greg KH
Cc: Danil Kipnis, Jack Wang, linux-block, linux-rdma, axboe,
Christoph Hellwig, Sagi Grimberg, bvanassche, jgg, dledford,
Roman Pen
In-Reply-To: <20190709111737.GB6719@kroah.com>
On Tue, Jul 09, 2019 at 01:17:37PM +0200, Greg KH wrote:
> On Tue, Jul 09, 2019 at 02:00:36PM +0300, Leon Romanovsky wrote:
> > On Tue, Jul 09, 2019 at 11:55:03AM +0200, Danil Kipnis wrote:
> > > Hallo Doug, Hallo Jason, Hallo Jens, Hallo Greg,
> > >
> > > Could you please provide some feedback to the IBNBD driver and the
> > > IBTRS library?
> > > So far we addressed all the requests provided by the community and
> > > continue to maintain our code up-to-date with the upstream kernel
> > > while having an extra compatibility layer for older kernels in our
> > > out-of-tree repository.
> > > I understand that SRP and NVMEoF which are in the kernel already do
> > > provide equivalent functionality for the majority of the use cases.
> > > IBNBD on the other hand is showing higher performance and more
> > > importantly includes the IBTRS - a general purpose library to
> > > establish connections and transport BIO-like read/write sg-lists over
> > > RDMA, while SRP is targeting SCSI and NVMEoF is addressing NVME. While
> > > I believe IBNBD does meet the kernel coding standards, it doesn't have
> > > a lot of users, while SRP and NVMEoF are widely accepted. Do you think
> > > it would make sense for us to rework our patchset and try pushing it
> > > for staging tree first, so that we can proof IBNBD is well maintained,
> > > beneficial for the eco-system, find a proper location for it within
> > > block/rdma subsystems? This would make it easier for people to try it
> > > out and would also be a huge step for us in terms of maintenance
> > > effort.
> > > The names IBNBD and IBTRS are in fact misleading. IBTRS sits on top of
> > > RDMA and is not bound to IB (We will evaluate IBTRS with ROCE in the
> > > near future). Do you think it would make sense to rename the driver to
> > > RNBD/RTRS?
> >
> > It is better to avoid "staging" tree, because it will lack attention of
> > relevant people and your efforts will be lost once you will try to move
> > out of staging. We are all remembering Lustre and don't want to see it
> > again.
>
> That's up to the developers, that had nothing to do with the fact that
> the code was in the staging tree. If the Lustre developers had actually
> done the requested work, it would have moved out of the staging tree.
>
> So if these developers are willing to do the work to get something out
> of staging, and into the "real" part of the kernel, I will gladly take
> it.
Greg,
It is not matter of how much *real* work developers will do, but
it is a matter of guidance to do *right* thing, which is hard to achieve
if people mentioned in the beginning of this thread wouldn't look on
staging code.
>
> But I will note that it is almost always easier to just do the work
> ahead of time, and merge it in "correctly" than to go from staging into
> the real part of the kernel. But it's up to the developers what they
> want to do.
>
> thanks,
>
> greg k-h
^ permalink raw reply
* Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
From: Jinpu Wang @ 2019-07-09 14:17 UTC (permalink / raw)
To: Jason Gunthorpe, Christoph Hellwig, Sagi Grimberg,
bvanassche@acm.org, chuck.lever
Cc: Jinpu Wang, Leon Romanovsky, Danil Kipnis,
linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
Jens Axboe, dledford@redhat.com, Roman Pen, Greg Kroah-Hartman
In-Reply-To: <20190709131932.GI3436@mellanox.com>
On Tue, Jul 9, 2019 at 3:19 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Tue, Jul 09, 2019 at 03:15:46PM +0200, Jinpu Wang wrote:
> > On Tue, Jul 9, 2019 at 2:06 PM Jason Gunthorpe <jgg@mellanox.com> wrote:
> > >
> > > On Tue, Jul 09, 2019 at 01:37:39PM +0200, Jinpu Wang wrote:
> > > > Leon Romanovsky <leon@kernel.org> 于2019年7月9日周二 下午1:00写道:
> > > > >
> > > > > On Tue, Jul 09, 2019 at 11:55:03AM +0200, Danil Kipnis wrote:
> > > > > > Hallo Doug, Hallo Jason, Hallo Jens, Hallo Greg,
> > > > > >
> > > > > > Could you please provide some feedback to the IBNBD driver and the
> > > > > > IBTRS library?
> > > > > > So far we addressed all the requests provided by the community and
> > > > > > continue to maintain our code up-to-date with the upstream kernel
> > > > > > while having an extra compatibility layer for older kernels in our
> > > > > > out-of-tree repository.
> > > > > > I understand that SRP and NVMEoF which are in the kernel already do
> > > > > > provide equivalent functionality for the majority of the use cases.
> > > > > > IBNBD on the other hand is showing higher performance and more
> > > > > > importantly includes the IBTRS - a general purpose library to
> > > > > > establish connections and transport BIO-like read/write sg-lists over
> > > > > > RDMA, while SRP is targeting SCSI and NVMEoF is addressing NVME. While
> > > > > > I believe IBNBD does meet the kernel coding standards, it doesn't have
> > > > > > a lot of users, while SRP and NVMEoF are widely accepted. Do you think
> > > > > > it would make sense for us to rework our patchset and try pushing it
> > > > > > for staging tree first, so that we can proof IBNBD is well maintained,
> > > > > > beneficial for the eco-system, find a proper location for it within
> > > > > > block/rdma subsystems? This would make it easier for people to try it
> > > > > > out and would also be a huge step for us in terms of maintenance
> > > > > > effort.
> > > > > > The names IBNBD and IBTRS are in fact misleading. IBTRS sits on top of
> > > > > > RDMA and is not bound to IB (We will evaluate IBTRS with ROCE in the
> > > > > > near future). Do you think it would make sense to rename the driver to
> > > > > > RNBD/RTRS?
> > > > >
> > > > > It is better to avoid "staging" tree, because it will lack attention of
> > > > > relevant people and your efforts will be lost once you will try to move
> > > > > out of staging. We are all remembering Lustre and don't want to see it
> > > > > again.
> > > > >
> > > > > Back then, you was asked to provide support for performance superiority.
> > > > > Can you please share any numbers with us?
> > > > Hi Leon,
> > > >
> > > > Thanks for you feedback.
> > > >
> > > > For performance numbers, Danil did intensive benchmark, and create
> > > > some PDF with graphes here:
> > > > https://github.com/ionos-enterprise/ibnbd/tree/master/performance/v4-v5.2-rc3
> > > >
> > > > It includes both single path results also different multipath policy results.
> > > >
> > > > If you have any question regarding the results, please let us know.
> > >
> > > I kind of recall that last time the perf numbers were skewed toward
> > > IBNBD because the invalidation model for MR was wrong - did this get
> > > fixed?
> > >
> > > Jason
> >
> > Thanks Jason for feedback.
> > Can you be more specific about "the invalidation model for MR was wrong"
>
> MR's must be invalidated before data is handed over to the block
> layer. It can't leave MRs open for access and then touch the memory
> the MR covers.
>
> IMHO this is the most likely explanation for any performance difference
> from nvme..
>
> > I checked in the history of the email thread, only found
> > "I think from the RDMA side, before we accept something like this, I'd
> > like to hear from Christoph, Chuck or Sagi that the dataplane
> > implementation of this is correct, eg it uses the MRs properly and
> > invalidates at the right time, sequences with dma_ops as required,
> > etc.
> > "
> > And no reply from any of you since then.
>
> This task still needs to happen..
>
> Jason
We did extensive testing and cross-checked how iser and nvmeof does
invalidation of MR,
doesn't find a problem.
+ Chuck
It will be appreciated if Christoph, Chuck, Sagi or Bart could give a
check, thank you in advance.
Thanks
Jack
^ permalink raw reply
* [PATCH v6 rdma-next 1/6] RDMA/core: Create mmap database and cookie helper functions
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>
Create some common API's for adding entries to a xa_mmap.
Searching for an entry and freeing one.
The code was copied from the efa driver almost as is, just renamed
function to be generic and not efa specific.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/rdma_core.c | 1 +
drivers/infiniband/core/uverbs_cmd.c | 1 +
drivers/infiniband/core/uverbs_main.c | 135 ++++++++++++++++++++++++++++++++++
include/rdma/ib_verbs.h | 46 ++++++++++++
5 files changed, 184 insertions(+)
diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 8a6ccb936dfe..a830c2c5d691 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2521,6 +2521,7 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, map_mr_sg_pi);
SET_DEVICE_OP(dev_ops, map_phys_fmr);
SET_DEVICE_OP(dev_ops, mmap);
+ SET_DEVICE_OP(dev_ops, mmap_free);
SET_DEVICE_OP(dev_ops, modify_ah);
SET_DEVICE_OP(dev_ops, modify_cq);
SET_DEVICE_OP(dev_ops, modify_device);
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
index ccf4d069c25c..1ed01b02401f 100644
--- a/drivers/infiniband/core/rdma_core.c
+++ b/drivers/infiniband/core/rdma_core.c
@@ -816,6 +816,7 @@ static void ufile_destroy_ucontext(struct ib_uverbs_file *ufile,
rdma_restrack_del(&ucontext->res);
+ rdma_user_mmap_entries_remove_free(ucontext);
ib_dev->ops.dealloc_ucontext(ucontext);
kfree(ucontext);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 7ddd0e5bc6b3..44c0600245e4 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -254,6 +254,7 @@ static int ib_uverbs_get_context(struct uverbs_attr_bundle *attrs)
mutex_init(&ucontext->per_mm_list_lock);
INIT_LIST_HEAD(&ucontext->per_mm_list);
+ xa_init(&ucontext->mmap_xa);
ret = get_unused_fd_flags(O_CLOEXEC);
if (ret < 0)
diff --git a/drivers/infiniband/core/uverbs_main.c b/drivers/infiniband/core/uverbs_main.c
index 11c13c1381cf..4b909d7b97de 100644
--- a/drivers/infiniband/core/uverbs_main.c
+++ b/drivers/infiniband/core/uverbs_main.c
@@ -965,6 +965,141 @@ int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
}
EXPORT_SYMBOL(rdma_user_mmap_io);
+static inline u64
+rdma_user_mmap_get_key(const struct rdma_user_mmap_entry *entry)
+{
+ return (u64)entry->mmap_page << PAGE_SHIFT;
+}
+
+/**
+ * rdma_user_mmap_entry_get() - Get an entry from the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @key: The key received from rdma_user_mmap_entry_insert which
+ * is provided by user as the address to map.
+ * @len: The length the user wants to map
+ *
+ * This function is called when a user tries to mmap a key it
+ * initially received from the driver. They key was created by
+ * the function rdma_user_mmap_entry_insert.
+ *
+ * Return an entry if exists or NULL if there is no match.
+ */
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
+{
+ struct rdma_user_mmap_entry *entry;
+ u64 mmap_page;
+
+ mmap_page = key >> PAGE_SHIFT;
+ if (mmap_page > U32_MAX)
+ return NULL;
+
+ entry = xa_load(&ucontext->mmap_xa, mmap_page);
+ if (!entry || entry->length != len)
+ return NULL;
+
+ ibdev_dbg(ucontext->device,
+ "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
+ entry->obj, key, entry->address, entry->length);
+
+ return entry;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_get);
+
+/**
+ * rdma_user_mmap_entry_insert() - Allocate and insert an entry to the mmap_xa.
+ *
+ * @ucontext: associated user context.
+ * @obj: opaque driver object that will be stored in the entry.
+ * @address: The address that will be mmapped to the user
+ * @length: Length of the address that will be mmapped
+ * @mmap_flag: opaque driver flags related to the address (For
+ * example could be used for cachability)
+ *
+ * This function should be called by drivers that use the rdma_user_mmap
+ * interface for handling user mmapped addresses. The database is handled in
+ * the core and helper functions are provided to insert entries into the
+ * database and extract entries when the user call mmap with the given key.
+ * The function returns a unique key that should be provided to user, the user
+ * will use the key to map the given address.
+ *
+ * Note this locking scheme cannot support removal of entries,
+ * except during ucontext destruction when the core code
+ * guarentees no concurrency.
+ *
+ * Return: unique key or RDMA_USER_MMAP_INVALID if entry was not added.
+ */
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
+ u64 address, u64 length, u8 mmap_flag)
+{
+ struct rdma_user_mmap_entry *entry;
+ u32 next_mmap_page;
+ int err;
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry)
+ return RDMA_USER_MMAP_INVALID;
+
+ entry->obj = obj;
+ entry->address = address;
+ entry->length = length;
+ entry->mmap_flag = mmap_flag;
+
+ xa_lock(&ucontext->mmap_xa);
+ if (check_add_overflow(ucontext->mmap_xa_page,
+ (u32)(length >> PAGE_SHIFT),
+ &next_mmap_page))
+ goto err_unlock;
+
+ entry->mmap_page = ucontext->mmap_xa_page;
+ ucontext->mmap_xa_page = next_mmap_page;
+ err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
+ GFP_KERNEL);
+ if (err)
+ goto err_unlock;
+
+ xa_unlock(&ucontext->mmap_xa);
+
+ ibdev_dbg(ucontext->device,
+ "mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
+ entry->obj, entry->address, entry->length,
+ rdma_user_mmap_get_key(entry));
+
+ return rdma_user_mmap_get_key(entry);
+
+err_unlock:
+ xa_unlock(&ucontext->mmap_xa);
+ kfree(entry);
+ return RDMA_USER_MMAP_INVALID;
+}
+EXPORT_SYMBOL(rdma_user_mmap_entry_insert);
+
+/*
+ * This is only called when the ucontext is destroyed and there can be no
+ * concurrent query via mmap or allocate on the xarray, thus we can be sure no
+ * other thread is using the entry pointer. We also know that all the BAR
+ * pages have either been zap'd or munmaped at this point. Normal pages are
+ * refcounted and will be freed at the proper time.
+ */
+void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext)
+{
+ struct rdma_user_mmap_entry *entry;
+ unsigned long mmap_page;
+
+ xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
+ xa_erase(&ucontext->mmap_xa, mmap_page);
+
+ ibdev_dbg(ucontext->device,
+ "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
+ entry->obj, rdma_user_mmap_get_key(entry),
+ entry->address, entry->length);
+ if (ucontext->device->ops.mmap_free)
+ ucontext->device->ops.mmap_free(entry);
+ kfree(entry);
+ }
+}
+
void uverbs_user_mmap_disassociate(struct ib_uverbs_file *ufile)
{
struct rdma_umap_priv *priv, *next_priv;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 26e9c2594913..1ba29a00f584 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1425,6 +1425,8 @@ struct ib_ucontext {
* Implementation details of the RDMA core, don't use in drivers:
*/
struct rdma_restrack_entry res;
+ struct xarray mmap_xa;
+ u32 mmap_xa_page;
};
struct ib_uobject {
@@ -2199,6 +2201,17 @@ struct iw_cm_conn_param;
#define DECLARE_RDMA_OBJ_SIZE(ib_struct) size_t size_##ib_struct
+#define RDMA_USER_MMAP_FLAG_SHIFT 56
+#define RDMA_USER_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
+#define RDMA_USER_MMAP_INVALID U64_MAX
+struct rdma_user_mmap_entry {
+ void *obj;
+ u64 address;
+ u64 length;
+ u32 mmap_page;
+ u8 mmap_flag;
+};
+
/**
* struct ib_device_ops - InfiniBand device operations
* This structure defines all the InfiniBand device operations, providers will
@@ -2311,6 +2324,19 @@ struct ib_device_ops {
struct ib_udata *udata);
void (*dealloc_ucontext)(struct ib_ucontext *context);
int (*mmap)(struct ib_ucontext *context, struct vm_area_struct *vma);
+ /**
+ * Memory that is mapped to the user can only be freed once the
+ * ucontext of the application is destroyed. This is for
+ * security reasons where we don't want an application to have a
+ * mapping to phyiscal memory that is freed and allocated to
+ * another application. For this reason, all the entries are
+ * stored in ucontext and once ucontext is freed mmap_free is
+ * called on each of the entries. They type of the memory that
+ * was mapped may differ between entries and is opaque to the
+ * rdma_user_mmap interface. Therefore needs to be implemented
+ * by the driver in mmap_free.
+ */
+ void (*mmap_free)(struct rdma_user_mmap_entry *entry);
void (*disassociate_ucontext)(struct ib_ucontext *ibcontext);
int (*alloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
void (*dealloc_pd)(struct ib_pd *pd, struct ib_udata *udata);
@@ -2709,6 +2735,11 @@ void ib_set_device_ops(struct ib_device *device,
#if IS_ENABLED(CONFIG_INFINIBAND_USER_ACCESS)
int rdma_user_mmap_io(struct ib_ucontext *ucontext, struct vm_area_struct *vma,
unsigned long pfn, unsigned long size, pgprot_t prot);
+u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
+ u64 address, u64 length, u8 mmap_flag);
+struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len);
+void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext);
#else
static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
struct vm_area_struct *vma,
@@ -2717,6 +2748,21 @@ static inline int rdma_user_mmap_io(struct ib_ucontext *ucontext,
{
return -EINVAL;
}
+
+static u64 rdma_user_mmap_entry_insert(struct ib_ucontext *ucontext, void *obj,
+ u64 address, u64 length, u8 mmap_flag)
+{
+ return RDMA_USER_MMAP_INVALID;
+}
+
+static struct rdma_user_mmap_entry *
+rdma_user_mmap_entry_get(struct ib_ucontext *ucontext, u64 key, u64 len)
+{
+ return NULL;
+}
+
+static void rdma_user_mmap_entries_remove_free(struct ib_ucontext *ucontext) {}
+
#endif
static inline int ib_copy_from_udata(void *dest, struct ib_udata *udata, size_t len)
--
2.14.5
^ permalink raw reply related
* [PATCH v6 rdma-next 6/6] RDMA/qedr: Add iWARP doorbell recovery support
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>
This patch adds the iWARP specific doorbells to the doorbell
recovery mechanism
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
drivers/infiniband/hw/qedr/qedr.h | 12 +++++++-----
drivers/infiniband/hw/qedr/verbs.c | 37 ++++++++++++++++++++++++++++++++++++-
2 files changed, 43 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 8aed24b32de6..dc9ebbf625d2 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -234,6 +234,11 @@ struct qedr_ucontext {
bool db_rec;
};
+union db_prod32 {
+ struct rdma_pwm_val16_data data;
+ u32 raw;
+};
+
union db_prod64 {
struct rdma_pwm_val32_data data;
u64 raw;
@@ -265,6 +270,8 @@ struct qedr_userq {
struct qedr_user_db_rec *db_rec_data;
u64 db_rec_phys;
u64 db_rec_key;
+ void __iomem *db_rec_db2_addr;
+ union db_prod32 db_rec_db2_data;
};
struct qedr_cq {
@@ -300,11 +307,6 @@ struct qedr_pd {
struct qedr_ucontext *uctx;
};
-union db_prod32 {
- struct rdma_pwm_val16_data data;
- u32 raw;
-};
-
struct qedr_qp_hwq_info {
/* WQE Elements */
struct qed_chain pbl;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index b0b9ec70f2fd..64190de4ce23 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -1684,6 +1684,10 @@ static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
if (qp->urq.db_rec_data)
qedr_db_recovery_del(dev, qp->urq.db_addr,
&qp->urq.db_rec_data->db_data);
+
+ if (rdma_protocol_iwarp(&dev->ibdev, 1))
+ qedr_db_recovery_del(dev, qp->urq.db_rec_db2_addr,
+ &qp->urq.db_rec_db2_data);
}
static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1758,6 +1762,17 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
+ if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+ qp->urq.db_rec_db2_addr = ctx->dpi_addr + uresp.rq_db2_offset;
+
+ /* calculate the db_rec_db2 data since it is constant so no
+ * need to reflect from user
+ */
+ qp->urq.db_rec_db2_data.data.icid = cpu_to_le16(qp->icid);
+ qp->urq.db_rec_db2_data.data.value =
+ cpu_to_le16(DQ_TCM_IWARP_POST_RQ_CF_CMD);
+ }
+
rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
&qp->usq.db_rec_data->db_data,
DB_REC_WIDTH_32B,
@@ -1771,6 +1786,15 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
DB_REC_USER);
if (rc)
goto err;
+
+ if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
+ rc = qedr_db_recovery_add(dev, qp->urq.db_rec_db2_addr,
+ &qp->urq.db_rec_db2_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_USER);
+ if (rc)
+ goto err;
+ }
qedr_qp_user_print(dev, qp);
return rc;
@@ -1811,7 +1835,13 @@ static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
&qp->rq.db_data,
DB_REC_WIDTH_32B,
DB_REC_KERNEL);
+ if (rc)
+ return rc;
+ rc = qedr_db_recovery_add(dev, qp->rq.iwarp_db2,
+ &qp->rq.iwarp_db2_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_KERNEL);
return rc;
}
@@ -1940,8 +1970,13 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
- if (!qp->srq)
+ if (!qp->srq) {
qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
+
+ if (rdma_protocol_iwarp(&dev->ibdev, 1))
+ qedr_db_recovery_del(dev, qp->rq.iwarp_db2,
+ &qp->rq.iwarp_db2_data);
+ }
}
static int qedr_create_kernel_qp(struct qedr_dev *dev,
--
2.14.5
^ permalink raw reply related
* [PATCH v6 rdma-next 4/6] qed*: Change dpi_addr to be denoted with __iomem
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>
Several casts were required around dpi_addr parameter in qed_rdma_if.h
This is an address on the doorbell bar and should therefore be marked
with __iomem.
Reported-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
drivers/infiniband/hw/qedr/main.c | 2 +-
drivers/infiniband/hw/qedr/qedr.h | 2 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 ++---
include/linux/qed/qed_rdma_if.h | 2 +-
4 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index a0a7ba0a5af4..3db4b6ba5ad6 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -815,7 +815,7 @@ static int qedr_init_hw(struct qedr_dev *dev)
if (rc)
goto out;
- dev->db_addr = (void __iomem *)(uintptr_t)out_params.dpi_addr;
+ dev->db_addr = out_params.dpi_addr;
dev->db_phys_addr = out_params.dpi_phys_addr;
dev->db_size = out_params.dpi_size;
dev->dpi = out_params.dpi;
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 97c90d1e525d..7e80ce521d8d 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -227,7 +227,7 @@ struct qedr_ucontext {
struct ib_ucontext ibucontext;
struct qedr_dev *dev;
struct qedr_pd *pd;
- u64 dpi_addr;
+ void __iomem *dpi_addr;
u64 dpi_phys_addr;
u32 dpi_size;
u16 dpi;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index 7873d6dfd91f..fb3fe60a1a68 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -799,9 +799,8 @@ static int qed_rdma_add_user(void *rdma_cxt,
/* Calculate the corresponding DPI address */
dpi_start_offset = p_hwfn->dpi_start_offset;
- out_params->dpi_addr = (u64)((u8 __iomem *)p_hwfn->doorbells +
- dpi_start_offset +
- ((out_params->dpi) * p_hwfn->dpi_size));
+ out_params->dpi_addr = p_hwfn->doorbells + dpi_start_offset +
+ out_params->dpi * p_hwfn->dpi_size;
out_params->dpi_phys_addr = p_hwfn->cdev->db_phys_addr +
dpi_start_offset +
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index d15f8e4815e3..834166809a6c 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -225,7 +225,7 @@ struct qed_rdma_start_in_params {
struct qed_rdma_add_user_out_params {
u16 dpi;
- u64 dpi_addr;
+ void __iomem *dpi_addr;
u64 dpi_phys_addr;
u32 dpi_size;
u16 wid_count;
--
2.14.5
^ permalink raw reply related
* [PATCH v6 rdma-next 3/6] RDMA/qedr: Use the common mmap API
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>
Remove all function related to mmap from qedr and use the common
API
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
drivers/infiniband/hw/qedr/qedr.h | 13 ----
drivers/infiniband/hw/qedr/verbs.c | 153 +++++++++++++------------------------
drivers/infiniband/hw/qedr/verbs.h | 2 +-
3 files changed, 52 insertions(+), 116 deletions(-)
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 6175d1e98717..97c90d1e525d 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -231,11 +231,6 @@ struct qedr_ucontext {
u64 dpi_phys_addr;
u32 dpi_size;
u16 dpi;
-
- struct list_head mm_head;
-
- /* Lock to protect mm list */
- struct mutex mm_list_lock;
};
union db_prod64 {
@@ -298,14 +293,6 @@ struct qedr_pd {
struct qedr_ucontext *uctx;
};
-struct qedr_mm {
- struct {
- u64 phy_addr;
- unsigned long len;
- } key;
- struct list_head entry;
-};
-
union db_prod32 {
struct rdma_pwm_val16_data data;
u32 raw;
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index 27d90a84ea01..f33f0f1e7d76 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -58,6 +58,10 @@
#define DB_ADDR_SHIFT(addr) ((addr) << DB_PWM_ADDR_OFFSET_SHIFT)
+enum {
+ QEDR_USER_MMAP_IO_WC = 0,
+};
+
static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
size_t len)
{
@@ -256,60 +260,6 @@ int qedr_modify_port(struct ib_device *ibdev, u8 port, int mask,
return 0;
}
-static int qedr_add_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
- unsigned long len)
-{
- struct qedr_mm *mm;
-
- mm = kzalloc(sizeof(*mm), GFP_KERNEL);
- if (!mm)
- return -ENOMEM;
-
- mm->key.phy_addr = phy_addr;
- /* This function might be called with a length which is not a multiple
- * of PAGE_SIZE, while the mapping is PAGE_SIZE grained and the kernel
- * forces this granularity by increasing the requested size if needed.
- * When qedr_mmap is called, it will search the list with the updated
- * length as a key. To prevent search failures, the length is rounded up
- * in advance to PAGE_SIZE.
- */
- mm->key.len = roundup(len, PAGE_SIZE);
- INIT_LIST_HEAD(&mm->entry);
-
- mutex_lock(&uctx->mm_list_lock);
- list_add(&mm->entry, &uctx->mm_head);
- mutex_unlock(&uctx->mm_list_lock);
-
- DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
- "added (addr=0x%llx,len=0x%lx) for ctx=%p\n",
- (unsigned long long)mm->key.phy_addr,
- (unsigned long)mm->key.len, uctx);
-
- return 0;
-}
-
-static bool qedr_search_mmap(struct qedr_ucontext *uctx, u64 phy_addr,
- unsigned long len)
-{
- bool found = false;
- struct qedr_mm *mm;
-
- mutex_lock(&uctx->mm_list_lock);
- list_for_each_entry(mm, &uctx->mm_head, entry) {
- if (len != mm->key.len || phy_addr != mm->key.phy_addr)
- continue;
-
- found = true;
- break;
- }
- mutex_unlock(&uctx->mm_list_lock);
- DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
- "searched for (addr=0x%llx,len=0x%lx) for ctx=%p, result=%d\n",
- mm->key.phy_addr, mm->key.len, uctx, found);
-
- return found;
-}
-
int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
{
struct ib_device *ibdev = uctx->device;
@@ -318,6 +268,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
struct qedr_alloc_ucontext_resp uresp = {};
struct qedr_dev *dev = get_qedr_dev(ibdev);
struct qed_rdma_add_user_out_params oparams;
+ u64 key;
if (!udata)
return -EFAULT;
@@ -334,13 +285,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
ctx->dpi_addr = oparams.dpi_addr;
ctx->dpi_phys_addr = oparams.dpi_phys_addr;
ctx->dpi_size = oparams.dpi_size;
- INIT_LIST_HEAD(&ctx->mm_head);
- mutex_init(&ctx->mm_list_lock);
+
+ key = rdma_user_mmap_entry_insert(uctx, ctx,
+ ctx->dpi_phys_addr, ctx->dpi_size,
+ QEDR_USER_MMAP_IO_WC);
+ if (key == RDMA_USER_MMAP_INVALID)
+ return -ENOMEM;
uresp.dpm_enabled = dev->user_dpm_enabled;
uresp.wids_enabled = 1;
uresp.wid_count = oparams.wid_count;
- uresp.db_pa = ctx->dpi_phys_addr;
+ uresp.db_pa = key;
uresp.db_size = ctx->dpi_size;
uresp.max_send_wr = dev->attr.max_sqe;
uresp.max_recv_wr = dev->attr.max_rqe;
@@ -356,10 +311,6 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
ctx->dev = dev;
- rc = qedr_add_mmap(ctx, ctx->dpi_phys_addr, ctx->dpi_size);
- if (rc)
- return rc;
-
DP_DEBUG(dev, QEDR_MSG_INIT, "Allocating user context %p\n",
&ctx->ibucontext);
return 0;
@@ -368,66 +319,64 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
{
struct qedr_ucontext *uctx = get_qedr_ucontext(ibctx);
- struct qedr_mm *mm, *tmp;
DP_DEBUG(uctx->dev, QEDR_MSG_INIT, "Deallocating user context %p\n",
uctx);
uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
-
- list_for_each_entry_safe(mm, tmp, &uctx->mm_head, entry) {
- DP_DEBUG(uctx->dev, QEDR_MSG_MISC,
- "deleted (addr=0x%llx,len=0x%lx) for ctx=%p\n",
- mm->key.phy_addr, mm->key.len, uctx);
- list_del(&mm->entry);
- kfree(mm);
- }
}
-int qedr_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
{
- struct qedr_ucontext *ucontext = get_qedr_ucontext(context);
- struct qedr_dev *dev = get_qedr_dev(context->device);
- unsigned long phys_addr = vma->vm_pgoff << PAGE_SHIFT;
- unsigned long len = (vma->vm_end - vma->vm_start);
- unsigned long dpi_start;
+ struct ib_device *dev = ucontext->device;
+ u64 length = vma->vm_end - vma->vm_start;
+ u64 key = vma->vm_pgoff << PAGE_SHIFT;
+ struct rdma_user_mmap_entry *entry;
+ u64 pfn;
+ int err;
- dpi_start = dev->db_phys_addr + (ucontext->dpi * ucontext->dpi_size);
+ ibdev_dbg(dev,
+ "start %#lx, end %#lx, length = %#llx, key = %#llx\n",
+ vma->vm_start, vma->vm_end, length, key);
- DP_DEBUG(dev, QEDR_MSG_INIT,
- "mmap invoked with vm_start=0x%pK, vm_end=0x%pK,vm_pgoff=0x%pK; dpi_start=0x%pK dpi_size=0x%x\n",
- (void *)vma->vm_start, (void *)vma->vm_end,
- (void *)vma->vm_pgoff, (void *)dpi_start, ucontext->dpi_size);
-
- if ((vma->vm_start & (PAGE_SIZE - 1)) || (len & (PAGE_SIZE - 1))) {
- DP_ERR(dev,
- "failed mmap, addresses must be page aligned: start=0x%pK, end=0x%pK\n",
- (void *)vma->vm_start, (void *)vma->vm_end);
+ if (length % PAGE_SIZE != 0 || !(vma->vm_flags & VM_SHARED)) {
+ ibdev_dbg(dev,
+ "length[%#llx] is not page size aligned[%#lx] or VM_SHARED is not set [%#lx]\n",
+ length, PAGE_SIZE, vma->vm_flags);
return -EINVAL;
}
- if (!qedr_search_mmap(ucontext, phys_addr, len)) {
- DP_ERR(dev, "failed mmap, vm_pgoff=0x%lx is not authorized\n",
- vma->vm_pgoff);
- return -EINVAL;
+ if (vma->vm_flags & VM_EXEC) {
+ ibdev_dbg(dev, "Mapping executable pages is not permitted\n");
+ return -EPERM;
}
+ vma->vm_flags &= ~VM_MAYEXEC;
- if (phys_addr < dpi_start ||
- ((phys_addr + len) > (dpi_start + ucontext->dpi_size))) {
- DP_ERR(dev,
- "failed mmap, pages are outside of dpi; page address=0x%pK, dpi_start=0x%pK, dpi_size=0x%x\n",
- (void *)phys_addr, (void *)dpi_start,
- ucontext->dpi_size);
+ entry = rdma_user_mmap_entry_get(ucontext, key, length);
+ if (!entry) {
+ ibdev_dbg(dev, "key[%#llx] does not have valid entry\n",
+ key);
return -EINVAL;
}
- if (vma->vm_flags & VM_READ) {
- DP_ERR(dev, "failed mmap, cannot map doorbell bar for read\n");
- return -EINVAL;
+ ibdev_dbg(dev,
+ "Mapping address[%#llx], length[%#llx], mmap_flag[%d]\n",
+ entry->address, length, entry->mmap_flag);
+
+ pfn = entry->address >> PAGE_SHIFT;
+ switch (entry->mmap_flag) {
+ case QEDR_USER_MMAP_IO_WC:
+ err = rdma_user_mmap_io(ucontext, vma, pfn, length,
+ pgprot_writecombine(vma->vm_page_prot));
+ break;
+ default:
+ err = -EINVAL;
}
- vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);
- return io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff, len,
- vma->vm_page_prot);
+ ibdev_dbg(dev,
+ "Couldn't mmap address[%#llx] length[%#llx] mmap_flag[%d] err[%d]\n",
+ entry->address, length, entry->mmap_flag, err);
+
+ return err;
}
int qedr_alloc_pd(struct ib_pd *ibpd, struct ib_udata *udata)
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 9aaa90283d6e..724d0983e972 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -46,7 +46,7 @@ int qedr_query_pkey(struct ib_device *, u8 port, u16 index, u16 *pkey);
int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
-int qedr_mmap(struct ib_ucontext *, struct vm_area_struct *vma);
+int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
--
2.14.5
^ permalink raw reply related
* [PATCH v6 rdma-next 2/6] RDMA/efa: Use the common mmap_xa helpers
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>
Remove the functions related to managing the mmap_xa database.
This code was copied to the ib_core. Use the common API's instead.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
drivers/infiniband/hw/efa/efa.h | 3 +-
drivers/infiniband/hw/efa/efa_main.c | 1 +
drivers/infiniband/hw/efa/efa_verbs.c | 186 ++++++++--------------------------
3 files changed, 44 insertions(+), 146 deletions(-)
diff --git a/drivers/infiniband/hw/efa/efa.h b/drivers/infiniband/hw/efa/efa.h
index 119f8efec564..350754ac240e 100644
--- a/drivers/infiniband/hw/efa/efa.h
+++ b/drivers/infiniband/hw/efa/efa.h
@@ -71,8 +71,6 @@ struct efa_dev {
struct efa_ucontext {
struct ib_ucontext ibucontext;
- struct xarray mmap_xa;
- u32 mmap_xa_page;
u16 uarn;
};
@@ -147,6 +145,7 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata);
void efa_dealloc_ucontext(struct ib_ucontext *ibucontext);
int efa_mmap(struct ib_ucontext *ibucontext,
struct vm_area_struct *vma);
+void efa_mmap_free(struct rdma_user_mmap_entry *entry);
int efa_create_ah(struct ib_ah *ibah,
struct rdma_ah_attr *ah_attr,
u32 flags,
diff --git a/drivers/infiniband/hw/efa/efa_main.c b/drivers/infiniband/hw/efa/efa_main.c
index dd1c6d49466f..65508c73accd 100644
--- a/drivers/infiniband/hw/efa/efa_main.c
+++ b/drivers/infiniband/hw/efa/efa_main.c
@@ -215,6 +215,7 @@ static const struct ib_device_ops efa_dev_ops = {
.get_link_layer = efa_port_link_layer,
.get_port_immutable = efa_get_port_immutable,
.mmap = efa_mmap,
+ .mmap_free = efa_mmap_free,
.modify_qp = efa_modify_qp,
.query_device = efa_query_device,
.query_gid = efa_query_gid,
diff --git a/drivers/infiniband/hw/efa/efa_verbs.c b/drivers/infiniband/hw/efa/efa_verbs.c
index df77bc312a25..419170952760 100644
--- a/drivers/infiniband/hw/efa/efa_verbs.c
+++ b/drivers/infiniband/hw/efa/efa_verbs.c
@@ -13,10 +13,6 @@
#include "efa.h"
-#define EFA_MMAP_FLAG_SHIFT 56
-#define EFA_MMAP_PAGE_MASK GENMASK(EFA_MMAP_FLAG_SHIFT - 1, 0)
-#define EFA_MMAP_INVALID U64_MAX
-
enum {
EFA_MMAP_DMA_PAGE = 0,
EFA_MMAP_IO_WC,
@@ -27,20 +23,6 @@ enum {
(BIT(EFA_ADMIN_FATAL_ERROR) | BIT(EFA_ADMIN_WARNING) | \
BIT(EFA_ADMIN_NOTIFICATION) | BIT(EFA_ADMIN_KEEP_ALIVE))
-struct efa_mmap_entry {
- void *obj;
- u64 address;
- u64 length;
- u32 mmap_page;
- u8 mmap_flag;
-};
-
-static inline u64 get_mmap_key(const struct efa_mmap_entry *efa)
-{
- return ((u64)efa->mmap_flag << EFA_MMAP_FLAG_SHIFT) |
- ((u64)efa->mmap_page << PAGE_SHIFT);
-}
-
#define EFA_CHUNK_PAYLOAD_SHIFT 12
#define EFA_CHUNK_PAYLOAD_SIZE BIT(EFA_CHUNK_PAYLOAD_SHIFT)
#define EFA_CHUNK_PAYLOAD_PTR_SIZE 8
@@ -145,106 +127,6 @@ static void *efa_zalloc_mapped(struct efa_dev *dev, dma_addr_t *dma_addr,
return addr;
}
-/*
- * This is only called when the ucontext is destroyed and there can be no
- * concurrent query via mmap or allocate on the xarray, thus we can be sure no
- * other thread is using the entry pointer. We also know that all the BAR
- * pages have either been zap'd or munmaped at this point. Normal pages are
- * refcounted and will be freed at the proper time.
- */
-static void mmap_entries_remove_free(struct efa_dev *dev,
- struct efa_ucontext *ucontext)
-{
- struct efa_mmap_entry *entry;
- unsigned long mmap_page;
-
- xa_for_each(&ucontext->mmap_xa, mmap_page, entry) {
- xa_erase(&ucontext->mmap_xa, mmap_page);
-
- ibdev_dbg(
- &dev->ibdev,
- "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
- entry->obj, get_mmap_key(entry), entry->address,
- entry->length);
- if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
- /* DMA mapping is already gone, now free the pages */
- free_pages_exact(phys_to_virt(entry->address),
- entry->length);
- kfree(entry);
- }
-}
-
-static struct efa_mmap_entry *mmap_entry_get(struct efa_dev *dev,
- struct efa_ucontext *ucontext,
- u64 key, u64 len)
-{
- struct efa_mmap_entry *entry;
- u64 mmap_page;
-
- mmap_page = (key & EFA_MMAP_PAGE_MASK) >> PAGE_SHIFT;
- if (mmap_page > U32_MAX)
- return NULL;
-
- entry = xa_load(&ucontext->mmap_xa, mmap_page);
- if (!entry || get_mmap_key(entry) != key || entry->length != len)
- return NULL;
-
- ibdev_dbg(&dev->ibdev,
- "mmap: obj[0x%p] key[%#llx] addr[%#llx] len[%#llx] removed\n",
- entry->obj, key, entry->address, entry->length);
-
- return entry;
-}
-
-/*
- * Note this locking scheme cannot support removal of entries, except during
- * ucontext destruction when the core code guarentees no concurrency.
- */
-static u64 mmap_entry_insert(struct efa_dev *dev, struct efa_ucontext *ucontext,
- void *obj, u64 address, u64 length, u8 mmap_flag)
-{
- struct efa_mmap_entry *entry;
- u32 next_mmap_page;
- int err;
-
- entry = kmalloc(sizeof(*entry), GFP_KERNEL);
- if (!entry)
- return EFA_MMAP_INVALID;
-
- entry->obj = obj;
- entry->address = address;
- entry->length = length;
- entry->mmap_flag = mmap_flag;
-
- xa_lock(&ucontext->mmap_xa);
- if (check_add_overflow(ucontext->mmap_xa_page,
- (u32)(length >> PAGE_SHIFT),
- &next_mmap_page))
- goto err_unlock;
-
- entry->mmap_page = ucontext->mmap_xa_page;
- ucontext->mmap_xa_page = next_mmap_page;
- err = __xa_insert(&ucontext->mmap_xa, entry->mmap_page, entry,
- GFP_KERNEL);
- if (err)
- goto err_unlock;
-
- xa_unlock(&ucontext->mmap_xa);
-
- ibdev_dbg(
- &dev->ibdev,
- "mmap: obj[0x%p] addr[%#llx], len[%#llx], key[%#llx] inserted\n",
- entry->obj, entry->address, entry->length, get_mmap_key(entry));
-
- return get_mmap_key(entry);
-
-err_unlock:
- xa_unlock(&ucontext->mmap_xa);
- kfree(entry);
- return EFA_MMAP_INVALID;
-
-}
-
int efa_query_device(struct ib_device *ibdev,
struct ib_device_attr *props,
struct ib_udata *udata)
@@ -488,45 +370,53 @@ static int qp_mmap_entries_setup(struct efa_qp *qp,
struct efa_com_create_qp_params *params,
struct efa_ibv_create_qp_resp *resp)
{
+ u64 address;
+ u64 length;
+
/*
* Once an entry is inserted it might be mmapped, hence cannot be
* cleaned up until dealloc_ucontext.
*/
resp->sq_db_mmap_key =
- mmap_entry_insert(dev, ucontext, qp,
- dev->db_bar_addr + resp->sq_db_offset,
- PAGE_SIZE, EFA_MMAP_IO_NC);
- if (resp->sq_db_mmap_key == EFA_MMAP_INVALID)
+ rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+ dev->db_bar_addr +
+ resp->sq_db_offset,
+ PAGE_SIZE, EFA_MMAP_IO_NC);
+ if (resp->sq_db_mmap_key == RDMA_USER_MMAP_INVALID)
return -ENOMEM;
resp->sq_db_offset &= ~PAGE_MASK;
+ address = dev->mem_bar_addr + resp->llq_desc_offset;
+ length = PAGE_ALIGN(params->sq_ring_size_in_bytes +
+ (resp->llq_desc_offset & ~PAGE_MASK));
resp->llq_desc_mmap_key =
- mmap_entry_insert(dev, ucontext, qp,
- dev->mem_bar_addr + resp->llq_desc_offset,
- PAGE_ALIGN(params->sq_ring_size_in_bytes +
- (resp->llq_desc_offset & ~PAGE_MASK)),
- EFA_MMAP_IO_WC);
- if (resp->llq_desc_mmap_key == EFA_MMAP_INVALID)
+ rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+ address,
+ length,
+ EFA_MMAP_IO_WC);
+ if (resp->llq_desc_mmap_key == RDMA_USER_MMAP_INVALID)
return -ENOMEM;
resp->llq_desc_offset &= ~PAGE_MASK;
if (qp->rq_size) {
+ address = dev->db_bar_addr + resp->rq_db_offset;
resp->rq_db_mmap_key =
- mmap_entry_insert(dev, ucontext, qp,
- dev->db_bar_addr + resp->rq_db_offset,
- PAGE_SIZE, EFA_MMAP_IO_NC);
- if (resp->rq_db_mmap_key == EFA_MMAP_INVALID)
+ rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+ address, PAGE_SIZE,
+ EFA_MMAP_IO_NC);
+ if (resp->rq_db_mmap_key == RDMA_USER_MMAP_INVALID)
return -ENOMEM;
resp->rq_db_offset &= ~PAGE_MASK;
+ address = virt_to_phys(qp->rq_cpu_addr);
resp->rq_mmap_key =
- mmap_entry_insert(dev, ucontext, qp,
- virt_to_phys(qp->rq_cpu_addr),
- qp->rq_size, EFA_MMAP_DMA_PAGE);
- if (resp->rq_mmap_key == EFA_MMAP_INVALID)
+ rdma_user_mmap_entry_insert(&ucontext->ibucontext, qp,
+ address, qp->rq_size,
+ EFA_MMAP_DMA_PAGE);
+ if (resp->rq_mmap_key == RDMA_USER_MMAP_INVALID)
return -ENOMEM;
resp->rq_mmap_size = qp->rq_size;
@@ -875,11 +765,14 @@ void efa_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
static int cq_mmap_entries_setup(struct efa_dev *dev, struct efa_cq *cq,
struct efa_ibv_create_cq_resp *resp)
{
+ struct efa_ucontext *ucontext = cq->ucontext;
+
resp->q_mmap_size = cq->size;
- resp->q_mmap_key = mmap_entry_insert(dev, cq->ucontext, cq,
- virt_to_phys(cq->cpu_addr),
- cq->size, EFA_MMAP_DMA_PAGE);
- if (resp->q_mmap_key == EFA_MMAP_INVALID)
+ resp->q_mmap_key =
+ rdma_user_mmap_entry_insert(&ucontext->ibucontext, cq,
+ virt_to_phys(cq->cpu_addr),
+ cq->size, EFA_MMAP_DMA_PAGE);
+ if (resp->q_mmap_key == RDMA_USER_MMAP_INVALID)
return -ENOMEM;
return 0;
@@ -1531,7 +1424,6 @@ int efa_alloc_ucontext(struct ib_ucontext *ibucontext, struct ib_udata *udata)
goto err_out;
ucontext->uarn = result.uarn;
- xa_init(&ucontext->mmap_xa);
resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_QUERY_DEVICE;
resp.cmds_supp_udata_mask |= EFA_USER_CMDS_SUPP_UDATA_CREATE_AH;
@@ -1560,19 +1452,25 @@ void efa_dealloc_ucontext(struct ib_ucontext *ibucontext)
struct efa_ucontext *ucontext = to_eucontext(ibucontext);
struct efa_dev *dev = to_edev(ibucontext->device);
- mmap_entries_remove_free(dev, ucontext);
efa_dealloc_uar(dev, ucontext->uarn);
}
+void efa_mmap_free(struct rdma_user_mmap_entry *entry)
+{
+ /* DMA mapping is already gone, now free the pages */
+ if (entry->mmap_flag == EFA_MMAP_DMA_PAGE)
+ free_pages_exact(phys_to_virt(entry->address), entry->length);
+}
+
static int __efa_mmap(struct efa_dev *dev, struct efa_ucontext *ucontext,
struct vm_area_struct *vma, u64 key, u64 length)
{
- struct efa_mmap_entry *entry;
+ struct rdma_user_mmap_entry *entry;
unsigned long va;
u64 pfn;
int err;
- entry = mmap_entry_get(dev, ucontext, key, length);
+ entry = rdma_user_mmap_entry_get(&ucontext->ibucontext, key, length);
if (!entry) {
ibdev_dbg(&dev->ibdev, "key[%#llx] does not have valid entry\n",
key);
--
2.14.5
^ permalink raw reply related
* [PATCH v6 rdma-next 0/6] RDMA/qedr: Use the doorbell overflow recovery mechanism for RDMA
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
This patch series uses the doorbell overflow recovery mechanism
introduced in
commit 36907cd5cd72 ("qed: Add doorbell overflow recovery mechanism")
for rdma ( RoCE and iWARP )
The first three patches modify the core code to contain helper
functions for managing mmap_xa inserting, getting and freeing
entries. The code was taken almost as is from the efa driver.
There is still an open discussion on whether we should take
this even further and make the entire mmap generic. Until a
decision is made, I only created the database API and modified
the efa and qedr driver to use it. The doorbell recovery code will be based
on the common code.
Efa driver was compile tested only.
rdma-core pull request #493
Changes from V5:
- Switch between driver dealloc_ucontext and mmap_entries_remove.
- No need to verify the key after using the key to load an entry from
the mmap_xa.
- Change mmap_free api to pass an 'entry' object.
- Add documentation for mmap_free and for newly exported functions.
- Fix some extra/missing line breaks.
Changes from V4:
- Add common mmap database and cookie helper functions.
Changes from V3:
- Remove casts from void to u8. Pointer arithmetic can be done on void
- rebase to tip of rdma-next
Changes from V2:
- Don't use long-lived kmap. Instead use user-trigger mmap for the
doorbell recovery entries.
- Modify dpi_addr to be denoted with __iomem and avoid redundant
casts
Changes from V1:
- call kmap to map virtual address into kernel space
- modify db_rec_delete to be void
- remove some cpu_to_le16 that were added to previous patch which are
correct but not related to the overflow recovery mechanism. Will be
submitted as part of a different patch
Michal Kalderon (6):
RDMA/core: Create mmap database and cookie helper functions
RDMA/efa: Use the common mmap_xa helpers
RDMA/qedr: Use the common mmap API
qed*: Change dpi_addr to be denoted with __iomem
RDMA/qedr: Add doorbell overflow recovery support
RDMA/qedr: Add iWARP doorbell recovery support
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/rdma_core.c | 1 +
drivers/infiniband/core/uverbs_cmd.c | 1 +
drivers/infiniband/core/uverbs_main.c | 135 +++++++++
drivers/infiniband/hw/efa/efa.h | 3 +-
drivers/infiniband/hw/efa/efa_main.c | 1 +
drivers/infiniband/hw/efa/efa_verbs.c | 186 +++---------
drivers/infiniband/hw/qedr/main.c | 3 +-
drivers/infiniband/hw/qedr/qedr.h | 32 +-
drivers/infiniband/hw/qedr/verbs.c | 463 ++++++++++++++++++++---------
drivers/infiniband/hw/qedr/verbs.h | 4 +-
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 5 +-
include/linux/qed/qed_rdma_if.h | 2 +-
include/rdma/ib_verbs.h | 46 +++
include/uapi/rdma/qedr-abi.h | 25 ++
15 files changed, 600 insertions(+), 308 deletions(-)
--
2.14.5
^ permalink raw reply
* [PATCH v6 rdma-next 5/6] RDMA/qedr: Add doorbell overflow recovery support
From: Michal Kalderon @ 2019-07-09 14:17 UTC (permalink / raw)
To: michal.kalderon, ariel.elior, jgg, dledford, galpress
Cc: linux-rdma, davem, netdev
In-Reply-To: <20190709141735.19193-1-michal.kalderon@marvell.com>
Use the doorbell recovery mechanism to register rdma related doorbells
that will be restored in case there is a doorbell overflow attention.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com>
Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com>
---
drivers/infiniband/hw/qedr/main.c | 1 +
drivers/infiniband/hw/qedr/qedr.h | 7 +
drivers/infiniband/hw/qedr/verbs.c | 273 ++++++++++++++++++++++++++++++++-----
drivers/infiniband/hw/qedr/verbs.h | 2 +
include/uapi/rdma/qedr-abi.h | 25 ++++
5 files changed, 273 insertions(+), 35 deletions(-)
diff --git a/drivers/infiniband/hw/qedr/main.c b/drivers/infiniband/hw/qedr/main.c
index 3db4b6ba5ad6..34225c88f03d 100644
--- a/drivers/infiniband/hw/qedr/main.c
+++ b/drivers/infiniband/hw/qedr/main.c
@@ -206,6 +206,7 @@ static const struct ib_device_ops qedr_dev_ops = {
.get_link_layer = qedr_link_layer,
.map_mr_sg = qedr_map_mr_sg,
.mmap = qedr_mmap,
+ .mmap_free = qedr_mmap_free,
.modify_port = qedr_modify_port,
.modify_qp = qedr_modify_qp,
.modify_srq = qedr_modify_srq,
diff --git a/drivers/infiniband/hw/qedr/qedr.h b/drivers/infiniband/hw/qedr/qedr.h
index 7e80ce521d8d..8aed24b32de6 100644
--- a/drivers/infiniband/hw/qedr/qedr.h
+++ b/drivers/infiniband/hw/qedr/qedr.h
@@ -231,6 +231,7 @@ struct qedr_ucontext {
u64 dpi_phys_addr;
u32 dpi_size;
u16 dpi;
+ bool db_rec;
};
union db_prod64 {
@@ -258,6 +259,12 @@ struct qedr_userq {
struct qedr_pbl *pbl_tbl;
u64 buf_addr;
size_t buf_len;
+
+ /* doorbell recovery */
+ void __iomem *db_addr;
+ struct qedr_user_db_rec *db_rec_data;
+ u64 db_rec_phys;
+ u64 db_rec_key;
};
struct qedr_cq {
diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c
index f33f0f1e7d76..b0b9ec70f2fd 100644
--- a/drivers/infiniband/hw/qedr/verbs.c
+++ b/drivers/infiniband/hw/qedr/verbs.c
@@ -60,6 +60,7 @@
enum {
QEDR_USER_MMAP_IO_WC = 0,
+ QEDR_USER_MMAP_PHYS_PAGE,
};
static inline int qedr_ib_copy_to_udata(struct ib_udata *udata, void *src,
@@ -266,6 +267,7 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
int rc;
struct qedr_ucontext *ctx = get_qedr_ucontext(uctx);
struct qedr_alloc_ucontext_resp uresp = {};
+ struct qedr_alloc_ucontext_req ureq = {};
struct qedr_dev *dev = get_qedr_dev(ibdev);
struct qed_rdma_add_user_out_params oparams;
u64 key;
@@ -273,6 +275,17 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata)
if (!udata)
return -EFAULT;
+ if (udata->inlen) {
+ rc = ib_copy_from_udata(&ureq, udata,
+ min(sizeof(ureq), udata->inlen));
+ if (rc) {
+ DP_ERR(dev, "Problem copying data from user space\n");
+ return -EFAULT;
+ }
+
+ ctx->db_rec = !!(ureq.context_flags & QEDR_ALLOC_UCTX_DB_REC);
+ }
+
rc = dev->ops->rdma_add_user(dev->rdma_ctx, &oparams);
if (rc) {
DP_ERR(dev,
@@ -325,6 +338,13 @@ void qedr_dealloc_ucontext(struct ib_ucontext *ibctx)
uctx->dev->ops->rdma_remove_user(uctx->dev->rdma_ctx, uctx->dpi);
}
+void qedr_mmap_free(struct rdma_user_mmap_entry *entry)
+{
+ /* DMA mapping is already gone, now free the pages */
+ if (entry->mmap_flag == QEDR_USER_MMAP_PHYS_PAGE)
+ free_page((unsigned long)phys_to_virt(entry->address));
+}
+
int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
{
struct ib_device *dev = ucontext->device;
@@ -368,6 +388,11 @@ int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma)
err = rdma_user_mmap_io(ucontext, vma, pfn, length,
pgprot_writecombine(vma->vm_page_prot));
break;
+ case QEDR_USER_MMAP_PHYS_PAGE:
+ err = vm_insert_page(vma, vma->vm_start, pfn_to_page(pfn));
+ if (err)
+ break;
+ break;
default:
err = -EINVAL;
}
@@ -606,16 +631,48 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem,
}
}
+static int qedr_db_recovery_add(struct qedr_dev *dev,
+ void __iomem *db_addr,
+ void *db_data,
+ enum qed_db_rec_width db_width,
+ enum qed_db_rec_space db_space)
+{
+ if (!db_data) {
+ DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+ return 0;
+ }
+
+ return dev->ops->common->db_recovery_add(dev->cdev, db_addr, db_data,
+ db_width, db_space);
+}
+
+static void qedr_db_recovery_del(struct qedr_dev *dev,
+ void __iomem *db_addr,
+ void *db_data)
+{
+ if (!db_data) {
+ DP_DEBUG(dev, QEDR_MSG_INIT, "avoiding db rec since old lib\n");
+ return;
+ }
+
+ /* Ignore return code as there is not much we can do about it. Error
+ * log will be printed inside.
+ */
+ dev->ops->common->db_recovery_del(dev->cdev, db_addr, db_data);
+}
+
static int qedr_copy_cq_uresp(struct qedr_dev *dev,
- struct qedr_cq *cq, struct ib_udata *udata)
+ struct qedr_cq *cq, struct ib_udata *udata,
+ u32 db_offset)
{
struct qedr_create_cq_uresp uresp;
int rc;
memset(&uresp, 0, sizeof(uresp));
- uresp.db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+ uresp.db_offset = db_offset;
uresp.icid = cq->icid;
+ uresp.db_rec_addr = cq->q.db_rec_key;
rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
if (rc)
@@ -643,10 +700,42 @@ static inline int qedr_align_cq_entries(int entries)
return aligned_size / QEDR_CQE_SIZE;
}
+static int qedr_init_user_db_rec(struct ib_udata *udata,
+ struct qedr_dev *dev, struct qedr_userq *q,
+ bool requires_db_rec)
+{
+ struct qedr_ucontext *uctx =
+ rdma_udata_to_drv_context(udata, struct qedr_ucontext,
+ ibucontext);
+
+ /* Aborting for non doorbell userqueue (SRQ) or non-supporting lib */
+ if (requires_db_rec == 0 || !uctx->db_rec)
+ return 0;
+
+ /* Allocate a page for doorbell recovery, add to mmap ) */
+ q->db_rec_data = (void *)get_zeroed_page(GFP_KERNEL);
+ if (!q->db_rec_data) {
+ DP_ERR(dev,
+ "get_free_page failed\n");
+ return -ENOMEM;
+ }
+
+ q->db_rec_phys = virt_to_phys(q->db_rec_data);
+ q->db_rec_key = rdma_user_mmap_entry_insert(&uctx->ibucontext, q,
+ q->db_rec_phys,
+ PAGE_SIZE,
+ QEDR_USER_MMAP_PHYS_PAGE);
+ if (q->db_rec_key == RDMA_USER_MMAP_INVALID)
+ return -ENOMEM;
+
+ return 0;
+}
+
static inline int qedr_init_user_queue(struct ib_udata *udata,
struct qedr_dev *dev,
struct qedr_userq *q, u64 buf_addr,
- size_t buf_len, int access, int dmasync,
+ size_t buf_len, bool requires_db_rec,
+ int access, int dmasync,
int alloc_and_init)
{
u32 fw_pages;
@@ -684,7 +773,8 @@ static inline int qedr_init_user_queue(struct ib_udata *udata,
}
}
- return 0;
+ /* mmap the user address used to store doorbell data for recovery */
+ return qedr_init_user_db_rec(udata, dev, q, requires_db_rec);
err0:
ib_umem_release(q->umem);
@@ -770,6 +860,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
int entries = attr->cqe;
struct qedr_cq *cq = get_qedr_cq(ibcq);
int chain_entries;
+ u32 db_offset;
int page_cnt;
u64 pbl_ptr;
u16 icid;
@@ -789,8 +880,12 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
chain_entries = qedr_align_cq_entries(entries);
chain_entries = min_t(int, chain_entries, QEDR_MAX_CQES);
+ /* calc db offset. user will add DPI base, kernel will add db addr */
+ db_offset = DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
+
if (udata) {
- if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+ if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+ udata->inlen))) {
DP_ERR(dev,
"create cq: problem copying data from user space\n");
goto err0;
@@ -805,8 +900,9 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
cq->cq_type = QEDR_CQ_TYPE_USER;
rc = qedr_init_user_queue(udata, dev, &cq->q, ureq.addr,
- ureq.len, IB_ACCESS_LOCAL_WRITE, 1,
- 1);
+ ureq.len, true,
+ IB_ACCESS_LOCAL_WRITE,
+ 1, 1);
if (rc)
goto err0;
@@ -814,6 +910,7 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
page_cnt = cq->q.pbl_info.num_pbes;
cq->ibcq.cqe = chain_entries;
+ cq->q.db_addr = ctx->dpi_addr + db_offset;
} else {
cq->cq_type = QEDR_CQ_TYPE_KERNEL;
@@ -844,14 +941,21 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
spin_lock_init(&cq->cq_lock);
if (udata) {
- rc = qedr_copy_cq_uresp(dev, cq, udata);
+ rc = qedr_copy_cq_uresp(dev, cq, udata, db_offset);
+ if (rc)
+ goto err3;
+
+ rc = qedr_db_recovery_add(dev, cq->q.db_addr,
+ &cq->q.db_rec_data->db_data,
+ DB_REC_WIDTH_64B,
+ DB_REC_USER);
if (rc)
goto err3;
+
} else {
/* Generate doorbell address. */
- cq->db_addr = dev->db_addr +
- DB_ADDR_SHIFT(DQ_PWM_OFFSET_UCM_RDMA_CQ_CONS_32BIT);
cq->db.data.icid = cq->icid;
+ cq->db_addr = dev->db_addr + db_offset;
cq->db.data.params = DB_AGG_CMD_SET <<
RDMA_PWM_VAL32_DATA_AGG_CMD_SHIFT;
@@ -861,6 +965,11 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
cq->latest_cqe = NULL;
consume_cqe(cq);
cq->cq_cons = qed_chain_get_cons_idx_u32(&cq->pbl);
+
+ rc = qedr_db_recovery_add(dev, cq->db_addr, &cq->db.data,
+ DB_REC_WIDTH_64B, DB_REC_KERNEL);
+ if (rc)
+ goto err3;
}
DP_DEBUG(dev, QEDR_MSG_CQ,
@@ -879,8 +988,18 @@ int qedr_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
else
dev->ops->common->chain_free(dev->cdev, &cq->pbl);
err1:
- if (udata)
+ if (udata) {
ib_umem_release(cq->q.umem);
+ if (cq->q.db_rec_data) {
+ qedr_db_recovery_del(dev, cq->q.db_addr,
+ &cq->q.db_rec_data->db_data);
+ if (cq->q.db_rec_key == RDMA_USER_MMAP_INVALID)
+ free_page((unsigned long)cq->q.db_rec_data);
+ /* o/w will be freed by ib_uverbs on context free */
+ }
+ } else {
+ qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
+ }
err0:
return -EINVAL;
}
@@ -911,8 +1030,10 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
cq->destroyed = 1;
/* GSIs CQs are handled by driver, so they don't exist in the FW */
- if (cq->cq_type == QEDR_CQ_TYPE_GSI)
+ if (cq->cq_type == QEDR_CQ_TYPE_GSI) {
+ qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
return;
+ }
iparams.icid = cq->icid;
dev->ops->rdma_destroy_cq(dev->rdma_ctx, &iparams, &oparams);
@@ -921,6 +1042,12 @@ void qedr_destroy_cq(struct ib_cq *ibcq, struct ib_udata *udata)
if (udata) {
qedr_free_pbl(dev, &cq->q.pbl_info, cq->q.pbl_tbl);
ib_umem_release(cq->q.umem);
+
+ if (cq->q.db_rec_data)
+ qedr_db_recovery_del(dev, cq->q.db_addr,
+ &cq->q.db_rec_data->db_data);
+ } else {
+ qedr_db_recovery_del(dev, cq->db_addr, &cq->db.data);
}
/* We don't want the IRQ handler to handle a non-existing CQ so we
@@ -1085,8 +1212,8 @@ static int qedr_copy_srq_uresp(struct qedr_dev *dev,
}
static void qedr_copy_rq_uresp(struct qedr_dev *dev,
- struct qedr_create_qp_uresp *uresp,
- struct qedr_qp *qp)
+ struct qedr_create_qp_uresp *uresp,
+ struct qedr_qp *qp)
{
/* iWARP requires two doorbells per RQ. */
if (rdma_protocol_iwarp(&dev->ibdev, 1)) {
@@ -1099,6 +1226,7 @@ static void qedr_copy_rq_uresp(struct qedr_dev *dev,
}
uresp->rq_icid = qp->icid;
+ uresp->rq_db_rec_addr = qp->urq.db_rec_key;
}
static void qedr_copy_sq_uresp(struct qedr_dev *dev,
@@ -1112,22 +1240,24 @@ static void qedr_copy_sq_uresp(struct qedr_dev *dev,
uresp->sq_icid = qp->icid;
else
uresp->sq_icid = qp->icid + 1;
+
+ uresp->sq_db_rec_addr = qp->usq.db_rec_key;
}
static int qedr_copy_qp_uresp(struct qedr_dev *dev,
- struct qedr_qp *qp, struct ib_udata *udata)
+ struct qedr_qp *qp, struct ib_udata *udata,
+ struct qedr_create_qp_uresp *uresp)
{
- struct qedr_create_qp_uresp uresp;
int rc;
- memset(&uresp, 0, sizeof(uresp));
- qedr_copy_sq_uresp(dev, &uresp, qp);
- qedr_copy_rq_uresp(dev, &uresp, qp);
+ memset(uresp, 0, sizeof(*uresp));
+ qedr_copy_sq_uresp(dev, uresp, qp);
+ qedr_copy_rq_uresp(dev, uresp, qp);
- uresp.atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
- uresp.qp_id = qp->qp_id;
+ uresp->atomic_supported = dev->atomic_cap != IB_ATOMIC_NONE;
+ uresp->qp_id = qp->qp_id;
- rc = qedr_ib_copy_to_udata(udata, &uresp, sizeof(uresp));
+ rc = qedr_ib_copy_to_udata(udata, uresp, sizeof(*uresp));
if (rc)
DP_ERR(dev,
"create qp: failed a copy to user space with qp icid=0x%x.\n",
@@ -1171,16 +1301,35 @@ static void qedr_set_common_qp_params(struct qedr_dev *dev,
qp->sq.max_sges, qp->sq_cq->icid);
}
-static void qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_roce_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
{
+ int rc;
+
qp->sq.db = dev->db_addr +
DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
qp->sq.db_data.data.icid = qp->icid + 1;
+ rc = qedr_db_recovery_add(dev, qp->sq.db,
+ &qp->sq.db_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_KERNEL);
+ if (rc)
+ return rc;
+
if (!qp->srq) {
qp->rq.db = dev->db_addr +
DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_ROCE_RQ_PROD);
qp->rq.db_data.data.icid = qp->icid;
+
+ rc = qedr_db_recovery_add(dev, qp->rq.db,
+ &qp->rq.db_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_KERNEL);
+ if (rc)
+ qedr_db_recovery_del(dev, qp->sq.db,
+ &qp->sq.db_data);
}
+
+ return rc;
}
static int qedr_check_srq_params(struct qedr_dev *dev,
@@ -1234,7 +1383,7 @@ static int qedr_init_srq_user_params(struct ib_udata *udata,
int rc;
rc = qedr_init_user_queue(udata, srq->dev, &srq->usrq, ureq->srq_addr,
- ureq->srq_len, access, dmasync, 1);
+ ureq->srq_len, false, access, dmasync, 1);
if (rc)
return rc;
@@ -1330,7 +1479,8 @@ int qedr_create_srq(struct ib_srq *ibsrq, struct ib_srq_init_attr *init_attr,
hw_srq->max_sges = init_attr->attr.max_sge;
if (udata) {
- if (ib_copy_from_udata(&ureq, udata, sizeof(ureq))) {
+ if (ib_copy_from_udata(&ureq, udata, min(sizeof(ureq),
+ udata->inlen))) {
DP_ERR(dev,
"create srq: problem copying data from user space\n");
goto err0;
@@ -1526,6 +1676,14 @@ static void qedr_cleanup_user(struct qedr_dev *dev, struct qedr_qp *qp)
ib_umem_release(qp->urq.umem);
qp->urq.umem = NULL;
+
+ if (qp->usq.db_rec_data)
+ qedr_db_recovery_del(dev, qp->usq.db_addr,
+ &qp->usq.db_rec_data->db_data);
+
+ if (qp->urq.db_rec_data)
+ qedr_db_recovery_del(dev, qp->urq.db_addr,
+ &qp->urq.db_rec_data->db_data);
}
static int qedr_create_user_qp(struct qedr_dev *dev,
@@ -1537,12 +1695,14 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
struct qed_rdma_create_qp_in_params in_params;
struct qed_rdma_create_qp_out_params out_params;
struct qedr_pd *pd = get_qedr_pd(ibpd);
+ struct qedr_create_qp_uresp uresp;
+ struct qedr_ucontext *ctx = NULL;
struct qedr_create_qp_ureq ureq;
int alloc_and_init = rdma_protocol_roce(&dev->ibdev, 1);
int rc = -EINVAL;
memset(&ureq, 0, sizeof(ureq));
- rc = ib_copy_from_udata(&ureq, udata, sizeof(ureq));
+ rc = ib_copy_from_udata(&ureq, udata, min(sizeof(ureq), udata->inlen));
if (rc) {
DP_ERR(dev, "Problem copying data from user space\n");
return rc;
@@ -1550,14 +1710,16 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
/* SQ - read access only (0), dma sync not required (0) */
rc = qedr_init_user_queue(udata, dev, &qp->usq, ureq.sq_addr,
- ureq.sq_len, 0, 0, alloc_and_init);
+ ureq.sq_len, true, 0, 0,
+ alloc_and_init);
if (rc)
return rc;
if (!qp->srq) {
/* RQ - read access only (0), dma sync not required (0) */
rc = qedr_init_user_queue(udata, dev, &qp->urq, ureq.rq_addr,
- ureq.rq_len, 0, 0, alloc_and_init);
+ ureq.rq_len, true,
+ 0, 0, alloc_and_init);
if (rc)
return rc;
}
@@ -1587,13 +1749,31 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
qp->qp_id = out_params.qp_id;
qp->icid = out_params.icid;
- rc = qedr_copy_qp_uresp(dev, qp, udata);
+ rc = qedr_copy_qp_uresp(dev, qp, udata, &uresp);
if (rc)
goto err;
+ /* db offset was calculated in copy_qp_uresp, now set in the user q */
+ ctx = pd->uctx;
+ qp->usq.db_addr = ctx->dpi_addr + uresp.sq_db_offset;
+ qp->urq.db_addr = ctx->dpi_addr + uresp.rq_db_offset;
+
+ rc = qedr_db_recovery_add(dev, qp->usq.db_addr,
+ &qp->usq.db_rec_data->db_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_USER);
+ if (rc)
+ goto err;
+
+ rc = qedr_db_recovery_add(dev, qp->urq.db_addr,
+ &qp->urq.db_rec_data->db_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_USER);
+ if (rc)
+ goto err;
qedr_qp_user_print(dev, qp);
- return 0;
+ return rc;
err:
rc = dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
if (rc)
@@ -1604,12 +1784,21 @@ static int qedr_create_user_qp(struct qedr_dev *dev,
return rc;
}
-static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
+static int qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
{
+ int rc;
+
qp->sq.db = dev->db_addr +
DB_ADDR_SHIFT(DQ_PWM_OFFSET_XCM_RDMA_SQ_PROD);
qp->sq.db_data.data.icid = qp->icid;
+ rc = qedr_db_recovery_add(dev, qp->sq.db,
+ &qp->sq.db_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_KERNEL);
+ if (rc)
+ return rc;
+
qp->rq.db = dev->db_addr +
DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_IWARP_RQ_PROD);
qp->rq.db_data.data.icid = qp->icid;
@@ -1617,6 +1806,13 @@ static void qedr_set_iwarp_db_info(struct qedr_dev *dev, struct qedr_qp *qp)
DB_ADDR_SHIFT(DQ_PWM_OFFSET_TCM_FLAGS);
qp->rq.iwarp_db2_data.data.icid = qp->icid;
qp->rq.iwarp_db2_data.data.value = DQ_TCM_IWARP_POST_RQ_CF_CMD;
+
+ rc = qedr_db_recovery_add(dev, qp->rq.db,
+ &qp->rq.db_data,
+ DB_REC_WIDTH_32B,
+ DB_REC_KERNEL);
+
+ return rc;
}
static int
@@ -1664,8 +1860,7 @@ qedr_roce_create_kernel_qp(struct qedr_dev *dev,
qp->qp_id = out_params.qp_id;
qp->icid = out_params.icid;
- qedr_set_roce_db_info(dev, qp);
- return rc;
+ return qedr_set_roce_db_info(dev, qp);
}
static int
@@ -1723,8 +1918,7 @@ qedr_iwarp_create_kernel_qp(struct qedr_dev *dev,
qp->qp_id = out_params.qp_id;
qp->icid = out_params.icid;
- qedr_set_iwarp_db_info(dev, qp);
- return rc;
+ return qedr_set_iwarp_db_info(dev, qp);
err:
dev->ops->rdma_destroy_qp(dev->rdma_ctx, qp->qed_qp);
@@ -1739,6 +1933,15 @@ static void qedr_cleanup_kernel(struct qedr_dev *dev, struct qedr_qp *qp)
dev->ops->common->chain_free(dev->cdev, &qp->rq.pbl);
kfree(qp->rqe_wr_id);
+
+ /* GSI qp is not registered to db mechanism so no need to delete */
+ if (qp->qp_type == IB_QPT_GSI)
+ return;
+
+ qedr_db_recovery_del(dev, qp->sq.db, &qp->sq.db_data);
+
+ if (!qp->srq)
+ qedr_db_recovery_del(dev, qp->rq.db, &qp->rq.db_data);
}
static int qedr_create_kernel_qp(struct qedr_dev *dev,
diff --git a/drivers/infiniband/hw/qedr/verbs.h b/drivers/infiniband/hw/qedr/verbs.h
index 724d0983e972..830c86561e23 100644
--- a/drivers/infiniband/hw/qedr/verbs.h
+++ b/drivers/infiniband/hw/qedr/verbs.h
@@ -47,6 +47,8 @@ int qedr_alloc_ucontext(struct ib_ucontext *uctx, struct ib_udata *udata);
void qedr_dealloc_ucontext(struct ib_ucontext *uctx);
int qedr_mmap(struct ib_ucontext *ucontext, struct vm_area_struct *vma);
+void qedr_mmap_free(struct rdma_user_mmap_entry *entry);
+
int qedr_alloc_pd(struct ib_pd *pd, struct ib_udata *udata);
void qedr_dealloc_pd(struct ib_pd *pd, struct ib_udata *udata);
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 7a10b3a325fa..c022ee26089b 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -38,6 +38,15 @@
#define QEDR_ABI_VERSION (8)
/* user kernel communication data structures. */
+enum qedr_alloc_ucontext_flags {
+ QEDR_ALLOC_UCTX_RESERVED = 1 << 0,
+ QEDR_ALLOC_UCTX_DB_REC = 1 << 1
+};
+
+struct qedr_alloc_ucontext_req {
+ __u32 context_flags;
+ __u32 reserved;
+};
struct qedr_alloc_ucontext_resp {
__aligned_u64 db_pa;
@@ -74,6 +83,7 @@ struct qedr_create_cq_uresp {
__u32 db_offset;
__u16 icid;
__u16 reserved;
+ __aligned_u64 db_rec_addr;
};
struct qedr_create_qp_ureq {
@@ -109,6 +119,13 @@ struct qedr_create_qp_uresp {
__u32 rq_db2_offset;
__u32 reserved;
+
+ /* address of SQ doorbell recovery user entry */
+ __aligned_u64 sq_db_rec_addr;
+
+ /* address of RQ doorbell recovery user entry */
+ __aligned_u64 rq_db_rec_addr;
+
};
struct qedr_create_srq_ureq {
@@ -128,4 +145,12 @@ struct qedr_create_srq_uresp {
__u32 reserved1;
};
+/* doorbell recovery entry allocated and populated by userspace doorbelling
+ * entities and mapped to kernel. Kernel uses this to register doorbell
+ * information with doorbell drop recovery mechanism.
+ */
+struct qedr_user_db_rec {
+ __aligned_u64 db_data; /* doorbell data */
+};
+
#endif /* __QEDR_USER_H__ */
--
2.14.5
^ permalink raw reply related
* Re: [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules
From: Leon Romanovsky @ 2019-07-09 15:10 UTC (permalink / raw)
To: Jack Wang
Cc: linux-block, linux-rdma, axboe, hch, sagi, bvanassche, jgg,
dledford, danil.kipnis, rpenyaev, Roman Pen, Jack Wang
In-Reply-To: <20190620150337.7847-26-jinpuwang@gmail.com>
On Thu, Jun 20, 2019 at 05:03:37PM +0200, Jack Wang wrote:
> From: Roman Pen <roman.penyaev@profitbricks.com>
>
> Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
> Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> ---
> MAINTAINERS | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a6954776a37e..0b7fd93f738d 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7590,6 +7590,20 @@ IBM ServeRAID RAID DRIVER
> S: Orphan
> F: drivers/scsi/ips.*
>
> +IBNBD BLOCK DRIVERS
> +M: IBNBD/IBTRS Storage Team <ibnbd@cloud.ionos.com>
> +L: linux-block@vger.kernel.org
> +S: Maintained
> +T: git git://github.com/profitbricks/ibnbd.git
> +F: drivers/block/ibnbd/
> +
> +IBTRS TRANSPORT DRIVERS
> +M: IBNBD/IBTRS Storage Team <ibnbd@cloud.ionos.com>
I don't know if it rule or not, but can you please add real
person/persons to Maintainers list? Many times, those global
support lists are simply ignored.
> +L: linux-rdma@vger.kernel.org
> +S: Maintained
> +T: git git://github.com/profitbricks/ibnbd.git
How did you imagine patch flow for ULP, while your tree is
external to RDMA tree?
> +F: drivers/infiniband/ulp/ibtrs/
> +
> ICH LPC AND GPIO DRIVER
> M: Peter Tyser <ptyser@xes-inc.com>
> S: Maintained
> --
> 2.17.1
>
^ permalink raw reply
* Re: [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules
From: Jinpu Wang @ 2019-07-09 15:18 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Jack Wang, linux-block, linux-rdma, Jens Axboe, Christoph Hellwig,
Sagi Grimberg, Bart Van Assche, Jason Gunthorpe, Doug Ledford,
Danil Kipnis, rpenyaev, Roman Pen
In-Reply-To: <20190709151013.GW7034@mtr-leonro.mtl.com>
On Tue, Jul 9, 2019 at 5:10 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Thu, Jun 20, 2019 at 05:03:37PM +0200, Jack Wang wrote:
> > From: Roman Pen <roman.penyaev@profitbricks.com>
> >
> > Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
> > Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> > ---
> > MAINTAINERS | 14 ++++++++++++++
> > 1 file changed, 14 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index a6954776a37e..0b7fd93f738d 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -7590,6 +7590,20 @@ IBM ServeRAID RAID DRIVER
> > S: Orphan
> > F: drivers/scsi/ips.*
> >
> > +IBNBD BLOCK DRIVERS
> > +M: IBNBD/IBTRS Storage Team <ibnbd@cloud.ionos.com>
> > +L: linux-block@vger.kernel.org
> > +S: Maintained
> > +T: git git://github.com/profitbricks/ibnbd.git
> > +F: drivers/block/ibnbd/
> > +
> > +IBTRS TRANSPORT DRIVERS
> > +M: IBNBD/IBTRS Storage Team <ibnbd@cloud.ionos.com>
>
> I don't know if it rule or not, but can you please add real
> person/persons to Maintainers list? Many times, those global
> support lists are simply ignored.
Sure, we can use my and Danil 's name in next round.
>
> > +L: linux-rdma@vger.kernel.org
> > +S: Maintained
> > +T: git git://github.com/profitbricks/ibnbd.git
>
> How did you imagine patch flow for ULP, while your tree is
> external to RDMA tree?
Plan was we gather the patch in the git tree, and
send patches to the list via git send email, do we accept pull request
from github?
What the preferred way?
Thanks Leon.
Jack
>
> > +F: drivers/infiniband/ulp/ibtrs/
> > +
> > ICH LPC AND GPIO DRIVER
> > M: Peter Tyser <ptyser@xes-inc.com>
> > S: Maintained
> > --
> > 2.17.1
> >
^ permalink raw reply
* Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
From: Bart Van Assche @ 2019-07-09 15:39 UTC (permalink / raw)
To: Greg KH, Leon Romanovsky
Cc: Danil Kipnis, Jack Wang, linux-block, linux-rdma, axboe,
Christoph Hellwig, Sagi Grimberg, jgg, dledford, Roman Pen
In-Reply-To: <20190709111737.GB6719@kroah.com>
On 7/9/19 4:17 AM, Greg KH wrote:
> So if these developers are willing to do the work to get something out
> of staging, and into the "real" part of the kernel, I will gladly take
> it.
Linus once famously said "given enough eyeballs, all bugs are shallow".
There are already two block-over-RDMA driver pairs upstream (NVMeOF and
SRP). Accepting the IBTRS and IBNBD drivers upstream would reduce the
number of users of the upstream block-over-RDMA drivers and hence would
fragment the block-over-RDMA driver user base further. Additionally, I'm
not yet convinced that the interesting parts of IBNBD cannot be
integrated into the existing upstream drivers. So it's not clear to me
whether taking the IBTRS and IBNBD drivers upstream would help the Linux
user community.
Bart.
^ permalink raw reply
* Re: [PATCH v4 25/25] MAINTAINERS: Add maintainer for IBNBD/IBTRS modules
From: Leon Romanovsky @ 2019-07-09 15:51 UTC (permalink / raw)
To: Jinpu Wang
Cc: Jack Wang, linux-block, linux-rdma, Jens Axboe, Christoph Hellwig,
Sagi Grimberg, Bart Van Assche, Jason Gunthorpe, Doug Ledford,
Danil Kipnis, rpenyaev, Roman Pen
In-Reply-To: <CAMGffEmeH7-oEENYLQ3tEnkKbO4pcb7omPeavNscVJteEnupyw@mail.gmail.com>
On Tue, Jul 09, 2019 at 05:18:37PM +0200, Jinpu Wang wrote:
> On Tue, Jul 9, 2019 at 5:10 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Thu, Jun 20, 2019 at 05:03:37PM +0200, Jack Wang wrote:
> > > From: Roman Pen <roman.penyaev@profitbricks.com>
> > >
> > > Signed-off-by: Danil Kipnis <danil.kipnis@cloud.ionos.com>
> > > Signed-off-by: Jack Wang <jinpu.wang@cloud.ionos.com>
> > > ---
> > > MAINTAINERS | 14 ++++++++++++++
> > > 1 file changed, 14 insertions(+)
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS
> > > index a6954776a37e..0b7fd93f738d 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -7590,6 +7590,20 @@ IBM ServeRAID RAID DRIVER
> > > S: Orphan
> > > F: drivers/scsi/ips.*
> > >
> > > +IBNBD BLOCK DRIVERS
> > > +M: IBNBD/IBTRS Storage Team <ibnbd@cloud.ionos.com>
> > > +L: linux-block@vger.kernel.org
> > > +S: Maintained
> > > +T: git git://github.com/profitbricks/ibnbd.git
> > > +F: drivers/block/ibnbd/
> > > +
> > > +IBTRS TRANSPORT DRIVERS
> > > +M: IBNBD/IBTRS Storage Team <ibnbd@cloud.ionos.com>
> >
> > I don't know if it rule or not, but can you please add real
> > person/persons to Maintainers list? Many times, those global
> > support lists are simply ignored.
>
> Sure, we can use my and Danil 's name in next round.
>
> >
> > > +L: linux-rdma@vger.kernel.org
> > > +S: Maintained
> > > +T: git git://github.com/profitbricks/ibnbd.git
> >
> > How did you imagine patch flow for ULP, while your tree is
> > external to RDMA tree?
>
> Plan was we gather the patch in the git tree, and
> send patches to the list via git send email, do we accept pull request
> from github?
> What the preferred way?
The preferred way is to start with sending patches directly
to the mailing and allow RDMA maintainers to collect and
apply them by themselves. It gives an easy way to other people
to do cross-subsystem changes and we are doing a lot of them.
Till you will be asked to send PRs the "T:" link should point to RDMA subsystem.
Thanks
>
> Thanks Leon.
> Jack
> >
> > > +F: drivers/infiniband/ulp/ibtrs/
> > > +
> > > ICH LPC AND GPIO DRIVER
> > > M: Peter Tyser <ptyser@xes-inc.com>
> > > S: Maintained
> > > --
> > > 2.17.1
> > >
^ permalink raw reply
* [GIT PULL] Please pull NFSoRDMA updates for Linux 5.3 (v2)
From: Schumaker, Anna @ 2019-07-09 16:03 UTC (permalink / raw)
To: trondmy@hammerspace.com
Cc: linux-nfs@vger.kernel.org, linux-rdma@vger.kernel.org
In-Reply-To: <b2cabbe76eecc8db717cccd84067d78f8c3a7d0f.camel@netapp.com>
Hi Trond,
The following changes since commit
9e0babf2c06c73cda2c0cd37a1653d823adb40ec:
Linux 5.2-rc5 (2019-06-16 08:49:45 -1000)
are available in the Git repository at:
git://git.linux-nfs.org/projects/anna/linux-nfs.git tags/nfs-rdma-
for-5.3-1
for you to fetch changes up to
62a92ba97a31c544802bbf13d3a998e86796d548:
NFS: Record task, client ID, and XID in xdr_status trace points
(2019-07-09 10:30:25 -0400)
----------------------------------------------------------------
v2: Add missing signed-off-by lines
Thanks,
Anna
----------------------------------------------------------------
Chuck Lever (19):
xprtrdma: Fix a BUG when tracing is enabled with NFSv4.1 on RDMA
xprtrdma: Fix use-after-free in rpcrdma_post_recvs
xprtrdma: Replace use of xdr_stream_pos in rpcrdma_marshal_req
xprtrdma: Fix occasional transport deadlock
xprtrdma: Remove the RPCRDMA_REQ_F_PENDING flag
xprtrdma: Remove fr_state
xprtrdma: Add mechanism to place MRs back on the free list
xprtrdma: Reduce context switching due to Local Invalidation
xprtrdma: Wake RPCs directly in rpcrdma_wc_send path
xprtrdma: Simplify rpcrdma_rep_create
xprtrdma: Streamline rpcrdma_post_recvs
xprtrdma: Refactor chunk encoding
xprtrdma: Remove rpcrdma_req::rl_buffer
xprtrdma: Modernize ops->connect
NFS4: Add a trace event to record invalid CB sequence IDs
NFS: Fix show_nfs_errors macros again
NFS: Display symbolic status code names in trace log
NFS: Update symbolic flags displayed by trace events
NFS: Record task, client ID, and XID in xdr_status trace points
fs/nfs/callback_proc.c | 28 ++++++++---
fs/nfs/nfs2xdr.c | 2 +-
fs/nfs/nfs3xdr.c | 2 +-
fs/nfs/nfs4trace.h | 207
++++++++++++++++++++++++++++++++++++++++++++++-------------------------
----
fs/nfs/nfs4xdr.c | 2 +-
fs/nfs/nfstrace.h | 233
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------
-------------
include/linux/sunrpc/xprt.h | 3 ++
include/trace/events/rpcrdma.h | 90 +++++++++++++++++++++++++-------
-
net/sunrpc/sched.c | 1 +
net/sunrpc/xprt.c | 32 ++++++++++++
net/sunrpc/xprtrdma/frwr_ops.c | 327
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+++++++----------------------------------------
net/sunrpc/xprtrdma/rpc_rdma.c | 148 +++++++++++++++++++++++---------
---------------------
net/sunrpc/xprtrdma/transport.c | 83 +++++++++++++++++++++++-------
net/sunrpc/xprtrdma/verbs.c | 115 +++++++++++++++++++-------------
----------
net/sunrpc/xprtrdma/xprt_rdma.h | 44 +++++-----------
net/sunrpc/xprtsock.c | 23 +--------
16 files changed, 837 insertions(+), 503 deletions(-)
On Tue, 2019-07-02 at 16:35 -0400, Anna Schumaker wrote:
> Hi Trond,
>
> The following changes since commit
> 9e0babf2c06c73cda2c0cd37a1653d823adb40ec:
>
> Linux 5.2-rc5 (2019-06-16 08:49:45 -1000)
>
> are available in the Git repository at:
>
> git://git.linux-nfs.org/projects/anna/linux-nfs.git tags/nfs-rdma-
> for-5.3-1
>
> for you to fetch changes up to
> 1a8f1ed3eb1ac2fddc1d2c75294db08ace88c1cb:
>
> NFS: Record task, client ID, and XID in xdr_status trace points
> (2019-07-02 16:29:22 -0400)
>
> Thanks,
> Anna
>
> ----------------------------------------------------------------
> Chuck Lever (19):
> xprtrdma: Fix a BUG when tracing is enabled with NFSv4.1 on
> RDMA
> xprtrdma: Fix use-after-free in rpcrdma_post_recvs
> xprtrdma: Replace use of xdr_stream_pos in rpcrdma_marshal_req
> xprtrdma: Fix occasional transport deadlock
> xprtrdma: Remove the RPCRDMA_REQ_F_PENDING flag
> xprtrdma: Remove fr_state
> xprtrdma: Add mechanism to place MRs back on the free list
> xprtrdma: Reduce context switching due to Local Invalidation
> xprtrdma: Wake RPCs directly in rpcrdma_wc_send path
> xprtrdma: Simplify rpcrdma_rep_create
> xprtrdma: Streamline rpcrdma_post_recvs
> xprtrdma: Refactor chunk encoding
> xprtrdma: Remove rpcrdma_req::rl_buffer
> xprtrdma: Modernize ops->connect
> NFS4: Add a trace event to record invalid CB sequence IDs
> NFS: Fix show_nfs_errors macros again
> NFS: Display symbolic status code names in trace log
> NFS: Update symbolic flags displayed by trace events
> NFS: Record task, client ID, and XID in xdr_status trace points
>
> fs/nfs/callback_proc.c | 28 ++++++++---
> fs/nfs/nfs2xdr.c | 2 +-
> fs/nfs/nfs3xdr.c | 2 +-
> fs/nfs/nfs4trace.h | 207
> ++++++++++++++++++++++++++++++++++++++++++++++-----------------------
> --
> ----
> fs/nfs/nfs4xdr.c | 2 +-
> fs/nfs/nfstrace.h | 233
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
> --
> -------------
> include/linux/sunrpc/xprt.h | 3 ++
> include/trace/events/rpcrdma.h | 90 +++++++++++++++++++++++++-----
> --
> -
> net/sunrpc/sched.c | 1 +
> net/sunrpc/xprt.c | 32 ++++++++++++
> net/sunrpc/xprtrdma/frwr_ops.c | 327
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> ++
> +++++++----------------------------------------
> net/sunrpc/xprtrdma/rpc_rdma.c | 148 +++++++++++++++++++++++-------
> --
> ---------------------
> net/sunrpc/xprtrdma/transport.c | 83 +++++++++++++++++++++++-------
> net/sunrpc/xprtrdma/verbs.c | 115 +++++++++++++++++++-----------
> --
> ----------
> net/sunrpc/xprtrdma/xprt_rdma.h | 44 +++++-----------
> net/sunrpc/xprtsock.c | 23 +--------
> 16 files changed, 837 insertions(+), 503 deletions(-)
^ permalink raw reply
* [GIT PULL] Please pull hmm changes
From: Jason Gunthorpe @ 2019-07-09 19:24 UTC (permalink / raw)
To: Linus Torvalds, Andrew Morton
Cc: Dan Williams, Christoph Hellwig, dri-devel@lists.freedesktop.org,
linux-mm@kvack.org, David Airlie, Daniel Vetter,
amd-gfx@lists.freedesktop.org, Kuehling, Felix,
Deucher, Alexander, linux-rdma@vger.kernel.org,
linux-kernel@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 9266 bytes --]
Hi Linus,
As was discussed some time ago here are the mostly -mm patches related
to hmm functions. In agreement with Andrew we split this out from
quilt into a git topic branch so it can be shared between the DRM and
RDMA git trees. However, this cycle did not see dependencies with work
in DRM or RDMA that required a topic merge. I expect that work will
start to get ready next cycle and we will see a need for a cross-tree
topic merge then.
I'm sending it early as it is now a dependency for several patches in
mm's quilt.
This has been an exciting topic branch for conflicts, you'll need the
below simple resolution in the merge commit to make it compile
(lockdep_assert_held_exclusive() was renamed to
lockdep_assert_held_write())
Otherwise, for reference to all parties, here is how the conflicts were
handled:
- Several small patches from -mm quilt were moved to this tree to simplify
conflict management, only Ira's 'fix release_pages()' patch was not hmm
related.
- DRM introduced a new users of the hmm_range_register() API. We worked
with AMDGPU to ensure that their new user could use the revised API via
the below trivial merge fixup with DRM:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -783,7 +783,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
0 : range->flags[HMM_PFN_WRITE];
range->pfn_flags_mask = 0;
range->pfns = pfns;
- hmm_range_register(range, mm, start,
+ hmm_range_register(range, mirror, start,
start + ttm->num_pages * PAGE_SIZE, PAGE_SHIFT);
retry:
- ARM64 has a patch series going through -mm with a trivial
conflict ("Devmap cleanups + arm64 support"), Andrew has re-applied this
in quilt onto linux-next and will send it
- The memreap sub-section changes in -mm has 5 hunk conflict with the
memremap changes here. Andrew reapplied Dan's series ontop of
Christoph's series in linux-next and will send it.
The tag for-linus-hmm-merged with my merge resolution to your tree is
also available to pull.
Thanks,
Jason
diff --cc mm/hmm.c
index d48b9283725a90,f702a3895d05d8..e1eedef129cf5c
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@@ -42,16 -54,11 +42,16 @@@ static const struct mmu_notifier_ops hm
*/
static struct hmm *hmm_get_or_create(struct mm_struct *mm)
{
- struct hmm *hmm = mm_get_hmm(mm);
- bool cleanup = false;
+ struct hmm *hmm;
- lockdep_assert_held_exclusive(&mm->mmap_sem);
- if (hmm)
- return hmm;
++ lockdep_assert_held_write(&mm->mmap_sem);
+
+ /* Abuse the page_table_lock to also protect mm->hmm. */
+ spin_lock(&mm->page_table_lock);
+ hmm = mm->hmm;
+ if (mm->hmm && kref_get_unless_zero(&mm->hmm->kref))
+ goto out_unlock;
+ spin_unlock(&mm->page_table_lock);
hmm = kmalloc(sizeof(*hmm), GFP_KERNEL);
if (!hmm)
@@@ -245,8 -277,8 +245,8 @@@ static const struct mmu_notifier_ops hm
*/
int hmm_mirror_register(struct hmm_mirror *mirror, struct mm_struct *mm)
{
- lockdep_assert_held_exclusive(&mm->mmap_sem);
++ lockdep_assert_held_write(&mm->mmap_sem);
+
/* Sanity check */
if (!mm || !mirror || !mirror->ops)
return -EINVAL;
The following changes since commit 6fbc7275c7a9ba97877050335f290341a1fd8dbf:
Linux 5.2-rc7 (2019-06-30 11:25:36 +0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git tags/for-linus-hmm
for you to fetch changes up to cc5dfd59e375f4d0f2b64643723d16b38b2f2d78:
Merge branch 'hmm-devmem-cleanup.4' into rdma.git hmm (2019-07-02 15:10:45 -0300)
----------------------------------------------------------------
HMM patches for 5.3
Improvements and bug fixes for the hmm interface in the kernel:
- Improve clarity, locking and APIs related to the 'hmm mirror' feature
merged last cycle. In linux-next we now see AMDGPU and nouveau to be
using this API.
- Remove old or transitional hmm APIs. These are hold overs from the past
with no users, or APIs that existed only to manage cross tree conflicts.
There are still a few more of these cleanups that didn't make the merge
window cut off.
- Improve some core mm APIs:
* export alloc_pages_vma() for driver use
* refactor into devm_request_free_mem_region() to manage
DEVICE_PRIVATE resource reservations
* refactor duplicative driver code into the core dev_pagemap
struct
- Remove hmm wrappers of improved core mm APIs, instead have drivers use
the simplified API directly
- Remove DEVICE_PUBLIC
- Simplify the kconfig flow for the hmm users and core code
----------------------------------------------------------------
Christoph Hellwig (24):
mm: remove the unused ARCH_HAS_HMM_DEVICE Kconfig option
mm: remove the struct hmm_device infrastructure
mm: remove MEMORY_DEVICE_PUBLIC support
mm: don't clear ->mapping in hmm_devmem_free
mm: export alloc_pages_vma
mm: factor out a devm_request_free_mem_region helper
memremap: validate the pagemap type passed to devm_memremap_pages
memremap: move dev_pagemap callbacks into a separate structure
memremap: pass a struct dev_pagemap to ->kill and ->cleanup
memremap: lift the devmap_enable manipulation into devm_memremap_pages
memremap: add a migrate_to_ram method to struct dev_pagemap_ops
memremap: remove the data field in struct dev_pagemap
memremap: replace the altmap_valid field with a PGMAP_ALTMAP_VALID flag
memremap: provide an optional internal refcount in struct dev_pagemap
device-dax: use the dev_pagemap internal refcount
PCI/P2PDMA: use the dev_pagemap internal refcount
nouveau: use alloc_page_vma directly
nouveau: use devm_memremap_pages directly
mm: remove hmm_vma_alloc_locked_page
mm: remove hmm_devmem_add
mm: simplify ZONE_DEVICE page private data
mm: sort out the DEVICE_PRIVATE Kconfig mess
mm: remove the HMM config option
mm: don't select MIGRATE_VMA_HELPER from HMM_MIRROR
Ira Weiny (1):
mm/swap: fix release_pages() when releasing devmap pages
Jason Gunthorpe (15):
mm/hmm.c: suppress compilation warnings when CONFIG_HUGETLB_PAGE is not set
mm/hmm: fix use after free with struct hmm in the mmu notifiers
mm/hmm: Use hmm_mirror not mm as an argument for hmm_range_register
mm/hmm: Hold a mmgrab from hmm to mm
mm/hmm: Simplify hmm_get_or_create and make it reliable
mm/hmm: Remove duplicate condition test before wait_event_timeout
mm/hmm: Do not use list*_rcu() for hmm->ranges
mm/hmm: Hold on to the mmget for the lifetime of the range
mm/hmm: Use lockdep instead of comments
mm/hmm: Remove racy protection against double-unregistration
mm/hmm: Poison hmm_range during unregister
mm/hmm: Remove confusing comment and logic from hmm_release
mm/hmm: Fix error flows in hmm_invalidate_range_start
Merge tag 'v5.2-rc7' into rdma.git hmm
Merge branch 'hmm-devmem-cleanup.4' into rdma.git hmm
Kuehling, Felix (1):
mm/hmm: Only set FAULT_FLAG_ALLOW_RETRY for non-blocking
Philip Yang (1):
mm/hmm: support automatic NUMA balancing
Ralph Campbell (2):
mm/hmm: update HMM documentation
mm/hmm: clean up some coding style and comments
Documentation/vm/hmm.rst | 166 ++++------
arch/powerpc/mm/mem.c | 10 +-
arch/x86/mm/init_64.c | 8 +-
drivers/dax/dax-private.h | 4 -
drivers/dax/device.c | 41 +--
drivers/dax/pmem/core.c | 2 +-
drivers/gpu/drm/nouveau/Kconfig | 6 +-
drivers/gpu/drm/nouveau/nouveau_dmem.c | 103 +++---
drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
drivers/nvdimm/pfn_devs.c | 3 +-
drivers/nvdimm/pmem.c | 51 ++-
drivers/pci/p2pdma.c | 52 +--
fs/proc/task_mmu.c | 2 +-
include/linux/hmm.h | 302 ++---------------
include/linux/ioport.h | 3 +-
include/linux/memremap.h | 75 +++--
include/linux/mm.h | 28 +-
include/linux/mm_types.h | 4 +-
include/linux/swapops.h | 15 -
kernel/fork.c | 1 -
kernel/memremap.c | 194 ++++++-----
kernel/resource.c | 39 +++
mm/Kconfig | 50 +--
mm/Makefile | 2 +-
mm/gup.c | 7 -
mm/hmm.c | 587 ++++++++-------------------------
mm/madvise.c | 2 +-
mm/memcontrol.c | 13 +-
mm/memory-failure.c | 6 +-
mm/memory.c | 49 +--
mm/memory_hotplug.c | 6 +-
mm/mempolicy.c | 1 +
mm/migrate.c | 28 +-
mm/page_alloc.c | 13 +-
mm/swap.c | 13 +-
tools/testing/nvdimm/test/iomap.c | 57 +++-
36 files changed, 619 insertions(+), 1326 deletions(-)
(diffstat from tag for-linus-hmm-merged)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [PATCH v4] RDMA/core: Fix race when resolving IP address
From: Jason Gunthorpe @ 2019-07-09 19:28 UTC (permalink / raw)
To: Dag Moxnes; +Cc: dledford, leon, parav, linux-rdma, linux-kernel
In-Reply-To: <1562673026-31996-1-git-send-email-dag.moxnes@oracle.com>
On Tue, Jul 09, 2019 at 01:50:26PM +0200, Dag Moxnes wrote:
> Use the neighbour lock when copying the MAC address from the neighbour
> data struct in dst_fetch_ha.
>
> When not using the lock, it is possible for the function to race with
> neigh_update(), causing it to copy an torn MAC address:
>
> rdma_resolve_addr()
> rdma_resolve_ip()
> addr_resolve()
> addr_resolve_neigh()
> fetch_ha()
> dst_fetch_ha()
> memcpy(dev_addr->dst_dev_addr, n->ha, MAX_ADDR_LEN)
>
> and
>
> net_ioctl()
> arp_ioctl()
> arp_rec_delete()
> arp_invalidate()
> neigh_update()
> __neigh_update()
> memcpy(&neigh->ha, lladdr, dev->addr_len)
>
> It is possible to provoke this error by calling rdma_resolve_addr() in a
> tight loop, while deleting the corresponding ARP entry in another tight
> loop.
>
> Fixes: 51d45974515c ("infiniband: addr: Consolidate code to fetch neighbour hardware address from dst.")
> Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com>
> Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
> Reviewed-by: Parav Pandit <parav@mellanox.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
> drivers/infiniband/core/addr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Applied to for-next, thanks
Jason
^ permalink raw reply
* Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
From: Sagi Grimberg @ 2019-07-09 19:45 UTC (permalink / raw)
To: Danil Kipnis, Jack Wang
Cc: linux-block, linux-rdma, axboe, Christoph Hellwig, bvanassche,
jgg, dledford, Roman Pen, gregkh
In-Reply-To: <CAHg0HuzUaKs-ACHah-VdNHbot0_usx4ErMesVAw8+DFR63FFqw@mail.gmail.com>
Hi Danil and Jack,
> Hallo Doug, Hallo Jason, Hallo Jens, Hallo Greg,
>
> Could you please provide some feedback to the IBNBD driver and the
> IBTRS library?
> So far we addressed all the requests provided by the community
That is not exactly correct AFAIR,
My main issues which were raised before are:
- IMO there isn't any justification to this ibtrs layering separation
given that the only user of this is your ibnbd. Unless you are
trying to submit another consumer, you should avoid adding another
subsystem that is not really general purpose.
- ibtrs in general is using almost no infrastructure from the existing
kernel subsystems. Examples are:
- tag allocation mechanism (which I'm not clear why its needed)
- rdma rw abstraction similar to what we have in the core
- list_next_or_null_rr_rcu ??
- few other examples sprinkled around..
Another question, from what I understand from the code, the client
always rdma_writes data on writes (with imm) from a remote pool of
server buffers dedicated to it. Essentially all writes are immediate (no
rdma reads ever). How is that different than using send wrs to a set of
pre-posted recv buffers (like all others are doing)? Is it faster?
Also, given that the server pre-allocate a substantial amount of memory
for each connection, is it documented the requirements from the server
side? Usually kernel implementations (especially upstream ones) will
avoid imposing such large longstanding memory requirements on the system
by default. I don't have a firm stand on this, but wanted to highlight
this as you are sending this for upstream inclusion.
and
> continue to maintain our code up-to-date with the upstream kernel
> while having an extra compatibility layer for older kernels in our
> out-of-tree repository.
Overall, while I absolutely support your cause to lower your maintenance
overhead by having this sit upstream, I don't see why this can be
helpful to anyone else in the rdma community. If instead you can
crystallize why/how ibnbd is faster than anything else, and perhaps
contribute a common infrastructure piece (or enhance an existing one)
such that other existing ulps can leverage, it will be a lot more
compelling to include it upstream.
> I understand that SRP and NVMEoF which are in the kernel already do
> provide equivalent functionality for the majority of the use cases.
> IBNBD on the other hand is showing higher performance and more
> importantly includes the IBTRS - a general purpose library to
> establish connections and transport BIO-like read/write sg-lists over
> RDMA,
But who needs it? Can other ulps use it or pieces of it? I keep failing
to understand why is this a benefit if its specific to your ibnbd?
^ permalink raw reply
* Re: [PATCH mlx5-next 4/5] net/mlx5: Introduce TLS TX offload hardware bits and structures
From: Saeed Mahameed @ 2019-07-09 20:54 UTC (permalink / raw)
To: saeedm@dev.mellanox.co.il, leon@kernel.org
Cc: Eran Ben Elisha, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org, Tariq Toukan
In-Reply-To: <20190704182113.GG7212@mtr-leonro.mtl.com>
On Thu, 2019-07-04 at 21:21 +0300, Leon Romanovsky wrote:
> On Thu, Jul 04, 2019 at 01:21:04PM -0400, Saeed Mahameed wrote:
> > On Thu, Jul 4, 2019 at 1:15 PM Leon Romanovsky <leon@kernel.org>
> > wrote:
> > > On Thu, Jul 04, 2019 at 01:06:58PM -0400, Saeed Mahameed wrote:
> > > > On Wed, Jul 3, 2019 at 5:27 AM <leon@kernel.org> wrote:
> > > > > On Wed, Jul 03, 2019 at 07:39:32AM +0000, Saeed Mahameed
> > > > > wrote:
> > > > > > From: Eran Ben Elisha <eranbe@mellanox.com>
> > > > > >
> > > > > > Add TLS offload related IFC structs, layouts and
> > > > > > enumerations.
> > > > > >
> > > > > > Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
> > > > > > Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
> > > > > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > > > > ---
> > > > > > include/linux/mlx5/device.h | 14 +++++
> > > > > > include/linux/mlx5/mlx5_ifc.h | 104
> > > > > > ++++++++++++++++++++++++++++++++--
> > > > > > 2 files changed, 114 insertions(+), 4 deletions(-)
> > > > >
> > > > > <...>
> > > > >
> > > > > > @@ -2725,7 +2739,8 @@ struct mlx5_ifc_traffic_counter_bits
> > > > > > {
> > > > > >
> > > > > > struct mlx5_ifc_tisc_bits {
> > > > > > u8 strict_lag_tx_port_affinity[0x1];
> > > > > > - u8 reserved_at_1[0x3];
> > > > > > + u8 tls_en[0x1];
> > > > > > + u8 reserved_at_1[0x2];
> > > > >
> > > > > It should be reserved_at_2.
> > > > >
> > > >
> > > > it should be at_1.
> > >
> > > Why? See mlx5_ifc_flow_table_prop_layout_bits,
> > > mlx5_ifc_roce_cap_bits, e.t.c.
> > >
> >
> > they are all at_1 .. so i don't really understand what you want
> > from me,
> > Leon the code is good, please double check you comments..
>
> Saeed,
>
> reserved_at_1 should be renamed to be reserved_at_2.
>
> strict_lag_tx_port_affinity[0x1] + tls_en[0x1] = 0x2
>
Ok now it is clear, i trusted the developer on this one :)
anyway you have to admit that you mislead me with your examples:
mx5_ifc_flow_table_prop_layout_bits and mlx5_ifc_roce_cap_bits, they
both are fine so i though this was fine too.
I will fix it up.
Thanks,
Saeed.
> > > Thanks
> > >
> > > > > Thanks
^ permalink raw reply
* Re: [rdma 14/16] RDMA/irdma: Add ABI definitions
From: Henry Orosco @ 2019-07-09 20:56 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Saleem, Shiraz, Leon Romanovsky, Kirsher, Jeffrey T,
dledford@redhat.com, davem@davemloft.net, Ismail, Mustafa,
linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
nhorman@redhat.com, sassmann@redhat.com, poswald@suse.com,
Ertman, David M
In-Reply-To: <20190708141336.GF23966@mellanox.com>
On Mon, Jul 08, 2019 at 02:13:39PM +0000, Jason Gunthorpe wrote:
> On Sat, Jul 06, 2019 at 04:15:20PM +0000, Saleem, Shiraz wrote:
> > > Subject: Re: [rdma 14/16] RDMA/irdma: Add ABI definitions
> > >
> > > On Fri, Jul 05, 2019 at 04:42:19PM +0000, Saleem, Shiraz wrote:
> > > > > Subject: Re: [rdma 14/16] RDMA/irdma: Add ABI definitions
> > > > >
> > > > > On Thu, Jul 04, 2019 at 10:40:21AM +0300, Leon Romanovsky wrote:
> > > > > > On Wed, Jul 03, 2019 at 07:12:57PM -0700, Jeff Kirsher wrote:
> > > > > > > From: Mustafa Ismail <mustafa.ismail@intel.com>
> > > > > > >
> > > > > > > Add ABI definitions for irdma.
> > > > > > >
> > > > > > > Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
> > > > > > > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
> > > > > > > include/uapi/rdma/irdma-abi.h | 130
> > > > > > > ++++++++++++++++++++++++++++++++++
> > > > > > > 1 file changed, 130 insertions(+) create mode 100644
> > > > > > > include/uapi/rdma/irdma-abi.h
> > > > > > >
> > > > > > > diff --git a/include/uapi/rdma/irdma-abi.h
> > > > > > > b/include/uapi/rdma/irdma-abi.h new file mode 100644 index
> > > > > > > 000000000000..bdfbda4c829e
> > > > > > > +++ b/include/uapi/rdma/irdma-abi.h
> > > > > > > @@ -0,0 +1,130 @@
> > > > > > > +/* SPDX-License-Identifier: GPL-2.0 OR BSD-2-Clause */
> > > > > > > +/* Copyright (c) 2006 - 2019 Intel Corporation. All rights reserved.
> > > > > > > + * Copyright (c) 2005 Topspin Communications. All rights reserved.
> > > > > > > + * Copyright (c) 2005 Cisco Systems. All rights reserved.
> > > > > > > + * Copyright (c) 2005 Open Grid Computing, Inc. All rights reserved.
> > > > > > > + */
> > > > > > > +
> > > > > > > +#ifndef IRDMA_ABI_H
> > > > > > > +#define IRDMA_ABI_H
> > > > > > > +
> > > > > > > +#include <linux/types.h>
> > > > > > > +
> > > > > > > +/* irdma must support legacy GEN_1 i40iw kernel
> > > > > > > + * and user-space whose last ABI ver is 5 */ #define
> > > > > > > +IRDMA_ABI_VER
> > > > > > > +6
> > > > > >
> > > > > > Can you please elaborate about it more?
> > > > > > There is no irdma code in RDMA yet, so it makes me wonder why new
> > > > > > define shouldn't start from 1.
> > > > >
> > > > > It is because they are ABI compatible with the current user space,
> > > > > which raises the question why we even have this confusing header file..
> > > >
> > > > It is because we need to support current providers/i40iw user-space.
> > > > Our user-space patch series will introduce a new provider (irdma)
> > > > whose ABI ver. is also 6 (capable of supporting X722 and which will
> > > > work with i40iw driver on older kernels) and removes providers/i40iw from rdma-
> > > core.
> > >
> > > Why on earth would we do that?
> > >
> > A unified library providers/irdma to go in hand with the driver irdma and uses the ABI header.
> > It can support the new network device e810 and existing x722 iWARP device. It obsoletes
> > providers/i40iw and extends its ABI. So why keep providers/i40iw around in rdma-core?
>
> Why rewrite a perfectly good userspace that is compatible with the
> future and past kernels?
>
> Is there something so wrong with the userspace provider to need this?
>
Yes, the issue is that providers/i40iw was never designed to work with a unified driver
which supports multiple hardware generations.
Henry
^ permalink raw reply
* Re: [PATCH v4 00/25] InfiniBand Transport (IBTRS) and Network Block Device (IBNBD)
From: Sagi Grimberg @ 2019-07-09 21:27 UTC (permalink / raw)
To: Jason Gunthorpe, Jinpu Wang
Cc: Jinpu Wang, Leon Romanovsky, Danil Kipnis,
linux-block@vger.kernel.org, linux-rdma@vger.kernel.org,
Jens Axboe, Christoph Hellwig, bvanassche@acm.org,
dledford@redhat.com, Roman Pen, Greg Kroah-Hartman
In-Reply-To: <20190709131932.GI3436@mellanox.com>
>> Thanks Jason for feedback.
>> Can you be more specific about "the invalidation model for MR was wrong"
>
> MR's must be invalidated before data is handed over to the block
> layer. It can't leave MRs open for access and then touch the memory
> the MR covers.
Jason is referring to these fixes:
2f122e4f5107 ("nvme-rdma: wait for local invalidation before completing
a request")
4af7f7ff92a4 ("nvme-rdma: don't complete requests before a send work
request has completed")
b4b591c87f2b ("nvme-rdma: don't suppress send completions")
^ permalink raw reply
* [PATCH] IB/rdmavt: Remove err declaration in if statement in rvt_create_cq
From: Nathan Chancellor @ 2019-07-09 22:13 UTC (permalink / raw)
To: Dennis Dalessandro, Mike Marciniszyn, Doug Ledford,
Jason Gunthorpe
Cc: Kamenee Arumugam, linux-rdma, linux-kernel, clang-built-linux,
Nathan Chancellor
clang warns:
drivers/infiniband/sw/rdmavt/cq.c:260:7: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (err)
^~~
drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
here
return err;
^~~
drivers/infiniband/sw/rdmavt/cq.c:260:3: note: remove the 'if' if its
condition is always false
if (err)
^~~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:253:7: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (!cq->ip) {
^~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
here
return err;
^~~
drivers/infiniband/sw/rdmavt/cq.c:253:3: note: remove the 'if' if its
condition is always false
if (!cq->ip) {
^~~~~~~~~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:211:9: note: initialize the variable
'err' to silence this warning
int err;
^
= 0
2 warnings generated.
There are two err declarations in this function: at the top and within
an if statement; clang warns because the assignments to err in the if
statement are local to the if statement so err will be used
uninitialized if this error handling is used. Remove the if statement's
err declaration so that everything works properly.
Fixes: 239b0e52d8aa ("IB/hfi1: Move rvt_cq_wc struct into uapi directory")
Link: https://github.com/ClangBuiltLinux/linux/issues/594
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
drivers/infiniband/sw/rdmavt/cq.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index fac87b13329d..a85571a4cf57 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -247,8 +247,6 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
* See rvt_mmap() for details.
*/
if (udata && udata->outlen >= sizeof(__u64)) {
- int err;
-
cq->ip = rvt_create_mmap_info(rdi, sz, udata, u_wc);
if (!cq->ip) {
err = -ENOMEM;
--
2.22.0
^ permalink raw reply related
* Re: [PATCH] IB/rdmavt: Remove err declaration in if statement in rvt_create_cq
From: Nick Desaulniers @ 2019-07-09 22:38 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Dennis Dalessandro, Mike Marciniszyn, Doug Ledford,
Jason Gunthorpe, Kamenee Arumugam, linux-rdma, LKML,
clang-built-linux
In-Reply-To: <20190709221312.7089-1-natechancellor@gmail.com>
On Tue, Jul 9, 2019 at 3:13 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> clang warns:
>
> drivers/infiniband/sw/rdmavt/cq.c:260:7: warning: variable 'err' is used
Oh, !$*@, this is a tricky one. While the if scoped `err` declared on
L250 is initialized when used at L260, the function scoped `err`
declared on L211 is not initialized when it is used on L310 when
control flow enters the if on L249 then the goto on L255 or L261. So
this is a bug due to the if scoped `err` "shadowing" the function
scoped `err`.
Maybe not important enough to send a v2, but I feel like the commit
message should say something along the lines of `fix a potential use
of uninitialized memory due to shadowing`. Either way, this fixes a
real bug, so thanks for the patch.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> uninitialized whenever 'if' condition is true
> [-Wsometimes-uninitialized]
> if (err)
> ^~~
> drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
> here
> return err;
> ^~~
> drivers/infiniband/sw/rdmavt/cq.c:260:3: note: remove the 'if' if its
> condition is always false
> if (err)
> ^~~~~~~~
> drivers/infiniband/sw/rdmavt/cq.c:253:7: warning: variable 'err' is used
> uninitialized whenever 'if' condition is true
> [-Wsometimes-uninitialized]
> if (!cq->ip) {
> ^~~~~~~
> drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
> here
> return err;
> ^~~
> drivers/infiniband/sw/rdmavt/cq.c:253:3: note: remove the 'if' if its
> condition is always false
> if (!cq->ip) {
> ^~~~~~~~~~~~~~
> drivers/infiniband/sw/rdmavt/cq.c:211:9: note: initialize the variable
> 'err' to silence this warning
> int err;
> ^
> = 0
> 2 warnings generated.
>
> There are two err declarations in this function: at the top and within
> an if statement; clang warns because the assignments to err in the if
> statement are local to the if statement so err will be used
> uninitialized if this error handling is used. Remove the if statement's
> err declaration so that everything works properly.
>
> Fixes: 239b0e52d8aa ("IB/hfi1: Move rvt_cq_wc struct into uapi directory")
> Link: https://github.com/ClangBuiltLinux/linux/issues/594
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
> drivers/infiniband/sw/rdmavt/cq.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
> index fac87b13329d..a85571a4cf57 100644
> --- a/drivers/infiniband/sw/rdmavt/cq.c
> +++ b/drivers/infiniband/sw/rdmavt/cq.c
> @@ -247,8 +247,6 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
> * See rvt_mmap() for details.
> */
> if (udata && udata->outlen >= sizeof(__u64)) {
> - int err;
> -
--
Thanks,
~Nick Desaulniers
^ permalink raw reply
* Re: [PATCH] IB/rdmavt: Remove err declaration in if statement in rvt_create_cq
From: Nathan Chancellor @ 2019-07-09 22:43 UTC (permalink / raw)
To: Nick Desaulniers
Cc: Dennis Dalessandro, Mike Marciniszyn, Doug Ledford,
Jason Gunthorpe, Kamenee Arumugam, linux-rdma, LKML,
clang-built-linux
In-Reply-To: <CAKwvOdkXwD6Wvyt5tYWJP7f3YePqUe1_TvST2RMNb_tSEc3cEQ@mail.gmail.com>
On Tue, Jul 09, 2019 at 03:38:57PM -0700, Nick Desaulniers wrote:
> On Tue, Jul 9, 2019 at 3:13 PM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > clang warns:
> >
> > drivers/infiniband/sw/rdmavt/cq.c:260:7: warning: variable 'err' is used
>
> Oh, !$*@, this is a tricky one. While the if scoped `err` declared on
> L250 is initialized when used at L260, the function scoped `err`
> declared on L211 is not initialized when it is used on L310 when
> control flow enters the if on L249 then the goto on L255 or L261. So
> this is a bug due to the if scoped `err` "shadowing" the function
> scoped `err`.
>
> Maybe not important enough to send a v2, but I feel like the commit
> message should say something along the lines of `fix a potential use
> of uninitialized memory due to shadowing`. Either way, this fixes a
> real bug, so thanks for the patch.
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Gah, shadowing was the word I was looking for. Knew the behavior, didn't
remember what it was called. I like the clarity of your explanation
better so I will clean up the message and send a v2 shortly with your
reviewed by.
Thanks for the review!
Nathan
^ permalink raw reply
* Re: [PATCH] net/mlx5e: Return in default case statement in tx_post_resync_params
From: Nick Desaulniers @ 2019-07-09 22:44 UTC (permalink / raw)
To: Nathan Chancellor
Cc: Saeed Mahameed, Leon Romanovsky, David S. Miller, Boris Pismenny,
netdev, linux-rdma, LKML, clang-built-linux
In-Reply-To: <20190708231154.89969-1-natechancellor@gmail.com>
On Mon, Jul 8, 2019 at 4:13 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> clang warns:
>
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:251:2:
> warning: variable 'rec_seq_sz' is used uninitialized whenever switch
> default is taken [-Wsometimes-uninitialized]
> default:
> ^~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:255:46: note:
> uninitialized use occurs here
> skip_static_post = !memcmp(rec_seq, &rn_be, rec_seq_sz);
> ^~~~~~~~~~
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c:239:16: note:
> initialize the variable 'rec_seq_sz' to silence this warning
> u16 rec_seq_sz;
> ^
> = 0
> 1 warning generated.
>
> This case statement was clearly designed to be one that should not be
> hit during runtime because of the WARN_ON statement so just return early
> to prevent copying uninitialized memory up into rn_be.
>
> Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support")
> Link: https://github.com/ClangBuiltLinux/linux/issues/590
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
> ---
> drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> index 3f5f4317a22b..5c08891806f0 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ktls_tx.c
> @@ -250,6 +250,7 @@ tx_post_resync_params(struct mlx5e_txqsq *sq,
> }
> default:
> WARN_ON(1);
> + return;
> }
hmm...a switch statement with a single case is a code smell. How
about a single conditional with early return? Then the "meat" of the
happy path doesn't need an additional level of indentation.
--
Thanks,
~Nick Desaulniers
^ permalink raw reply
* [PATCH v2] IB/rdmavt: Fix variable shadowing issue in rvt_create_cq
From: Nathan Chancellor @ 2019-07-09 23:05 UTC (permalink / raw)
To: Dennis Dalessandro, Mike Marciniszyn, Doug Ledford,
Jason Gunthorpe
Cc: Kamenee Arumugam, linux-rdma, linux-kernel, clang-built-linux,
Nathan Chancellor, Nick Desaulniers
In-Reply-To: <20190709221312.7089-1-natechancellor@gmail.com>
clang warns:
drivers/infiniband/sw/rdmavt/cq.c:260:7: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (err)
^~~
drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
here
return err;
^~~
drivers/infiniband/sw/rdmavt/cq.c:260:3: note: remove the 'if' if its
condition is always false
if (err)
^~~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:253:7: warning: variable 'err' is used
uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if (!cq->ip) {
^~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:310:9: note: uninitialized use occurs
here
return err;
^~~
drivers/infiniband/sw/rdmavt/cq.c:253:3: note: remove the 'if' if its
condition is always false
if (!cq->ip) {
^~~~~~~~~~~~~~
drivers/infiniband/sw/rdmavt/cq.c:211:9: note: initialize the variable
'err' to silence this warning
int err;
^
= 0
2 warnings generated.
The function scoped err variable is uninitialized when the flow jumps
into the if statement. The if scoped err variable shadows the function
scoped err variable, preventing the err assignments within the if
statement to be reflected at the function level, which will cause
uninitialized use when the goto statements are taken.
Just remove the if scoped err declaration so that there is only one
copy of the err variable for this function.
Fixes: 239b0e52d8aa ("IB/hfi1: Move rvt_cq_wc struct into uapi directory")
Link: https://github.com/ClangBuiltLinux/linux/issues/594
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
v1 -> v2:
* Updated the wording of the commit message to use proper terms like
scoping and shadowing, thanks to review from Nick (let me know if the
wording isn't up to snuff).
drivers/infiniband/sw/rdmavt/cq.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/sw/rdmavt/cq.c b/drivers/infiniband/sw/rdmavt/cq.c
index fac87b13329d..a85571a4cf57 100644
--- a/drivers/infiniband/sw/rdmavt/cq.c
+++ b/drivers/infiniband/sw/rdmavt/cq.c
@@ -247,8 +247,6 @@ int rvt_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr,
* See rvt_mmap() for details.
*/
if (udata && udata->outlen >= sizeof(__u64)) {
- int err;
-
cq->ip = rvt_create_mmap_info(rdi, sz, udata, u_wc);
if (!cq->ip) {
err = -ENOMEM;
--
2.22.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox