Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
* Re:  Re: Re: Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Bernard Metzler @ 2019-07-12 17:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <20190712153243.GI27512@ziepe.ca>

-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Jason Gunthorpe" <jgg@ziepe.ca>
>Date: 07/12/2019 05:33PM
>Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
>linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>Subject: [EXTERNAL] Re: Re: Re: Re: [PATCH] rdma/siw: avoid
>smp_store_mb() on a u64
>
>On Fri, Jul 12, 2019 at 03:24:09PM +0000, Bernard Metzler wrote:
>> 
>> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
>> >From: "Jason Gunthorpe" <jgg@ziepe.ca>
>> >Date: 07/12/2019 04:43PM
>> >Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
>> ><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
>> >linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>> >Subject: [EXTERNAL] Re: Re: Re: [PATCH] rdma/siw: avoid
>> >smp_store_mb() on a u64
>> >
>> >On Fri, Jul 12, 2019 at 02:35:50PM +0000, Bernard Metzler wrote:
>> >
>> >> >This looks wrong to me.. a userspace notification re-arm cannot
>be
>> >> >lost, so have a split READ/TEST/WRITE sequence can't possibly
>> >work?
>> >> >
>> >> >I'd expect an atomic test and clear here?
>> >> 
>> >> We cannot avoid the case that the application re-arms the
>> >> CQ only after a CQE got placed. That is why folks are polling
>the
>> >> CQ once after re-arming it - to make sure they do not miss the
>> >> very last and single CQE which would have produced a CQ event.
>> >
>> >That is different, that is re-arm happing after a CQE placement
>and
>> >this can't be fixed.
>> >
>> >What I said is that a re-arm from userspace cannot be lost. So you
>> >can't blindly clear the arm flag with the WRITE_ONCE. It might be
>OK
>> >beacuse of the if, but...
>> >
>> >It is just goofy to write it without a 'test and clear' atomic. If
>> >the
>> >writer side consumes the notify it should always be done
>atomically.
>> >
>> Hmmm, I don't yet get why we should test and clear atomically, if
>we
>> clear anyway - is it because we want to avoid clearing a re-arm
>which
>> happens just after testing and before clearing?
>
>It is just clearer as to the intent.. 
>
>Are you trying to optimize away an atomic or something? That might
>work better as a dual counter scheme.
>
>> Another complication -- test_and_set_bit() operates on a single
>> bit, but we have to test two bits, and reset both, if one is
>> set.
>
>Why are two bits needed to represent armed and !armed?
>
>Jason

It is because there are two levels a CQ can be armed:

       #include <infiniband/verbs.h>

       int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only);

If we kick the CQ handler, we have to clear the whole
thing. The user later again decides how he wants to get it
re-armed...SOLICITED completions only, or ALL signaled.

Best,
Bernard.

>
>


^ permalink raw reply

* Re: regression: nvme rdma with bnxt_re0 broken
From: Selvin Xavier @ 2019-07-12 16:29 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Parav Pandit, Yi Zhang, Daniel Jurgens,
	linux-rdma@vger.kernel.org, Devesh Sharma,
	linux-nvme@lists.infradead.org
In-Reply-To: <20190712154044.GJ27512@ziepe.ca>

On Fri, Jul 12, 2019 at 9:10 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Jul 12, 2019 at 12:52:24PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Yi Zhang <yi.zhang@redhat.com>
> > > Sent: Friday, July 12, 2019 5:11 PM
> > > To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> > > <selvin.xavier@broadcom.com>
> > > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > > Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> > > nvme@lists.infradead.org
> > > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > >
> > > Hi Parav
> > > The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> > >
> > >
> > > [1]
> > > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> > > testnqn
> > > Failed to write to /dev/nvme-fabrics: Connection reset by peer
> > > [2]
> > > https://pastebin.com/ipvak0Sp
> > >
> >
> > I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> > This is probably solves it. Not sure. I am not much familiar with it.
> >
> > Selvin,
> > Can you please take a look?
> >
> > From 7b55e1d4500259cf7c02fb4d9fbbeb5ad1cfc623 Mon Sep 17 00:00:00 2001
> > From: Parav Pandit <parav@mellanox.com>
> > Date: Fri, 12 Jul 2019 04:34:52 -0500
> > Subject: [PATCH v1] IB/bnxt_re: Honor vlan_id in GID entry comparison
> >
> > GID entry consist of GID, vlan, netdev and smac.
> > Extend GID duplicate check companions to consider vlan_id as well
> > to support IPv6 VLAN based link local addresses.
> >
> > GID deletion based on only GID comparision is not correct.
> > It needs further fixes.
> >
> > Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> > Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
> > Signed-off-by: Parav Pandit <parav@mellanox.com>
> >  drivers/infiniband/hw/bnxt_re/qplib_res.c | 5 +++++
> >  drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 7 ++++++-
> >  drivers/infiniband/hw/bnxt_re/qplib_sp.h  | 1 +
> >  3 files changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > index 37928b1111df..216648b640ce 100644
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> > @@ -488,6 +488,8 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
> >                                    struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >                                    u16 max)
> >  {
> > +     u16 i;
> > +
> >       sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
> >       if (!sgid_tbl->tbl)
> >               return -ENOMEM;
> > @@ -500,6 +502,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
> >       if (!sgid_tbl->ctx)
> >               goto out_free2;
> >
> > +     for (i = 0; i < max; i++)
> > +             sgid_tbl->tbl[i].vlan_id = 0xffff;
> > +
> >       sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
> >       if (!sgid_tbl->vlan)
> >               goto out_free3;
> > diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > index 48793d3512ac..0d90be88685f 100644
> > +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> > @@ -236,6 +236,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >               return -ENOMEM;
> >       }
> >       for (index = 0; index < sgid_tbl->max; index++) {
> > +             /* FIXME: GID delete should happen based on index
> > +              * and refcount
> > +              */
> >               if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
> >                       break;
> >       }
> > @@ -296,7 +299,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
> >       }
> >       free_idx = sgid_tbl->max;
> >       for (i = 0; i < sgid_tbl->max; i++) {
> > -             if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> > +             if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> > +                 sgid_tbl->tbl[i].vlan_id == vlan_id) {
> >                       dev_dbg(&res->pdev->dev,
> >                               "SGID entry already exist in entry %d!\n", i);
>
> bnxt guys: please just delete this duplicate detection code from the
> driver. Every GID provided from the core must be programmed into the
> given gid table index.

Jason,
 This check is required in bnxt_re because the HW need only one entry
in its table for RoCE V1 and RoCE v2 Gids.
Sending the second add_gid for RoCE V2 (with the same gid as RoCE v1)
to the HW returns failure. So
driver handles this using a ref count. During delete_gid, the entry in
the HW is deleted only if the refcount is zero.

Thanks,
Selvin

>
> Jason

^ permalink raw reply

* Re: regression: nvme rdma with bnxt_re0 broken
From: Selvin Xavier @ 2019-07-12 16:18 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yi Zhang, Daniel Jurgens, linux-rdma@vger.kernel.org,
	Devesh Sharma, linux-nvme@lists.infradead.org
In-Reply-To: <AM0PR05MB48666463325E1D0E25D63F57D1F20@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Fri, Jul 12, 2019 at 6:22 PM Parav Pandit <parav@mellanox.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Yi Zhang <yi.zhang@redhat.com>
> > Sent: Friday, July 12, 2019 5:11 PM
> > To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> > <selvin.xavier@broadcom.com>
> > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> > nvme@lists.infradead.org
> > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> >
> > Hi Parav
> > The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> >
> >
> > [1]
> > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> > testnqn
> > Failed to write to /dev/nvme-fabrics: Connection reset by peer
> > [2]
> > https://pastebin.com/ipvak0Sp
> >
>
> I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> This is probably solves it. Not sure. I am not much familiar with it.
>
> Selvin,
> Can you please take a look?
>
Parav,
 Thanks for the patch. I removed the change you made in struct bnxt_qplib_gid
and added a new structure struct bnxt_qplib_gid_info to include both
gid and vlan_id.
struct bnxt_qplib_gid is used in multiple places to memcpy or compare
the 16 bytes.
Also, added vlan checking in the destroy path also. Some more cleanup
possible in
delete_gid path. That can be done once the basic issue is fixed.

Yi, Can you please test this patch and see if it is solving the issue?

Thanks,
Selvin

From 3d83613cfc5993bf9dae529ab0d4d25ddefff29b Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav@mellanox.com>
Date: Fri, 12 Jul 2019 10:32:51 -0400
Subject: [PATCH] IB/bnxt_re: Honor vlan_id in GID entry comparison

GID entry consist of GID, vlan, netdev and smac.
Extend GID duplicate check companions to consider vlan_id as well
to support IPv6 VLAN based link local addresses.

Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  7 +++++--
 drivers/infiniband/hw/bnxt_re/qplib_res.c | 17 +++++++++++++----
 drivers/infiniband/hw/bnxt_re/qplib_res.h |  2 +-
 drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 14 +++++++++-----
 drivers/infiniband/hw/bnxt_re/qplib_sp.h  |  7 ++++++-
 5 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
index a91653aabf38..098ab883733e 100644
--- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c
+++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c
@@ -308,6 +308,7 @@ int bnxt_re_del_gid(const struct ib_gid_attr
*attr, void **context)
  struct bnxt_re_dev *rdev = to_bnxt_re_dev(attr->device, ibdev);
  struct bnxt_qplib_sgid_tbl *sgid_tbl = &rdev->qplib_res.sgid_tbl;
  struct bnxt_qplib_gid *gid_to_del;
+ u16 vlan_id = 0xFFFF;

  /* Delete the entry from the hardware */
  ctx = *context;
@@ -317,7 +318,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
*attr, void **context)
  if (sgid_tbl && sgid_tbl->active) {
  if (ctx->idx >= sgid_tbl->max)
  return -EINVAL;
- gid_to_del = &sgid_tbl->tbl[ctx->idx];
+ gid_to_del = &sgid_tbl->tbl[ctx->idx].gid;
+ vlan_id = sgid_tbl->tbl[ctx->idx].vlan_id;
  /* DEL_GID is called in WQ context(netdevice_event_work_handler)
  * or via the ib_unregister_device path. In the former case QP1
  * may not be destroyed yet, in which case just return as FW
@@ -335,7 +337,8 @@ int bnxt_re_del_gid(const struct ib_gid_attr
*attr, void **context)
  }
  ctx->refcnt--;
  if (!ctx->refcnt) {
- rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del, true);
+ rc = bnxt_qplib_del_sgid(sgid_tbl, gid_to_del,
+ vlan_id,  true);
  if (rc) {
  dev_err(rdev_to_dev(rdev),
  "Failed to remove GID: %#x", rc);
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c
b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 37928b1111df..7f2571f7a13f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -488,7 +488,10 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
bnxt_qplib_res *res,
       struct bnxt_qplib_sgid_tbl *sgid_tbl,
       u16 max)
 {
- sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
+ u16 i;
+
+ sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid_info),
+ GFP_KERNEL);
  if (!sgid_tbl->tbl)
  return -ENOMEM;

@@ -500,6 +503,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct
bnxt_qplib_res *res,
  if (!sgid_tbl->ctx)
  goto out_free2;

+ for (i = 0; i < max; i++)
+ sgid_tbl->tbl[i].vlan_id = 0xffff;
+
  sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
  if (!sgid_tbl->vlan)
  goto out_free3;
@@ -526,9 +532,11 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
bnxt_qplib_res *res,
  for (i = 0; i < sgid_tbl->max; i++) {
  if (memcmp(&sgid_tbl->tbl[i], &bnxt_qplib_gid_zero,
     sizeof(bnxt_qplib_gid_zero)))
- bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i], true);
+ bnxt_qplib_del_sgid(sgid_tbl, &sgid_tbl->tbl[i].gid,
+     sgid_tbl->tbl[i].vlan_id, true);
  }
- memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
+ memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
+ sgid_tbl->max);
  memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
  memset(sgid_tbl->vlan, 0, sizeof(u8) * sgid_tbl->max);
  sgid_tbl->active = 0;
@@ -537,7 +545,8 @@ static void bnxt_qplib_cleanup_sgid_tbl(struct
bnxt_qplib_res *res,
 static void bnxt_qplib_init_sgid_tbl(struct bnxt_qplib_sgid_tbl *sgid_tbl,
       struct net_device *netdev)
 {
- memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid) * sgid_tbl->max);
+ memset(sgid_tbl->tbl, 0, sizeof(struct bnxt_qplib_gid_info) *
+ sgid_tbl->max);
  memset(sgid_tbl->hw_id, -1, sizeof(u16) * sgid_tbl->max);
 }

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.h
b/drivers/infiniband/hw/bnxt_re/qplib_res.h
index 30c42c92fac7..fbda11a7ab1a 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.h
@@ -111,7 +111,7 @@ struct bnxt_qplib_pd_tbl {
 };

 struct bnxt_qplib_sgid_tbl {
- struct bnxt_qplib_gid *tbl;
+ struct bnxt_qplib_gid_info *tbl;
  u16 *hw_id;
  u16 max;
  u16 active;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 48793d3512ac..40296b97d21e 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -213,12 +213,12 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
  index, sgid_tbl->max);
  return -EINVAL;
  }
- memcpy(gid, &sgid_tbl->tbl[index], sizeof(*gid));
+ memcpy(gid, &sgid_tbl->tbl[index].gid, sizeof(*gid));
  return 0;
 }

 int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
- struct bnxt_qplib_gid *gid, bool update)
+ struct bnxt_qplib_gid *gid, u16 vlan_id, bool update)
 {
  struct bnxt_qplib_res *res = to_bnxt_qplib(sgid_tbl,
     struct bnxt_qplib_res,
@@ -236,7 +236,8 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  return -ENOMEM;
  }
  for (index = 0; index < sgid_tbl->max; index++) {
- if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
+ if (!memcmp(&sgid_tbl->tbl[index].gid, gid, sizeof(*gid)) &&
+     vlan_id == sgid_tbl->tbl[index].vlan_id)
  break;
  }
  if (index == sgid_tbl->max) {
@@ -262,8 +263,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  if (rc)
  return rc;
  }
- memcpy(&sgid_tbl->tbl[index], &bnxt_qplib_gid_zero,
+ memcpy(&sgid_tbl->tbl[index].gid, &bnxt_qplib_gid_zero,
         sizeof(bnxt_qplib_gid_zero));
+ sgid_tbl->tbl[index].vlan_id = 0xFFFF;
  sgid_tbl->vlan[index] = 0;
  sgid_tbl->active--;
  dev_dbg(&res->pdev->dev,
@@ -296,7 +298,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  }
  free_idx = sgid_tbl->max;
  for (i = 0; i < sgid_tbl->max; i++) {
- if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
+ if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
+     sgid_tbl->tbl[i].vlan_id == vlan_id) {
  dev_dbg(&res->pdev->dev,
  "SGID entry already exist in entry %d!\n", i);
  *index = i;
@@ -351,6 +354,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl
*sgid_tbl,
  }
  /* Add GID to the sgid_tbl */
  memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
+ sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
  sgid_tbl->active++;
  if (vlan_id != 0xFFFF)
  sgid_tbl->vlan[free_idx] = 1;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
index 0ec3b12b0bcd..b5c4ce302c61 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
@@ -84,6 +84,11 @@ struct bnxt_qplib_gid {
  u8 data[16];
 };

+struct bnxt_qplib_gid_info {
+ struct bnxt_qplib_gid gid;
+ u16 vlan_id;
+};
+
 struct bnxt_qplib_ah {
  struct bnxt_qplib_gid dgid;
  struct bnxt_qplib_pd *pd;
@@ -221,7 +226,7 @@ int bnxt_qplib_get_sgid(struct bnxt_qplib_res *res,
  struct bnxt_qplib_sgid_tbl *sgid_tbl, int index,
  struct bnxt_qplib_gid *gid);
 int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
- struct bnxt_qplib_gid *gid, bool update);
+ struct bnxt_qplib_gid *gid, u16 vlan_id, bool update);
 int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
  struct bnxt_qplib_gid *gid, u8 *mac, u16 vlan_id,
  bool update, u32 *index);
-- 
2.18.1

^ permalink raw reply related

* Re: Re: Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Peter Zijlstra @ 2019-07-12 16:12 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Jason Gunthorpe, Arnd Bergmann, Doug Ledford, linux-rdma,
	linux-kernel, Paul McKenney, Will Deacon
In-Reply-To: <OF9F46C3F6.DC3E03FF-ON00258435.00521546-00258435.00549C01@notes.na.collabserv.com>

On Fri, Jul 12, 2019 at 03:24:09PM +0000, Bernard Metzler wrote:
> -----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----

> Hmmm, I don't yet get why we should test and clear atomically, if we
> clear anyway - is it because we want to avoid clearing a re-arm which
> happens just after testing and before clearing?
> (1) If the test was positive, we will call the CQ event handler,
> and per RDMA verbs spec, the application MUST re-arm the CQ after it
> got a CQ event, to get another one. So clearing it sometimes before
> calling the handler is right.
> (2) If the test was negative, a test and reset would not change
> anything.
> 
> Another complication -- test_and_set_bit() operates on a single
> bit, but we have to test two bits, and reset both, if one is
> set. Can we do that atomically, if we test the bits conditionally?
> I didn't find anything appropriate.

There's cmpxchg() loops that can do that.

	unsigned int new, val = atomic_read(&var);
	do {
		if (!(val & MY_TWO_BITS))
			break;

		new = val | MY_TWO_BITS;
	} while (!atomic_try_cmpxchg(&var, &val, new));

only problem is you probably shouldn't share atomic_t with userspace,
unless you put conditions on what archs you support.

> >And then I think all the weird barriers go away
> >
> >> >> @@ -1141,11 +1145,17 @@ int siw_req_notify_cq(struct ib_cq
> >> >*base_cq, enum ib_cq_notify_flags flags)
> >> >>  	siw_dbg_cq(cq, "flags: 0x%02x\n", flags);
> >> >>  
> >> >>  	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
> >> >> -		/* CQ event for next solicited completion */
> >> >> -		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
> >> >> +		/*
> >> >> +		 * Enable CQ event for next solicited completion.
> >> >> +		 * and make it visible to all associated producers.
> >> >> +		 */
> >> >> +		smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);
> >> >
> >> >But what is the 2nd piece of data to motivate the smp_store_mb?
> >> 
> >> Another core (such as a concurrent RX operation) shall see this
> >> CQ being re-armed asap.
> >
> >'ASAP' is not a '2nd piece of data'. 
> >
> >AFAICT this requirement is just a normal atomic set_bit which does
> >also expedite making the change visible?
> 
> Absolutely!!
> good point....this is just a single flag we are operating on.
> And the weird barrier goes away ;)

atomic ops don't expedite anything, and memory barriers don't make
things happen asap.

That is; the stores from atomic ops can stay in store buffers just like
any other store, and memory barriers don't flush store buffers, they
only impose order between memops.

^ permalink raw reply

* Re: User SIW fails matching device
From: Potnuri Bharat Teja @ 2019-07-12 15:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com,
	monis@mellanox.com, Nirranjan Kirubaharan
In-Reply-To: <20190712153032.GH27512@ziepe.ca>

On Friday, July 07/12/19, 2019 at 21:00:32 +0530, Jason Gunthorpe wrote:
> On Fri, Jul 12, 2019 at 08:54:20PM +0530, Potnuri Bharat Teja wrote:
> > On Friday, July 07/12/19, 2019 at 20:05:46 +0530, Jason Gunthorpe wrote:
> > > On Fri, Jul 12, 2019 at 07:57:19PM +0530, Potnuri Bharat Teja wrote:
> > > > Hi all,
> > > > I observe the following behavior on one of my machines configured for siw.
> > > > 
> > > > Issue:
> > > > SIW device gets wrong device ops (HW/real rdma driver device ops) instead of
> > > > siw device ops due to improper device matching.
> > > > 
> > > > Root-cause:
> > > > In libibverbs, during user cma initialisation, for each entry from the driver 
> > > > list, sysfs device is checked for matching name or device.
> > > > If the siw/rxe driver is at the head of the list, then sysfs device matches 
> > > > properly with the corresponding siw driver and gets the corresponding siw/rxe 
> > > > device ops. Now, If the siw/rxe driver is after the real HW driver cxgb4/mlx5 
> > > > respectively in the driver list, then siw sysfs device matches pci device and 
> > > > wrongly gets the device ops of HW driver (cxgb4/mlx5).
> > > > 
> > > > Below debug prints from verbs_register_driver() and driver_list entries, where 
> > > > siw is after cxgb4. I see verbs alloc context landing in cxgb4_alloc_context 
> > > > instead of siw_alloc_context, thus breaking user siw.
> > > > 
> > > > <debug> verbs_register_driver_22: 184: driver 0x176e370
> > > > <debug> verbs_register_driver_22: 185: name ipathverbs
> > > > <debug> verbs_register_driver_22: 184: driver 0x176f6a0
> > > > <debug> verbs_register_driver_22: 185: name cxgb4
> > > > <debug> verbs_register_driver_22: 184: driver 0x176fd50
> > > > <debug> verbs_register_driver_22: 185: name cxgb3
> > > > <debug> verbs_register_driver_22: 184: driver 0x1777020
> > > > <debug> verbs_register_driver_22: 185: name rxe
> > > > <debug> verbs_register_driver_22: 184: driver 0x1770a30
> > > > <debug> verbs_register_driver_22: 185: name siw
> > > > <debug> verbs_register_driver_22: 184: driver 0x1771120
> > > > <debug> verbs_register_driver_22: 185: name mlx4
> > > > <debug> verbs_register_driver_22: 184: driver 0x1771990
> > > > <debug> verbs_register_driver_22: 185: name mlx5
> > > > <debug> verbs_register_driver_22: 184: driver 0x1771ff0
> > > > <debug> verbs_register_driver_22: 185: name efa
> > > > 
> > > > <debug> try_drivers: 372: driver 0x176e370, sysfs_dev 0x1776b20, name: ipathverbs
> > > > <debug> try_drivers: 372: driver 0x176f6a0, sysfs_dev 0x1776b20, name: cxgb4
> > > > <debug> try_drivers: 372: driver 0x176fd50, sysfs_dev 0x1776b20, name: cxgb3
> > > > <debug> try_drivers: 372: driver 0x1777020, sysfs_dev 0x1776b20, name: rxe
> > > > <debug> try_drivers: 372: driver 0x1770a30, sysfs_dev 0x1776b20, name: siw
> > > > <debug> try_drivers: 372: driver 0x1771120, sysfs_dev 0x1776b20, name: mlx4
> > > > <debug> try_drivers: 372: driver 0x1771990, sysfs_dev 0x1776b20, name: mlx5
> > > > <debug> try_drivers: 372: driver 0x1771ff0, sysfs_dev 0x1776b20, name: efa
> > > > 
> > > > Proposed fix:
> > > > I have the below fix that works. It adds siw/rxe driver to the HEAD of the 
> > > > driver list and the rest to the tail. I am not sure if this fix is the ideal 
> > > > one, so I am attaching it to this mail.
> > > 
> > > Update your rdma-core to latest and this will be fixed fully by using
> > > netlink to match the siw device..
> > > 
> > I pulled the latest rdma-core, still see the issue.
> > 
> > commit 7ef6077ec3201f661458297fea776746ba752843 (HEAD, upstream/master)
> > Merge: 837954ff677c 95934b61a74e
> > Author: Jason Gunthorpe <jgg@mellanox.com>
> > Date:   Thu Jul 11 16:18:06 2019 -0300
> > 
> >     Merge pull request #539 from jgunthorpe/netlink
> > 
> >         Use netlink to learn about ibdevs and their related chardevs
> > 
> > 
> > Is there any corresponding kernel change or package dependency? I am currently 
> > on Doug's wip/dl-for-next branch.
> 
> That should be good enough for the kernel.. Hmm.. The siw stuff didn't
> get updated, you need this rdma-core patch too. Please confirm
> 
> diff --git a/providers/siw/siw.c b/providers/siw/siw.c
> index 23e4dd976caf84..41f33fa16123e9 100644
> --- a/providers/siw/siw.c
> +++ b/providers/siw/siw.c
> @@ -907,7 +907,7 @@ static void siw_device_free(struct verbs_device *vdev)
>  }
>  
>  static const struct verbs_match_ent rnic_table[] = {
> -	VERBS_NAME_MATCH("siw", NULL),
> +	VERBS_DRIVER_ID(RDMA_DRIVER_SIW),
>  	{},
>  };
>
It works with above change. Thanks!

^ permalink raw reply

* Re: [rdma-core patch v2] srp_daemon: improve the debug message for is_enabled_by_rules_file
From: Jason Gunthorpe @ 2019-07-12 15:41 UTC (permalink / raw)
  To: Honggang Li; +Cc: bvanassche, linux-rdma
In-Reply-To: <20190712143147.10414-1-honli@redhat.com>

On Fri, Jul 12, 2019 at 10:31:47AM -0400, Honggang Li wrote:
> Signed-off-by: Honggang Li <honli@redhat.com>

This really needs some commit message explaining why this should be
done

Jason

^ permalink raw reply

* Re: regression: nvme rdma with bnxt_re0 broken
From: Jason Gunthorpe @ 2019-07-12 15:40 UTC (permalink / raw)
  To: Parav Pandit
  Cc: Yi Zhang, Selvin Xavier, Daniel Jurgens,
	linux-rdma@vger.kernel.org, Devesh Sharma,
	linux-nvme@lists.infradead.org
In-Reply-To: <AM0PR05MB48666463325E1D0E25D63F57D1F20@AM0PR05MB4866.eurprd05.prod.outlook.com>

On Fri, Jul 12, 2019 at 12:52:24PM +0000, Parav Pandit wrote:
> 
> 
> > From: Yi Zhang <yi.zhang@redhat.com>
> > Sent: Friday, July 12, 2019 5:11 PM
> > To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> > <selvin.xavier@broadcom.com>
> > Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> > Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> > nvme@lists.infradead.org
> > Subject: Re: regression: nvme rdma with bnxt_re0 broken
> > 
> > Hi Parav
> > The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> > 
> > 
> > [1]
> > [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> > testnqn
> > Failed to write to /dev/nvme-fabrics: Connection reset by peer
> > [2]
> > https://pastebin.com/ipvak0Sp
> > 
> 
> I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
> This is probably solves it. Not sure. I am not much familiar with it.
> 
> Selvin,
> Can you please take a look?
> 
> From 7b55e1d4500259cf7c02fb4d9fbbeb5ad1cfc623 Mon Sep 17 00:00:00 2001
> From: Parav Pandit <parav@mellanox.com>
> Date: Fri, 12 Jul 2019 04:34:52 -0500
> Subject: [PATCH v1] IB/bnxt_re: Honor vlan_id in GID entry comparison
> 
> GID entry consist of GID, vlan, netdev and smac.
> Extend GID duplicate check companions to consider vlan_id as well
> to support IPv6 VLAN based link local addresses.
> 
> GID deletion based on only GID comparision is not correct.
> It needs further fixes.
> 
> Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
> Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
> Signed-off-by: Parav Pandit <parav@mellanox.com>
>  drivers/infiniband/hw/bnxt_re/qplib_res.c | 5 +++++
>  drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 7 ++++++-
>  drivers/infiniband/hw/bnxt_re/qplib_sp.h  | 1 +
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> index 37928b1111df..216648b640ce 100644
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
> @@ -488,6 +488,8 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
>  				     struct bnxt_qplib_sgid_tbl *sgid_tbl,
>  				     u16 max)
>  {
> +	u16 i;
> +
>  	sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
>  	if (!sgid_tbl->tbl)
>  		return -ENOMEM;
> @@ -500,6 +502,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
>  	if (!sgid_tbl->ctx)
>  		goto out_free2;
>  
> +	for (i = 0; i < max; i++)
> +		sgid_tbl->tbl[i].vlan_id = 0xffff;
> +
>  	sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
>  	if (!sgid_tbl->vlan)
>  		goto out_free3;
> diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> index 48793d3512ac..0d90be88685f 100644
> +++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
> @@ -236,6 +236,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>  		return -ENOMEM;
>  	}
>  	for (index = 0; index < sgid_tbl->max; index++) {
> +		/* FIXME: GID delete should happen based on index
> +		 * and refcount
> +		 */
>  		if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
>  			break;
>  	}
> @@ -296,7 +299,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
>  	}
>  	free_idx = sgid_tbl->max;
>  	for (i = 0; i < sgid_tbl->max; i++) {
> -		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
> +		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
> +		    sgid_tbl->tbl[i].vlan_id == vlan_id) {
>  			dev_dbg(&res->pdev->dev,
>  				"SGID entry already exist in entry %d!\n", i);

bnxt guys: please just delete this duplicate detection code from the
driver. Every GID provided from the core must be programmed into the
given gid table index.

Jason

^ permalink raw reply

* Re: Re: Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Jason Gunthorpe @ 2019-07-12 15:32 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <OF9F46C3F6.DC3E03FF-ON00258435.00521546-00258435.00549C01@notes.na.collabserv.com>

On Fri, Jul 12, 2019 at 03:24:09PM +0000, Bernard Metzler wrote:
> 
> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Jason Gunthorpe" <jgg@ziepe.ca>
> >Date: 07/12/2019 04:43PM
> >Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
> ><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
> >linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
> >Subject: [EXTERNAL] Re: Re: Re: [PATCH] rdma/siw: avoid
> >smp_store_mb() on a u64
> >
> >On Fri, Jul 12, 2019 at 02:35:50PM +0000, Bernard Metzler wrote:
> >
> >> >This looks wrong to me.. a userspace notification re-arm cannot be
> >> >lost, so have a split READ/TEST/WRITE sequence can't possibly
> >work?
> >> >
> >> >I'd expect an atomic test and clear here?
> >> 
> >> We cannot avoid the case that the application re-arms the
> >> CQ only after a CQE got placed. That is why folks are polling the
> >> CQ once after re-arming it - to make sure they do not miss the
> >> very last and single CQE which would have produced a CQ event.
> >
> >That is different, that is re-arm happing after a CQE placement and
> >this can't be fixed.
> >
> >What I said is that a re-arm from userspace cannot be lost. So you
> >can't blindly clear the arm flag with the WRITE_ONCE. It might be OK
> >beacuse of the if, but...
> >
> >It is just goofy to write it without a 'test and clear' atomic. If
> >the
> >writer side consumes the notify it should always be done atomically.
> >
> Hmmm, I don't yet get why we should test and clear atomically, if we
> clear anyway - is it because we want to avoid clearing a re-arm which
> happens just after testing and before clearing?

It is just clearer as to the intent.. 

Are you trying to optimize away an atomic or something? That might
work better as a dual counter scheme.

> Another complication -- test_and_set_bit() operates on a single
> bit, but we have to test two bits, and reset both, if one is
> set.

Why are two bits needed to represent armed and !armed?

Jason

^ permalink raw reply

* Re: User SIW fails matching device
From: Jason Gunthorpe @ 2019-07-12 15:30 UTC (permalink / raw)
  To: Potnuri Bharat Teja
  Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com,
	monis@mellanox.com, Nirranjan Kirubaharan
In-Reply-To: <20190712152418.GA16331@chelsio.com>

On Fri, Jul 12, 2019 at 08:54:20PM +0530, Potnuri Bharat Teja wrote:
> On Friday, July 07/12/19, 2019 at 20:05:46 +0530, Jason Gunthorpe wrote:
> > On Fri, Jul 12, 2019 at 07:57:19PM +0530, Potnuri Bharat Teja wrote:
> > > Hi all,
> > > I observe the following behavior on one of my machines configured for siw.
> > > 
> > > Issue:
> > > SIW device gets wrong device ops (HW/real rdma driver device ops) instead of
> > > siw device ops due to improper device matching.
> > > 
> > > Root-cause:
> > > In libibverbs, during user cma initialisation, for each entry from the driver 
> > > list, sysfs device is checked for matching name or device.
> > > If the siw/rxe driver is at the head of the list, then sysfs device matches 
> > > properly with the corresponding siw driver and gets the corresponding siw/rxe 
> > > device ops. Now, If the siw/rxe driver is after the real HW driver cxgb4/mlx5 
> > > respectively in the driver list, then siw sysfs device matches pci device and 
> > > wrongly gets the device ops of HW driver (cxgb4/mlx5).
> > > 
> > > Below debug prints from verbs_register_driver() and driver_list entries, where 
> > > siw is after cxgb4. I see verbs alloc context landing in cxgb4_alloc_context 
> > > instead of siw_alloc_context, thus breaking user siw.
> > > 
> > > <debug> verbs_register_driver_22: 184: driver 0x176e370
> > > <debug> verbs_register_driver_22: 185: name ipathverbs
> > > <debug> verbs_register_driver_22: 184: driver 0x176f6a0
> > > <debug> verbs_register_driver_22: 185: name cxgb4
> > > <debug> verbs_register_driver_22: 184: driver 0x176fd50
> > > <debug> verbs_register_driver_22: 185: name cxgb3
> > > <debug> verbs_register_driver_22: 184: driver 0x1777020
> > > <debug> verbs_register_driver_22: 185: name rxe
> > > <debug> verbs_register_driver_22: 184: driver 0x1770a30
> > > <debug> verbs_register_driver_22: 185: name siw
> > > <debug> verbs_register_driver_22: 184: driver 0x1771120
> > > <debug> verbs_register_driver_22: 185: name mlx4
> > > <debug> verbs_register_driver_22: 184: driver 0x1771990
> > > <debug> verbs_register_driver_22: 185: name mlx5
> > > <debug> verbs_register_driver_22: 184: driver 0x1771ff0
> > > <debug> verbs_register_driver_22: 185: name efa
> > > 
> > > <debug> try_drivers: 372: driver 0x176e370, sysfs_dev 0x1776b20, name: ipathverbs
> > > <debug> try_drivers: 372: driver 0x176f6a0, sysfs_dev 0x1776b20, name: cxgb4
> > > <debug> try_drivers: 372: driver 0x176fd50, sysfs_dev 0x1776b20, name: cxgb3
> > > <debug> try_drivers: 372: driver 0x1777020, sysfs_dev 0x1776b20, name: rxe
> > > <debug> try_drivers: 372: driver 0x1770a30, sysfs_dev 0x1776b20, name: siw
> > > <debug> try_drivers: 372: driver 0x1771120, sysfs_dev 0x1776b20, name: mlx4
> > > <debug> try_drivers: 372: driver 0x1771990, sysfs_dev 0x1776b20, name: mlx5
> > > <debug> try_drivers: 372: driver 0x1771ff0, sysfs_dev 0x1776b20, name: efa
> > > 
> > > Proposed fix:
> > > I have the below fix that works. It adds siw/rxe driver to the HEAD of the 
> > > driver list and the rest to the tail. I am not sure if this fix is the ideal 
> > > one, so I am attaching it to this mail.
> > 
> > Update your rdma-core to latest and this will be fixed fully by using
> > netlink to match the siw device..
> > 
> I pulled the latest rdma-core, still see the issue.
> 
> commit 7ef6077ec3201f661458297fea776746ba752843 (HEAD, upstream/master)
> Merge: 837954ff677c 95934b61a74e
> Author: Jason Gunthorpe <jgg@mellanox.com>
> Date:   Thu Jul 11 16:18:06 2019 -0300
> 
>     Merge pull request #539 from jgunthorpe/netlink
> 
>         Use netlink to learn about ibdevs and their related chardevs
> 
> 
> Is there any corresponding kernel change or package dependency? I am currently 
> on Doug's wip/dl-for-next branch.

That should be good enough for the kernel.. Hmm.. The siw stuff didn't
get updated, you need this rdma-core patch too. Please confirm

diff --git a/providers/siw/siw.c b/providers/siw/siw.c
index 23e4dd976caf84..41f33fa16123e9 100644
--- a/providers/siw/siw.c
+++ b/providers/siw/siw.c
@@ -907,7 +907,7 @@ static void siw_device_free(struct verbs_device *vdev)
 }
 
 static const struct verbs_match_ent rnic_table[] = {
-	VERBS_NAME_MATCH("siw", NULL),
+	VERBS_DRIVER_ID(RDMA_DRIVER_SIW),
 	{},
 };
 

^ permalink raw reply related

* Re: User SIW fails matching device
From: Potnuri Bharat Teja @ 2019-07-12 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, BMT@zurich.ibm.com,
	monis@mellanox.com, Nirranjan Kirubaharan
In-Reply-To: <20190712143546.GD27512@ziepe.ca>

On Friday, July 07/12/19, 2019 at 20:05:46 +0530, Jason Gunthorpe wrote:
> On Fri, Jul 12, 2019 at 07:57:19PM +0530, Potnuri Bharat Teja wrote:
> > Hi all,
> > I observe the following behavior on one of my machines configured for siw.
> > 
> > Issue:
> > SIW device gets wrong device ops (HW/real rdma driver device ops) instead of
> > siw device ops due to improper device matching.
> > 
> > Root-cause:
> > In libibverbs, during user cma initialisation, for each entry from the driver 
> > list, sysfs device is checked for matching name or device.
> > If the siw/rxe driver is at the head of the list, then sysfs device matches 
> > properly with the corresponding siw driver and gets the corresponding siw/rxe 
> > device ops. Now, If the siw/rxe driver is after the real HW driver cxgb4/mlx5 
> > respectively in the driver list, then siw sysfs device matches pci device and 
> > wrongly gets the device ops of HW driver (cxgb4/mlx5).
> > 
> > Below debug prints from verbs_register_driver() and driver_list entries, where 
> > siw is after cxgb4. I see verbs alloc context landing in cxgb4_alloc_context 
> > instead of siw_alloc_context, thus breaking user siw.
> > 
> > <debug> verbs_register_driver_22: 184: driver 0x176e370
> > <debug> verbs_register_driver_22: 185: name ipathverbs
> > <debug> verbs_register_driver_22: 184: driver 0x176f6a0
> > <debug> verbs_register_driver_22: 185: name cxgb4
> > <debug> verbs_register_driver_22: 184: driver 0x176fd50
> > <debug> verbs_register_driver_22: 185: name cxgb3
> > <debug> verbs_register_driver_22: 184: driver 0x1777020
> > <debug> verbs_register_driver_22: 185: name rxe
> > <debug> verbs_register_driver_22: 184: driver 0x1770a30
> > <debug> verbs_register_driver_22: 185: name siw
> > <debug> verbs_register_driver_22: 184: driver 0x1771120
> > <debug> verbs_register_driver_22: 185: name mlx4
> > <debug> verbs_register_driver_22: 184: driver 0x1771990
> > <debug> verbs_register_driver_22: 185: name mlx5
> > <debug> verbs_register_driver_22: 184: driver 0x1771ff0
> > <debug> verbs_register_driver_22: 185: name efa
> > 
> > <debug> try_drivers: 372: driver 0x176e370, sysfs_dev 0x1776b20, name: ipathverbs
> > <debug> try_drivers: 372: driver 0x176f6a0, sysfs_dev 0x1776b20, name: cxgb4
> > <debug> try_drivers: 372: driver 0x176fd50, sysfs_dev 0x1776b20, name: cxgb3
> > <debug> try_drivers: 372: driver 0x1777020, sysfs_dev 0x1776b20, name: rxe
> > <debug> try_drivers: 372: driver 0x1770a30, sysfs_dev 0x1776b20, name: siw
> > <debug> try_drivers: 372: driver 0x1771120, sysfs_dev 0x1776b20, name: mlx4
> > <debug> try_drivers: 372: driver 0x1771990, sysfs_dev 0x1776b20, name: mlx5
> > <debug> try_drivers: 372: driver 0x1771ff0, sysfs_dev 0x1776b20, name: efa
> > 
> > Proposed fix:
> > I have the below fix that works. It adds siw/rxe driver to the HEAD of the 
> > driver list and the rest to the tail. I am not sure if this fix is the ideal 
> > one, so I am attaching it to this mail.
> 
> Update your rdma-core to latest and this will be fixed fully by using
> netlink to match the siw device..
> 
I pulled the latest rdma-core, still see the issue.

commit 7ef6077ec3201f661458297fea776746ba752843 (HEAD, upstream/master)
Merge: 837954ff677c 95934b61a74e
Author: Jason Gunthorpe <jgg@mellanox.com>
Date:   Thu Jul 11 16:18:06 2019 -0300

    Merge pull request #539 from jgunthorpe/netlink

        Use netlink to learn about ibdevs and their related chardevs

-----------

Is there any corresponding kernel change or package dependency? I am currently 
on Doug's wip/dl-for-next branch.

^ permalink raw reply

* Re:  Re: Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Bernard Metzler @ 2019-07-12 15:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <20190712144257.GE27512@ziepe.ca>

-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Jason Gunthorpe" <jgg@ziepe.ca>
>Date: 07/12/2019 04:43PM
>Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
>linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>Subject: [EXTERNAL] Re: Re: Re: [PATCH] rdma/siw: avoid
>smp_store_mb() on a u64
>
>On Fri, Jul 12, 2019 at 02:35:50PM +0000, Bernard Metzler wrote:
>
>> >This looks wrong to me.. a userspace notification re-arm cannot be
>> >lost, so have a split READ/TEST/WRITE sequence can't possibly
>work?
>> >
>> >I'd expect an atomic test and clear here?
>> 
>> We cannot avoid the case that the application re-arms the
>> CQ only after a CQE got placed. That is why folks are polling the
>> CQ once after re-arming it - to make sure they do not miss the
>> very last and single CQE which would have produced a CQ event.
>
>That is different, that is re-arm happing after a CQE placement and
>this can't be fixed.
>
>What I said is that a re-arm from userspace cannot be lost. So you
>can't blindly clear the arm flag with the WRITE_ONCE. It might be OK
>beacuse of the if, but...
>
>It is just goofy to write it without a 'test and clear' atomic. If
>the
>writer side consumes the notify it should always be done atomically.
>
Hmmm, I don't yet get why we should test and clear atomically, if we
clear anyway - is it because we want to avoid clearing a re-arm which
happens just after testing and before clearing?
(1) If the test was positive, we will call the CQ event handler,
and per RDMA verbs spec, the application MUST re-arm the CQ after it
got a CQ event, to get another one. So clearing it sometimes before
calling the handler is right.
(2) If the test was negative, a test and reset would not change
anything.

Another complication -- test_and_set_bit() operates on a single
bit, but we have to test two bits, and reset both, if one is
set. Can we do that atomically, if we test the bits conditionally?
I didn't find anything appropriate.

>And then I think all the weird barriers go away
>
>> >> @@ -1141,11 +1145,17 @@ int siw_req_notify_cq(struct ib_cq
>> >*base_cq, enum ib_cq_notify_flags flags)
>> >>  	siw_dbg_cq(cq, "flags: 0x%02x\n", flags);
>> >>  
>> >>  	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
>> >> -		/* CQ event for next solicited completion */
>> >> -		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
>> >> +		/*
>> >> +		 * Enable CQ event for next solicited completion.
>> >> +		 * and make it visible to all associated producers.
>> >> +		 */
>> >> +		smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);
>> >
>> >But what is the 2nd piece of data to motivate the smp_store_mb?
>> 
>> Another core (such as a concurrent RX operation) shall see this
>> CQ being re-armed asap.
>
>'ASAP' is not a '2nd piece of data'. 
>
>AFAICT this requirement is just a normal atomic set_bit which does
>also expedite making the change visible?

Absolutely!!
good point....this is just a single flag we are operating on.
And the weird barrier goes away ;)

Many thanks!
Bernard.


^ permalink raw reply

* Re: [PATCH rdma-next] lib/dim: Prevent overflow in calculation of ratio statistics
From: Jason Gunthorpe @ 2019-07-12 15:23 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: Doug Ledford, RDMA mailing list, Yamin Friedman
In-Reply-To: <20190712060309.GM23598@mtr-leonro.mtl.com>

On Fri, Jul 12, 2019 at 09:03:09AM +0300, Leon Romanovsky wrote:
> On Thu, Jul 11, 2019 at 05:31:14PM +0000, Jason Gunthorpe wrote:
> > On Thu, Jul 11, 2019 at 08:19:22PM +0300, Leon Romanovsky wrote:
> > > On Thu, Jul 11, 2019 at 04:11:07PM +0000, Jason Gunthorpe wrote:
> > > > On Thu, Jul 11, 2019 at 06:47:34PM +0300, Leon Romanovsky wrote:
> > > > > > > diff --git a/lib/dim/dim.c b/lib/dim/dim.c
> > > > > > > index 439d641ec796..38045d6d0538 100644
> > > > > > > +++ b/lib/dim/dim.c
> > > > > > > @@ -74,8 +74,8 @@ void dim_calc_stats(struct dim_sample *start, struct dim_sample *end,
> > > > > > >  					delta_us);
> > > > > > >  	curr_stats->cpms = DIV_ROUND_UP(ncomps * USEC_PER_MSEC, delta_us);
> > > > > > >  	if (curr_stats->epms != 0)
> > > > > > > -		curr_stats->cpe_ratio =
> > > > > > > -				(curr_stats->cpms * 100) / curr_stats->epms;
> > > > > > > +		curr_stats->cpe_ratio = DIV_ROUND_DOWN_ULL(
> > > > > > > +			curr_stats->cpms * 100, curr_stats->epms);
> > > > > >
> > > > > > This will still potentially overfow the 'int' for cpe_ratio if epms <
> > > > > > 100 ?
> > > > >
> > > > > I assumed that assignment to "unsigned long long" will do the trick.
> > > > > https://elixir.bootlin.com/linux/latest/source/include/linux/kernel.h#L94
> > > >
> > > > That only protects the multiply, the result of DIV_ROUND_DOWN_ULL is
> > > > casted to int.
> > >
> > > It is ok, the result is "int" and it will be small, 100 in multiply
> > > represents percentage.
> >
> > Percentage would be divide by 100..
> >
> > Like I said it will overflow if epms < 100 ...
> 
> It is unlikely to happen because cpe_ratio is between 0 to 100 and cpms
> * 100 is not large at all.
> 
> UBSAN error is "theoretical" overflow.

? UBSAN is not theoretical, it only triggers if something actually
happens. So in this case cpms*100 was very large and overflowed. 

Maybe it shouldn't be and that is the actual bug, but if we overflowed
with cpms*100, then epms must be > 100 or we still overflow the
divide.

Jason

^ permalink raw reply

* Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Jason Gunthorpe @ 2019-07-12 15:14 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Bernard Metzler, Doug Ledford, Peter Zijlstra, linux-rdma,
	Linux Kernel Mailing List
In-Reply-To: <CAK8P3a3ZqY_qLSN1gw12EvzLS49RAnmG4nT9=N+Qj9XngQd0CA@mail.gmail.com>

On Fri, Jul 12, 2019 at 03:22:35PM +0200, Arnd Bergmann wrote:
> On Fri, Jul 12, 2019 at 3:05 PM Bernard Metzler <BMT@zurich.ibm.com> wrote:
> 
> >
> > We share CQ (completion queue) notification flags between application
> > (which may be user land) and producer (kernel QP's (queue pairs)).
> > Those flags can be written by both application and QP's. The application
> > writes those flags to let the driver know if it shall inform about new
> > work completions. It can write those flags at any time.
> > Only a kernel producer reads those flags to decide if
> > the CQ notification handler shall be kicked, if a new CQ element gets
> > added to the CQ. When kicking the completion handler, the driver resets the
> > notification flag, which must get re-armed by the application.
> >
> > We use READ_ONCE() and WRITE_ONCE(), since the flags are potentially
> > shared (mmap'd) between user and kernel land.
> >
> > siw_req_notify_cq() is being called only by kernel consumers to change
> > (write) the CQ notification state. We use smp_store_mb() to make sure
> > the new value becomes visible to all kernel producers (QP's) asap.
> >
> >
> > From cfb861a09dcfb24a98ba0f1e26bdaa1529d1b006 Mon Sep 17 00:00:00 2001
> > From: Bernard Metzler <bmt@zurich.ibm.com>
> > Date: Fri, 12 Jul 2019 13:19:27 +0200
> > Subject: [PATCH] Make shared CQ notification flags 32bit to respect 32bit
> >  architectures
> >
> > Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
> 
> This fixes the build for me, thanks!
> 
> Tested-by: Arnd Bergmann <arnd@arndb.de>

Since this is coming up so late in the merge window, I'm inclined to
take the simple path while Bernard makes a complete solution
here. What do you think Arnd?

From 0b043644c0ca601cb19943a81aa1f1455dbe9461 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgg@mellanox.com>
Date: Fri, 12 Jul 2019 12:12:06 -0300
Subject: [PATCH] RMDA/siw: Require a 64 bit arch

The new siw driver fails to build on i386 with

drivers/infiniband/sw/siw/siw_qp.c:1025:3: error: invalid output size for constraint '+q'
                smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);

As it is using 64 bit values with the smp_store_mb.

Since the entire scheme here seems questionable, and we are in the merge
window, fix the compile failures by disabling 32 bit support on this
driver.

A proper fix will be reviewed post merge window.

Fixes: c0cf5bdde46c ("rdma/siw: addition to kernel build environment")
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 drivers/infiniband/sw/siw/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/sw/siw/Kconfig b/drivers/infiniband/sw/siw/Kconfig
index b622fc62f2cd6d..dace276aea1413 100644
--- a/drivers/infiniband/sw/siw/Kconfig
+++ b/drivers/infiniband/sw/siw/Kconfig
@@ -1,6 +1,6 @@
 config RDMA_SIW
 	tristate "Software RDMA over TCP/IP (iWARP) driver"
-	depends on INET && INFINIBAND && LIBCRC32C
+	depends on INET && INFINIBAND && LIBCRC32C && 64BIT
 	select DMA_VIRT_OPS
 	help
 	This driver implements the iWARP RDMA transport over
-- 
2.21.0



^ permalink raw reply related

* Re: Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Jason Gunthorpe @ 2019-07-12 14:42 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <OF3D069E00.E0996A14-ON00258435.004DD8C8-00258435.00502F8C@notes.na.collabserv.com>

On Fri, Jul 12, 2019 at 02:35:50PM +0000, Bernard Metzler wrote:

> >This looks wrong to me.. a userspace notification re-arm cannot be
> >lost, so have a split READ/TEST/WRITE sequence can't possibly work?
> >
> >I'd expect an atomic test and clear here?
> 
> We cannot avoid the case that the application re-arms the
> CQ only after a CQE got placed. That is why folks are polling the
> CQ once after re-arming it - to make sure they do not miss the
> very last and single CQE which would have produced a CQ event.

That is different, that is re-arm happing after a CQE placement and
this can't be fixed.

What I said is that a re-arm from userspace cannot be lost. So you
can't blindly clear the arm flag with the WRITE_ONCE. It might be OK
beacuse of the if, but...

It is just goofy to write it without a 'test and clear' atomic. If the
writer side consumes the notify it should always be done atomically.

And then I think all the weird barriers go away

> >> @@ -1141,11 +1145,17 @@ int siw_req_notify_cq(struct ib_cq
> >*base_cq, enum ib_cq_notify_flags flags)
> >>  	siw_dbg_cq(cq, "flags: 0x%02x\n", flags);
> >>  
> >>  	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
> >> -		/* CQ event for next solicited completion */
> >> -		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
> >> +		/*
> >> +		 * Enable CQ event for next solicited completion.
> >> +		 * and make it visible to all associated producers.
> >> +		 */
> >> +		smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);
> >
> >But what is the 2nd piece of data to motivate the smp_store_mb?
> 
> Another core (such as a concurrent RX operation) shall see this
> CQ being re-armed asap.

'ASAP' is not a '2nd piece of data'. 

AFAICT this requirement is just a normal atomic set_bit which does
also expedite making the change visible?
 
Jason

^ permalink raw reply

* Re:  Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Bernard Metzler @ 2019-07-12 14:35 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <20190712135339.GC27512@ziepe.ca>

-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Jason Gunthorpe" <jgg@ziepe.ca>
>Date: 07/12/2019 03:53PM
>Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
>linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>Subject: [EXTERNAL] Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on
>a u64
>
>On Fri, Jul 12, 2019 at 01:05:14PM +0000, Bernard Metzler wrote:
>> 
>> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
>> >From: "Jason Gunthorpe" <jgg@ziepe.ca>
>> >Date: 07/12/2019 02:03PM
>> >Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
>> ><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
>> >linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>> >Subject: [EXTERNAL] Re: [PATCH] rdma/siw: avoid smp_store_mb() on
>a
>> >u64
>> >
>> >On Fri, Jul 12, 2019 at 11:33:46AM +0000, Bernard Metzler wrote:
>> >> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>> >> >b/drivers/infiniband/sw/siw/siw_verbs.c
>> >> >index 32dc79d0e898..41c5ab293fe1 100644
>> >> >+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>> >> >@@ -1142,10 +1142,11 @@ int siw_req_notify_cq(struct ib_cq
>> >*base_cq,
>> >> >enum ib_cq_notify_flags flags)
>> >> > 
>> >> > 	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
>> >> > 		/* CQ event for next solicited completion */
>> >> >-		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
>> >> >+		WRITE_ONCE(*cq->notify, SIW_NOTIFY_SOLICITED);
>> >> > 	else
>> >> > 		/* CQ event for any signalled completion */
>> >> >-		smp_store_mb(*cq->notify, SIW_NOTIFY_ALL);
>> >> >+		WRITE_ONCE(*cq->notify, SIW_NOTIFY_ALL);
>> >> >+	smp_wmb();
>> >> > 
>> >> > 	if (flags & IB_CQ_REPORT_MISSED_EVENTS)
>> >> > 		return cq->cq_put - cq->cq_get;
>> >> 
>> >> 
>> >> Hi Arnd,
>> >> Many thanks for pointing that out! Indeed, this CQ notification
>> >> mechanism does not take 32 bit architectures into account.
>> >> Since we have only three flags to hold here, it's probably
>better
>> >> to make it a 32bit value. That would remove the issue w/o
>> >> introducing extra smp_wmb(). 
>> >
>> >I also prefer not to see smp_wmb() in drivers..
>> >
>> >> I'd prefer smp_store_mb(), since on some architectures it shall
>be
>> >> more efficient.  That would also make it sufficient to use
>> >> READ_ONCE.
>> >
>> >The READ_ONCE is confusing to me too, if you need store_release
>> >semantics then the reader also needs to pair with load_acquite -
>> >otherwise it doesn't work.
>> >
>> >Still, we need to do something rapidly to fix the i386 build,
>please
>> >revise right away..
>> >
>> >Jason
>> >
>> >
>> 
>> We share CQ (completion queue) notification flags between
>application
>> (which may be user land) and producer (kernel QP's (queue pairs)).
>> Those flags can be written by both application and QP's. The
>application
>> writes those flags to let the driver know if it shall inform about
>new
>> work completions. It can write those flags at any time.
>> Only a kernel producer reads those flags to decide if
>> the CQ notification handler shall be kicked, if a new CQ element
>gets
>> added to the CQ. When kicking the completion handler, the driver
>resets the
>> notification flag, which must get re-armed by the application.
>
>This looks wrong to me.. a userspace notification re-arm cannot be
>lost, so have a split READ/TEST/WRITE sequence can't possibly work?
>
>I'd expect an atomic test and clear here?

We cannot avoid the case that the application re-arms the
CQ only after a CQE got placed. That is why folks are polling the
CQ once after re-arming it - to make sure they do not miss the
very last and single CQE which would have produced a CQ event.

I do not think an atomic test and clear would change that picture.

Also, per RDMA verbs semantics, if the user would re-arm the CQ
more then once before it gets a CQ event, it will still get only one
CQ event if a new CQ element becomes ready. 
>
>
>> @@ -1141,11 +1145,17 @@ int siw_req_notify_cq(struct ib_cq
>*base_cq, enum ib_cq_notify_flags flags)
>>  	siw_dbg_cq(cq, "flags: 0x%02x\n", flags);
>>  
>>  	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
>> -		/* CQ event for next solicited completion */
>> -		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
>> +		/*
>> +		 * Enable CQ event for next solicited completion.
>> +		 * and make it visible to all associated producers.
>> +		 */
>> +		smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);
>
>But what is the 2nd piece of data to motivate the smp_store_mb?

Another core (such as a concurrent RX operation) shall see this
CQ being re-armed asap.

Best,
Bernard.
>
>Jason
>
>


^ permalink raw reply

* Re: User SIW fails matching device
From: Jason Gunthorpe @ 2019-07-12 14:35 UTC (permalink / raw)
  To: Potnuri Bharat Teja; +Cc: linux-rdma, BMT, monis, nirranjan
In-Reply-To: <20190712142718.GA26697@chelsio.com>

On Fri, Jul 12, 2019 at 07:57:19PM +0530, Potnuri Bharat Teja wrote:
> Hi all,
> I observe the following behavior on one of my machines configured for siw.
> 
> Issue:
> SIW device gets wrong device ops (HW/real rdma driver device ops) instead of
> siw device ops due to improper device matching.
> 
> Root-cause:
> In libibverbs, during user cma initialisation, for each entry from the driver 
> list, sysfs device is checked for matching name or device.
> If the siw/rxe driver is at the head of the list, then sysfs device matches 
> properly with the corresponding siw driver and gets the corresponding siw/rxe 
> device ops. Now, If the siw/rxe driver is after the real HW driver cxgb4/mlx5 
> respectively in the driver list, then siw sysfs device matches pci device and 
> wrongly gets the device ops of HW driver (cxgb4/mlx5).
> 
> Below debug prints from verbs_register_driver() and driver_list entries, where 
> siw is after cxgb4. I see verbs alloc context landing in cxgb4_alloc_context 
> instead of siw_alloc_context, thus breaking user siw.
> 
> <debug> verbs_register_driver_22: 184: driver 0x176e370
> <debug> verbs_register_driver_22: 185: name ipathverbs
> <debug> verbs_register_driver_22: 184: driver 0x176f6a0
> <debug> verbs_register_driver_22: 185: name cxgb4
> <debug> verbs_register_driver_22: 184: driver 0x176fd50
> <debug> verbs_register_driver_22: 185: name cxgb3
> <debug> verbs_register_driver_22: 184: driver 0x1777020
> <debug> verbs_register_driver_22: 185: name rxe
> <debug> verbs_register_driver_22: 184: driver 0x1770a30
> <debug> verbs_register_driver_22: 185: name siw
> <debug> verbs_register_driver_22: 184: driver 0x1771120
> <debug> verbs_register_driver_22: 185: name mlx4
> <debug> verbs_register_driver_22: 184: driver 0x1771990
> <debug> verbs_register_driver_22: 185: name mlx5
> <debug> verbs_register_driver_22: 184: driver 0x1771ff0
> <debug> verbs_register_driver_22: 185: name efa
> 
> <debug> try_drivers: 372: driver 0x176e370, sysfs_dev 0x1776b20, name: ipathverbs
> <debug> try_drivers: 372: driver 0x176f6a0, sysfs_dev 0x1776b20, name: cxgb4
> <debug> try_drivers: 372: driver 0x176fd50, sysfs_dev 0x1776b20, name: cxgb3
> <debug> try_drivers: 372: driver 0x1777020, sysfs_dev 0x1776b20, name: rxe
> <debug> try_drivers: 372: driver 0x1770a30, sysfs_dev 0x1776b20, name: siw
> <debug> try_drivers: 372: driver 0x1771120, sysfs_dev 0x1776b20, name: mlx4
> <debug> try_drivers: 372: driver 0x1771990, sysfs_dev 0x1776b20, name: mlx5
> <debug> try_drivers: 372: driver 0x1771ff0, sysfs_dev 0x1776b20, name: efa
> 
> Proposed fix:
> I have the below fix that works. It adds siw/rxe driver to the HEAD of the 
> driver list and the rest to the tail. I am not sure if this fix is the ideal 
> one, so I am attaching it to this mail.

Update your rdma-core to latest and this will be fixed fully by using
netlink to match the siw device..

Jason

^ permalink raw reply

* [rdma-core patch v2] srp_daemon: improve the debug message for is_enabled_by_rules_file
From: Honggang Li @ 2019-07-12 14:31 UTC (permalink / raw)
  To: bvanassche; +Cc: linux-rdma, Honggang Li

Signed-off-by: Honggang Li <honli@redhat.com>
---
 srp_daemon/srp_daemon.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/srp_daemon/srp_daemon.c b/srp_daemon/srp_daemon.c
index a004f6a4..e85b9668 100644
--- a/srp_daemon/srp_daemon.c
+++ b/srp_daemon/srp_daemon.c
@@ -349,10 +349,11 @@ static int is_enabled_by_rules_file(struct target_details *target)
 	int rule;
 	struct config_t *conf = config;
 
-	if (NULL == conf->rules)
+	if (NULL == conf->rules) {
+		pr_debug("Allowing SRP target with id_ext %s because not using a rules file\n", target->id_ext);
 		return 1;
+	}
 
-	pr_debug("Found an SRP target with id_ext %s - check if it allowed by rules file\n", target->id_ext);
 	rule = -1;
 	do {
 		rule++;
@@ -392,6 +393,9 @@ static int is_enabled_by_rules_file(struct target_details *target)
 
 		target->options = conf->rules[rule].options;
 
+		pr_debug("SRP target with id_ext %s %s by rules file\n",
+				target->id_ext,
+				conf->rules[rule].allow ? "allowed" : "disallowed");
 		return conf->rules[rule].allow;
 
 	} while (1);
-- 
2.20.1


^ permalink raw reply related

* User SIW fails matching device
From: Potnuri Bharat Teja @ 2019-07-12 14:27 UTC (permalink / raw)
  To: jgg, linux-rdma; +Cc: BMT, monis, nirranjan, bharat

Hi all,
I observe the following behavior on one of my machines configured for siw.

Issue:
SIW device gets wrong device ops (HW/real rdma driver device ops) instead of
siw device ops due to improper device matching.

Root-cause:
In libibverbs, during user cma initialisation, for each entry from the driver 
list, sysfs device is checked for matching name or device.
If the siw/rxe driver is at the head of the list, then sysfs device matches 
properly with the corresponding siw driver and gets the corresponding siw/rxe 
device ops. Now, If the siw/rxe driver is after the real HW driver cxgb4/mlx5 
respectively in the driver list, then siw sysfs device matches pci device and 
wrongly gets the device ops of HW driver (cxgb4/mlx5).

Below debug prints from verbs_register_driver() and driver_list entries, where 
siw is after cxgb4. I see verbs alloc context landing in cxgb4_alloc_context 
instead of siw_alloc_context, thus breaking user siw.

<debug> verbs_register_driver_22: 184: driver 0x176e370
<debug> verbs_register_driver_22: 185: name ipathverbs
<debug> verbs_register_driver_22: 184: driver 0x176f6a0
<debug> verbs_register_driver_22: 185: name cxgb4
<debug> verbs_register_driver_22: 184: driver 0x176fd50
<debug> verbs_register_driver_22: 185: name cxgb3
<debug> verbs_register_driver_22: 184: driver 0x1777020
<debug> verbs_register_driver_22: 185: name rxe
<debug> verbs_register_driver_22: 184: driver 0x1770a30
<debug> verbs_register_driver_22: 185: name siw
<debug> verbs_register_driver_22: 184: driver 0x1771120
<debug> verbs_register_driver_22: 185: name mlx4
<debug> verbs_register_driver_22: 184: driver 0x1771990
<debug> verbs_register_driver_22: 185: name mlx5
<debug> verbs_register_driver_22: 184: driver 0x1771ff0
<debug> verbs_register_driver_22: 185: name efa

<debug> try_drivers: 372: driver 0x176e370, sysfs_dev 0x1776b20, name: ipathverbs
<debug> try_drivers: 372: driver 0x176f6a0, sysfs_dev 0x1776b20, name: cxgb4
<debug> try_drivers: 372: driver 0x176fd50, sysfs_dev 0x1776b20, name: cxgb3
<debug> try_drivers: 372: driver 0x1777020, sysfs_dev 0x1776b20, name: rxe
<debug> try_drivers: 372: driver 0x1770a30, sysfs_dev 0x1776b20, name: siw
<debug> try_drivers: 372: driver 0x1771120, sysfs_dev 0x1776b20, name: mlx4
<debug> try_drivers: 372: driver 0x1771990, sysfs_dev 0x1776b20, name: mlx5
<debug> try_drivers: 372: driver 0x1771ff0, sysfs_dev 0x1776b20, name: efa

Proposed fix:
I have the below fix that works. It adds siw/rxe driver to the HEAD of the 
driver list and the rest to the tail. I am not sure if this fix is the ideal 
one, so I am attaching it to this mail.

diff --git a/libibverbs/init.c b/libibverbs/init.c
index 930d91811ca9..e44f0d743063 100644
--- a/libibverbs/init.c
+++ b/libibverbs/init.c
@@ -182,7 +182,10 @@ void verbs_register_driver(const struct verbs_device_ops *ops)

	driver->ops = ops;

-       list_add_tail(&driver_list, &driver->entry);
+       if (!strcmp(ops->name, "siw") || !strcmp(ops->name, "rxe"))
+               list_add(&driver_list, &driver->entry);
+       else
+               list_add_tail(&driver_list, &driver->entry);
 }
---

Thanks,
Bharat

^ permalink raw reply related

* Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Jason Gunthorpe @ 2019-07-12 13:53 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <OF36428621.B839DE8B-ON00258435.00461748-00258435.0047E413@notes.na.collabserv.com>

On Fri, Jul 12, 2019 at 01:05:14PM +0000, Bernard Metzler wrote:
> 
> >To: "Bernard Metzler" <BMT@zurich.ibm.com>
> >From: "Jason Gunthorpe" <jgg@ziepe.ca>
> >Date: 07/12/2019 02:03PM
> >Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
> ><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
> >linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
> >Subject: [EXTERNAL] Re: [PATCH] rdma/siw: avoid smp_store_mb() on a
> >u64
> >
> >On Fri, Jul 12, 2019 at 11:33:46AM +0000, Bernard Metzler wrote:
> >> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
> >> >b/drivers/infiniband/sw/siw/siw_verbs.c
> >> >index 32dc79d0e898..41c5ab293fe1 100644
> >> >+++ b/drivers/infiniband/sw/siw/siw_verbs.c
> >> >@@ -1142,10 +1142,11 @@ int siw_req_notify_cq(struct ib_cq
> >*base_cq,
> >> >enum ib_cq_notify_flags flags)
> >> > 
> >> > 	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
> >> > 		/* CQ event for next solicited completion */
> >> >-		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
> >> >+		WRITE_ONCE(*cq->notify, SIW_NOTIFY_SOLICITED);
> >> > 	else
> >> > 		/* CQ event for any signalled completion */
> >> >-		smp_store_mb(*cq->notify, SIW_NOTIFY_ALL);
> >> >+		WRITE_ONCE(*cq->notify, SIW_NOTIFY_ALL);
> >> >+	smp_wmb();
> >> > 
> >> > 	if (flags & IB_CQ_REPORT_MISSED_EVENTS)
> >> > 		return cq->cq_put - cq->cq_get;
> >> 
> >> 
> >> Hi Arnd,
> >> Many thanks for pointing that out! Indeed, this CQ notification
> >> mechanism does not take 32 bit architectures into account.
> >> Since we have only three flags to hold here, it's probably better
> >> to make it a 32bit value. That would remove the issue w/o
> >> introducing extra smp_wmb(). 
> >
> >I also prefer not to see smp_wmb() in drivers..
> >
> >> I'd prefer smp_store_mb(), since on some architectures it shall be
> >> more efficient.  That would also make it sufficient to use
> >> READ_ONCE.
> >
> >The READ_ONCE is confusing to me too, if you need store_release
> >semantics then the reader also needs to pair with load_acquite -
> >otherwise it doesn't work.
> >
> >Still, we need to do something rapidly to fix the i386 build, please
> >revise right away..
> >
> >Jason
> >
> >
> 
> We share CQ (completion queue) notification flags between application
> (which may be user land) and producer (kernel QP's (queue pairs)).
> Those flags can be written by both application and QP's. The application
> writes those flags to let the driver know if it shall inform about new
> work completions. It can write those flags at any time.
> Only a kernel producer reads those flags to decide if
> the CQ notification handler shall be kicked, if a new CQ element gets
> added to the CQ. When kicking the completion handler, the driver resets the
> notification flag, which must get re-armed by the application.

This looks wrong to me.. a userspace notification re-arm cannot be
lost, so have a split READ/TEST/WRITE sequence can't possibly work?

I'd expect an atomic test and clear here?


> @@ -1141,11 +1145,17 @@ int siw_req_notify_cq(struct ib_cq *base_cq, enum ib_cq_notify_flags flags)
>  	siw_dbg_cq(cq, "flags: 0x%02x\n", flags);
>  
>  	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
> -		/* CQ event for next solicited completion */
> -		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
> +		/*
> +		 * Enable CQ event for next solicited completion.
> +		 * and make it visible to all associated producers.
> +		 */
> +		smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);

But what is the 2nd piece of data to motivate the smp_store_mb?

Jason

^ permalink raw reply

* Re:  Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Bernard Metzler @ 2019-07-12 13:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Gunthorpe, Arnd Bergmann, Doug Ledford, linux-rdma,
	linux-kernel
In-Reply-To: <20190712131930.GT3419@hirez.programming.kicks-ass.net>

-----"Peter Zijlstra" <peterz@infradead.org> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Peter Zijlstra" <peterz@infradead.org>
>Date: 07/12/2019 03:19PM
>Cc: "Jason Gunthorpe" <jgg@ziepe.ca>, "Arnd Bergmann"
><arnd@arndb.de>, "Doug Ledford" <dledford@redhat.com>,
>linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>Subject: [EXTERNAL] Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on
>a u64
>
>On Fri, Jul 12, 2019 at 01:05:14PM +0000, Bernard Metzler wrote:
>> @@ -1131,6 +1131,10 @@ int siw_poll_cq(struct ib_cq *base_cq, int
>num_cqe, struct ib_wc *wc)
>>   *   number of not reaped CQE's regardless of its notification
>>   *   type and current or new CQ notification settings.
>>   *
>> + * This function gets called only by kernel consumers.
>> + * Notification state must immediately become visible to all
>> + * associated kernel producers (QP's).
>
>No amount of memory barriers can achieve that goal.
>
>
Oh right, that is correct. And it is not needed. This statement
is to bold. We simply want it asap. Actually, per API semantics, there is
no ordering guarantee between creating a completion queue event and
concurrently arming/disarming the completion queue. Since it is
not possible.
I use the barrier to minimize the likelihood a CQ event is missed
since the CQ is not yet visible to be re-armed to all producers.
But it can happen. This is what this optional
IB_CQ_REPORT_MISSED_EVENTS right below the re-arming is for.

Would be OK if we just adapt the comment from 'must immediately become visible'
to 'shall become visible asap'. That would reflect the intention.

Thanks very much!
Bernard.

int siw_req_notify_cq(struct ib_cq *base_cq, enum ib_cq_notify_flags flags)
{
        struct siw_cq *cq = to_siw_cq(base_cq);

        siw_dbg_cq(cq, "flags: 0x%02x\n", flags);

        if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
                /*
                 * Enable CQ event for next solicited completion.
                 * and make it visible to all associated producers.
                 */
                smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);
        else
                /*
                 * Enable CQ event for any signalled completion.
                 * and make it visible to all associated producers.
                 */
                smp_store_mb(cq->notify->flags, SIW_NOTIFY_ALL);

        if (flags & IB_CQ_REPORT_MISSED_EVENTS)
                return cq->cq_put - cq->cq_get;

        return 0;
}


^ permalink raw reply

* Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Arnd Bergmann @ 2019-07-12 13:22 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Jason Gunthorpe, Doug Ledford, Peter Zijlstra, linux-rdma,
	Linux Kernel Mailing List
In-Reply-To: <OF36428621.B839DE8B-ON00258435.00461748-00258435.0047E413@notes.na.collabserv.com>

On Fri, Jul 12, 2019 at 3:05 PM Bernard Metzler <BMT@zurich.ibm.com> wrote:

>
> We share CQ (completion queue) notification flags between application
> (which may be user land) and producer (kernel QP's (queue pairs)).
> Those flags can be written by both application and QP's. The application
> writes those flags to let the driver know if it shall inform about new
> work completions. It can write those flags at any time.
> Only a kernel producer reads those flags to decide if
> the CQ notification handler shall be kicked, if a new CQ element gets
> added to the CQ. When kicking the completion handler, the driver resets the
> notification flag, which must get re-armed by the application.
>
> We use READ_ONCE() and WRITE_ONCE(), since the flags are potentially
> shared (mmap'd) between user and kernel land.
>
> siw_req_notify_cq() is being called only by kernel consumers to change
> (write) the CQ notification state. We use smp_store_mb() to make sure
> the new value becomes visible to all kernel producers (QP's) asap.
>
>
> From cfb861a09dcfb24a98ba0f1e26bdaa1529d1b006 Mon Sep 17 00:00:00 2001
> From: Bernard Metzler <bmt@zurich.ibm.com>
> Date: Fri, 12 Jul 2019 13:19:27 +0200
> Subject: [PATCH] Make shared CQ notification flags 32bit to respect 32bit
>  architectures
>
> Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>

This fixes the build for me, thanks!

Tested-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply

* Re: Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Peter Zijlstra @ 2019-07-12 13:19 UTC (permalink / raw)
  To: Bernard Metzler
  Cc: Jason Gunthorpe, Arnd Bergmann, Doug Ledford, linux-rdma,
	linux-kernel
In-Reply-To: <OF36428621.B839DE8B-ON00258435.00461748-00258435.0047E413@notes.na.collabserv.com>

On Fri, Jul 12, 2019 at 01:05:14PM +0000, Bernard Metzler wrote:
> @@ -1131,6 +1131,10 @@ int siw_poll_cq(struct ib_cq *base_cq, int num_cqe, struct ib_wc *wc)
>   *   number of not reaped CQE's regardless of its notification
>   *   type and current or new CQ notification settings.
>   *
> + * This function gets called only by kernel consumers.
> + * Notification state must immediately become visible to all
> + * associated kernel producers (QP's).

No amount of memory barriers can achieve that goal.

^ permalink raw reply

* Re: [PATCH v5 0/3] [v4.9.y] coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
From: Ajay Kaher @ 2019-07-12 13:17 UTC (permalink / raw)
  To: aarcange@redhat.com, jannh@google.com, oleg@redhat.com,
	peterx@redhat.com, rppt@linux.ibm.com, jgg@mellanox.com,
	mhocko@suse.com
  Cc: jglisse@redhat.com, akpm@linux-foundation.org,
	mike.kravetz@oracle.com, viro@zeniv.linux.org.uk,
	riandrews@android.com, arve@android.com, yishaih@mellanox.com,
	dledford@redhat.com, sean.hefty@intel.com,
	hal.rosenstock@gmail.com, matanb@mellanox.com,
	leonro@mellanox.com, gregkh@linuxfoundation.org,
	torvalds@linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, devel@driverdev.osuosl.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org, Srivatsa Bhat, Alexey Makhalov,
	Vasavi Sirnapalli, srivatsa@csail.mit.edu
In-Reply-To: <1562005928-1929-4-git-send-email-akaher@vmware.com>

Greg, I hope you would like to include these patches in next release.
In case any review comment please let me know.

- Ajay


On 01/07/19, 4:03 PM, "Ajay Kaher" <akaher@vmware.com> wrote:

> coredump: fix race condition between mmget_not_zero()/get_task_mm()
> and core dumping
    
> [PATCH v5 1/3]:
> Backporting of commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.
    
> [PATCH v5 2/3]:
> Extension of commit 04f5866e41fb to fix the race condition between
> get_task_mm() and core dumping for IB->mlx4 and IB->mlx5 drivers.
    
> [PATCH v5 3/3]
> Backporting of commit 59ea6d06cfa9247b586a695c21f94afa7183af74 upstream.
    
> [diff from v4]:
> - Corrected Subject line for [PATCH v5 2/3], [PATCH v5 3/3]
    


^ permalink raw reply

* Re:  Re: [PATCH] rdma/siw: avoid smp_store_mb() on a u64
From: Bernard Metzler @ 2019-07-12 13:05 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Arnd Bergmann, Doug Ledford, Peter Zijlstra, linux-rdma,
	linux-kernel
In-Reply-To: <20190712120328.GB27512@ziepe.ca>

-----"Jason Gunthorpe" <jgg@ziepe.ca> wrote: -----

>To: "Bernard Metzler" <BMT@zurich.ibm.com>
>From: "Jason Gunthorpe" <jgg@ziepe.ca>
>Date: 07/12/2019 02:03PM
>Cc: "Arnd Bergmann" <arnd@arndb.de>, "Doug Ledford"
><dledford@redhat.com>, "Peter Zijlstra" <peterz@infradead.org>,
>linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
>Subject: [EXTERNAL] Re: [PATCH] rdma/siw: avoid smp_store_mb() on a
>u64
>
>On Fri, Jul 12, 2019 at 11:33:46AM +0000, Bernard Metzler wrote:
>> >diff --git a/drivers/infiniband/sw/siw/siw_verbs.c
>> >b/drivers/infiniband/sw/siw/siw_verbs.c
>> >index 32dc79d0e898..41c5ab293fe1 100644
>> >+++ b/drivers/infiniband/sw/siw/siw_verbs.c
>> >@@ -1142,10 +1142,11 @@ int siw_req_notify_cq(struct ib_cq
>*base_cq,
>> >enum ib_cq_notify_flags flags)
>> > 
>> > 	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
>> > 		/* CQ event for next solicited completion */
>> >-		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
>> >+		WRITE_ONCE(*cq->notify, SIW_NOTIFY_SOLICITED);
>> > 	else
>> > 		/* CQ event for any signalled completion */
>> >-		smp_store_mb(*cq->notify, SIW_NOTIFY_ALL);
>> >+		WRITE_ONCE(*cq->notify, SIW_NOTIFY_ALL);
>> >+	smp_wmb();
>> > 
>> > 	if (flags & IB_CQ_REPORT_MISSED_EVENTS)
>> > 		return cq->cq_put - cq->cq_get;
>> 
>> 
>> Hi Arnd,
>> Many thanks for pointing that out! Indeed, this CQ notification
>> mechanism does not take 32 bit architectures into account.
>> Since we have only three flags to hold here, it's probably better
>> to make it a 32bit value. That would remove the issue w/o
>> introducing extra smp_wmb(). 
>
>I also prefer not to see smp_wmb() in drivers..
>
>> I'd prefer smp_store_mb(), since on some architectures it shall be
>> more efficient.  That would also make it sufficient to use
>> READ_ONCE.
>
>The READ_ONCE is confusing to me too, if you need store_release
>semantics then the reader also needs to pair with load_acquite -
>otherwise it doesn't work.
>
>Still, we need to do something rapidly to fix the i386 build, please
>revise right away..
>
>Jason
>
>

We share CQ (completion queue) notification flags between application
(which may be user land) and producer (kernel QP's (queue pairs)).
Those flags can be written by both application and QP's. The application
writes those flags to let the driver know if it shall inform about new
work completions. It can write those flags at any time.
Only a kernel producer reads those flags to decide if
the CQ notification handler shall be kicked, if a new CQ element gets
added to the CQ. When kicking the completion handler, the driver resets the
notification flag, which must get re-armed by the application.

We use READ_ONCE() and WRITE_ONCE(), since the flags are potentially
shared (mmap'd) between user and kernel land.

siw_req_notify_cq() is being called only by kernel consumers to change
(write) the CQ notification state. We use smp_store_mb() to make sure
the new value becomes visible to all kernel producers (QP's) asap.


From cfb861a09dcfb24a98ba0f1e26bdaa1529d1b006 Mon Sep 17 00:00:00 2001
From: Bernard Metzler <bmt@zurich.ibm.com>
Date: Fri, 12 Jul 2019 13:19:27 +0200
Subject: [PATCH] Make shared CQ notification flags 32bit to respect 32bit
 architectures

Signed-off-by: Bernard Metzler <bmt@zurich.ibm.com>
---
 drivers/infiniband/sw/siw/siw.h       |  2 +-
 drivers/infiniband/sw/siw/siw_qp.c    | 12 ++++++++----
 drivers/infiniband/sw/siw/siw_verbs.c | 20 +++++++++++++++-----
 include/uapi/rdma/siw-abi.h           |  3 ++-
 4 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw.h b/drivers/infiniband/sw/siw/siw.h
index 409e2987cd45..d59d81f4d86b 100644
--- a/drivers/infiniband/sw/siw/siw.h
+++ b/drivers/infiniband/sw/siw/siw.h
@@ -216,7 +216,7 @@ struct siw_wqe {
 struct siw_cq {
 struct ib_cq base_cq;
 	spinlock_t lock;
-	u64 *notify;
+	struct siw_cq_ctrl *notify;
 	struct siw_cqe *queue;
 	u32 cq_put;
 	u32 cq_get;
diff --git a/drivers/infiniband/sw/siw/siw_qp.c b/drivers/infiniband/sw/siw/siw_qp.c
index 83e50fe8e48b..f4b8d55839a7 100644
--- a/drivers/infiniband/sw/siw/siw_qp.c
+++ b/drivers/infiniband/sw/siw/siw_qp.c
@@ -1011,18 +1011,22 @@ int siw_activate_tx(struct siw_qp *qp)
  */
 static bool siw_cq_notify_now(struct siw_cq *cq, u32 flags)
 {
-	u64 cq_notify;
+	u32 cq_notify;
 
 	if (!cq->base_cq.comp_handler)
 		return false;
 
-	cq_notify = READ_ONCE(*cq->notify);
+	/* Read application shared notification state */
+	cq_notify = READ_ONCE(cq->notify->flags);
 
 	if ((cq_notify & SIW_NOTIFY_NEXT_COMPLETION) ||
 	    ((cq_notify & SIW_NOTIFY_SOLICITED) &&
 	     (flags & SIW_WQE_SOLICITED))) {
-		/* dis-arm CQ */
-		smp_store_mb(*cq->notify, SIW_NOTIFY_NOT);
+		/*
+		 * dis-arm CQ: write (potentially user land)
+		 * application shared notification state.
+		 */
+		WRITE_ONCE(cq->notify->flags, SIW_NOTIFY_NOT);
 
 		return true;
 	}
diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c
index d4fb78780765..cda92e4c7cc9 100644
--- a/drivers/infiniband/sw/siw/siw_verbs.c
+++ b/drivers/infiniband/sw/siw/siw_verbs.c
@@ -1049,7 +1049,7 @@ int siw_create_cq(struct ib_cq *base_cq, const struct ib_cq_init_attr *attr,

 	spin_lock_init(&cq->lock);
 
-	cq->notify = &((struct siw_cq_ctrl *)&cq->queue[size])->notify;
+	cq->notify = (struct siw_cq_ctrl *)&cq->queue[size];

 	if (udata) {
 		struct siw_uresp_create_cq uresp = {};
@@ -1131,6 +1131,10 @@ int siw_poll_cq(struct ib_cq *base_cq, int num_cqe, struct ib_wc *wc)
  *   number of not reaped CQE's regardless of its notification
  *   type and current or new CQ notification settings.
  *
+ * This function gets called only by kernel consumers.
+ * Notification state must immediately become visible to all
+ * associated kernel producers (QP's).
+ *
  * @base_cq:   Base CQ contained in siw CQ.
  * @flags:     Requested notification flags.
  */
@@ -1141,11 +1145,17 @@ int siw_req_notify_cq(struct ib_cq *base_cq, enum ib_cq_notify_flags flags)
 	siw_dbg_cq(cq, "flags: 0x%02x\n", flags);
 
 	if ((flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED)
-		/* CQ event for next solicited completion */
-		smp_store_mb(*cq->notify, SIW_NOTIFY_SOLICITED);
+		/*
+		 * Enable CQ event for next solicited completion.
+		 * and make it visible to all associated producers.
+		 */
+		smp_store_mb(cq->notify->flags, SIW_NOTIFY_SOLICITED);
 	else
-		/* CQ event for any signalled completion */
-		smp_store_mb(*cq->notify, SIW_NOTIFY_ALL);
+		/*
+		 * Enable CQ event for any signalled completion.
+		 * and make it visible to all associated producers.
+		 */
+		smp_store_mb(cq->notify->flags, SIW_NOTIFY_ALL);
 
 	if (flags & IB_CQ_REPORT_MISSED_EVENTS)
 		return cq->cq_put - cq->cq_get;
diff --git a/include/uapi/rdma/siw-abi.h b/include/uapi/rdma/siw-abi.h
index ba4d5315cb76..93298980d3a7 100644
--- a/include/uapi/rdma/siw-abi.h
+++ b/include/uapi/rdma/siw-abi.h
@@ -178,6 +178,7 @@ struct siw_cqe {
  * to control CQ arming.
  */
 struct siw_cq_ctrl {
-	__aligned_u64 notify;
+	__u32 flags;
+	__u32 pad;
 };
 #endif
-- 
2.17.2




^ permalink raw reply related

* RE: regression: nvme rdma with bnxt_re0 broken
From: Parav Pandit @ 2019-07-12 12:52 UTC (permalink / raw)
  To: Yi Zhang, Selvin Xavier
  Cc: Daniel Jurgens, linux-rdma@vger.kernel.org, Devesh Sharma,
	linux-nvme@lists.infradead.org
In-Reply-To: <ef6a01a1-9163-ef4e-29ac-4f4130c682f1@redhat.com>



> -----Original Message-----
> From: Yi Zhang <yi.zhang@redhat.com>
> Sent: Friday, July 12, 2019 5:11 PM
> To: Parav Pandit <parav@mellanox.com>; Selvin Xavier
> <selvin.xavier@broadcom.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>; linux-rdma@vger.kernel.org;
> Devesh Sharma <devesh.sharma@broadcom.com>; linux-
> nvme@lists.infradead.org
> Subject: Re: regression: nvme rdma with bnxt_re0 broken
> 
> Hi Parav
> The nvme connect still failed[1], I've paste all the dmesg log to[2], pls check it.
> 
> 
> [1]
> [root@rdma-perf-07 ~]$ nvme connect -t rdma -a 172.31.40.125 -s 4420 -n
> testnqn
> Failed to write to /dev/nvme-fabrics: Connection reset by peer
> [2]
> https://pastebin.com/ipvak0Sp
> 

I think vlan_id is not initialized to 0xffff due to which ipv4 entry addition also failed with my patch.
This is probably solves it. Not sure. I am not much familiar with it.

Selvin,
Can you please take a look?

From 7b55e1d4500259cf7c02fb4d9fbbeb5ad1cfc623 Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav@mellanox.com>
Date: Fri, 12 Jul 2019 04:34:52 -0500
Subject: [PATCH v1] IB/bnxt_re: Honor vlan_id in GID entry comparison

GID entry consist of GID, vlan, netdev and smac.
Extend GID duplicate check companions to consider vlan_id as well
to support IPv6 VLAN based link local addresses.

GID deletion based on only GID comparision is not correct.
It needs further fixes.

Fixes: 823b23da7113 ("IB/core: Allow vlan link local address based RoCE GIDs")
Change-Id: I2e026ec8065c8425ba24fad8525323d112a2f4e4
Signed-off-by: Parav Pandit <parav@mellanox.com>
---
 drivers/infiniband/hw/bnxt_re/qplib_res.c | 5 +++++
 drivers/infiniband/hw/bnxt_re/qplib_sp.c  | 7 ++++++-
 drivers/infiniband/hw/bnxt_re/qplib_sp.h  | 1 +
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c
index 37928b1111df..216648b640ce 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_res.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c
@@ -488,6 +488,8 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
 				     struct bnxt_qplib_sgid_tbl *sgid_tbl,
 				     u16 max)
 {
+	u16 i;
+
 	sgid_tbl->tbl = kcalloc(max, sizeof(struct bnxt_qplib_gid), GFP_KERNEL);
 	if (!sgid_tbl->tbl)
 		return -ENOMEM;
@@ -500,6 +502,9 @@ static int bnxt_qplib_alloc_sgid_tbl(struct bnxt_qplib_res *res,
 	if (!sgid_tbl->ctx)
 		goto out_free2;
 
+	for (i = 0; i < max; i++)
+		sgid_tbl->tbl[i].vlan_id = 0xffff;
+
 	sgid_tbl->vlan = kcalloc(max, sizeof(u8), GFP_KERNEL);
 	if (!sgid_tbl->vlan)
 		goto out_free3;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.c b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
index 48793d3512ac..0d90be88685f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.c
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.c
@@ -236,6 +236,9 @@ int bnxt_qplib_del_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 		return -ENOMEM;
 	}
 	for (index = 0; index < sgid_tbl->max; index++) {
+		/* FIXME: GID delete should happen based on index
+		 * and refcount
+		 */
 		if (!memcmp(&sgid_tbl->tbl[index], gid, sizeof(*gid)))
 			break;
 	}
@@ -296,7 +299,8 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 	}
 	free_idx = sgid_tbl->max;
 	for (i = 0; i < sgid_tbl->max; i++) {
-		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid))) {
+		if (!memcmp(&sgid_tbl->tbl[i], gid, sizeof(*gid)) &&
+		    sgid_tbl->tbl[i].vlan_id == vlan_id) {
 			dev_dbg(&res->pdev->dev,
 				"SGID entry already exist in entry %d!\n", i);
 			*index = i;
@@ -351,6 +355,7 @@ int bnxt_qplib_add_sgid(struct bnxt_qplib_sgid_tbl *sgid_tbl,
 	}
 	/* Add GID to the sgid_tbl */
 	memcpy(&sgid_tbl->tbl[free_idx], gid, sizeof(*gid));
+	sgid_tbl->tbl[free_idx].vlan_id = vlan_id;
 	sgid_tbl->active++;
 	if (vlan_id != 0xFFFF)
 		sgid_tbl->vlan[free_idx] = 1;
diff --git a/drivers/infiniband/hw/bnxt_re/qplib_sp.h b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
index 0ec3b12b0bcd..7a1957c9dc6f 100644
--- a/drivers/infiniband/hw/bnxt_re/qplib_sp.h
+++ b/drivers/infiniband/hw/bnxt_re/qplib_sp.h
@@ -82,6 +82,7 @@ struct bnxt_qplib_pd {
 
 struct bnxt_qplib_gid {
 	u8				data[16];
+	u16 vlan_id;
 };
 
 struct bnxt_qplib_ah {
-- 
2.19.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox