public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-next 0/3] RDMA/erdma: Support flushing all WRs after QP state changed to ERROR
@ 2022-11-16  2:31 Cheng Xu
  2022-11-16  2:31 ` [PATCH for-next 1/3] RDMA/erdma: Add a workqueue for WRs reflushing Cheng Xu
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Cheng Xu @ 2022-11-16  2:31 UTC (permalink / raw)
  To: jgg, leon; +Cc: linux-rdma, KaiShen

Hi,

This series introduces the support of flushing all WRs posted to hardware
after QP state changed to ERROR.

Old Firmware may not flush the newly posted WRs after QP state chagned to
ERROR, because it's a little difficult for firmware to get the realtime
PI (producer index) of QPs, especially for the RQs.

Previously we want to avoid this issue by implementing custom
drain_{sq/rq} [1], but this has falw, as Tom and Jason pointed out, which
we also meet in some scenarios, for example, NoF fatal recovery.

So, we introduce a new mechanism to fix this. When registering the ibdev,
we create a workqueue for reflushing (we name it "reflush", because
hardware is already start flushing for the QPs at that time, and it's used
for hardware to flush newly posted WRs). Once QP needs to flush WRs, or
new WRs posted after flushing, we post a delay work to the workqueue or
modify the delay time if is already posted. In the work, driver notifies
the lastest PIs to firmware by CMDQ, so that firmware can flush all the
newly posted WRs. This applies to kernel QP first.

- #1 adds a workqueue for WRs reflushing.
- #2 adds a reflushing work for each QP.
- #4 notifies the lastest PIs to firmware for reflushing.

[1] https://lore.kernel.org/all/20220824094251.23190-3-chengyou@linux.alibaba.com/t/

Thanks,
Cheng Xu

Cheng Xu (3):
  RDMA/erdma: Add a workqueue for WRs reflushing
  RDMA/erdma: Implement the lifecycle of reflushing work for each QP
  RDMA/erdma: Notify the latest PI to FW for reflushing when necessary

 drivers/infiniband/hw/erdma/erdma.h       |  1 +
 drivers/infiniband/hw/erdma/erdma_hw.h    |  8 ++++++
 drivers/infiniband/hw/erdma/erdma_main.c  | 14 +++++++++--
 drivers/infiniband/hw/erdma/erdma_qp.c    | 30 ++++++++++++++++-------
 drivers/infiniband/hw/erdma/erdma_verbs.c | 18 ++++++++++++++
 drivers/infiniband/hw/erdma/erdma_verbs.h |  7 ++++++
 6 files changed, 67 insertions(+), 11 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-11-24 19:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-11-16  2:31 [PATCH for-next 0/3] RDMA/erdma: Support flushing all WRs after QP state changed to ERROR Cheng Xu
2022-11-16  2:31 ` [PATCH for-next 1/3] RDMA/erdma: Add a workqueue for WRs reflushing Cheng Xu
2022-11-16  2:31 ` [PATCH for-next 2/3] RDMA/erdma: Implement the lifecycle of reflushing work for each QP Cheng Xu
2022-11-16  2:31 ` [PATCH for-next 3/3] RDMA/erdma: Notify the latest PI to FW for reflushing when necessary Cheng Xu
2022-11-24 19:00 ` [PATCH for-next 0/3] RDMA/erdma: Support flushing all WRs after QP state changed to ERROR Jason Gunthorpe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox