netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Omer Shpigelman <oshpigelman@habana.ai>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"ogabbay@kernel.org" <ogabbay@kernel.org>,
	Zvika Yehudai <zyehudai@habana.ai>
Subject: Re: [PATCH 04/15] net: hbl_cn: QP state machine
Date: Tue, 18 Jun 2024 10:08:43 +0300	[thread overview]
Message-ID: <20240618070843.GD4025@unreal> (raw)
In-Reply-To: <a43d2eaf-e295-4ed4-b66a-3f2e96ea088c@habana.ai>

On Tue, Jun 18, 2024 at 05:50:15AM +0000, Omer Shpigelman wrote:
> On 6/17/24 16:18, Leon Romanovsky wrote:
> > [Some people who received this message don't often get email from leon@kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> > 
> > On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
> >> Add a common QP state machine which handles the moving for a QP from one
> >> state to another including performing necessary checks, draining
> >> in-flight transactions, invalidating caches and error reporting.
> >>
> >> Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
> >> Co-developed-by: Abhilash K V <kvabhilash@habana.ai>
> >> Signed-off-by: Abhilash K V <kvabhilash@habana.ai>
> >> Co-developed-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Signed-off-by: Andrey Agranovich <aagranovich@habana.ai>
> >> Co-developed-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Signed-off-by: Bharat Jauhari <bjauhari@habana.ai>
> >> Co-developed-by: David Meriin <dmeriin@habana.ai>
> >> Signed-off-by: David Meriin <dmeriin@habana.ai>
> >> Co-developed-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
> >> Co-developed-by: Zvika Yehudai <zyehudai@habana.ai>
> >> Signed-off-by: Zvika Yehudai <zyehudai@habana.ai>
> >> ---
> >>  .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c  | 480 +++++++++++++++++-
> >>  1 file changed, 479 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >> index 9ddc23bf8194..26ebdf448193 100644
> >> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
> >> @@ -6,8 +6,486 @@
> > 
> > <...>
> > 
> >> +/* The following table represents the (valid) operations that can be performed on
> >> + * a QP in order to move it from one state to another
> >> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
> >> + * operation.
> >> + */
> >> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
> >> +     [CN_QP_STATE_RESET] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_INIT]      = CN_QP_OP_RST_2INIT,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
> >> +             [CN_QP_STATE_QPD]       = CN_QP_OP_NOP,
> >> +     },
> >> +     [CN_QP_STATE_INIT] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +             [CN_QP_STATE_INIT]      = CN_QP_OP_NOP,
> >> +             [CN_QP_STATE_RTR]       = CN_QP_OP_INIT_2RTR,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
> >> +             [CN_QP_STATE_QPD]       = CN_QP_OP_NOP,
> >> +     },
> >> +     [CN_QP_STATE_RTR] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +             [CN_QP_STATE_RTR]       = CN_QP_OP_RTR_2RTR,
> >> +             [CN_QP_STATE_RTS]       = CN_QP_OP_RTR_2RTS,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
> >> +             [CN_QP_STATE_QPD]       = CN_QP_OP_RTR_2QPD,
> >> +     },
> >> +     [CN_QP_STATE_RTS] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +             [CN_QP_STATE_RTS]       = CN_QP_OP_RTS_2RTS,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_RTS_2SQD,
> >> +             [CN_QP_STATE_QPD]       = CN_QP_OP_RTS_2QPD,
> >> +             [CN_QP_STATE_SQERR]     = CN_QP_OP_RTS_2SQERR,
> >> +     },
> >> +     [CN_QP_STATE_SQD] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_SQD_2SQD,
> >> +             [CN_QP_STATE_RTS]       = CN_QP_OP_SQD_2RTS,
> >> +             [CN_QP_STATE_QPD]       = CN_QP_OP_SQD_2QPD,
> >> +             [CN_QP_STATE_SQERR]     = CN_QP_OP_SQD_2SQ_ERR,
> >> +     },
> >> +     [CN_QP_STATE_QPD] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_NOP,
> >> +             [CN_QP_STATE_QPD]       = CN_QP_OP_NOP,
> >> +             [CN_QP_STATE_RTR]       = CN_QP_OP_QPD_2RTR,
> >> +     },
> >> +     [CN_QP_STATE_SQERR] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +             [CN_QP_STATE_SQD]       = CN_QP_OP_SQ_ERR_2SQD,
> >> +             [CN_QP_STATE_SQERR]     = CN_QP_OP_NOP,
> >> +     },
> >> +     [CN_QP_STATE_ERR] = {
> >> +             [CN_QP_STATE_RESET]     = CN_QP_OP_2RESET,
> >> +             [CN_QP_STATE_ERR]       = CN_QP_OP_2ERR,
> >> +     }
> >> +};
> > 
> > I don't understand why IBTA QP state machine is declared in ETH driver
> > and not in IB driver.
> > 
> 
> Implementing the actual transitions between the states requires full
> knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
> Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
> generations are planned to be added and the IB driver should not be aware
> of these additional HWs.
> Hence we implemeted the QP state machine in the CN driver which is aware
> of the actual HW.

Somehow ALL other IB drivers are able to implement this logic in the IB,
while supporting multiple ASICs. I don't see a reason why you can't do
the same.

> 
> >> +
> > 
> > <...>
> > 
> >> +             /* Release lock while we wait before retry.
> >> +              * Note, we can assert that we are already locked.
> >> +              */
> >> +             port_funcs->cfg_unlock(cn_port);
> >> +
> >> +             msleep(20);
> >> +
> >> +             port_funcs->cfg_lock(cn_port);
> > 
> > lock/unlock through ops pointer doesn't look like a good idea.
> > 
> 
> More ASIC generations will be added once we merge the current Gaudi2 code.
> On other ASICs the locking granularity is different because some of the HW
> resources are shared between different logical ports.
> Hence it is was logical for us to implement it with a function pointer so
> each ASIC specific code can implemnet the locking properly.

We are reviewing this code which is for the current ASIC, not for the
unknown future ASICs. Please don't over engineer the first submission.
You will always be able to improve/change the code once you decide to
upstream your future ASICs.

Thanks

  reply	other threads:[~2024-06-18  7:08 UTC|newest]

Thread overview: 108+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13  8:21 [PATCH 00/15] Introduce HabanaLabs network drivers Omer Shpigelman
2024-06-13  8:21 ` [PATCH 01/15] net: hbl_cn: add habanalabs Core Network driver Omer Shpigelman
2024-06-13 13:01   ` Przemek Kitszel
2024-06-13 14:16     ` Przemek Kitszel
2024-06-17  8:08     ` Omer Shpigelman
2024-06-17 11:48       ` Leon Romanovsky
2024-06-18  7:28         ` Omer Shpigelman
2024-06-15  0:05   ` Stephen Hemminger
2024-06-17  8:14     ` Omer Shpigelman
2024-06-17 14:05   ` Markus Elfring
2024-06-17 15:02     ` Andrew Lunn
2024-06-18  7:51       ` Omer Shpigelman
2024-06-13  8:21 ` [PATCH 02/15] net: hbl_cn: memory manager component Omer Shpigelman
2024-06-13  8:21 ` [PATCH 03/15] net: hbl_cn: physical layer support Omer Shpigelman
2024-06-13  8:21 ` [PATCH 04/15] net: hbl_cn: QP state machine Omer Shpigelman
2024-06-17 13:18   ` Leon Romanovsky
2024-06-18  5:50     ` Omer Shpigelman
2024-06-18  7:08       ` Leon Romanovsky [this message]
2024-06-18  7:58         ` Omer Shpigelman
2024-06-18  9:00           ` Leon Romanovsky
2024-06-24  7:24             ` Omer Shpigelman
2024-06-13  8:21 ` [PATCH 05/15] net: hbl_cn: memory trace events Omer Shpigelman
2024-06-13  8:21 ` [PATCH 06/15] net: hbl_cn: debugfs support Omer Shpigelman
2024-06-19 18:35   ` Sunil Kovvuri Goutham
2024-06-21 10:17     ` Omer Shpigelman
2024-06-21 10:30       ` Sunil Kovvuri Goutham
2024-06-23  7:25         ` Omer Shpigelman
2024-06-21 15:33       ` Andrew Lunn
2024-06-23  6:57         ` Omer Shpigelman
2024-06-23 15:02           ` Andrew Lunn
2024-06-24  7:21             ` Omer Shpigelman
2024-06-24  9:22             ` Leon Romanovsky
2024-12-17 10:00             ` Avri Kehat
2024-12-17 10:52               ` Andrew Lunn
2024-06-13  8:22 ` [PATCH 07/15] net: hbl_cn: gaudi2: ASIC register header files Omer Shpigelman
2024-06-13  8:22 ` [PATCH 08/15] net: hbl_cn: gaudi2: ASIC specific support Omer Shpigelman
2024-06-13  8:22 ` [PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver Omer Shpigelman
2024-06-13 21:49   ` Andrew Lunn
2024-06-18  6:58     ` Omer Shpigelman
2024-06-18 14:19       ` Andrew Lunn
2024-06-19  7:16         ` Omer Shpigelman
2024-06-19  8:01           ` Przemek Kitszel
2024-06-19 12:15             ` Omer Shpigelman
2024-06-19 15:21           ` Jakub Kicinski
2024-06-20  8:43             ` Omer Shpigelman
2024-06-20 13:51               ` Jakub Kicinski
2024-06-20 19:14                 ` Andrew Lunn
2024-06-23 14:48                   ` Omer Shpigelman
2024-06-19 16:13           ` Andrew Lunn
2024-06-23  6:22             ` Omer Shpigelman
2024-06-23 14:46               ` Andrew Lunn
2024-06-26 10:13                 ` Omer Shpigelman
2024-06-26 14:13                   ` Andrew Lunn
2024-06-30  7:11                     ` Omer Shpigelman
2024-06-14 22:48   ` Joe Damato
2024-06-16  1:04     ` Andrew Lunn
2024-06-18 19:37     ` Omer Shpigelman
2024-06-18 21:19       ` Stephen Hemminger
2024-06-19 12:13         ` Omer Shpigelman
2024-06-15  0:10   ` Stephen Hemminger
2024-06-19 12:07     ` Omer Shpigelman
2024-06-15  0:16   ` Stephen Hemminger
2024-06-18 19:39     ` Omer Shpigelman
2024-06-19 15:40       ` Andrew Lunn
2024-06-20  8:36         ` Omer Shpigelman
2024-06-15 10:55   ` Zhu Yanjun
2024-06-18 11:16     ` Omer Shpigelman
2024-06-15 17:13   ` Zhu Yanjun
2024-06-16  1:08     ` Andrew Lunn
2024-06-13  8:22 ` [PATCH 10/15] net: hbl_en: gaudi2: ASIC specific support Omer Shpigelman
2024-06-13  8:22 ` [PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver Omer Shpigelman
2024-06-13 19:18   ` Leon Romanovsky
2024-06-17 17:43     ` Omer Shpigelman
2024-06-17 19:04       ` Leon Romanovsky
2024-06-18 11:08         ` Omer Shpigelman
2024-06-18 12:58           ` Leon Romanovsky
2024-06-19  9:27             ` Omer Shpigelman
2024-06-19 10:52               ` Leon Romanovsky
2024-06-24  8:47                 ` Omer Shpigelman
2024-06-24  9:10                   ` Leon Romanovsky
2024-06-28 10:24                 ` Omer Shpigelman
2024-06-30 13:29                   ` Leon Romanovsky
2024-07-01 10:46                     ` Omer Shpigelman
2024-07-01 12:46                       ` Leon Romanovsky
2024-07-12 13:08                   ` Jason Gunthorpe
2024-07-14 10:18                     ` Omer Shpigelman
2024-07-16 13:40                       ` Jason Gunthorpe
2024-07-17  7:08                         ` Omer Shpigelman
2024-07-17  7:36                           ` Leon Romanovsky
2024-07-17 10:51                             ` Omer Shpigelman
2024-07-17 11:56                               ` Jason Gunthorpe
2024-07-17 12:33                               ` Leon Romanovsky
2024-07-18  6:54                                 ` Omer Shpigelman
2024-07-18  8:31                                   ` Leon Romanovsky
2024-06-18 16:01           ` Przemek Kitszel
2024-06-19  9:34             ` Omer Shpigelman
2024-06-17 14:17   ` Jason Gunthorpe
2024-06-19  9:39     ` Omer Shpigelman
2024-06-13  8:22 ` [PATCH 12/15] RDMA/hbl: direct verbs support Omer Shpigelman
2024-06-13  8:22 ` [PATCH 13/15] accel/habanalabs: network scaling support Omer Shpigelman
2024-06-19 18:41   ` Sunil Kovvuri Goutham
2024-06-21 10:21     ` Omer Shpigelman
2024-06-13  8:22 ` [PATCH 14/15] accel/habanalabs/gaudi2: CN registers header files Omer Shpigelman
2024-06-13  8:22 ` [PATCH 15/15] accel/habanalabs/gaudi2: network scaling support Omer Shpigelman
2024-06-17 12:34 ` [PATCH 00/15] Introduce HabanaLabs network drivers Alexander Lobakin
2024-06-19 11:40   ` Omer Shpigelman
2024-06-19 16:33 ` Jiri Pirko
2024-06-20  5:37   ` Omer Shpigelman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240618070843.GD4025@unreal \
    --to=leon@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=ogabbay@kernel.org \
    --cc=oshpigelman@habana.ai \
    --cc=zyehudai@habana.ai \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).