Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace

linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Saeed Mahameed <saeedm@nvidia.com>, Eli Cohen <elic@nvidia.com>,
	Leon Romanovsky <leonro@mellanox.com>, <netdev@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, Eli Cohen <eli@mellanox.com>,
	Mark Bloch <mbloch@nvidia.com>, Maor Gottlieb <maorg@nvidia.com>
Subject: Re: [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace
Date: Tue, 24 Nov 2020 15:44:13 -0400	[thread overview]
Message-ID: <20201124194413.GF4800@nvidia.com> (raw)
In-Reply-To: <20201124104106.0b1201b2@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Tue, Nov 24, 2020 at 10:41:06AM -0800, Jakub Kicinski wrote:
> On Tue, 24 Nov 2020 14:02:10 -0400 Jason Gunthorpe wrote:
> > On Tue, Nov 24, 2020 at 09:12:19AM -0800, Jakub Kicinski wrote:
> > > On Sun, 22 Nov 2020 08:41:58 +0200 Eli Cohen wrote:  
> > > > On Sat, Nov 21, 2020 at 04:01:55PM -0800, Jakub Kicinski wrote:  
> > > > > On Fri, 20 Nov 2020 15:03:34 -0800 Saeed Mahameed wrote:    
> > > > > > From: Eli Cohen <eli@mellanox.com>
> > > > > > 
> > > > > > Add a new namespace type to the NIC RX root namespace to allow for
> > > > > > inserting VDPA rules before regular NIC but after bypass, thus allowing
> > > > > > DPDK to have precedence in packet processing.    
> > > > > 
> > > > > How does DPDK and VDPA relate in this context?    
> > > > 
> > > > mlx5 steering is hierarchical and defines precedence amongst namespaces.
> > > > Up till now, the VDPA implementation would insert a rule into the
> > > > MLX5_FLOW_NAMESPACE_BYPASS hierarchy which is used by DPDK thus taking
> > > > all the incoming traffic.
> > > > 
> > > > The MLX5_FLOW_NAMESPACE_VDPA hirerachy comes after
> > > > MLX5_FLOW_NAMESPACE_BYPASS.  
> > > 
> > > Our policy was no DPDK driver bifurcation. There's no asterisk saying
> > > "unless you pretend you need flow filters for RDMA, get them upstream
> > > and then drop the act".  
> > 
> > Huh?
> > 
> > mlx5 DPDK is an *RDMA* userspace application. 
> 
> Forgive me for my naiveté. 
> 
> Here I thought the RDMA subsystem is for doing RDMA.

RDMA covers a wide range of accelerated networking these days.. Where
else are you going to put this stuff in the kernel?

> I'm sure if you start doing crypto over ibverbs crypto people will want
> to have a look.

Well, RDMA has crypto transforms for a few years now too. Why would
crypto subsystem people be involved? It isn't using or duplicating
their APIs.

> > libibverbs. It runs on the RDMA stack. It uses RDMA flow filtering and
> > RDMA raw ethernet QPs. 
> 
> I'm not saying that's not the case. I'm saying I don't think this was
> something that netdev developers signed-off on.

Part of the point of the subsystem split was to end the fighting that
started all of it. It was very clear during the whole iWarp and TCP
Offload Engine buisness in the mid 2000's that netdev wanted nothing
to do with the accelerator world.

So why would netdev need sign off on any accelerator stuff?  Do you
want to start co-operating now? I'm willing to talk about how to do
that.

> And our policy on DPDK is pretty widely known.

I honestly have no idea on the netdev DPDK policy, I'm maintaining the
RDMA subsystem not DPDK :)

> Would you mind pointing us to the introduction of raw Ethernet QPs?
> 
> Is there any production use for that without DPDK?

Hmm.. It is very old. RAW (InfiniBand) QPs were part of the original
IBA specification cira 2000. When RoCE was defined (around 2010) they
were naturally carried forward to Ethernet. The "flow steering"
concept to make raw ethernet QP useful was added to verbs around 2012
- 2013. It officially made it upstream in commit 436f2ad05a0b
("IB/core: Export ib_create/destroy_flow through uverbs")

If I recall properly the first real application was ultra low latency
ethernet processing for financial applications.

dpdk later adopted the first mlx4 PMD using this libibverbs API around
2015. Interestingly the mlx4 PMD was made through an open source
process with minimal involvment from Mellanox, based on the
pre-existing RDMA work.

Currently there are many projects, and many open source, built on top
of the RDMA raw ethernet QP and RDMA flow steering model. It is now
long established kernel ABI.

> > It has been like this for years, it is not some "act".
> > 
> > It is long standing uABI that accelerators like RDMA/etc get to take
> > the traffic before netdev. This cannot be reverted. I don't really
> > understand what you are expecting here?
> 
> Same. I don't really know what you expect me to do either. I don't
> think I can sign-off on kernel changes needed for DPDK.

This patch is fine tuning the shared logic that splits the traffic to
accelerator subsystems, I don't think netdev should have a veto
here. This needs to be consensus among the various communities and
subsystems that rely on this.

Eli did not explain this well in his commit message. When he said DPDK
he means RDMA which is the owner of the FLOW_NAMESPACE. Each
accelerator subsystem gets hooked into this, so here VPDA is getting
its own hook because re-using the the same hook between two kernel
subsystems is buggy.

Jason

next prev parent reply	other threads:[~2020-11-24 19:47 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 23:03 [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 01/16] net/mlx5: Add sample offload hardware bits and structures Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 02/16] net/mlx5: Add sampler destination type Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 03/16] net/mlx5: Check dr mask size against mlx5_match_param size Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 04/16] net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 05/16] net/mlx5: Add ts_cqe_to_dest_cqn related bits Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 06/16] net/mlx5: Avoid exposing driver internal command helpers Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 07/16] net/mlx5: Update the list of the PCI supported devices Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 08/16] net/mlx5: Update the hardware interface definition for vhca state Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 09/16] net/mlx5: Expose IP-in-IP TX and RX capability bits Saeed Mahameed
2020-11-21 23:58   ` Jakub Kicinski
2020-11-22 15:17     ` Aya Levin
2020-11-23 21:15       ` Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 10/16] net/mlx5: Expose other function ifc bits Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 11/16] net/mlx5: Add VDPA priority to NIC RX namespace Saeed Mahameed
2020-11-22  0:01   ` Jakub Kicinski
2020-11-22  6:41     ` Eli Cohen
2020-11-24 17:12       ` Jakub Kicinski
2020-11-24 18:02         ` Jason Gunthorpe
2020-11-24 18:41           ` Jakub Kicinski
2020-11-24 19:44             ` Jason Gunthorpe [this message]
2020-11-25  6:19               ` Eli Cohen
2020-11-25 19:04                 ` Saeed Mahameed
2020-11-25 18:54               ` Jakub Kicinski
2020-11-25 19:28                 ` Saeed Mahameed
2020-11-25 21:22                 ` Jason Gunthorpe
2020-11-20 23:03 ` [PATCH mlx5-next 12/16] net/mlx5: Export steering related functions Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 13/16] net/mlx5: Make API mlx5_core_is_ecpf accept const pointer Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 14/16] net/mlx5: Rename peer_pf to host_pf Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 15/16] net/mlx5: Enable host PF HCA after eswitch is initialized Saeed Mahameed
2020-11-20 23:03 ` [PATCH mlx5-next 16/16] net/mlx5: Treat host PF vport as other (non eswitch manager) vport Saeed Mahameed
2020-11-30 18:42 ` [PATCH mlx5-next 00/16] mlx5 next updates 2020-11-20 Saeed Mahameed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124194413.GF4800@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=eli@mellanox.com \
    --cc=elic@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=leonro@mellanox.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@nvidia.com \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=saeedm@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).