From: Jason Gunthorpe <jgg@nvidia.com>
To: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Saeed Mahameed <saeed@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Leon Romanovsky <leonro@nvidia.com>,
Netdev <netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
David Ahern <dsahern@kernel.org>,
Jacob Keller <jacob.e.keller@intel.com>,
Sridhar Samudrala <sridhar.samudrala@intel.com>,
"Ertman, David M" <david.m.ertman@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
Kiran Patil <kiran.patil@intel.com>,
Greg KH <gregkh@linuxfoundation.org>
Subject: Re: [net-next v4 00/15] Add mlx5 subfunction support
Date: Thu, 17 Dec 2020 20:08:02 -0400 [thread overview]
Message-ID: <20201218000802.GV552508@nvidia.com> (raw)
In-Reply-To: <CAKgT0Ue9+cd-Mp4qgusorDX1mnjfzMXrQvB2FqLaH+ouzVTMRQ@mail.gmail.com>
On Thu, Dec 17, 2020 at 01:05:03PM -0800, Alexander Duyck wrote:
> > I view the SW bypass path you are talking about similarly to
> > GSO/etc. It should be accessed by the HW driver as an optional service
> > provided by the core netdev, not implemented as some wrapper netdev
> > around a HW implementation.
>
> I view it as being something that would be a part of the switchdev API
> itself. Basically the switchev and endpoint would need to be able to
> control something like this because if XDP were enabled on one end or
> the other you would need to be able to switch it off so that all of
> the packets followed the same flow and could be scanned by the XDP
> program.
To me that still all comes down to being something like an optional
offload that the HW driver can trigger if the conditions are met.
> > It is simple enough, the HW driver's tx path would somehow detect
> > east/west and queue it differently, and the rx path would somehow be
> > able to mux in skbs from a SW queue. Not seeing any blockers here.
>
> In my mind the simple proof of concept for this would be to check for
> the multicast bit being set in the destination MAC address for packets
> coming from the subfunction. If it is then shunt to this bypass route,
> and if not then you transmit to the hardware queues.
Sure, not sure multicast optimization like this isn't incredibly niche
too, but it would be an interesting path to explore.
But again, there is nothing fundamental about the model here that
precludes this optional optimization.
> > Even if that is true, I don't belive for a second that adding a
> > different HW abstraction layer is going to somehow undo the mistakes
> > of the last 20 years.
>
> It depends on how it is done. The general idea is to address the
> biggest limitation that has occured, which is the fact that in many
> cases we don't have software offloads to take care of things when the
> hardware offloads provided by a certain piece of hardware are not
> present.
This is really disappointing to hear. Admittedly I don't follow all
the twists and turns on the mailing list, but I thought having a SW
version of everything was one of the fundamental tenants of netdev
that truly distinguished it from something like RDMA.
> It would basically allow us to reset the feature set. If something
> cannot be offloaded in software in a reasonable way, it is not
> allowed to be present in the interface provided to a container.
> That way instead of having to do all the custom configuration in the
> container recipe it can be centralized to one container handling all
> of the switching and hardware configuration.
Well, you could start by blocking stuff without a SW fallback..
> There I disagree. Now I can agree that most of the series is about
> presenting the aux device and that part I am fine with. However when
> the aux device is a netdev and that netdev is being loaded into the
> same kernel as the switchdev port is where the red flags start flying,
> especially when we start talking about how it is the same as a VF.
Well, it happens for the same reason a VF can create a netdev,
stopping it would actually be more patches. As I said before, people
are already doing this model with VFs.
I can agree with some of our points, but this is not the series to
argue them. What you want is to start some new thread on optimizing
switchdev for the container user case.
> In my mind we are talking about how the switchdev will behave and it
> makes sense to see about defining if a east-west bypass makes sense
> and how it could be implemented, rather than saying we won't bother
> for now and potentially locking in the subfunction to virtual function
> equality.
At least for mlx5 SF == VF, that is a consequence of the HW. Any SW
bypass would need to be specially built in the mlx5 netdev running on
a VF/SF attached to a switchdev port.
I don't see anything about this part of the model that precludes ever
doing that, and I also don't see this optimization as being valuable
enough to block things "just to be sure"
> In my mind we need more than just the increased count to justify
> going to subfunctions, and I think being able to solve the east-west
> problem at least in terms of containers would be such a thing.
Increased count is pretty important for users with SRIOV.
Jason
next prev parent reply other threads:[~2020-12-18 0:08 UTC|newest]
Thread overview: 65+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-14 21:43 [net-next v4 00/15] Add mlx5 subfunction support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 01/15] net/mlx5: Fix compilation warning for 32-bit platform Saeed Mahameed
2020-12-14 22:31 ` Alexander Duyck
2020-12-14 22:45 ` Saeed Mahameed
2020-12-15 4:59 ` Leon Romanovsky
2020-12-14 21:43 ` [net-next v4 02/15] devlink: Prepare code to fill multiple port function attributes Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 03/15] devlink: Introduce PCI SF port flavour and port attribute Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 04/15] devlink: Support add and delete devlink port Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 05/15] devlink: Support get and set state of port function Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 06/15] net/mlx5: Introduce vhca state event notifier Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 07/15] net/mlx5: SF, Add auxiliary device support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 08/15] net/mlx5: SF, Add auxiliary device driver Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 09/15] net/mlx5: E-switch, Prepare eswitch to handle SF vport Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 10/15] net/mlx5: E-switch, Add eswitch helpers for " Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 11/15] net/mlx5: SF, Add port add delete functionality Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 12/15] net/mlx5: SF, Port function state change support Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 13/15] devlink: Add devlink port documentation Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 14/15] devlink: Extend devlink port documentation for subfunctions Saeed Mahameed
2020-12-14 21:43 ` [net-next v4 15/15] net/mlx5: Add devlink subfunction port documentation Saeed Mahameed
2020-12-15 1:53 ` [net-next v4 00/15] Add mlx5 subfunction support Alexander Duyck
2020-12-15 2:44 ` David Ahern
2020-12-15 16:16 ` Alexander Duyck
2020-12-15 16:59 ` Parav Pandit
2020-12-15 5:48 ` Parav Pandit
2020-12-15 18:47 ` Alexander Duyck
2020-12-15 20:05 ` Saeed Mahameed
2020-12-15 21:03 ` Jason Gunthorpe
2020-12-16 1:12 ` Edwin Peer
2020-12-16 2:39 ` Jason Gunthorpe
2020-12-16 3:12 ` Alexander Duyck
2020-12-15 20:59 ` David Ahern
2020-12-15 6:15 ` Saeed Mahameed
2020-12-15 19:12 ` Alexander Duyck
2020-12-15 20:35 ` Saeed Mahameed
2020-12-15 21:28 ` Jakub Kicinski
2020-12-16 6:50 ` Leon Romanovsky
2020-12-16 17:59 ` Saeed Mahameed
2020-12-15 21:41 ` Alexander Duyck
2020-12-16 0:19 ` Jason Gunthorpe
2020-12-16 2:19 ` Alexander Duyck
2020-12-16 3:03 ` Jason Gunthorpe
2020-12-16 4:13 ` Alexander Duyck
2020-12-16 4:45 ` Parav Pandit
2020-12-16 13:33 ` Jason Gunthorpe
2020-12-16 16:31 ` Alexander Duyck
2020-12-16 17:51 ` Jason Gunthorpe
2020-12-16 19:27 ` Alexander Duyck
2020-12-16 20:35 ` Jason Gunthorpe
2020-12-16 22:53 ` Alexander Duyck
2020-12-17 0:38 ` Jason Gunthorpe
2020-12-17 18:48 ` Alexander Duyck
2020-12-17 19:40 ` Jason Gunthorpe
2020-12-17 21:05 ` Alexander Duyck
2020-12-18 0:08 ` Jason Gunthorpe [this message]
2020-12-18 1:30 ` David Ahern
2020-12-18 3:11 ` Alexander Duyck
2020-12-18 3:55 ` David Ahern
2020-12-18 15:54 ` Alexander Duyck
2020-12-18 5:20 ` Parav Pandit
2020-12-18 5:36 ` Parav Pandit
2020-12-18 16:01 ` Alexander Duyck
2020-12-18 18:01 ` Parav Pandit
2020-12-18 19:22 ` Alexander Duyck
2020-12-18 20:18 ` Jason Gunthorpe
2020-12-19 0:03 ` Alexander Duyck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201218000802.GV552508@nvidia.com \
--to=jgg@nvidia.com \
--cc=alexander.duyck@gmail.com \
--cc=dan.j.williams@intel.com \
--cc=davem@davemloft.net \
--cc=david.m.ertman@intel.com \
--cc=dsahern@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=jacob.e.keller@intel.com \
--cc=kiran.patil@intel.com \
--cc=kuba@kernel.org \
--cc=leonro@nvidia.com \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=saeed@kernel.org \
--cc=sridhar.samudrala@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).