From: Mark Bloch <mbloch@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Moshe Shemesh <moshe@nvidia.com>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Donald Hunter <donald.hunter@gmail.com>,
Jiri Pirko <jiri@resnulli.us>, Jonathan Corbet <corbet@lwn.net>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Tariq Toukan <tariqt@nvidia.com>
Subject: Re: [RFC net-next 0/5] devlink: Add unique identifier to devlink port function
Date: Wed, 14 May 2025 15:01:40 +0300 [thread overview]
Message-ID: <da372ddc-bc00-4e14-bcd8-4e9c607cc1d8@nvidia.com> (raw)
In-Reply-To: <bee1e240-cc6a-4c30-a2ae-6f7974627053@nvidia.com>
On 08/05/2025 12:04, Mark Bloch wrote:
>
>
> On 08/05/2025 3:43, Jakub Kicinski wrote:
>> On Tue, 6 May 2025 18:34:22 +0300 Mark Bloch wrote:
>>>>> Flow:
>>>>> 1. A user requests a container with networking connectivity.
>>>>> 2. Kubernetes allocates a VF on host X. An agent on the host handles VF
>>>>> configuration and sends the PF number and VF index to the central
>>>>> management software.
>>>>
>>>> What is "central management software" here? Deployment specific or
>>>> some part of k8s?
>>>
>>> It's the k8s API server.
>>>
>>>>
>>>>> 3. An agent on the DPU side detects the changes made on host X. Using
>>>>> the PF number and VF index, it identifies the corresponding
>>>>> representor, attaches it to an OVS bridge, and allows OVN to program
>>>>> the relevant steering rules.
>>>>
>>>> What does it mean that DPU "detects it", what's the source and
>>>> mechanism of the notification?
>>>> Is it communicating with the central SW during the process?
>>>
>>> The agent (running in the ARM/DPU) listens for events from the k8s API server.
>>
>> Interesting. So a deployment with no security boundaries. The internals
>> of the IPU and the k8s on the host are in the same domain of control.
>
> The VF is created on host X, but the corresponding representor appears
> on a different host, the IPU. Naturally, they need to be able to
> synchronize and exchange information for everything to work correctly.
>
>>
>> So how does the user remotely power cycle the hosts?
>
> Why should a user be able to power cycle the hosts?
> Are you are asking about the administrator?
>
>>
>> What I'm getting at is that your mental model seems to be missing any
>> sort of HW inventory database, which lists all the hosts and how they
>> plug into the DC. The administrator of the system must already know
>> where each machine is exactly in the chassis for basic DC ops. And
>> that HW DB is normally queried in what you describe. If there is any
>> security domain crossing in the picture it will require cross checking
>> against that HW DB.
>
> You're assuming that external host numbering and PCI enumeration are
> stable, also users can determine the mapping only after creating
> VFs. But even then, the mapping is indirect e.g: “I created a VF on
> this PF, and I see a single representor appear on the IPU, so they
> must be linked.” That approach is fragile and error prone.
>
> Also, keep in mind: the external hosts and their kernels shouldn’t
> be aware they’re part of a multi-host system. With our current
> approach, you just need to provide a host-to-IPU mapping
> upfront, no guesswork involved.
>
> Just thinking out loud, once this feature is in place, we might
> not even need a static mapping between external hosts and IPU hosts.
>
> If VUID and FUID are globally unique, the following workflow
> becomes possible:
>
> - A user requests a container with network connectivity.
> - k8s allocates and configures a VF on one of the hosts.
> It then sends the VUID, PF number, and VF index for the new VF
> to the k8S API server.
> - Somewhere in the network, a representor appears. An agent detects
> this and notifies the k8s API server, including its FUID,
> PF number, and VF index.
> - The API server matches the VF and representor data based on the
> globally unique identifiers and sends the relevant information
> back to the agent that reported the representor creation.
> - The agent attaches the representor to the OVS bridge, and with
> OVN configures the appropriate steering rules.
>
> This would remove the need for pre defined host to IPU mappings
> and allow for a more dynamic and flexible setup.
>
>>
>> I don't think this is sufficiently well established to warrant new uAPI.
>> You can use a UUID and pass it via ndo_get_phys_port_id.
>
> phys_port_id only applies to netdev interfaces, whereas this use case is
> broader and more aligned with devlink. We believe devlink is a more
> appropriate place for this functionality.
>
> Mark
>
Hi Jakub,
Just checking in, have you had a chance to review my earlier email?
Would appreciate your thoughts or guidance on the right path forward.
Mark
next prev parent reply other threads:[~2025-05-14 12:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-23 13:50 [RFC net-next 0/5] devlink: Add unique identifier to devlink port function Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 1/5] " Moshe Shemesh
2025-04-28 12:33 ` Simon Horman
2025-04-29 9:33 ` Avihai Horon
2025-04-23 13:50 ` [RFC net-next 2/5] net/mlx5: Move mlx5_cmd_query_vuid() from IB to core Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 3/5] net/mlx5: Add vhca_id argument to mlx5_core_query_vuid() Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 4/5] net/mlx5: Add define for max VUID string size Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 5/5] net/mlx5: Expose unique identifier in devlink port function Moshe Shemesh
2025-04-24 23:24 ` [RFC net-next 0/5] devlink: Add unique identifier to " Jakub Kicinski
2025-04-25 11:26 ` Jiri Pirko
2025-04-25 17:51 ` Jakub Kicinski
2025-04-28 16:30 ` Jiri Pirko
2025-04-28 12:11 ` Moshe Shemesh
2025-04-28 18:19 ` Jakub Kicinski
2025-04-29 8:37 ` Moshe Shemesh
2025-05-02 0:39 ` Jakub Kicinski
2025-05-04 17:46 ` Mark Bloch
2025-05-05 18:55 ` Jakub Kicinski
2025-05-06 11:25 ` Mark Bloch
2025-05-06 15:20 ` Jakub Kicinski
2025-05-06 15:34 ` Mark Bloch
2025-05-08 0:43 ` Jakub Kicinski
2025-05-08 9:04 ` Mark Bloch
2025-05-14 12:01 ` Mark Bloch [this message]
2025-05-14 14:52 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da372ddc-bc00-4e14-bcd8-4e9c607cc1d8@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).