From: Mark Bloch <mbloch@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Moshe Shemesh <moshe@nvidia.com>,
netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Donald Hunter <donald.hunter@gmail.com>,
Jiri Pirko <jiri@resnulli.us>, Jonathan Corbet <corbet@lwn.net>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Tariq Toukan <tariqt@nvidia.com>
Subject: Re: [RFC net-next 0/5] devlink: Add unique identifier to devlink port function
Date: Wed, 14 May 2025 15:01:40 +0300 [thread overview]
Message-ID: <da372ddc-bc00-4e14-bcd8-4e9c607cc1d8@nvidia.com> (raw)
In-Reply-To: <bee1e240-cc6a-4c30-a2ae-6f7974627053@nvidia.com>
On 08/05/2025 12:04, Mark Bloch wrote:
>
>
> On 08/05/2025 3:43, Jakub Kicinski wrote:
>> On Tue, 6 May 2025 18:34:22 +0300 Mark Bloch wrote:
>>>>> Flow:
>>>>> 1. A user requests a container with networking connectivity.
>>>>> 2. Kubernetes allocates a VF on host X. An agent on the host handles VF
>>>>> configuration and sends the PF number and VF index to the central
>>>>> management software.
>>>>
>>>> What is "central management software" here? Deployment specific or
>>>> some part of k8s?
>>>
>>> It's the k8s API server.
>>>
>>>>
>>>>> 3. An agent on the DPU side detects the changes made on host X. Using
>>>>> the PF number and VF index, it identifies the corresponding
>>>>> representor, attaches it to an OVS bridge, and allows OVN to program
>>>>> the relevant steering rules.
>>>>
>>>> What does it mean that DPU "detects it", what's the source and
>>>> mechanism of the notification?
>>>> Is it communicating with the central SW during the process?
>>>
>>> The agent (running in the ARM/DPU) listens for events from the k8s API server.
>>
>> Interesting. So a deployment with no security boundaries. The internals
>> of the IPU and the k8s on the host are in the same domain of control.
>
> The VF is created on host X, but the corresponding representor appears
> on a different host, the IPU. Naturally, they need to be able to
> synchronize and exchange information for everything to work correctly.
>
>>
>> So how does the user remotely power cycle the hosts?
>
> Why should a user be able to power cycle the hosts?
> Are you are asking about the administrator?
>
>>
>> What I'm getting at is that your mental model seems to be missing any
>> sort of HW inventory database, which lists all the hosts and how they
>> plug into the DC. The administrator of the system must already know
>> where each machine is exactly in the chassis for basic DC ops. And
>> that HW DB is normally queried in what you describe. If there is any
>> security domain crossing in the picture it will require cross checking
>> against that HW DB.
>
> You're assuming that external host numbering and PCI enumeration are
> stable, also users can determine the mapping only after creating
> VFs. But even then, the mapping is indirect e.g: “I created a VF on
> this PF, and I see a single representor appear on the IPU, so they
> must be linked.” That approach is fragile and error prone.
>
> Also, keep in mind: the external hosts and their kernels shouldn’t
> be aware they’re part of a multi-host system. With our current
> approach, you just need to provide a host-to-IPU mapping
> upfront, no guesswork involved.
>
> Just thinking out loud, once this feature is in place, we might
> not even need a static mapping between external hosts and IPU hosts.
>
> If VUID and FUID are globally unique, the following workflow
> becomes possible:
>
> - A user requests a container with network connectivity.
> - k8s allocates and configures a VF on one of the hosts.
> It then sends the VUID, PF number, and VF index for the new VF
> to the k8S API server.
> - Somewhere in the network, a representor appears. An agent detects
> this and notifies the k8s API server, including its FUID,
> PF number, and VF index.
> - The API server matches the VF and representor data based on the
> globally unique identifiers and sends the relevant information
> back to the agent that reported the representor creation.
> - The agent attaches the representor to the OVS bridge, and with
> OVN configures the appropriate steering rules.
>
> This would remove the need for pre defined host to IPU mappings
> and allow for a more dynamic and flexible setup.
>
>>
>> I don't think this is sufficiently well established to warrant new uAPI.
>> You can use a UUID and pass it via ndo_get_phys_port_id.
>
> phys_port_id only applies to netdev interfaces, whereas this use case is
> broader and more aligned with devlink. We believe devlink is a more
> appropriate place for this functionality.
>
> Mark
>
Hi Jakub,
Just checking in, have you had a chance to review my earlier email?
Would appreciate your thoughts or guidance on the right path forward.
Mark
next prev parent reply other threads:[~2025-05-14 12:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-23 13:50 [RFC net-next 0/5] devlink: Add unique identifier to devlink port function Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 1/5] " Moshe Shemesh
2025-04-28 12:33 ` Simon Horman
2025-04-29 9:33 ` Avihai Horon
2025-04-23 13:50 ` [RFC net-next 2/5] net/mlx5: Move mlx5_cmd_query_vuid() from IB to core Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 3/5] net/mlx5: Add vhca_id argument to mlx5_core_query_vuid() Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 4/5] net/mlx5: Add define for max VUID string size Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 5/5] net/mlx5: Expose unique identifier in devlink port function Moshe Shemesh
2025-04-24 23:24 ` [RFC net-next 0/5] devlink: Add unique identifier to " Jakub Kicinski
2025-04-25 11:26 ` Jiri Pirko
2025-04-25 17:51 ` Jakub Kicinski
2025-04-28 16:30 ` Jiri Pirko
2025-04-28 12:11 ` Moshe Shemesh
2025-04-28 18:19 ` Jakub Kicinski
2025-04-29 8:37 ` Moshe Shemesh
2025-05-02 0:39 ` Jakub Kicinski
2025-05-04 17:46 ` Mark Bloch
2025-05-05 18:55 ` Jakub Kicinski
2025-05-06 11:25 ` Mark Bloch
2025-05-06 15:20 ` Jakub Kicinski
2025-05-06 15:34 ` Mark Bloch
2025-05-08 0:43 ` Jakub Kicinski
2025-05-08 9:04 ` Mark Bloch
2025-05-14 12:01 ` Mark Bloch [this message]
2025-05-14 14:52 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=da372ddc-bc00-4e14-bcd8-4e9c607cc1d8@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=moshe@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.