netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mark Bloch <mbloch@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Moshe Shemesh <moshe@nvidia.com>,
	netdev@vger.kernel.org, "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	Donald Hunter <donald.hunter@gmail.com>,
	Jiri Pirko <jiri@resnulli.us>, Jonathan Corbet <corbet@lwn.net>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Tariq Toukan <tariqt@nvidia.com>
Subject: Re: [RFC net-next 0/5] devlink: Add unique identifier to devlink port function
Date: Wed, 14 May 2025 15:01:40 +0300	[thread overview]
Message-ID: <da372ddc-bc00-4e14-bcd8-4e9c607cc1d8@nvidia.com> (raw)
In-Reply-To: <bee1e240-cc6a-4c30-a2ae-6f7974627053@nvidia.com>



On 08/05/2025 12:04, Mark Bloch wrote:
> 
> 
> On 08/05/2025 3:43, Jakub Kicinski wrote:
>> On Tue, 6 May 2025 18:34:22 +0300 Mark Bloch wrote:
>>>>> Flow:
>>>>> 1. A user requests a container with networking connectivity.
>>>>> 2. Kubernetes allocates a VF on host X. An agent on the host handles VF
>>>>>    configuration and sends the PF number and VF index to the central
>>>>>    management software.  
>>>>
>>>> What is "central management software" here? Deployment specific or
>>>> some part of k8s?  
>>>
>>> It's the k8s API server.
>>>
>>>>   
>>>>> 3. An agent on the DPU side detects the changes made on host X. Using
>>>>>    the PF number and VF index, it identifies the corresponding
>>>>>    representor, attaches it to an OVS bridge, and allows OVN to program
>>>>>    the relevant steering rules.  
>>>>
>>>> What does it mean that DPU "detects it", what's the source and 
>>>> mechanism of the notification?
>>>> Is it communicating with the central SW during  the process?  
>>>
>>> The agent (running in the ARM/DPU) listens for events from the k8s API server.
>>
>> Interesting. So a deployment with no security boundaries. The internals
>> of the IPU and the k8s on the host are in the same domain of control.
> 
> The VF is created on host X, but the corresponding representor appears
> on a different host, the IPU. Naturally, they need to be able to
> synchronize and exchange information for everything to work correctly.
> 
>>
>> So how does the user remotely power cycle the hosts?
> 
> Why should a user be able to power cycle the hosts?
> Are you are asking about the administrator?
> 
>>
>> What I'm getting at is that your mental model seems to be missing any
>> sort of HW inventory database, which lists all the hosts and how they
>> plug into the DC. The administrator of the system must already know
>> where each machine is exactly in the chassis for basic DC ops. And
>> that HW DB is normally queried in what you describe. If there is any
>> security domain crossing in the picture it will require cross checking
>> against that HW DB.
> 
> You're assuming that external host numbering and PCI enumeration are
> stable, also users can determine the mapping only after creating
> VFs. But even then, the mapping is indirect e.g: “I created a VF on
> this PF, and I see a single representor appear on the IPU, so they
> must be linked.” That approach is fragile and error prone.
> 
> Also, keep in mind: the external hosts and their kernels shouldn’t
> be aware they’re part of a multi-host system. With our current
> approach, you just need to provide a host-to-IPU mapping
> upfront, no guesswork involved.
> 
> Just thinking out loud, once this feature is in place, we might
> not even need a static mapping between external hosts and IPU hosts.
> 
> If VUID and FUID are globally unique, the following workflow
> becomes possible:
> 
> - A user requests a container with network connectivity.
> - k8s allocates and configures a VF on one of the hosts.
>   It then sends the VUID, PF number, and VF index for the new VF
>   to the k8S API server.
> - Somewhere in the network, a representor appears. An agent detects
>   this and notifies the k8s API server, including its FUID,
>   PF number, and VF index.
> - The API server matches the VF and representor data based on the
>   globally unique identifiers and sends the relevant information
>   back to the agent that reported the representor creation.
> - The agent attaches the representor to the OVS bridge, and with
>   OVN configures the appropriate steering rules.
> 
> This would remove the need for pre defined host to IPU mappings
> and allow for a more dynamic and flexible setup.
> 
>>
>> I don't think this is sufficiently well established to warrant new uAPI.
>> You can use a UUID and pass it via ndo_get_phys_port_id.
> 
> phys_port_id only applies to netdev interfaces, whereas this use case is
> broader and more aligned with devlink. We believe devlink is a more
> appropriate place for this functionality.
> 
> Mark
> 

Hi Jakub,

Just checking in, have you had a chance to review my earlier email?
Would appreciate your thoughts or guidance on the right path forward.

Mark

  reply	other threads:[~2025-05-14 12:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-23 13:50 [RFC net-next 0/5] devlink: Add unique identifier to devlink port function Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 1/5] " Moshe Shemesh
2025-04-28 12:33   ` Simon Horman
2025-04-29  9:33     ` Avihai Horon
2025-04-23 13:50 ` [RFC net-next 2/5] net/mlx5: Move mlx5_cmd_query_vuid() from IB to core Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 3/5] net/mlx5: Add vhca_id argument to mlx5_core_query_vuid() Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 4/5] net/mlx5: Add define for max VUID string size Moshe Shemesh
2025-04-23 13:50 ` [RFC net-next 5/5] net/mlx5: Expose unique identifier in devlink port function Moshe Shemesh
2025-04-24 23:24 ` [RFC net-next 0/5] devlink: Add unique identifier to " Jakub Kicinski
2025-04-25 11:26   ` Jiri Pirko
2025-04-25 17:51     ` Jakub Kicinski
2025-04-28 16:30       ` Jiri Pirko
2025-04-28 12:11   ` Moshe Shemesh
2025-04-28 18:19     ` Jakub Kicinski
2025-04-29  8:37       ` Moshe Shemesh
2025-05-02  0:39         ` Jakub Kicinski
2025-05-04 17:46           ` Mark Bloch
2025-05-05 18:55             ` Jakub Kicinski
2025-05-06 11:25               ` Mark Bloch
2025-05-06 15:20                 ` Jakub Kicinski
2025-05-06 15:34                   ` Mark Bloch
2025-05-08  0:43                     ` Jakub Kicinski
2025-05-08  9:04                       ` Mark Bloch
2025-05-14 12:01                         ` Mark Bloch [this message]
2025-05-14 14:52                           ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da372ddc-bc00-4e14-bcd8-4e9c607cc1d8@nvidia.com \
    --to=mbloch@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=moshe@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).