* RE: SR-IOV + switchdev + vlan + Mellanox: Cannot ping
@ 2024-04-26 20:35 Shane Miller
2024-04-27 10:26 ` Jiri Pirko
0 siblings, 1 reply; 6+ messages in thread
From: Shane Miller @ 2024-04-26 20:35 UTC (permalink / raw)
To: netdev
Problem:
-----------------------------------------------------------------
root@machA $ ping 10.xx.xx.194
PING 10.xx.xx.194 (10.xx.xx.194) 56(84) bytes of data
From 10.xx.xx.191 icmp seq=10 Destination Host Unreachable
Proximate Cause:
-----------------------------------------------------------------
This seems to be a side effect of "switchdev" mode. When the identical
configuration is set up EXCEPT that the SR-IOV virtualized NIC is left
"legacy", ping (and ncat) works just fine.
As far as I can tell I need a bridge or bridge commands, but I have no
idea where to start. This environment will not allow me to add modify
commands when enabling switchdev mode. devlink seems to accept
"switchdev" alone without modifiers.
Note: putting a NIC into switchdev mode makes the virtual functions
show as "link-state disable" which is confusing. (See below.) Contrary
to what it seems to suggest, the virtual NICs are up and running
Running "arp -e" on machine A shows machine B's ieth3v0 MAC address as
incomplete suggesting switchdev+ARP is broken.
Problem Environment:
-----------------------------------------------------------------
OS: RHEL 8.6 4.18.0-372.46.1.el8 x64
NICs: Mellanox ConnectX-6
Machine A Links:
70 tst@ieth3: <...LOWER_UP...> mtu 1500
link/ether xx.xx.xx.xx.xx.xx
vlan protocol 802.1Q id 133 <REORDER_HDR>
Inet 10.xx.xx.191
Machine B Links With ieth3 in SR-IOV mode in switchdev mode:
# Physical Function and its virtual functions:
2: ieth3:
<...PROMISC,UP,LOWER_UP> mtu 1500
link/ether xx.xx.xx.xx.xx.f6 portname p0 switchid xxxxe988
vf 0 link/ether xx.xx.xx.xx.xx.00 vlan 133 spoof off, link-state
disable, trust off
. . .
# Port representers
893: ieth3r0: <...UP,LOWER_UP> mtu 1500
link/ether xx.xx.xx.xx.xx.e1 portname pf0vf0 switchid xxxxe988
. . .
# Virtual Links
897: ieth3v0: <...UP,LOWER_UP> mtu 1500
link/ether xx.xx.xx.xx.xx.00 promiscuity 0
inet 10.xx.xx.194/24 scope global ieth3v0
. . .
SR-IOV Setup Summary
-----------------------------------------------------------------
This is done right since, in legacy mode, ping/ncat works fine:
1. Enable IOMMU, Vtx in BIOS
2. Boot Linux with iommu=on on command line
3. Install Mellanox OFED
4. Enable SR-IOV for max 8 devices in Mellanox firmware
(reboot)
5. Create 4 virtual NICs w/ SR-IOV
6. Configure 4 virtual NICs mac, trust off, spoofchk off, state auto
7. Unbind virtual NICs
8. Put ieth3 into switchdev mode
9. Rebind virtual NICs
10. Bring all links up
11. Assign IPV4 addresses to virtual links
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: SR-IOV + switchdev + vlan + Mellanox: Cannot ping 2024-04-26 20:35 SR-IOV + switchdev + vlan + Mellanox: Cannot ping Shane Miller @ 2024-04-27 10:26 ` Jiri Pirko 2024-04-28 20:24 ` Shane Miller 0 siblings, 1 reply; 6+ messages in thread From: Jiri Pirko @ 2024-04-27 10:26 UTC (permalink / raw) To: Shane Miller; +Cc: netdev Fri, Apr 26, 2024 at 10:35:28PM CEST, gshanemiller6@gmail.com wrote: >Problem: >----------------------------------------------------------------- >root@machA $ ping 10.xx.xx.194 >PING 10.xx.xx.194 (10.xx.xx.194) 56(84) bytes of data >From 10.xx.xx.191 icmp seq=10 Destination Host Unreachable >Proximate Cause: >----------------------------------------------------------------- >This seems to be a side effect of "switchdev" mode. When the identical >configuration is set up EXCEPT that the SR-IOV virtualized NIC is left >"legacy", ping (and ncat) works just fine. > >As far as I can tell I need a bridge or bridge commands, but I have no >idea where to start. This environment will not allow me to add modify >commands when enabling switchdev mode. devlink seems to accept >"switchdev" alone without modifiers. You have to configure forwarding between appropriate representors. Use ovs (probably easiest) or tc. > >Note: putting a NIC into switchdev mode makes the virtual functions >show as "link-state disable" which is confusing. (See below.) Contrary >to what it seems to suggest, the virtual NICs are up and running > >Running "arp -e" on machine A shows machine B's ieth3v0 MAC address as >incomplete suggesting switchdev+ARP is broken. > >Problem Environment: >----------------------------------------------------------------- >OS: RHEL 8.6 4.18.0-372.46.1.el8 x64 >NICs: Mellanox ConnectX-6 > >Machine A Links: >70 tst@ieth3: <...LOWER_UP...> mtu 1500 > link/ether xx.xx.xx.xx.xx.xx > vlan protocol 802.1Q id 133 <REORDER_HDR> > Inet 10.xx.xx.191 > >Machine B Links With ieth3 in SR-IOV mode in switchdev mode: ># Physical Function and its virtual functions: > 2: ieth3: ><...PROMISC,UP,LOWER_UP> mtu 1500 > link/ether xx.xx.xx.xx.xx.f6 portname p0 switchid xxxxe988 > vf 0 link/ether xx.xx.xx.xx.xx.00 vlan 133 spoof off, link-state >disable, trust off > . . . ># Port representers >893: ieth3r0: <...UP,LOWER_UP> mtu 1500 >link/ether xx.xx.xx.xx.xx.e1 portname pf0vf0 switchid xxxxe988 >. . . ># Virtual Links >897: ieth3v0: <...UP,LOWER_UP> mtu 1500 > link/ether xx.xx.xx.xx.xx.00 promiscuity 0 > inet 10.xx.xx.194/24 scope global ieth3v0 > . . . > >SR-IOV Setup Summary >----------------------------------------------------------------- >This is done right since, in legacy mode, ping/ncat works fine: > >1. Enable IOMMU, Vtx in BIOS >2. Boot Linux with iommu=on on command line >3. Install Mellanox OFED >4. Enable SR-IOV for max 8 devices in Mellanox firmware >(reboot) >5. Create 4 virtual NICs w/ SR-IOV >6. Configure 4 virtual NICs mac, trust off, spoofchk off, state auto >7. Unbind virtual NICs >8. Put ieth3 into switchdev mode >9. Rebind virtual NICs >10. Bring all links up >11. Assign IPV4 addresses to virtual links > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SR-IOV + switchdev + vlan + Mellanox: Cannot ping 2024-04-27 10:26 ` Jiri Pirko @ 2024-04-28 20:24 ` Shane Miller 2024-04-29 11:29 ` Jiri Pirko 0 siblings, 1 reply; 6+ messages in thread From: Shane Miller @ 2024-04-28 20:24 UTC (permalink / raw) To: Jiri Pirko; +Cc: netdev J Pirko wrote, "You have to configure forwarding between appropriate representors. Use ovs (probably easiest) or tc." Thank you for taking time to reply. But I need additional information/guidance on how to bridge and what to bridge. TC can be used to mirror packets for example and in fact, I have set that up, which is why I need the NIC in switchdev mode. However, this is orthogonal. As I say in the original post, leaving the NIC in "legacy" mode has no ping issues. As far as I understand it TC is not part of the solution space here. My vague understanding is putting a NIC into switchdev mode means packets flow into HW only not passing through the kernel, and this is what screws ARP up since the kernel is needed at bit. A bridge is supposed to fix that. I tried, brctl addbr sriovbr brctl addif sriovbr <DEV> ip link set dev sriovbr up ip addr ... sriov ... where <DEV> was the link name of the physical device, or the virtual link, or the port representor, or combo to no effect. So, restating the issue: A NIC is SR-IOV virtualized into 4 virt NICs each with a vlan, IP address. The NIC is placed into switchdev mode. The virtual NICs are not pingable from other boxes. The other boxes see the NIC's MAC addresses as incomplete (arp -n or arp -e). What and how do I bridge/link to fix this problem? On Sat, Apr 27, 2024 at 6:26 AM Jiri Pirko <jiri@resnulli.us> wrote: > > Fri, Apr 26, 2024 at 10:35:28PM CEST, gshanemiller6@gmail.com wrote: > >Problem: > >----------------------------------------------------------------- > >root@machA $ ping 10.xx.xx.194 > >PING 10.xx.xx.194 (10.xx.xx.194) 56(84) bytes of data > >From 10.xx.xx.191 icmp seq=10 Destination Host Unreachable > >Proximate Cause: > >----------------------------------------------------------------- > >This seems to be a side effect of "switchdev" mode. When the identical > >configuration is set up EXCEPT that the SR-IOV virtualized NIC is left > >"legacy", ping (and ncat) works just fine. > > > >As far as I can tell I need a bridge or bridge commands, but I have no > >idea where to start. This environment will not allow me to add modify > >commands when enabling switchdev mode. devlink seems to accept > >"switchdev" alone without modifiers. > > You have to configure forwarding between appropriate representors. Use > ovs (probably easiest) or tc. > > > > >Note: putting a NIC into switchdev mode makes the virtual functions > >show as "link-state disable" which is confusing. (See below.) Contrary > >to what it seems to suggest, the virtual NICs are up and running > > > >Running "arp -e" on machine A shows machine B's ieth3v0 MAC address as > >incomplete suggesting switchdev+ARP is broken. > > > >Problem Environment: > >----------------------------------------------------------------- > >OS: RHEL 8.6 4.18.0-372.46.1.el8 x64 > >NICs: Mellanox ConnectX-6 > > > >Machine A Links: > >70 tst@ieth3: <...LOWER_UP...> mtu 1500 > > link/ether xx.xx.xx.xx.xx.xx > > vlan protocol 802.1Q id 133 <REORDER_HDR> > > Inet 10.xx.xx.191 > > > >Machine B Links With ieth3 in SR-IOV mode in switchdev mode: > ># Physical Function and its virtual functions: > > 2: ieth3: > ><...PROMISC,UP,LOWER_UP> mtu 1500 > > link/ether xx.xx.xx.xx.xx.f6 portname p0 switchid xxxxe988 > > vf 0 link/ether xx.xx.xx.xx.xx.00 vlan 133 spoof off, link-state > >disable, trust off > > . . . > ># Port representers > >893: ieth3r0: <...UP,LOWER_UP> mtu 1500 > >link/ether xx.xx.xx.xx.xx.e1 portname pf0vf0 switchid xxxxe988 > >. . . > ># Virtual Links > >897: ieth3v0: <...UP,LOWER_UP> mtu 1500 > > link/ether xx.xx.xx.xx.xx.00 promiscuity 0 > > inet 10.xx.xx.194/24 scope global ieth3v0 > > . . . > > > >SR-IOV Setup Summary > >----------------------------------------------------------------- > >This is done right since, in legacy mode, ping/ncat works fine: > > > >1. Enable IOMMU, Vtx in BIOS > >2. Boot Linux with iommu=on on command line > >3. Install Mellanox OFED > >4. Enable SR-IOV for max 8 devices in Mellanox firmware > >(reboot) > >5. Create 4 virtual NICs w/ SR-IOV > >6. Configure 4 virtual NICs mac, trust off, spoofchk off, state auto > >7. Unbind virtual NICs > >8. Put ieth3 into switchdev mode > >9. Rebind virtual NICs > >10. Bring all links up > >11. Assign IPV4 addresses to virtual links > > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SR-IOV + switchdev + vlan + Mellanox: Cannot ping 2024-04-28 20:24 ` Shane Miller @ 2024-04-29 11:29 ` Jiri Pirko 2024-04-30 21:29 ` Shane Miller 0 siblings, 1 reply; 6+ messages in thread From: Jiri Pirko @ 2024-04-29 11:29 UTC (permalink / raw) To: Shane Miller; +Cc: netdev Sun, Apr 28, 2024 at 10:24:14PM CEST, gshanemiller6@gmail.com wrote: >J Pirko wrote, > >"You have to configure forwarding between appropriate representors. Use >ovs (probably easiest) or tc." > >Thank you for taking time to reply. But I need additional information/guidance > on how to bridge and what to bridge. > >TC can be used to mirror packets for example and in fact, I have set that up, >which is why I need the NIC in switchdev mode. However, this is orthogonal. >As I say in the original post, leaving the NIC in "legacy" mode has no ping >issues. As far as I understand it TC is not part of the solution space here. > >My vague understanding is putting a NIC into switchdev mode means packets >flow into HW only not passing through the kernel, and this is what screws ARP Nope. Think of it as another switch inside the NIC that connects VFs and uplink port. You have representors that represent the switch port. Each representor has counter part VF. You have to configure the forwarding between the representor, similar to switch ports. In switch, there is also no default forwarding. >up since the kernel is needed at bit. A bridge is supposed to fix that. I tried, > >brctl addbr sriovbr >brctl addif sriovbr <DEV> >ip link set dev sriovbr up >ip addr ... sriov ... I don't think that bridge offload is supported, I may be wrong. > >where <DEV> was the link name of the physical device, or the virtual link, or >the port representor, or combo to no effect. > >So, restating the issue: A NIC is SR-IOV virtualized into 4 virt NICs each with >a vlan, IP address. The NIC is placed into switchdev mode. The virtual NICs >are not pingable from other boxes. The other boxes see the NIC's MAC >addresses as incomplete (arp -n or arp -e). > >What and how do I bridge/link to fix this problem? > >On Sat, Apr 27, 2024 at 6:26 AM Jiri Pirko <jiri@resnulli.us> wrote: >> >> Fri, Apr 26, 2024 at 10:35:28PM CEST, gshanemiller6@gmail.com wrote: >> >Problem: >> >----------------------------------------------------------------- >> >root@machA $ ping 10.xx.xx.194 >> >PING 10.xx.xx.194 (10.xx.xx.194) 56(84) bytes of data >> >From 10.xx.xx.191 icmp seq=10 Destination Host Unreachable >> >Proximate Cause: >> >----------------------------------------------------------------- >> >This seems to be a side effect of "switchdev" mode. When the identical >> >configuration is set up EXCEPT that the SR-IOV virtualized NIC is left >> >"legacy", ping (and ncat) works just fine. >> > >> >As far as I can tell I need a bridge or bridge commands, but I have no >> >idea where to start. This environment will not allow me to add modify >> >commands when enabling switchdev mode. devlink seems to accept >> >"switchdev" alone without modifiers. >> >> You have to configure forwarding between appropriate representors. Use >> ovs (probably easiest) or tc. >> >> > >> >Note: putting a NIC into switchdev mode makes the virtual functions >> >show as "link-state disable" which is confusing. (See below.) Contrary >> >to what it seems to suggest, the virtual NICs are up and running >> > >> >Running "arp -e" on machine A shows machine B's ieth3v0 MAC address as >> >incomplete suggesting switchdev+ARP is broken. >> > >> >Problem Environment: >> >----------------------------------------------------------------- >> >OS: RHEL 8.6 4.18.0-372.46.1.el8 x64 >> >NICs: Mellanox ConnectX-6 >> > >> >Machine A Links: >> >70 tst@ieth3: <...LOWER_UP...> mtu 1500 >> > link/ether xx.xx.xx.xx.xx.xx >> > vlan protocol 802.1Q id 133 <REORDER_HDR> >> > Inet 10.xx.xx.191 >> > >> >Machine B Links With ieth3 in SR-IOV mode in switchdev mode: >> ># Physical Function and its virtual functions: >> > 2: ieth3: >> ><...PROMISC,UP,LOWER_UP> mtu 1500 >> > link/ether xx.xx.xx.xx.xx.f6 portname p0 switchid xxxxe988 >> > vf 0 link/ether xx.xx.xx.xx.xx.00 vlan 133 spoof off, link-state >> >disable, trust off >> > . . . >> ># Port representers >> >893: ieth3r0: <...UP,LOWER_UP> mtu 1500 >> >link/ether xx.xx.xx.xx.xx.e1 portname pf0vf0 switchid xxxxe988 >> >. . . >> ># Virtual Links >> >897: ieth3v0: <...UP,LOWER_UP> mtu 1500 >> > link/ether xx.xx.xx.xx.xx.00 promiscuity 0 >> > inet 10.xx.xx.194/24 scope global ieth3v0 >> > . . . >> > >> >SR-IOV Setup Summary >> >----------------------------------------------------------------- >> >This is done right since, in legacy mode, ping/ncat works fine: >> > >> >1. Enable IOMMU, Vtx in BIOS >> >2. Boot Linux with iommu=on on command line >> >3. Install Mellanox OFED >> >4. Enable SR-IOV for max 8 devices in Mellanox firmware >> >(reboot) >> >5. Create 4 virtual NICs w/ SR-IOV >> >6. Configure 4 virtual NICs mac, trust off, spoofchk off, state auto >> >7. Unbind virtual NICs >> >8. Put ieth3 into switchdev mode >> >9. Rebind virtual NICs >> >10. Bring all links up >> >11. Assign IPV4 addresses to virtual links >> > ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SR-IOV + switchdev + vlan + Mellanox: Cannot ping 2024-04-29 11:29 ` Jiri Pirko @ 2024-04-30 21:29 ` Shane Miller 2024-05-01 18:16 ` Benjamin Poirier 0 siblings, 1 reply; 6+ messages in thread From: Shane Miller @ 2024-04-30 21:29 UTC (permalink / raw) To: Jiri Pirko; +Cc: netdev On Mon, Apr 29, 2024 at 7:29 AM Jiri Pirko <jiri@resnulli.us> wrote: > Nope. Think of it as another switch inside the NIC that connects VFs and > uplink port. You have representors that represent the switch port. Each > representor has counter part VF. You have to configure the forwarding > between the representor, similar to switch ports. In switch, there is > also no default forwarding. The salient phrase is "forward between the representor". You seem to be saying to forward ARP packets from the uplink port (ieth3 e.g. the NIC that was virtualized) to a port representer (ieth3r0)? Are those the correct endpoints? Second, what UNIX tool do I use to forward? As far as I can tell, the correct methodology is to first create a bridge: ip link add name br0 type bridge ip link set br0 up Then do something (but what?) with bridge fdr add as described here: https://www.kernel.org/doc/html/v5.8/networking/switchdev.html#static-fdb-entries ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: SR-IOV + switchdev + vlan + Mellanox: Cannot ping 2024-04-30 21:29 ` Shane Miller @ 2024-05-01 18:16 ` Benjamin Poirier 0 siblings, 0 replies; 6+ messages in thread From: Benjamin Poirier @ 2024-05-01 18:16 UTC (permalink / raw) To: Shane Miller; +Cc: Jiri Pirko, netdev On 2024-04-30 17:29 -0400, Shane Miller wrote: > On Mon, Apr 29, 2024 at 7:29 AM Jiri Pirko <jiri@resnulli.us> wrote: > > Nope. Think of it as another switch inside the NIC that connects VFs and > > uplink port. You have representors that represent the switch port. Each > > representor has counter part VF. You have to configure the forwarding > > between the representor, similar to switch ports. In switch, there is > > also no default forwarding. > > The salient phrase is "forward between the representor". You seem to > be saying to forward ARP packets from the uplink port (ieth3 e.g. > the NIC that was virtualized) to a port representer (ieth3r0)? Are those > the correct endpoints? > > Second, what UNIX tool do I use to forward? As far as I can tell, the > correct methodology is to first create a bridge: > > ip link add name br0 type bridge > ip link set br0 up > I recently learned about this too and here is what I noted down: In switchdev mode, two netdevs are created for each VF: 1) port representor (PR) `ethtool -i` shows "driver: mlx5e_rep" sysfs device/ is the PF `devlink port` shows "flavour pcivf" 2) actual VF driver: mlx5_core sysfs device/ is unique `devlink port` shows "flavour virtual" In order to be able to pass traffic, the PR must be added into a bridge with the PF: ip link add br0 up type bridge ip link set dev eth2 up master br0 # PF ip link set dev eth4 up master br0 # PR ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-05-01 18:16 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-26 20:35 SR-IOV + switchdev + vlan + Mellanox: Cannot ping Shane Miller 2024-04-27 10:26 ` Jiri Pirko 2024-04-28 20:24 ` Shane Miller 2024-04-29 11:29 ` Jiri Pirko 2024-04-30 21:29 ` Shane Miller 2024-05-01 18:16 ` Benjamin Poirier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).