* [PATCH net 0/2] net/mlx4: two virtualization bug fixes @ 2013-01-17 15:30 Or Gerlitz 2013-01-17 15:30 ` [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Or Gerlitz ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Or Gerlitz @ 2013-01-17 15:30 UTC (permalink / raw) To: davem; +Cc: netdev, amirv, jackm, yevgenyp, Or Gerlitz Hi Dave, These are two fixes to driver bugs which were introduced during the integration of the SRIOV patches, would love see them going to -stable too. Or. Or Gerlitz (1): net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Yan Burman (1): net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode drivers/net/ethernet/mellanox/mlx4/en_tx.c | 13 +++++++++---- drivers/net/ethernet/mellanox/mlx4/main.c | 11 ++--------- 2 files changed, 11 insertions(+), 13 deletions(-) ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode 2013-01-17 15:30 [PATCH net 0/2] net/mlx4: two virtualization bug fixes Or Gerlitz @ 2013-01-17 15:30 ` Or Gerlitz 2013-01-17 16:21 ` Eric Dumazet 2013-01-17 15:30 ` [PATCH net 2/2] net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Or Gerlitz 2013-01-18 19:25 ` [PATCH net 0/2] net/mlx4: two virtualization bug fixes David Miller 2 siblings, 1 reply; 7+ messages in thread From: Or Gerlitz @ 2013-01-17 15:30 UTC (permalink / raw) To: davem; +Cc: netdev, amirv, jackm, yevgenyp, Yan Burman From: Yan Burman <yanb@mellanox.com> Commit 5b4c4d36860e "mlx4_en: Allow communication between functions on same host" introduced a regression under which a bridge acting as vSwitch whose uplink is an mlx4 Ethernet device become non-operative in native (non sriov) mode. This happens since broadcast ARP requests sent by VMs were loopback-ed by the HW and hence the bridge learned VM source MACs on both the VM and the uplink ports. The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch configuration or when the device is in selftest. Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Amir Vadai <amirv@mellanox.com> --- drivers/net/ethernet/mellanox/mlx4/en_tx.c | 13 +++++++++---- 1 files changed, 9 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c index 2b799f4..6771b69 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c @@ -630,10 +630,15 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) ring->tx_csum++; } - /* Copy dst mac address to wqe */ - ethh = (struct ethhdr *)skb->data; - tx_desc->ctrl.srcrb_flags16[0] = get_unaligned((__be16 *)ethh->h_dest); - tx_desc->ctrl.imm = get_unaligned((__be32 *)(ethh->h_dest + 2)); + if (mlx4_is_mfunc(mdev->dev) || priv->validate_loopback) { + /* Copy dst mac address to wqe. This allows loopback in eSwitch, + * so that VFs and PF can communicate with each other + */ + ethh = (struct ethhdr *)skb->data; + tx_desc->ctrl.srcrb_flags16[0] = get_unaligned((__be16 *)ethh->h_dest); + tx_desc->ctrl.imm = get_unaligned((__be32 *)(ethh->h_dest + 2)); + } + /* Handle LSO (TSO) packets */ if (lso_header_size) { /* Mark opcode as LSO */ -- 1.7.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode 2013-01-17 15:30 ` [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Or Gerlitz @ 2013-01-17 16:21 ` Eric Dumazet 2013-01-17 16:32 ` Or Gerlitz 0 siblings, 1 reply; 7+ messages in thread From: Eric Dumazet @ 2013-01-17 16:21 UTC (permalink / raw) To: Or Gerlitz; +Cc: davem, netdev, amirv, jackm, yevgenyp, Yan Burman On Thu, 2013-01-17 at 17:30 +0200, Or Gerlitz wrote: > From: Yan Burman <yanb@mellanox.com> > > Commit 5b4c4d36860e "mlx4_en: Allow communication between functions on > same host" introduced a regression under which a bridge acting as vSwitch > whose uplink is an mlx4 Ethernet device become non-operative in native > (non sriov) mode. This happens since broadcast ARP requests sent by VMs > were loopback-ed by the HW and hence the bridge learned VM source MACs > on both the VM and the uplink ports. > > The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch > configuration or when the device is in selftest. > > Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> > Signed-off-by: Yan Burman <yanb@mellanox.com> > Signed-off-by: Amir Vadai <amirv@mellanox.com> > --- > drivers/net/ethernet/mellanox/mlx4/en_tx.c | 13 +++++++++---- > 1 files changed, 9 insertions(+), 4 deletions(-) > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c > index 2b799f4..6771b69 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c > @@ -630,10 +630,15 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) > ring->tx_csum++; > } > > - /* Copy dst mac address to wqe */ > - ethh = (struct ethhdr *)skb->data; > - tx_desc->ctrl.srcrb_flags16[0] = get_unaligned((__be16 *)ethh->h_dest); > - tx_desc->ctrl.imm = get_unaligned((__be32 *)(ethh->h_dest + 2)); > + if (mlx4_is_mfunc(mdev->dev) || priv->validate_loopback) { > + /* Copy dst mac address to wqe. This allows loopback in eSwitch, > + * so that VFs and PF can communicate with each other > + */ > + ethh = (struct ethhdr *)skb->data; > + tx_desc->ctrl.srcrb_flags16[0] = get_unaligned((__be16 *)ethh->h_dest); > + tx_desc->ctrl.imm = get_unaligned((__be32 *)(ethh->h_dest + 2)); > + } > + > /* Handle LSO (TSO) packets */ > if (lso_header_size) { > /* Mark opcode as LSO */ Isn't this bug could explain why mlx4 had to filter incoming frames ? It seems you also could remove this chunk ? (I don't really understand why mlx4_en_mac_to_u64() is done... its horribly expensive...). matching 6 bytes can be done in 3 x86_64 instructions using the right helper (ether_addr_equal_64bits()) vi +613 drivers/net/ethernet/mellanox/mlx4/en_rx.c ethh = (struct ethhdr *)(page_address(frags[0].page) + frags[0].offset); s_mac = mlx4_en_mac_to_u64(ethh->h_source); /* If source MAC is equal to our own MAC and not performing * the selftest or flb disabled - drop the packet */ if (s_mac == priv->mac && !((dev->features & NETIF_F_LOOPBACK) || priv->validate_loopback)) goto next; ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode 2013-01-17 16:21 ` Eric Dumazet @ 2013-01-17 16:32 ` Or Gerlitz 2013-01-17 16:34 ` Or Gerlitz 0 siblings, 1 reply; 7+ messages in thread From: Or Gerlitz @ 2013-01-17 16:32 UTC (permalink / raw) To: Eric Dumazet; +Cc: davem, netdev, amirv, jackm, yevgenyp, Yan Burman On 17/01/2013 18:21, Eric Dumazet wrote: > On Thu, 2013-01-17 at 17:30 +0200, Or Gerlitz wrote: >> From: Yan Burman <yanb@mellanox.com> >> >> Commit 5b4c4d36860e "mlx4_en: Allow communication between functions on >> same host" introduced a regression under which a bridge acting as vSwitch >> whose uplink is an mlx4 Ethernet device become non-operative in native >> (non sriov) mode. This happens since broadcast ARP requests sent by VMs >> were loopback-ed by the HW and hence the bridge learned VM source MACs >> on both the VM and the uplink ports. >> >> The fix is to place the DMAC in the send WQE only under SRIOV/eSwitch >> configuration or when the device is in selftest. >> >> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> >> Signed-off-by: Yan Burman <yanb@mellanox.com> >> Signed-off-by: Amir Vadai <amirv@mellanox.com> >> --- >> drivers/net/ethernet/mellanox/mlx4/en_tx.c | 13 +++++++++---- >> 1 files changed, 9 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c >> index 2b799f4..6771b69 100644 >> --- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c >> +++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c >> @@ -630,10 +630,15 @@ netdev_tx_t mlx4_en_xmit(struct sk_buff *skb, struct net_device *dev) >> ring->tx_csum++; >> } >> >> - /* Copy dst mac address to wqe */ >> - ethh = (struct ethhdr *)skb->data; >> - tx_desc->ctrl.srcrb_flags16[0] = get_unaligned((__be16 *)ethh->h_dest); >> - tx_desc->ctrl.imm = get_unaligned((__be32 *)(ethh->h_dest + 2)); >> + if (mlx4_is_mfunc(mdev->dev) || priv->validate_loopback) { >> + /* Copy dst mac address to wqe. This allows loopback in eSwitch, >> + * so that VFs and PF can communicate with each other >> + */ >> + ethh = (struct ethhdr *)skb->data; >> + tx_desc->ctrl.srcrb_flags16[0] = get_unaligned((__be16 *)ethh->h_dest); >> + tx_desc->ctrl.imm = get_unaligned((__be32 *)(ethh->h_dest + 2)); >> + } >> + >> /* Handle LSO (TSO) packets */ >> if (lso_header_size) { >> /* Mark opcode as LSO */ > Isn't this bug could explain why mlx4 had to filter incoming frames ? yes and no... when in multi-function -- eSwitch mode we DO need to always put the DMAC in the TX WQE since this is a hint for the HW to optionally conduct loopback (e.g if this DMAC is registered with the HW for another VF). For net / 3.8 we wanted a pure bug fix 1st and most, we have more patches that reduce checks related to loopback but they can have a better fit to net-next. Also, there's another patch set coming from Yan that enables bridge and macvlan over VFs, this is new functionality, adding support to ndo_fdb entries and IFF_UNICAST_FLT, will be submitted to net-next soon (next week I believe) too. All in all, the check on the RX mode was intentionally not removed by this patch and will be optimized and enhanced in more patches which are coming for net-next. Hope this makes things clearer. Or. > > It seems you also could remove this chunk ? > > (I don't really understand why mlx4_en_mac_to_u64() is done... its > horribly expensive...). > matching 6 bytes can be done in 3 x86_64 instructions using the right > helper (ether_addr_equal_64bits()) > > vi +613 drivers/net/ethernet/mellanox/mlx4/en_rx.c > > ethh = (struct ethhdr *)(page_address(frags[0].page) + > frags[0].offset); > s_mac = mlx4_en_mac_to_u64(ethh->h_source); > > /* If source MAC is equal to our own MAC and not performing > * the selftest or flb disabled - drop the packet */ > if (s_mac == priv->mac && > !((dev->features & NETIF_F_LOOPBACK) || > priv->validate_loopback)) > goto next; > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode 2013-01-17 16:32 ` Or Gerlitz @ 2013-01-17 16:34 ` Or Gerlitz 0 siblings, 0 replies; 7+ messages in thread From: Or Gerlitz @ 2013-01-17 16:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: davem, netdev, amirv, jackm, yevgenyp, Yan Burman On 17/01/2013 18:32, Or Gerlitz wrote: > (I don't really understand why mlx4_en_mac_to_u64() is done... its > horribly expensive...). matching 6 bytes can be done in 3 x86_64 > instructions using the right helper (ether_addr_equal_64bits()) thanks for the heads up, we will fix. Or. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net 2/2] net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults 2013-01-17 15:30 [PATCH net 0/2] net/mlx4: two virtualization bug fixes Or Gerlitz 2013-01-17 15:30 ` [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Or Gerlitz @ 2013-01-17 15:30 ` Or Gerlitz 2013-01-18 19:25 ` [PATCH net 0/2] net/mlx4: two virtualization bug fixes David Miller 2 siblings, 0 replies; 7+ messages in thread From: Or Gerlitz @ 2013-01-17 15:30 UTC (permalink / raw) To: davem; +Cc: netdev, amirv, jackm, yevgenyp, Or Gerlitz The lines if (mlx4_is_mfunc(dev)) { nreq = 2; } else { which hard code the number of requested msi-x vectors under multi-function mode to two can be removed completely, since the firmware sets num_eqs and reserved_eqs appropriately Thus, the code line: nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, nreq); is by itself sufficient and correct for all cases. Currently, for mfunc mode num_eqs = 32 and reserved_eqs = 28, hence four vectors will be enabled. This triples (one vector is used for the async events and commands EQ) the horse power provided for processing of incoming packets on netdev RSS scheme, IO initiators/targets commands processing flows, etc. Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Amir Vadai <amirv@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> --- drivers/net/ethernet/mellanox/mlx4/main.c | 11 ++--------- 1 files changed, 2 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c index e1bafff..a6542d7 100644 --- a/drivers/net/ethernet/mellanox/mlx4/main.c +++ b/drivers/net/ethernet/mellanox/mlx4/main.c @@ -1790,15 +1790,8 @@ static void mlx4_enable_msi_x(struct mlx4_dev *dev) int i; if (msi_x) { - /* In multifunction mode each function gets 2 msi-X vectors - * one for data path completions anf the other for asynch events - * or command completions */ - if (mlx4_is_mfunc(dev)) { - nreq = 2; - } else { - nreq = min_t(int, dev->caps.num_eqs - - dev->caps.reserved_eqs, nreq); - } + nreq = min_t(int, dev->caps.num_eqs - dev->caps.reserved_eqs, + nreq); entries = kcalloc(nreq, sizeof *entries, GFP_KERNEL); if (!entries) -- 1.7.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net 0/2] net/mlx4: two virtualization bug fixes 2013-01-17 15:30 [PATCH net 0/2] net/mlx4: two virtualization bug fixes Or Gerlitz 2013-01-17 15:30 ` [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Or Gerlitz 2013-01-17 15:30 ` [PATCH net 2/2] net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Or Gerlitz @ 2013-01-18 19:25 ` David Miller 2 siblings, 0 replies; 7+ messages in thread From: David Miller @ 2013-01-18 19:25 UTC (permalink / raw) To: ogerlitz; +Cc: netdev, amirv, jackm, yevgenyp From: Or Gerlitz <ogerlitz@mellanox.com> Date: Thu, 17 Jan 2013 17:30:41 +0200 > These are two fixes to driver bugs which were introduced during the integration > of the SRIOV patches, would love see them going to -stable too. Applied and queued up for -stable. ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-01-18 19:25 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-01-17 15:30 [PATCH net 0/2] net/mlx4: two virtualization bug fixes Or Gerlitz 2013-01-17 15:30 ` [PATCH net 1/2] net/mlx4_en: Fix bridged vSwitch configuration for non SRIOV mode Or Gerlitz 2013-01-17 16:21 ` Eric Dumazet 2013-01-17 16:32 ` Or Gerlitz 2013-01-17 16:34 ` Or Gerlitz 2013-01-17 15:30 ` [PATCH net 2/2] net/mlx4_core: Set number of msix vectors under SRIOV mode to firmware defaults Or Gerlitz 2013-01-18 19:25 ` [PATCH net 0/2] net/mlx4: two virtualization bug fixes David Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).