[DISCUSS] Bond arp monitor not works with veth due to flag NETIF_F

Netdev List
 help / color / mirror / Atom feed

* [DISCUSS] Bond arp monitor not works with veth due to flag NETIF_F_LLTX.
@ 2021-11-18  3:11 Hangbin Liu
  2021-11-18  3:34 ` Eric Dumazet
  0 siblings, 1 reply; 3+ messages in thread
From: Hangbin Liu @ 2021-11-18  3:11 UTC (permalink / raw)
  To: netdev
  Cc: Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Jarod Wilson,
	Jakub Kicinski, Jiri Pirko, davem, Denis Kirjanov, David Ahern,
	Eric Dumazet

Hi,

When I test bond arp monitor with veth interface, the bond link flaps rapidly.
After checking, in bond_ab_arp_inspect():

                trans_start = dev_trans_start(slave->dev);
                if (bond_is_active_slave(slave) &&
                    (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
                     !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) {
                        bond_propose_link_state(slave, BOND_LINK_DOWN);
                        commit++;
                }

it checks both slave's trans_start and last_rx. While veth interface's
trans_start never get updated due to flag "NETIF_F_LLTX". As when NETIF_F_LLTX
set, in netdev_start_xmit() -> txq_trans_update() the txq->trans_start
never get updated because txq->xmit_lock_owner is always -1.

If we remove the flag NETIF_F_LLTX, the HARD_TX_LOCK() will acquire the
spin_lock and update txq->xmit_lock_owner. I expected there may have some
performance drop. But I tested with xdp_redirect_map and pktgen by forwarding
a 10G NIC's traffic to veth interface and didn't see much performance drop. e.g.
With xdpgeneric mode, with the flag, it's 2.18M pps, after removing the flag,
it's 2.11M pps. Not sure if I missed anything.

So what do you think? Should we remove this flag on veth to fix the issue?
Some user may want to use bonding active-backup arp monitor mode on netns.

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [DISCUSS] Bond arp monitor not works with veth due to flag NETIF_F_LLTX.
  2021-11-18  3:11 [DISCUSS] Bond arp monitor not works with veth due to flag NETIF_F_LLTX Hangbin Liu
@ 2021-11-18  3:34 ` Eric Dumazet
  2021-11-18 10:17   ` Hangbin Liu
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Dumazet @ 2021-11-18  3:34 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	Jarod Wilson, Jakub Kicinski, Jiri Pirko, davem, Denis Kirjanov,
	David Ahern

On Wed, Nov 17, 2021 at 7:11 PM Hangbin Liu <liuhangbin@gmail.com> wrote:
>
> Hi,
>
> When I test bond arp monitor with veth interface, the bond link flaps rapidly.
> After checking, in bond_ab_arp_inspect():
>
>                 trans_start = dev_trans_start(slave->dev);
>                 if (bond_is_active_slave(slave) &&
>                     (!bond_time_in_interval(bond, trans_start, bond->params.missed_max) ||
>                      !bond_time_in_interval(bond, last_rx, bond->params.missed_max))) {
>                         bond_propose_link_state(slave, BOND_LINK_DOWN);
>                         commit++;
>                 }
>
> it checks both slave's trans_start and last_rx. While veth interface's
> trans_start never get updated due to flag "NETIF_F_LLTX". As when NETIF_F_LLTX
> set, in netdev_start_xmit() -> txq_trans_update() the txq->trans_start
> never get updated because txq->xmit_lock_owner is always -1.
>
> If we remove the flag NETIF_F_LLTX, the HARD_TX_LOCK() will acquire the
> spin_lock and update txq->xmit_lock_owner. I expected there may have some
> performance drop. But I tested with xdp_redirect_map and pktgen by forwarding
> a 10G NIC's traffic to veth interface and didn't see much performance drop. e.g.
> With xdpgeneric mode, with the flag, it's 2.18M pps, after removing the flag,
> it's 2.11M pps. Not sure if I missed anything.

A non contended lock is not that expensive.

Have you tried using many threads, instead of mono-thread pktgen ?
This will likely be bad.

>
> So what do you think? Should we remove this flag on veth to fix the issue?
> Some user may want to use bonding active-backup arp monitor mode on netns.

Removing LLTX will have performance impact.

Updating ->trans_start at most once per jiffy should be ok.

Here is a patch against net-next

diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index 750cea23e9cd02bba139a58553c4b1753956ad10..f706efecf0555974e8df562084d5770cc62126e1
100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -356,7 +356,7 @@ static void
am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,

        if (netif_tx_queue_stopped(netif_txq)) {
                /* try recover if stopped by us */
-               txq_trans_update(netif_txq);
+               txq_trans_cond_update(netif_txq);
                netif_tx_wake_queue(netif_txq);
        }
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4f4a299e92de7ba9f61507ad4df7e334775c07a6..dfd2f38023c75fb6a4181af4b4a511a9955e6a0b
100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4098,12 +4098,6 @@ static inline void __netif_tx_unlock_bh(struct
netdev_queue *txq)
 /*
  * txq->trans_start can be read locklessly from dev_watchdog()
  */
-static inline void txq_trans_update(struct netdev_queue *txq)
-{
-       if (txq->xmit_lock_owner != -1)
-               WRITE_ONCE(txq->trans_start, jiffies);
-}
-
 static inline void txq_trans_cond_update(struct netdev_queue *txq)
 {
        unsigned long now = jiffies;
@@ -4626,7 +4620,7 @@ static inline netdev_tx_t
netdev_start_xmit(struct sk_buff *skb, struct net_devi

        rc = __netdev_start_xmit(ops, skb, dev, more);
        if (rc == NETDEV_TX_OK)
-               txq_trans_update(txq);
+               txq_trans_cond_update(txq);

        return rc;
 }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [DISCUSS] Bond arp monitor not works with veth due to flag NETIF_F_LLTX.
  2021-11-18  3:34 ` Eric Dumazet
@ 2021-11-18 10:17   ` Hangbin Liu
  0 siblings, 0 replies; 3+ messages in thread
From: Hangbin Liu @ 2021-11-18 10:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, Jay Vosburgh, Veaceslav Falico, Andy Gospodarek,
	Jarod Wilson, Jakub Kicinski, Jiri Pirko, davem, Denis Kirjanov,
	David Ahern

On Wed, Nov 17, 2021 at 07:34:40PM -0800, Eric Dumazet wrote:
> Removing LLTX will have performance impact.

Yes, I also think so.
> 
> Updating ->trans_start at most once per jiffy should be ok.
> 
> Here is a patch against net-next
> @@ -4626,7 +4620,7 @@ static inline netdev_tx_t
> netdev_start_xmit(struct sk_buff *skb, struct net_devi
> 
>         rc = __netdev_start_xmit(ops, skb, dev, more);
>         if (rc == NETDEV_TX_OK)
> -               txq_trans_update(txq);
> +               txq_trans_cond_update(txq);
> 
>         return rc;
>  }

Awesome. This should resolve the veth's trans_start issue. Where is the patch?
I only find out the following one but could not find the fix in netdev_start_xmit()

https://patchwork.kernel.org/project/netdevbpf/patch/20211117032924.1740327-3-eric.dumazet@gmail.com/

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-11-18 10:18 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-18  3:11 [DISCUSS] Bond arp monitor not works with veth due to flag NETIF_F_LLTX Hangbin Liu
2021-11-18  3:34 ` Eric Dumazet
2021-11-18 10:17   ` Hangbin Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox