netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: Jonathan Toppins <jtoppins@redhat.com>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Veaceslav Falico <vfalico@gmail.com>,
	Andy Gospodarek <andy@greyhouse.net>,
	Hangbin Liu <liuhangbin@gmail.com>
Subject: Re: Any reason why arp monitor keeps emitting netlink failover events?
Date: Tue, 14 Jun 2022 17:26:03 -0700	[thread overview]
Message-ID: <8133.1655252763@famine> (raw)
In-Reply-To: <0ea8519c-4289-c409-2e31-44574cdefde3@redhat.com>

Jonathan Toppins <jtoppins@redhat.com> wrote:

>On 6/14/22 11:29, Jay Vosburgh wrote:
>> Jonathan Toppins <jtoppins@redhat.com> wrote:
>> 
>>> On net-next/master from today, I see netlink failover events being emitted
>>>from an active-backup bond. In the ip monitor dump you can see the bond is
>>> up (according to the link status) but keeps emitting failover events and I
>>> am not sure why. This appears to be an issue also on Fedora 35 and CentOS
>>> 8 kernels. The configuration appears to be correct, though I could be
>>> missing something. Thoughts?
>> 	Anything showing up in the dmesg?  There's only one place that
>> generates the FAILOVER notifier, and it ought to have a corresponding
>> message in the kernel log.
>> 	Also, I note that the link1_1 veth has a lot of failures:
>
>Yes all those failures are created by the setup, I waited about 5 minutes
>before dumping the link info. The failover occurs about every second. Note
>this is just a representation of a physical network so that others can run
>the setup. The script `bond-bz2094911.sh`, easily reproduces the issue
>which I dumped with cat below in the original email.
>
>Here is the kernel log, I have dynamic debug enabled for the entire
>bonding module:

	I set up the test, and added some additional instrumentation to
bond_ab_arp_inspect, and what seems to be going on is that the
dev_trans_start for link1_1 is never updating.  The "down to up"
transition for the ARP monitor only checks last_rx, but the "up to down"
check for the active interface requires both TX and RX recently
("recently" being within the past missed_max * arp_interval).

	This looks to be due to HARD_TX_LOCK not actually locking for
NETIF_F_LLTX devices:

#define HARD_TX_LOCK(dev, txq, cpu) {                           if ((dev->features & NETIF_F_LLTX) == 0) {                      __netif_tx_lock(txq, cpu);                      } else {                                                        __netif_tx_acquire(txq);                        }                                               }

	in combination with

static inline void txq_trans_update(struct netdev_queue *txq)
{
        if (txq->xmit_lock_owner != -1)
                WRITE_ONCE(txq->trans_start, jiffies);
}

	causes the trans_start update to be skipped on veth devices.

	And, sure enough, if I apply the following, the test case
appears to work:

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 466da01ba2e3..2cb833b3006a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -312,6 +312,7 @@ static bool veth_skb_is_eligible_for_gro(const struct net_device *dev,
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+	struct netdev_queue *queue = NULL;
 	struct veth_rq *rq = NULL;
 	struct net_device *rcv;
 	int length = skb->len;
@@ -329,6 +330,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 	rxq = skb_get_queue_mapping(skb);
 	if (rxq < rcv->real_num_rx_queues) {
 		rq = &rcv_priv->rq[rxq];
+		queue = netdev_get_tx_queue(dev, rxq);
 
 		/* The napi pointer is available when an XDP program is
 		 * attached or when GRO is enabled
@@ -340,6 +342,8 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	skb_tx_timestamp(skb);
 	if (likely(veth_forward_skb(rcv, skb, rq, use_napi) == NET_RX_SUCCESS)) {
+		if (queue)
+			txq_trans_cond_update(queue);
 		if (!use_napi)
 			dev_lstats_add(dev, length);
 	} else {


	I'm not entirely sure this is the best way to get the
trans_start updated in veth, but LLTX devices need to handle it
internally (and others do, e.g., tun).

	Could you test the above and see if it resolves the problem in
your environment as well?

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com

  reply	other threads:[~2022-06-15  0:26 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-14  2:59 Any reason why arp monitor keeps emitting netlink failover events? Jonathan Toppins
2022-06-14 15:29 ` Jay Vosburgh
2022-06-14 17:07   ` Jonathan Toppins
2022-06-15  0:26     ` Jay Vosburgh [this message]
2022-06-15 15:51       ` Jonathan Toppins
2022-06-16 18:52         ` Jay Vosburgh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8133.1655252763@famine \
    --to=jay.vosburgh@canonical.com \
    --cc=andy@greyhouse.net \
    --cc=jtoppins@redhat.com \
    --cc=liuhangbin@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=vfalico@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).