From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7EDF93CFF6F for ; Thu, 2 Jul 2026 22:44:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783032279; cv=none; b=Wbb0DiHyGCCw6Od01JNM4QrX78U53ygwjFgZvZSdY5PZ1905omY4Wl70G8ST2T0zV0qfp1nkZfoZDqDTOwnU+jy1FwFtizwUSqVja5XO/N0drl6/wjaykQdDkbWt96gR0rIPzpQ2oEJdwCIjyVtrfunuOylb3Elw7z8IuRGL5oA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783032279; c=relaxed/simple; bh=UGDdCtJ3tvCdQ7atMdk9EYMybsWKIi1x/7V26/tC02I=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=SJ2zKCUcF4FWS5RuMw3hvI9Z4qMPYMOh28okEUi6BWhEUlppDVm/4ry/Di1BLSJ3dL4iBXALpSARSpZJZB/axy4A+OXLb/LFNGqDbL0YphKCYrZbPRrjeyCUIksGE4R+lY05GL7M0Wtd6rnIRUJsoC6ejnVdAv73SLw9FdbbS6E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=KQc6peP1; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b=R4K6VWVE; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="KQc6peP1"; dkim=pass (2048-bit key) header.d=redhat.com header.i=@redhat.com header.b="R4K6VWVE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1783032275; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=A/rbYuZ3JZ8lfJrteK+nOk5ehF4hktz6HEAclFasOjE=; b=KQc6peP1w+r3WTKLnOLWO6ukBZ0EyHy5c7xCtSgVqhUvA4XAjC1urhYXKpKMV48p6505+q w2WjrOt3w7Los5ugz67ASXk0YpJ+gRUZaBSuCK/uj0/PR879iGpFVwDw2qdblI4LTKCJjN jSarRktrx/NGBdwi0YxOz3aMilBPyN4= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-542-PcC1_nUgPSeMUwl-xa7YPw-1; Thu, 02 Jul 2026 18:44:34 -0400 X-MC-Unique: PcC1_nUgPSeMUwl-xa7YPw-1 X-Mimecast-MFC-AGG-ID: PcC1_nUgPSeMUwl-xa7YPw_1783032273 Received: by mail-wr1-f70.google.com with SMTP id ffacd0b85a97d-46fd6d94a8dso1089677f8f.2 for ; Thu, 02 Jul 2026 15:44:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=google; t=1783032273; x=1783637073; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=A/rbYuZ3JZ8lfJrteK+nOk5ehF4hktz6HEAclFasOjE=; b=R4K6VWVE7kd/Yyy9kU3HO4vjD8de0Eq58gYfmV9NJnkOa6gifMc9rEO4Mn2c+VsTp7 GnDKJoXvw+EPSqV50iRgv8KuRQDuTabsfHSD7sm4hdXgH7zbywAID+sQE7bxelfQA4Jx 8xCSMcS5ST2UaA7JeQ9sdqesySzHH1QkN6szt6Ih4hYerXt3MNsHWmy3u5NSgbedKiOt k/02yfJGbp/6bIvdUsylV67kOaIdOHcSGCmkMK4Iu/gmUw0jRhtTeEGZyMDAEk0V62V/ 8zbTc5f0Aned89sK/OgetzTR34UQl2zf4h0ioPfB0rQnbQQqOhCR7Lix5jGXjYXkBd0O 8wtg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1783032273; x=1783637073; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-gg:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A/rbYuZ3JZ8lfJrteK+nOk5ehF4hktz6HEAclFasOjE=; b=KpeiibJdtA4BAk/05GJOV5rIdNSkoPUHBSRh6t01iMYJtk9jkbL6EBVmJaqXUV6Gsn 2e5/MUGuEpnRlX6UMVao5hr4KAvrZorFWwoXfGEVSwlnHd763cLhiKQSn7E9kIolGEra Zuucgtt54KYD4yIZYItyqzhskNNu68H3PJ4m9AAM3D3eKcCIN8vnadNGUNoFChX3lr73 ZZUeBazoepufG6Urh81oTkFl98OltWE+escvlBpzMFJYSqZtdEO0rIXQBD+odv0Ie9kd HTOrzmv1/UnErc8i8vsMq5X4UzBbdrC+bcmivpuhWEpEuEGSAOXGds3zynYIGam3IwNS PbDg== X-Forwarded-Encrypted: i=1; AHgh+Rrr5frkgs45kuQlvg/7uOwRowWKFdt6WMQ58vjYr/Ge0xrQNQu7pFKCOIEOGgDNVmMVeYbs7S3ej5/mKpw=@vger.kernel.org X-Gm-Message-State: AOJu0YygV6AR0SujG7CGnifOOkjvkSmo2x/+JOx3wrelSnu/NejgvD4o eqFtVC1xxNPpkj8S13fDOU8kVtkaCi1EWDBmB03lkLleXfZ4RocNQJh78nEeY5wQNkEcFrhNT6/ n95B5s1OCJaqJ5jPrzKb34stbZneBxbMlzvyZPpIvCEHKip5PPUkPCZu+nO5Z4DluIh2CLuOJXh 4z X-Gm-Gg: AfdE7cmnXXgRpVWMnkS8uOKBXQys84vOrEnRFpJBxf9H7iJneWKqZkWmDy+LVRXJJRC IzxrFttfGV+D92VzDaghy2Dcc5R6Oc81FCPsXh6zSt/ymMppPyTIYpskznF/8EZ8CMgmT4KsZCp lugLoP6Vb8+2guufpSMcY5V+T1dVnsBM6GjnzgKlYNH7ZX0eBZPpYTuetuYJ7DSaeibLbL9ah2v GUXwmty9W/PIkHHgPmwdCN7fddJa4+F6wWmSxvDcsj0NVyuJfmoixK1MZoGoZupjlON8mZL7N8Y 4K+6kjWopuS2LoyvuyUvvF40xRN6hx1oQ4C++H/bHC3kjXcGIWMgrb7/QQ3kH6LEi5Jc4Hldmhp tpA9eBg5Z1tlvsbhnY1Pw3xhzfjHJ+Vz2 X-Received: by 2002:a05:6000:469b:b0:46e:624e:3c2f with SMTP id ffacd0b85a97d-477595684ffmr7982457f8f.47.1783032272540; Thu, 02 Jul 2026 15:44:32 -0700 (PDT) X-Received: by 2002:a05:6000:469b:b0:46e:624e:3c2f with SMTP id ffacd0b85a97d-477595684ffmr7982435f8f.47.1783032271940; Thu, 02 Jul 2026 15:44:31 -0700 (PDT) Received: from redhat.com (IGLD-80-230-68-31.inter.net.il. [80.230.68.31]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-477dd94cec5sm12205361f8f.19.2026.07.02.15.44.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 02 Jul 2026 15:44:31 -0700 (PDT) Date: Thu, 2 Jul 2026 18:44:28 -0400 From: "Michael S. Tsirkin" To: Brett Sheffield Cc: Simon Schippers , regressions@lists.linux.dev, netdev@vger.kernel.org, Jakub Kicinski , Tim Gebauer , Willem de Bruijn , Jason Wang , Andrew Lunn , "David S. Miller" , Eric Dumazet , Paolo Abeni , linux-kernel@vger.kernel.org Subject: Re: [REGRESSION][BISECTED] tun/tap & vhost-net: multi-threaded network performance Message-ID: <20260702183435-mutt-send-email-mst@kernel.org> References: <20260701165359-mutt-send-email-mst@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: On Thu, Jul 02, 2026 at 01:07:47PM +0200, Brett Sheffield wrote: > On 2026-07-02 09:24, Simon Schippers wrote: > > On 7/1/26 22:56, Michael S. Tsirkin wrote: > > > On Wed, Jul 01, 2026 at 09:16:48PM +0200, Brett Sheffield wrote: > > >> TL;DR - Commit 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3 causes > > >> significant performance regressions with TAP interfaces and multithreaded > > >> network code. Please revert. > > >> > > >> > > >> Librecast is an IPv6 multicast library. One of the tests (0055) fails under > > >> Linux 7.2-rc1. The test performs data synchronization over IPv6 multicast using a TAP > > >> interface. This test has run successfully on every stable, LTS and mainline RC > > >> released in the past year. Every kernel with my Tested-by has run this test. > > >> > > >> There have been a bunch of changes to MLDv2 so I started bisecting there, but > > >> the culprit is actually 1d6e569b7d0c0b2736636749e4be0a27f3cefcb3 "tun/tap & > > >> vhost-net: avoid ptr_ring tail-drop when a qdisc is present" > > >> > > >> Reverting this commit fixes the test. > > >> > > >> To eliminate my code and any multicast weirdness, I ran tests with iperf3 > > >> comparing the same host running 7.2-rc1 both with and without 1d6e569b7d0 > > >> reverted. > > > > Thank you very much for your bisect! > > > > As the author, I am sorry for that regression! > > No worries. That's why we test :-) > > > > - does it help to increase the tun queue size? > > > > I agree, this would be great to know. > > > > However, even then we must act. I am considering IFF_BACKPRESSURE > > as a feature flag, defaulting to off. It would just enable/disable > > the stopping logic in tun_net_xmit() and the waking logic > > in __tun_wake_queue(). If disabled, it would result in the same logic > > as before. > > > > I could provide such a patch as [net] material. > > I'm going to make myself a strong cup of tea and dig into it a bit more here and > will let you know if I find anything worth reporting. > > If you need me to try re-testing with specific settings or test a patch I'm > happy to do so. > > Cheers, > > > Brett > -- > Brett Sheffield (he/him) > Librecast - Decentralising the Internet with Multicast > https://librecast.net/ > https://blog.brettsheffield.com/ Well, the issue was with host to guest right? Then testing what does bql do might be interesting. Might help. Something like this? Lightly tested. diff --git a/drivers/net/tun.c b/drivers/net/tun.c index bfa49fa9e3a1..abc46354c107 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1076,6 +1076,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) queue = netdev_get_tx_queue(dev, txq); spin_lock(&tfile->tx_ring.producer_lock); + netdev_tx_sent_queue(queue, len); ret = __ptr_ring_produce(&tfile->tx_ring, skb); if (!qdisc_txq_has_no_queue(queue) && __ptr_ring_check_produce(&tfile->tx_ring) == -ENOSPC) { @@ -1088,6 +1089,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) spin_unlock(&tfile->tx_ring.producer_lock); if (ret) { + netdev_tx_completed_queue(queue, 1, len); /* This should be a rare case if a qdisc is present, but * can happen due to lltx. * Since skb_tx_timestamp(), skb_orphan(), @@ -2148,15 +2150,19 @@ static ssize_t tun_put_user(struct tun_struct *tun, /* Callers must hold ring.consumer_lock */ static void __tun_wake_queue(struct tun_struct *tun, - struct tun_file *tfile, int consumed) + struct tun_file *tfile, + unsigned int pkts, unsigned int bytes) { struct netdev_queue *txq = netdev_get_tx_queue(tun->dev, tfile->queue_index); + if (bytes) + netdev_tx_completed_queue(txq, pkts, bytes); + /* Paired with smp_mb__after_atomic() in tun_net_xmit() */ smp_mb(); if (netif_tx_queue_stopped(txq)) { - tfile->cons_cnt += consumed; + tfile->cons_cnt += pkts; if (tfile->cons_cnt >= tfile->tx_ring.size / 2 || __ptr_ring_empty(&tfile->tx_ring)) { netif_tx_wake_queue(txq); @@ -2167,12 +2173,16 @@ static void __tun_wake_queue(struct tun_struct *tun, static void *tun_ring_consume(struct tun_struct *tun, struct tun_file *tfile) { + unsigned int bytes = 0; void *ptr; spin_lock(&tfile->tx_ring.consumer_lock); ptr = __ptr_ring_consume(&tfile->tx_ring); - if (ptr) - __tun_wake_queue(tun, tfile, 1); + if (ptr) { + if (!tun_is_xdp_frame(ptr)) + bytes = ((struct sk_buff *)ptr)->len; + __tun_wake_queue(tun, tfile, 1, bytes); + } spin_unlock(&tfile->tx_ring.consumer_lock); return ptr; @@ -3805,7 +3815,7 @@ struct ptr_ring *tun_get_tx_ring(struct file *file) EXPORT_SYMBOL_GPL(tun_get_tx_ring); /* Callers must hold ring.consumer_lock */ -void tun_wake_queue(struct file *file, int consumed) +void tun_wake_queue(struct file *file, unsigned int pkts, unsigned int bytes) { struct tun_file *tfile; struct tun_struct *tun; @@ -3821,7 +3831,7 @@ void tun_wake_queue(struct file *file, int consumed) tun = rcu_dereference(tfile->tun); if (tun) - __tun_wake_queue(tun, tfile, consumed); + __tun_wake_queue(tun, tfile, pkts, bytes); rcu_read_unlock(); } diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index db341c922673..5267b323bd59 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -181,14 +181,23 @@ static int vhost_net_buf_produce(struct sock *sk, { struct file *file = sk->sk_socket->file; struct vhost_net_buf *rxq = &nvq->rxq; + unsigned int bytes = 0; + int i; rxq->head = 0; spin_lock(&nvq->rx_ring->consumer_lock); rxq->tail = __ptr_ring_consume_batched(nvq->rx_ring, rxq->queue, VHOST_NET_BATCH); - if (rxq->tail) - tun_wake_queue(file, rxq->tail); + if (rxq->tail) { + for (i = 0; i < rxq->tail; i++) { + void *ptr = rxq->queue[i]; + + if (!tun_is_xdp_frame(ptr)) + bytes += ((struct sk_buff *)ptr)->len; + } + tun_wake_queue(file, rxq->tail, bytes); + } spin_unlock(&nvq->rx_ring->consumer_lock); return rxq->tail; diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 5f3e206c7a73..49b85bf4f828 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -22,7 +22,7 @@ struct tun_msg_ctl { #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); -void tun_wake_queue(struct file *file, int consumed); +void tun_wake_queue(struct file *file, unsigned int pkts, unsigned int bytes); static inline bool tun_is_xdp_frame(void *ptr) { @@ -56,7 +56,8 @@ static inline struct ptr_ring *tun_get_tx_ring(struct file *f) return ERR_PTR(-EINVAL); } -static inline void tun_wake_queue(struct file *f, int consumed) {} +static inline void tun_wake_queue(struct file *f, + unsigned int pkts, unsigned int bytes) {} static inline bool tun_is_xdp_frame(void *ptr) {