From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from unimail.uni-dortmund.de (mx1.hrz.uni-dortmund.de [129.217.128.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0358B311958 for ; Fri, 1 May 2026 20:36:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=129.217.128.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777667781; cv=none; b=GTGurbPSVqrI3+4gl7Eq7lvq2IVNc6R/FWREHdUekWTr6Zone2Wfmfo75Hd08Va+dQA4iThdgbodmziOt+bxPYPSJrqn/Y7oTEL8/fYOLR66iljTn+YYZRKKgo5Brrwfyi40fEBDDXz5utZc8avDCAHV01Hwk1kczk38OSJJWDc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777667781; c=relaxed/simple; bh=T0lzHt3k2SflcgeBXqnPgMQIkF6Lsb5wQUoirQMnWHg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=D+/ed+55gXveI68BtgSfiu+/Joh/gGOVazrQI62RPe/n0ohR1BvpfsUbj91u/wR7ibxU9MU0bZlfWxhr7J7FZkPk9UYO+bA3vG8xfA6xUAKzNRMrJSFOb+KMoKrIWYR5Ua5BCrF+x2jDt36hjxWL30A4dBQQ/Av8mSKHPn9feW4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de; spf=pass smtp.mailfrom=tu-dortmund.de; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b=aPaEvqZV; arc=none smtp.client-ip=129.217.128.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=tu-dortmund.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=tu-dortmund.de header.i=@tu-dortmund.de header.b="aPaEvqZV" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=tu-dortmund.de; s=unimail; t=1777667758; bh=T0lzHt3k2SflcgeBXqnPgMQIkF6Lsb5wQUoirQMnWHg=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=aPaEvqZVKZCgGHV3/UOG3BOYQUnLYioUmoy6S8NZVzTEp2ZvdRxNTogS5rF+ub4jL o5ViRGlCg2BIJGmrtXZjm0ktziizzT2TXgi7K/p0WQJJrtLE2T82+BHqdoFH3zDOAE c0wIRj1edUWsP9M0x7K8dKRiCFnAVBKsC6XHlGUY= Received: from [IPV6:2a01:599:222:5292:7ab9:9848:5d84:7612] (tmo-072-87.customers.d1-online.com [80.187.72.87]) (authenticated bits=0) by unimail.uni-dortmund.de (8.18.2/8.18.2) with ESMTPSA id 641KZt8c005866 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Fri, 1 May 2026 22:35:55 +0200 (CEST) Message-ID: Date: Fri, 1 May 2026 22:35:54 +0200 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH net-next 5/5] selftests: net: add veth BQL stress test To: =?UTF-8?Q?Jonas_K=C3=B6ppeler?= , Jesper Dangaard Brouer Cc: andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, horms@kernel.org, jhs@mojatatu.com, jiri@resnulli.us, kernel-team@cloudflare.com, kuba@kernel.org, netdev@vger.kernel.org, pabeni@redhat.com References: <20260324174719.1224337-7-hawk@kernel.org> <02a713ac-f794-416f-9d69-95ea98d515b6@tu-dortmund.de> <1c435d90-8d08-4ac1-8b84-cc72c0b4e30f@tu-berlin.de> <7bc8edc9-1b46-471a-9a6e-cd2c27aebcfc@tu-dortmund.de> Content-Language: en-US From: Simon Schippers In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 5/1/26 10:43, Jonas Köppeler wrote: > On 4/30/26 2:31 PM, Jesper Dangaard Brouer wrote: >> >> >> On 30/04/2026 11.45, Simon Schippers wrote: >>> On 4/30/26 11:17, Jonas Köppeler wrote: >>>> On 3/28/26 4:19 PM, Simon Schippers wrote: >>>>> Hi, thanks for your work! I am really interested in this patchset. >>>>> >>>>> I am planning to submit a similar patch set (see [1]) for the tun/tap >>>>> driver, where I am currently implementing qdisc backpressure similar >>>>> to that used in veth. >>>>> >>>>> Can you run pktgen [2] to see if there is a regression? >>>>> I think that there might be a slowdown due to BQL not choosing a big >>>>> enough queue size. >>>> I ran some tests using pktgen by replacing the trafficgen from the >>>> selftest with samples/pktgen/pktgen_sample01_simple.sh (Patch v3) >>>> and used --nrules 0. In general the throughput is quite similar: >>>> >>>> BQL disabled (using --bql-disable): >>>> 2378694pps 1141Mb/sec (1141773120bps) errors: 0 >>>> 2400898pps 1152Mb/sec (1152431040bps) errors: 0 >>>> 2358125pps 1131Mb/sec (1131900000bps) errors: 0 >>>> 2402034pps 1152Mb/sec (1152976320bps) errors: 0 >>>> 2362061pps 1133Mb/sec (1133789280bps) errors: 0 >>>> 2416301pps 1159Mb/sec (1159824480bps) errors: 0 >>>> 2398496pps 1151Mb/sec (1151278080bps) errors: 0 >>>> 2415200pps 1159Mb/sec (1159296000bps) errors: 0 >>>> 2375921pps 1140Mb/sec (1140442080bps) errors: 0 >>>> 2427419pps 1165Mb/sec (1165161120bps) errors: 0 >>>> 2382461pps 1143Mb/sec (1143581280bps) errors: 0 >>>> >>>> mean: 2392510pps >>>> >>>> BQL enabled: >>>> 2159545pps 1036Mb/sec (1036581600bps) errors: 0 >>>> 2321899pps 1114Mb/sec (1114511520bps) errors: 0 >>>> 2477853pps 1189Mb/sec (1189369440bps) errors: 0 >>>> 2447857pps 1174Mb/sec (1174971360bps) errors: 0 >>>> 2400284pps 1152Mb/sec (1152136320bps) errors: 0 >>>> 2442841pps 1172Mb/sec (1172563680bps) errors: 0 >>>> 2442540pps 1172Mb/sec (1172419200bps) errors: 0 >>>> 2410585pps 1157Mb/sec (1157080800bps) errors: 0 >>>> 2395902pps 1150Mb/sec (1150032960bps) errors: 0 >>>> 2393260pps 1148Mb/sec (1148764800bps) errors: 0 >>>> 2401959pps 1152Mb/sec (1152940320bps) errors: 0 >>>> >>>> mean: 2390411pps >>>> >>>> BQL enabled is ~2099pps (~0.09%) lower than BQL disabled. >>> >>> Sounds great! >>> >>> One more thing: >>> Could you check what BQL limit settles during the test run using >>> something like: >>> >>> watch -n 0.1 'cat /sys/class/net/XXXXX/queues/tx-0/byte_queue_limits/limit' >> >> FYI: The selftest already tracks BQL "limit" and "inflight". >> - Jonas can just report those BQL inflight logs >> >> +print_periodic_stats() { >> + local elapsed="$1" >> + >> + # BQL stats and watchdog counter >> + WD_CNT=$(cat /sys/class/net/${VETH_A}/queues/tx-0/tx_timeout \ >> + 2>/dev/null) || WD_CNT="?" >> + if [ -n "$BQL_DIR" ] && [ -d "$BQL_DIR" ]; then >> + INFLIGHT=$(cat "$BQL_DIR/inflight" 2>/dev/null || echo "?") >> + LIMIT=$(cat "$BQL_DIR/limit" 2>/dev/null || echo "?") >> + echo " [${elapsed}s] BQL inflight=${INFLIGHT} limit=${LIMIT}" \ >> + "watchdog=${WD_CNT}" >> + else >> + echo " [${elapsed}s] watchdog=${WD_CNT} (no BQL sysfs)" >> + fi >> > Hi, so what I found is that pktgen does not respect > __QUEUE_STATE_STACK_OFF. So the test data above is invalid, since it > just sent packets even if the BQL "stopped" the queue. So I patched > pktgen with the following: > > - if (unlikely(netif_xmit_frozen_or_drv_stopped(txq))) { > + if (unlikely(netif_xmit_frozen_or_stopped(txq))) { > > Test run with --nrules 0 > > BQL disabled (using --bql-disable): > inflight packets is always around 200 packets and throughput > 2264138pps 1086Mb/sec (1086786240bps) > > BQL enabled: > inflight packet is always 3 packets (with some exception that > sometimes its even 0) and throughput is degraded: > 1813455pps 870Mb/sec (870458400bps) > limit is 2. > > BQL enabled is roughly 20% worse in throughput. Good findings. > > Test run with --nrules 3500 > > BQL disabled: Inflight ~200, throughput: 27161pps 13Mb/sec > BQL enabled: Inflight 3 (limit 2), throughput: 26085pps 12Mb/sec > BQL ~4% worse. > > Test run with --nrules 5000 > > BQL disabled: Inflight ~200, throughput: 19395pps 9Mb/sec > BQL enabled: Inflight 3 (limit 2), throughput: 20423pps 9Mb/sec > BQL ~5.3% better. > > So it seems that BQL will always steer to a limit of 2. Could this be > a result of that we call netdev_tx_completed_queue for every packet? > > Looking at the comment above netdev_tx_completed_queue in > include/linux/netdevice.h: > > "Must be called at most once per TX completion round (and not per > individual packet), so that BQL can adjust its limits appropriately." > > This is consistent with what Tom Herbert stated in the original BQL > cover letter [1]: > > "BQL accounting is in the transmit path for every packet, and the > function to recompute the byte limit is run once per transmit > completion." Yes, exactly that will be the problem here. There must be a periodic transmit completion but there is not. Adding something like a tasklet for calling netdev_tx_completed_queue() periodically feels wrong. And veth_tx_timeout() is also not suited for that. And calling it on ptr_ring_empty() is probably also wrong. I guess there must be a new "BQL" algorithm just for software interfaces which considers: 1. Context Switching (bigger ring size better) 2. Cache locality (smaller ring size better) 3. Bufferbloat (time limit in ring?) I think it is a hard problem. How about first adding a option to modify the VETH_RING_SIZE? > > [1] https://lwn.net/Articles/469652/ > > Jonas > >>> I guess it will just choose the ptr_ring size as limit in this case, >>> but it would be nice if you could briefly verify this :) >>> >>> Thanks! >>> >>>> >>>>> Thanks! >>>>> >>>>> [1] Link:https://lore.kernel.org/all/20260312130639.138988-1-simon.schippers@tu-dortmund.de/ >>>>> [2] Link:https://www.kernel.org/doc/html/latest/networking/pktgen.html >>