From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-qk1-f170.google.com (mail-qk1-f170.google.com [209.85.222.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1C9B527A122 for ; Mon, 28 Apr 2025 16:52:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745859136; cv=none; b=K0mKl1OVJTnJAuW1yGrb4AS8r8tqHZp3bBsqZs4fCV1w+ygs0G9FWWvW/lf02J4MRVSz6BTlN1eN5nmJ/cmZGP5prFiW8je545IHibnohyWeH9EFLeSXFMP8gNIvC6nmNSHmTKRjur9WKhJB6fu7XQCJHK9ud5jgXoh3UdjT+qs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745859136; c=relaxed/simple; bh=0ZD9qSVQwToN8FSjq7DF0U2ZHXcZG8Gg7JWw/83gfEY=; h=Date:From:To:Cc:Message-ID:In-Reply-To:References:Subject: Mime-Version:Content-Type; b=J6lDMPq0/byq3bh51MELncEedIOOEQFbIILQgQdd7/YX9WjS/oGb8LYSdtbUxIA5qzljhPdqxJSx0ye4LLJ/3uffd6eFJCvWgePCAAe0vV6cGRp3g9+r/SpNnDJVmyIb0rwg1qfINPBAfSmAscBC4WoOwbZJZAwMEM/CuiLjn3E= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gpy10ugL; arc=none smtp.client-ip=209.85.222.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gpy10ugL" Received: by mail-qk1-f170.google.com with SMTP id af79cd13be357-7c597760323so562786185a.3 for ; Mon, 28 Apr 2025 09:52:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745859133; x=1746463933; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=ddvgJUJeg53nZFmJicdOmV6fZH2hAdDKzwv87SgC2Rg=; b=gpy10ugLUyVwQGI/eYid2gnzPHgmCAXw224IzBCYtKTc+BsuUj5NpYIk1PCw42YBvY G7Tf5H5SgUtKVdh4ikAnWE9qA6un6MrRFY1JnKCHrUmQgV4LWrW3BujvTmbG9jPDjLvC 1FHLW+pQIx4uV0KoRra6xvPH1PwDWjKV2WR+LrZS41EPg4MG3BJi2PDdWhkS3CsSVbGG RuLmSF3N6NBbwfOzcL34dj6NB77QoQDhOwZ9ITvQhkKzVFahG7tJ6/pJN0u0aIZ/sfn2 ekT0/jHT7OmBXEHe1te+QxKesFK/+30muik12EME26JHoAU1aT4o344TPvJDAhnb2DmG zeHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745859133; x=1746463933; h=content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=ddvgJUJeg53nZFmJicdOmV6fZH2hAdDKzwv87SgC2Rg=; b=K7wtE9XFVshR/nzXZj8winl3yKBq6hOH0i80ki6WDPJQpnlkwsclWvny9az+ZmNp/q 4mECqu16I8HbuldJ3TUQPLDIauXse46JyFHlB7QUUmu1NOpOhjLFDKGb7RbNp9pIDM+4 bojXm5d0nCW/jufvrwlwWyaJDPlSiSllssd790PbH4ouUQ0rHRRWPaSTvlcAJ2x7pOHZ LyZvbcSpbpKt4AYAjlbmgDzNfJhJ8IOuRl3ip5mi8H7cqopAIwz9kweBosAGYG+TQsvJ 23z9vM5eT/zObX6fREtKsNE/s+fJ6BAU3HsaQNf5xvGTj5VOn17mVPyJw4iL3Ed2Vzfg yuXA== X-Gm-Message-State: AOJu0YxKe6uJJA0CKsoEHbdh16wja1sR0qmi4+QLL3BcKFXKxYE8Uogl djJ/d4kg6oI9MlIqi/Ap6xkXqbYxs/InwLSNuMAD7vNX/7fKU9lhbRSYjA== X-Gm-Gg: ASbGncudrNcyb2FWrewYI3UWiH9oYhCzoWOetW9cubjGljP+p8f796jakRLBvMZoyY8 zcJaxn6XcuO9Wh/AaJGyKPVW4IF4eEP+5v8t1t+DlL+bqfaS2xcG9g9QCIWin/uPWfvDnB8DMIJ JPayqPANOey2N3KShua9IPVvZHvtFIYYNgc7MkP6KVc2C3kQ/k0DO3HqUX8sIzwF6dlWolEkX4/ IVV21XEln+KeQHXqJdkIYTC9w5eA/MHSLF1b7XguOTf3CeFeYrf+npfjfiOSbjrEkuSDzZKd/Yj 3DIw+wZmkR3BrmEF7tJj4hGa1DPikQbfabz6H2V74xUmFzZwe5O0BAM7V15SF2OyDE9OPVsINjg JLmnjRbHitQ2kD9JSBKsq X-Google-Smtp-Source: AGHT+IHpyQMPL27H6J6fxMvFhCQv4orE6B/NdZmB6t8szlsd329s3a4mgaGBuxMAgc801zADFQt1+Q== X-Received: by 2002:a05:620a:3727:b0:7c5:4673:a224 with SMTP id af79cd13be357-7c9607ac4e5mr2115000285a.50.1745859132727; Mon, 28 Apr 2025 09:52:12 -0700 (PDT) Received: from localhost (141.139.145.34.bc.googleusercontent.com. [34.145.139.141]) by smtp.gmail.com with UTF8SMTPSA id af79cd13be357-7c958e7c001sm640695285a.82.2025.04.28.09.52.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 28 Apr 2025 09:52:12 -0700 (PDT) Date: Mon, 28 Apr 2025 12:52:11 -0400 From: Willem de Bruijn To: Martin Karsten , Samiullah Khawaja , Jakub Kicinski , "David S . Miller" , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, jdamato@fastly.com Cc: netdev@vger.kernel.org Message-ID: <680fb23bb1953_23f881294d9@willemb.c.googlers.com.notmuch> In-Reply-To: <52e7cf72-6655-49ed-984c-44bd1ecb0d95@uwaterloo.ca> References: <20250424200222.2602990-1-skhawaja@google.com> <52e7cf72-6655-49ed-984c-44bd1ecb0d95@uwaterloo.ca> Subject: Re: [PATCH net-next v5 0/4] Add support to do threaded napi busy poll Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Martin Karsten wrote: > On 2025-04-24 16:02, Samiullah Khawaja wrote: > > Extend the already existing support of threaded napi poll to do continuous > > busy polling. > > > > This is used for doing continuous polling of napi to fetch descriptors > > from backing RX/TX queues for low latency applications. Allow enabling > > of threaded busypoll using netlink so this can be enabled on a set of > > dedicated napis for low latency applications. > > > > Once enabled user can fetch the PID of the kthread doing NAPI polling > > and set affinity, priority and scheduler for it depending on the > > low-latency requirements. > > > > Currently threaded napi is only enabled at device level using sysfs. Add > > support to enable/disable threaded mode for a napi individually. This > > can be done using the netlink interface. Extend `napi-set` op in netlink > > spec that allows setting the `threaded` attribute of a napi. > > > > Extend the threaded attribute in napi struct to add an option to enable > > continuous busy polling. Extend the netlink and sysfs interface to allow > > enabling/disabling threaded busypolling at device or individual napi > > level. > > > > We use this for our AF_XDP based hard low-latency usecase with usecs > > level latency requirement. For our usecase we want low jitter and stable > > latency at P99. > > > > Following is an analysis and comparison of available (and compatible) > > busy poll interfaces for a low latency usecase with stable P99. Please > > note that the throughput and cpu efficiency is a non-goal. > > > > For analysis we use an AF_XDP based benchmarking tool `xdp_rr`. The > > description of the tool and how it tries to simulate the real workload > > is following, > > > > - It sends UDP packets between 2 machines. > > - The client machine sends packets at a fixed frequency. To maintain the > > frequency of the packet being sent, we use open-loop sampling. That is > > the packets are sent in a separate thread. > > - The server replies to the packet inline by reading the pkt from the > > recv ring and replies using the tx ring. > > - To simulate the application processing time, we use a configurable > > delay in usecs on the client side after a reply is received from the > > server. > > > > The xdp_rr tool is posted separately as an RFC for tools/testing/selftest. > > > > We use this tool with following napi polling configurations, > > > > - Interrupts only > > - SO_BUSYPOLL (inline in the same thread where the client receives the > > packet). > > - SO_BUSYPOLL (separate thread and separate core) > > - Threaded NAPI busypoll > > > > System is configured using following script in all 4 cases, > > > > ``` > > echo 0 | sudo tee /sys/class/net/eth0/threaded > > echo 0 | sudo tee /proc/sys/kernel/timer_migration > > echo off | sudo tee /sys/devices/system/cpu/smt/control > > > > sudo ethtool -L eth0 rx 1 tx 1 > > sudo ethtool -G eth0 rx 1024 > > > > echo 0 | sudo tee /proc/sys/net/core/rps_sock_flow_entries > > echo 0 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus > > > > # pin IRQs on CPU 2 > > IRQS="$(gawk '/eth0-(TxRx-)?1/ {match($1, /([0-9]+)/, arr); \ > > print arr[0]}' < /proc/interrupts)" > > for irq in "${IRQS}"; \ > > do echo 2 | sudo tee /proc/irq/$irq/smp_affinity_list; done > > > > echo -1 | sudo tee /proc/sys/kernel/sched_rt_runtime_us > > > > for i in /sys/devices/virtual/workqueue/*/cpumask; \ > > do echo $i; echo 1,2,3,4,5,6 > $i; done > > > > if [[ -z "$1" ]]; then > > echo 400 | sudo tee /proc/sys/net/core/busy_read > > echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > > fi > > > > sudo ethtool -C eth0 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0 > > > > if [[ "$1" == "enable_threaded" ]]; then > > echo 0 | sudo tee /proc/sys/net/core/busy_poll > > echo 0 | sudo tee /proc/sys/net/core/busy_read > > echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > > echo 2 | sudo tee /sys/class/net/eth0/threaded > > NAPI_T=$(ps -ef | grep napi | grep -v grep | awk '{ print $2 }') > > sudo chrt -f -p 50 $NAPI_T > > > > # pin threaded poll thread to CPU 2 > > sudo taskset -pc 2 $NAPI_T > > fi > > > > if [[ "$1" == "enable_interrupt" ]]; then > > echo 0 | sudo tee /proc/sys/net/core/busy_read > > echo 0 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > > fi > > ``` > > > > To enable various configurations, script can be run as following, > > > > - Interrupt Only > > ``` > >