From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4487A169AE6 for ; Mon, 25 Aug 2025 19:37:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.171 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756150657; cv=none; b=DFUnLh6dnabhWGQt8qzlHNY/oJOH4E1vWyJd3/ONBwFMx150VcDalteMUQJz9Ms+tdBEgbH9a/s59nYExOaoUCX3s9t9a2gqM9KtzbRsgYd048YSmPnHCFf9aMrYPczceO3w0jmb5CuXh+o1Sxl8H0pjwQbKt5oy+1erYm+A+TI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756150657; c=relaxed/simple; bh=K10h3M9KbOE5hMC4c+vdDyM6WXJwS/HGxhRInopiXV8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=H0Hb1QyU0cBHwhQJz2MOwahKnPMBHJJExXDBq9uiEzUsIkLSOh3pfUbR0atq+wnCQEXmpmuSH2sTCCg3S/RBHvMWFECyVSrF80rLitQWGMFFst1SrkhbGBAlUy1O5cnCc2odfYQwVuCKBZ45FCcEEDe7w4iXRwWls1wm1mMunyY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cCG/5y/C; arc=none smtp.client-ip=209.85.210.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cCG/5y/C" Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-76e2e88c6a6so4207012b3a.1 for ; Mon, 25 Aug 2025 12:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756150655; x=1756755455; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=JspAce57i6qHqxwzi6a3rYAAXGsJT3LXkyDZDvwF1lw=; b=cCG/5y/C1U+VlnwpN6UqfGInU7CWbbuiOEzO2UWsfs2iKIZJoPLRgIETEHQtscGDfx M+V/0y683330tXvADM9dph1vaLYJARxKf+akegh3++bxwj5Cd1c0FWfmhTz+VSyMlkyZ WjjeUyfqNsWf8BPkPF31V4jolEj+6kXvsPdyoPNaQySvc6CmiqHO6q5MUtG7rNsrtDvy 2e0KtInIAldAYtBtzwyDNYmWT8z6t0V8ULfVz31uRWYdMG8U0Tz0+JWML/vS7V0wu62Z G0q9s6R0+R5LssTdJkuOM6JFthoMvDqouTZbRGwRl7xPJ5IChWuGKH2DIOCqQymiClkD l+ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756150655; x=1756755455; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=JspAce57i6qHqxwzi6a3rYAAXGsJT3LXkyDZDvwF1lw=; b=AoDWFPOe1kGyZyprIUG1mK4bOQm1QZlgqCUeA36Fr63pFPklCze5wyIA7LHOv1eqHy a4BTskZuWZcHfj7FE3uUThHuvlRvs59u6PpYmz6htbYNG20XfIjEn1o76mrf+rQJuN0t biuOhWfHjL/a5cqR9EYCyGYcRNnY59kWyZ5lDw+sHuanHSf10foadhAVH83jjEqppPQs l812Q5PVeF/m6Hve2yxMNKQAbx012YwPE9NjFQGW0VywFhiiKF7IHr588vWxLX0NawnT l8R5polbaOpXoKwIB1n9q6RaJiUh+GzsoTBeNtQVrQ6YwlD/ADRdepQ8Ql+gZx5av00/ 2W3Q== X-Forwarded-Encrypted: i=1; AJvYcCVWHVovWY1qQNahHJiWr9sC4a2Gk2CUaxlTqkdkLhbl7NsLUiIc1pyN61XEtm4pz8RrExChAPU=@vger.kernel.org X-Gm-Message-State: AOJu0YwKZGbZlXPPB4SVxspv2in1VE1w8vG2xE6CwLdBnRRtp2HBBJlH OyoNlTQY2MVRgFHddBxK8UB5VtTKgtK9IKLmf18Tcf5bPclECis9sKc= X-Gm-Gg: ASbGncv4jkrmSKuRTC1r5Q2lox0uR1jUxETsuHbwbb5Uciyfud1pu3PnfRssv4uH/Nl ykoDMZa5yZvLBPASeZP5Suo3RC3c7UheFhtRW01H7FOs4slSi1Pxw1BI8Gs5dFLzyDwyVKLaAOk kk1pRF2oB212uV3L9Gk8V0eNDleJIDbJBMbHj/35WcQoGAKA6rZbLpBJlUVLYf+QGx62n1QmmTV wtSZLq2uB3DjA59kECFqU3n4LYDp8ugi96+mijzewsHa4HXprYW1nBCq3+2fkMb8pBeaX6KQ9vc TmuxDkEisR28MxqvIzCOyJxZPdpOKT+o3LdTAdFTealKppvSycmSYFTvJ0pF7qVk5BkITVEOHb0 cO9CnGCbN8aOwNX6bAIDHP3qIHF+UVX3ZcBGxywh6fvyhrCGsm0CjrIp3bEkDgf0Q6jwQ2XEgNK wxn3QiUVKHl1+LTxa64wF1x8pUIOaMxl4pLLtvTNN2WOPThl/SZS4p1zhcpYc5py2JH8hURbFPS HW7 X-Google-Smtp-Source: AGHT+IEzou8X8pz2beNTkpAR6bLvAktH1PlEXDIMYf0PhcVCPQ/4h6wR79IX35VLPhRCMXJXY3pJkg== X-Received: by 2002:a05:6a00:b45:b0:770:556d:32e8 with SMTP id d2e1a72fcca58-770556d5e86mr7829473b3a.24.1756150655285; Mon, 25 Aug 2025 12:37:35 -0700 (PDT) Received: from localhost (c-73-158-218-242.hsd1.ca.comcast.net. [73.158.218.242]) by smtp.gmail.com with UTF8SMTPSA id d2e1a72fcca58-771e6535272sm2962471b3a.24.2025.08.25.12.37.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Aug 2025 12:37:35 -0700 (PDT) Date: Mon, 25 Aug 2025 12:37:34 -0700 From: Stanislav Fomichev To: Samiullah Khawaja Cc: Jakub Kicinski , "David S . Miller " , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, mkarsten@uwaterloo.ca, Joe Damato , netdev@vger.kernel.org Subject: Re: [PATCH net-next v7 0/2] Add support to do threaded napi busy poll Message-ID: References: <20250824215418.257588-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20250824215418.257588-1-skhawaja@google.com> On 08/24, Samiullah Khawaja wrote: > Extend the already existing support of threaded napi poll to do continuous > busy polling. > > This is used for doing continuous polling of napi to fetch descriptors > from backing RX/TX queues for low latency applications. Allow enabling > of threaded busypoll using netlink so this can be enabled on a set of > dedicated napis for low latency applications. > > Once enabled user can fetch the PID of the kthread doing NAPI polling > and set affinity, priority and scheduler for it depending on the > low-latency requirements. > > Currently threaded napi is only enabled at device level using sysfs. Add > support to enable/disable threaded mode for a napi individually. This > can be done using the netlink interface. Extend `napi-set` op in netlink > spec that allows setting the `threaded` attribute of a napi. > > Extend the threaded attribute in napi struct to add an option to enable > continuous busy polling. Extend the netlink and sysfs interface to allow > enabling/disabling threaded busypolling at device or individual napi > level. > > We use this for our AF_XDP based hard low-latency usecase with usecs > level latency requirement. For our usecase we want low jitter and stable > latency at P99. > > Following is an analysis and comparison of available (and compatible) > busy poll interfaces for a low latency usecase with stable P99. Please > note that the throughput and cpu efficiency is a non-goal. > > For analysis we use an AF_XDP based benchmarking tool `xdp_rr`. The > description of the tool and how it tries to simulate the real workload > is following, > > - It sends UDP packets between 2 machines. > - The client machine sends packets at a fixed frequency. To maintain the > frequency of the packet being sent, we use open-loop sampling. That is > the packets are sent in a separate thread. > - The server replies to the packet inline by reading the pkt from the > recv ring and replies using the tx ring. > - To simulate the application processing time, we use a configurable > delay in usecs on the client side after a reply is received from the > server. > > The xdp_rr tool is posted separately as an RFC for tools/testing/selftest. > > We use this tool with following napi polling configurations, > > - Interrupts only > - SO_BUSYPOLL (inline in the same thread where the client receives the > packet). > - SO_BUSYPOLL (separate thread and separate core) > - Threaded NAPI busypoll > > System is configured using following script in all 4 cases, > > ``` > echo 0 | sudo tee /sys/class/net/eth0/threaded > echo 0 | sudo tee /proc/sys/kernel/timer_migration > echo off | sudo tee /sys/devices/system/cpu/smt/control > > sudo ethtool -L eth0 rx 1 tx 1 > sudo ethtool -G eth0 rx 1024 > > echo 0 | sudo tee /proc/sys/net/core/rps_sock_flow_entries > echo 0 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus > > # pin IRQs on CPU 2 > IRQS="$(gawk '/eth0-(TxRx-)?1/ {match($1, /([0-9]+)/, arr); \ > print arr[0]}' < /proc/interrupts)" > for irq in "${IRQS}"; \ > do echo 2 | sudo tee /proc/irq/$irq/smp_affinity_list; done > > echo -1 | sudo tee /proc/sys/kernel/sched_rt_runtime_us > > for i in /sys/devices/virtual/workqueue/*/cpumask; \ > do echo $i; echo 1,2,3,4,5,6 > $i; done > > if [[ -z "$1" ]]; then > echo 400 | sudo tee /proc/sys/net/core/busy_read > echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > fi > > sudo ethtool -C eth0 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0 > > if [[ "$1" == "enable_threaded" ]]; then > echo 0 | sudo tee /proc/sys/net/core/busy_poll > echo 0 | sudo tee /proc/sys/net/core/busy_read > echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > echo 2 | sudo tee /sys/class/net/eth0/threaded > NAPI_T=$(ps -ef | grep napi | grep -v grep | awk '{ print $2 }') > sudo chrt -f -p 50 $NAPI_T > > # pin threaded poll thread to CPU 2 > sudo taskset -pc 2 $NAPI_T > fi > > if [[ "$1" == "enable_interrupt" ]]; then > echo 0 | sudo tee /proc/sys/net/core/busy_read > echo 0 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > fi > ``` > > To enable various configurations, script can be run as following, > > - Interrupt Only > ``` >