From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B63531C84C7 for ; Mon, 25 Aug 2025 22:42:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756161775; cv=none; b=PO4o+X3k92zt6Pe90FpE4RWgg0a6lSG6Udw3eDdPAuprXWRo96AEEGxZpzevMF6uDhBIEtmDEasRgt9G8nmu/AZk/m5bOANhTXVQYLnyRBWAtHEmt94VEEN1lzzLnnMAPPPeDEr1vaa3UE+0k3MAEvaVz2OmrGFGC+ZXiKr2ZgE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756161775; c=relaxed/simple; bh=IJunPW85kRMAf++4P061rRRvbPmG42JbZAaGXQJ9mAg=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Bvlsz1s/ywg6WTNq0CQXqZkqZ9t2i03TOg8q5YLJndbfAX7j5P/5lceSzZR32wCUatxMOgevXwSEU0GFlvFN/VP4PPKkX/U1HWMnVW45TyJ7Wo35Bc7ngWEpBV9vrlyTGuiSn3dO81L8mz4lM7U6dwn3db9/Ebjxc9Yu5yG79ZQ= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UBSpUH+7; arc=none smtp.client-ip=209.85.210.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UBSpUH+7" Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-771f90a45easo55033b3a.1 for ; Mon, 25 Aug 2025 15:42:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1756161772; x=1756766572; darn=vger.kernel.org; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=IUU9SBm12tPx01wx/db69vt2qPGo6C1i7LGgG2osvr4=; b=UBSpUH+7wZNp3Rjh8WI0GS5ooEoUJkP4AQYIy9JajO8mSvEczs/nS9My3Y0aQEDSlw FDOQjUtYZFQWhwWINMAwTUq8K3+8o4PcCcsDrXU8CTlKZvNAjQDZooUV/cVRoOssqnJq wzq9IJUKTUnZz6FCbrNHQz1y625sIoYkFqEf2YnO6x3ww46M2usJ+IaXiwhvatbgvfDa YHETJnoy0YU5KPJi+bb1dgq5MUjO0kqgqjltMvxIapqHlocd4AnJ13KitYoq7uXtaVU8 IDap69BO6vmIeTJRhoWUcDjohoys2lk9LJhC+5kQncWIn/X6gWRDAJQFi60Z2RFxpOvG NoQA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756161772; x=1756766572; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IUU9SBm12tPx01wx/db69vt2qPGo6C1i7LGgG2osvr4=; b=qWKgbtbyNTyFYiqAOgufD6ynq6BBWITEAQWKm2nY702dJHdnvA5qQZ5+O3PFtV67pG RZbw6/6Ctm3+z42uwoe9Wnl2sWBfnhq3AwouObgd4rG6dbkgqoPjBlgSH5yNAh/fyG5g oinqRcXjL0LulNXZtzIB/lW7O/GXTEXnVIJnQBQUoiWqfX1wDWVfS+5zaTLDkNa26aqb 3cAvcND1VLvJXZJyLdClpCpcRLJ+5cxsYR8Pi/0zRlyRFnzSvhIZGWbyG5MCd/cv2UlS +qgHRaT3spXqrwkb5lys/Nbbe8jzZCi4nkb5iQ0ZIlTIkyLqOJyfz5VZDJjnOQekt4MS IqhA== X-Forwarded-Encrypted: i=1; AJvYcCUMFTiXfef866EMu5AiZa5SQU4neyKePR6sFnm3tnrOoNtedchzzXmNRHmSvQVr+9HLeaYuuak=@vger.kernel.org X-Gm-Message-State: AOJu0YyCuubofMhxtwJ0gPDVi7Nc+QYwCZnecwJBE97lg7OcnpMfFe7s 2JoVun/w+cbdIkh4s1dzi44n6RI3XIh8lPnzHJ+tOzRchcuPBUQCOag= X-Gm-Gg: ASbGncsZzH4a5S4XmSLGTZopL2Z0DomK0c7Q3qMb4nLG2B4/QLoQ+lB/JKQy69HRtww OvnLvGlYAUT+h0lBazshLHRXqLJBIWwekA7C2+K1KBTbWqjtPxFAHesj8hzg7mlCAr9JmhBoIu9 DbmlZ0FmZEtUTWVYKD8p9PtFNrGN4C17F8/PgdWEQsWWPsP1oI4KHgzrnp6WzOi6rcUiw9oAyeU r2CqCMZNnaHiUULmQ1eiwkX9yNtFUhEfMfYgPPJcp68le3KW6Mys7yzxSyKnWZW772N9HbEYmfX zwNIgPLqHXjnfhbJl7bH5+0XdtlrJ45VhyU/PktJK36pV2DaobZVv6b39S0q2WPZavggCVfJOmX 0MHoPRxNd3yi6LkqCi+3dzz1WjY33BAZroA18SWhES0N06XyGOWNo1D72qWbgLxzZQ0778njZPC 9O+oszip6OT1jx6XYezjrEtFsNIl2gzcl86rkgqKt3d64/9jDNnirA3yqG5Wrl7qiJ6F96UsZrj oE7 X-Google-Smtp-Source: AGHT+IHQaoabPS3jZ134QAfxzsKWh3AKCTB7TGPm0+Ib2SxeLUbx3KVxi0P5vzbmcFOmyGQwdxg6dg== X-Received: by 2002:a05:6a20:a106:b0:240:2473:57b7 with SMTP id adf61e73a8af0-24340af0a21mr18460297637.8.1756161771854; Mon, 25 Aug 2025 15:42:51 -0700 (PDT) Received: from localhost (c-73-158-218-242.hsd1.ca.comcast.net. [73.158.218.242]) by smtp.gmail.com with UTF8SMTPSA id 41be03b00d2f7-b49cbbc9945sm7522446a12.58.2025.08.25.15.42.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Aug 2025 15:42:51 -0700 (PDT) Date: Mon, 25 Aug 2025 15:42:51 -0700 From: Stanislav Fomichev To: Samiullah Khawaja Cc: Jakub Kicinski , "David S . Miller" , Eric Dumazet , Paolo Abeni , almasrymina@google.com, willemb@google.com, mkarsten@uwaterloo.ca, Joe Damato , netdev@vger.kernel.org Subject: Re: [PATCH net-next v7 0/2] Add support to do threaded napi busy poll Message-ID: References: <20250824215418.257588-1-skhawaja@google.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On 08/25, Samiullah Khawaja wrote: > On Mon, Aug 25, 2025 at 12:37 PM Stanislav Fomichev > wrote: > > > > On 08/24, Samiullah Khawaja wrote: > > > Extend the already existing support of threaded napi poll to do continuous > > > busy polling. > > > > > > This is used for doing continuous polling of napi to fetch descriptors > > > from backing RX/TX queues for low latency applications. Allow enabling > > > of threaded busypoll using netlink so this can be enabled on a set of > > > dedicated napis for low latency applications. > > > > > > Once enabled user can fetch the PID of the kthread doing NAPI polling > > > and set affinity, priority and scheduler for it depending on the > > > low-latency requirements. > > > > > > Currently threaded napi is only enabled at device level using sysfs. Add > > > support to enable/disable threaded mode for a napi individually. This > > > can be done using the netlink interface. Extend `napi-set` op in netlink > > > spec that allows setting the `threaded` attribute of a napi. > > > > > > Extend the threaded attribute in napi struct to add an option to enable > > > continuous busy polling. Extend the netlink and sysfs interface to allow > > > enabling/disabling threaded busypolling at device or individual napi > > > level. > > > > > > We use this for our AF_XDP based hard low-latency usecase with usecs > > > level latency requirement. For our usecase we want low jitter and stable > > > latency at P99. > > > > > > Following is an analysis and comparison of available (and compatible) > > > busy poll interfaces for a low latency usecase with stable P99. Please > > > note that the throughput and cpu efficiency is a non-goal. > > > > > > For analysis we use an AF_XDP based benchmarking tool `xdp_rr`. The > > > description of the tool and how it tries to simulate the real workload > > > is following, > > > > > > - It sends UDP packets between 2 machines. > > > - The client machine sends packets at a fixed frequency. To maintain the > > > frequency of the packet being sent, we use open-loop sampling. That is > > > the packets are sent in a separate thread. > > > - The server replies to the packet inline by reading the pkt from the > > > recv ring and replies using the tx ring. > > > - To simulate the application processing time, we use a configurable > > > delay in usecs on the client side after a reply is received from the > > > server. > > > > > > The xdp_rr tool is posted separately as an RFC for tools/testing/selftest. > > > > > > We use this tool with following napi polling configurations, > > > > > > - Interrupts only > > > - SO_BUSYPOLL (inline in the same thread where the client receives the > > > packet). > > > - SO_BUSYPOLL (separate thread and separate core) > > > - Threaded NAPI busypoll > > > > > > System is configured using following script in all 4 cases, > > > > > > ``` > > > echo 0 | sudo tee /sys/class/net/eth0/threaded > > > echo 0 | sudo tee /proc/sys/kernel/timer_migration > > > echo off | sudo tee /sys/devices/system/cpu/smt/control > > > > > > sudo ethtool -L eth0 rx 1 tx 1 > > > sudo ethtool -G eth0 rx 1024 > > > > > > echo 0 | sudo tee /proc/sys/net/core/rps_sock_flow_entries > > > echo 0 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus > > > > > > # pin IRQs on CPU 2 > > > IRQS="$(gawk '/eth0-(TxRx-)?1/ {match($1, /([0-9]+)/, arr); \ > > > print arr[0]}' < /proc/interrupts)" > > > for irq in "${IRQS}"; \ > > > do echo 2 | sudo tee /proc/irq/$irq/smp_affinity_list; done > > > > > > echo -1 | sudo tee /proc/sys/kernel/sched_rt_runtime_us > > > > > > for i in /sys/devices/virtual/workqueue/*/cpumask; \ > > > do echo $i; echo 1,2,3,4,5,6 > $i; done > > > > > > if [[ -z "$1" ]]; then > > > echo 400 | sudo tee /proc/sys/net/core/busy_read > > > echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > > > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > > > fi > > > > > > sudo ethtool -C eth0 adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0 > > > > > > if [[ "$1" == "enable_threaded" ]]; then > > > echo 0 | sudo tee /proc/sys/net/core/busy_poll > > > echo 0 | sudo tee /proc/sys/net/core/busy_read > > > echo 100 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > > > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > > > echo 2 | sudo tee /sys/class/net/eth0/threaded > > > NAPI_T=$(ps -ef | grep napi | grep -v grep | awk '{ print $2 }') > > > sudo chrt -f -p 50 $NAPI_T > > > > > > # pin threaded poll thread to CPU 2 > > > sudo taskset -pc 2 $NAPI_T > > > fi > > > > > > if [[ "$1" == "enable_interrupt" ]]; then > > > echo 0 | sudo tee /proc/sys/net/core/busy_read > > > echo 0 | sudo tee /sys/class/net/eth0/napi_defer_hard_irqs > > > echo 15000 | sudo tee /sys/class/net/eth0/gro_flush_timeout > > > fi > > > ``` > > > > > > To enable various configurations, script can be run as following, > > > > > > - Interrupt Only > > > ``` > > >