From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6F357DA6F for ; Sun, 29 Sep 2024 07:10:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.177 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727593855; cv=none; b=gtwNUYFZo5QLjqDaVWYqebG0PxDSLbq/nkAeYVicYkCXuVL2lFH46pSTL4VjSnJoYNhMsPa82uT/ufNorDgLHWcs15oWsnUJMGrWooh+1beWUASWYRs7cijZ6jtCpS8mgZZty1wzHv8mJSSRuTzfu2+HeDw+OdQHKPhkLwNP5vs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727593855; c=relaxed/simple; bh=n4Jy/g4Eh6+BSVVGvl01aw6en9RlnVRu3g0HXU8COnc=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=NIr376RyrNa832FHw3vGdRQINjAA7+mdY5/SgWVD92fb8Rcf5yi0c89sbzD6CWxWQow/Xqc1wQTGOKcnhfzIBY5iiRqHand0asK4DJhvvFZtr7V+nRmn6acYqjaS7qbVnVk0E0dXs2MLwnLl1fmKf5oHLzovSfG2kL+a2T1kRKo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com; spf=none smtp.mailfrom=daynix.com; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b=lH5SQ1u3; arc=none smtp.client-ip=209.85.215.177 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=daynix.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=daynix.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=daynix-com.20230601.gappssmtp.com header.i=@daynix-com.20230601.gappssmtp.com header.b="lH5SQ1u3" Received: by mail-pg1-f177.google.com with SMTP id 41be03b00d2f7-7d50e7a3652so2288084a12.3 for ; Sun, 29 Sep 2024 00:10:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=daynix-com.20230601.gappssmtp.com; s=20230601; t=1727593853; x=1728198653; darn=vger.kernel.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=wupXfiTFRgbbCCMXc+w3GmGZhuh2PGEq/8NKT31sCcc=; b=lH5SQ1u3H+QhpHmcIbUuQdBBmCtOCKG3bZbvt+aUo4LUny7tHnubZu1hhHoIgmJzI6 6SUWkb3PtRXZ/UsqDhMgT6kgMNUBhq6jZYS7ks+cQuIyXZgMzpN+W1ONEyfIHHK/eqVT F10TO1N3TXHgnts3vk3vFsJkUR8++cvRCRg3De7U6E8cru3qlL+wORqJX4BBd+KcCgwj 2SEQCTz3+SECmvUfQA2LWVFuOUhiy4DGFAl0XbSNVU4lABaoWfsz1wQ/arEMTfvRfNdk flOOasikzJXONwTtskHjfXnQxDPSxP3vAyIuz2ldv35mSLsuXr+H3iCxGejFD8ymlqVe 4H6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1727593853; x=1728198653; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wupXfiTFRgbbCCMXc+w3GmGZhuh2PGEq/8NKT31sCcc=; b=pR84xk8sYOMljJkkRRzLwsPmMIAMOKiICXR43zTUQRWWQowHuWt7vePEoATVfgO1+0 gAddI17BwoHKvGFrrqrBVcOCf+g26A8PsGhcc0QQq4T9P5OY0VULdZqWws4qQltnqYOT KIHmEXjf/21E5SoJVEPZwnP7hiYoai+GzdJwWVU6f6z5IsXLtsGDml3COklAp2knlbrz mrHP2SyTUw1gAmA+Eww1T1Cyw0X20SnWN9aLQWFK/u98rNcYGbA/IybUiCLWOVIsefF/ DuvHUs0Mnc0Kct/wzmji+eC+WuCn04YquwOopNEXE1+zHdzsvjKoNe+8HUV7jhmz/ZWM zANg== X-Forwarded-Encrypted: i=1; AJvYcCWcbVXmSwpkonl5mSjthMPx2e+AvYdNHIfm0Opf/t3Qc86EOfqtCpoVIWdOA3vIbOWRlvdeHSk=@vger.kernel.org X-Gm-Message-State: AOJu0YzjHoIInuLIWuPR2sE7LDZz3yLaDrVgL73u6lo5RFeZ3yFdYXkj Zecm+kPXLFiM6gjIwvc7yuJ/boEF4cOzhwC+/jDqgpAbqbkNOYtuU4jM4kGee7s= X-Google-Smtp-Source: AGHT+IGrcYY05To+/yLZrdZKHUuaT4lGmCtj4wsgyrCWwclUUaolM2+fnTgp4t6z/cgOYbzvWvn9bg== X-Received: by 2002:a17:90a:ad91:b0:2d4:924:8891 with SMTP id 98e67ed59e1d1-2e0b8ede168mr9323964a91.38.1727593852980; Sun, 29 Sep 2024 00:10:52 -0700 (PDT) Received: from [157.82.207.107] ([157.82.207.107]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e0b6c7942csm5324497a91.16.2024.09.29.00.10.48 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Sep 2024 00:10:52 -0700 (PDT) Message-ID: <447dca19-58c5-4c01-b60e-cfe5e601961a@daynix.com> Date: Sun, 29 Sep 2024 16:10:47 +0900 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC v4 0/9] tun: Introduce virtio-net hashing feature To: Jason Wang Cc: Jonathan Corbet , Willem de Bruijn , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Xuan Zhuo , Shuah Khan , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, netdev@vger.kernel.org, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-kselftest@vger.kernel.org, Yuri Benditovich , Andrew Melnychenko , Stephen Hemminger , gur.stavi@huawei.com References: <20240924-rss-v4-0-84e932ec0e6c@daynix.com> <6c101c08-4364-4211-a883-cb206d57303d@daynix.com> Content-Language: en-US From: Akihiko Odaki In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 2024/09/29 11:07, Jason Wang wrote: > On Fri, Sep 27, 2024 at 3:51 PM Akihiko Odaki wrote: >> >> On 2024/09/27 13:31, Jason Wang wrote: >>> On Fri, Sep 27, 2024 at 10:11 AM Akihiko Odaki wrote: >>>> >>>> On 2024/09/25 12:30, Jason Wang wrote: >>>>> On Tue, Sep 24, 2024 at 5:01 PM Akihiko Odaki wrote: >>>>>> >>>>>> virtio-net have two usage of hashes: one is RSS and another is hash >>>>>> reporting. Conventionally the hash calculation was done by the VMM. >>>>>> However, computing the hash after the queue was chosen defeats the >>>>>> purpose of RSS. >>>>>> >>>>>> Another approach is to use eBPF steering program. This approach has >>>>>> another downside: it cannot report the calculated hash due to the >>>>>> restrictive nature of eBPF. >>>>>> >>>>>> Introduce the code to compute hashes to the kernel in order to overcome >>>>>> thse challenges. >>>>>> >>>>>> An alternative solution is to extend the eBPF steering program so that it >>>>>> will be able to report to the userspace, but it is based on context >>>>>> rewrites, which is in feature freeze. We can adopt kfuncs, but they will >>>>>> not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM >>>>>> and vhost_net). >>>>>> >>>>> >>>>> I wonder if we could clone the skb and reuse some to store the hash, >>>>> then the steering eBPF program can access these fields without >>>>> introducing full RSS in the kernel? >>>> >>>> I don't get how cloning the skb can solve the issue. >>>> >>>> We can certainly implement Toeplitz function in the kernel or even with >>>> tc-bpf to store a hash value that can be used for eBPF steering program >>>> and virtio hash reporting. However we don't have a means of storing a >>>> hash type, which is specific to virtio hash reporting and lacks a >>>> corresponding skb field. >>> >>> I may miss something but looking at sk_filter_is_valid_access(). It >>> looks to me we can make use of skb->cb[0..4]? >> >> I didn't opt to using cb. Below is the rationale: >> >> cb is for tail call so it means we reuse the field for a different >> purpose. The context rewrite allows adding a field without increasing >> the size of the underlying storage (the real sk_buff) so we should add a >> new field instead of reusing an existing field to avoid confusion. >> >> We are however no longer allowed to add a new field. In my >> understanding, this is because it is an UAPI, and eBPF maintainers found >> it is difficult to maintain its stability. >> >> Reusing cb for hash reporting is a workaround to avoid having a new >> field, but it does not solve the underlying problem (i.e., keeping eBPF >> as stable as UAPI is unreasonably hard). In my opinion, adding an ioctl >> is a reasonable option to keep the API as stable as other virtualization >> UAPIs while respecting the underlying intention of the context rewrite >> feature freeze. > > Fair enough. > > Btw, I remember DPDK implements tuntap RSS via eBPF as well (probably > via cls or other). It might worth to see if anything we miss here. Thanks for the information. I wonder why they used cls instead of steering program. Perhaps it may be due to compatibility with macvtap and ipvtap, which don't steering program. Their RSS implementation looks cleaner so I will improve my RSS implementation accordingly. Regards, Akihiko Odaki