From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-189.mta1.migadu.com (out-189.mta1.migadu.com [95.215.58.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97B382E64A for ; Thu, 6 Feb 2025 06:12:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=95.215.58.189 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738822367; cv=none; b=IgGm30sCtv0V9bBpLb3Tb8MGLmVdzp1UvlfxhU93+5PuBaDji4+g4C+cArAHrbgvNUQVhOT6o6N2tpnJHmiUT3L3W9aYOdh2x6UzwOcLb9VRo1thNYNfDbu3XwD4KdFfRSyAVA+Imn346X8ZMcAd7HNvEkHfj7r6FsRknnuwjA0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738822367; c=relaxed/simple; bh=hogkxHDEfpKbYOQm+WoV7gcsiLjVcIZZhSZnar9SZ2M=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=cqz0TsDi+GupLC9eE+W0+/RCBADlVZzLQi9KF8ffYZf6eRU5p+a05dw3xS5nYs91lkhXk4V7iik0FuUckSAP4ujUvfBNPJYDk5tUcWTqZsyb40Or4HiRDCUrz9sr/HrKjQzT8eI1V8QS2nr6c0kCW7SjWkE+0pBg24MYa+Plwvs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=HA1BjPN7; arc=none smtp.client-ip=95.215.58.189 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="HA1BjPN7" Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1738822359; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fP09qw0fTTx05drYQv3984Q2Z6Ps7338BY1vTUz5Gr0=; b=HA1BjPN7H6pdK3AQPHUWBR9CHXN16ayPkbkeHGvuP2GKpSsCEwT2H2iw6bGpEyB9tWasQe Rk/oV7PeEgmp+rux3MlammoofunRdZXnGozYtW9LE2xOhbOCmu5TQp39Yq2QnfC7p6LTgq zsMyB+R+joci9qA/38YWvfSVpCszJTI= Date: Wed, 5 Feb 2025 22:12:29 -0800 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [PATCH bpf-next v8 10/12] bpf: make TCP tx timestamp bpf extension work To: Jason Xing Cc: Willem de Bruijn , Jakub Kicinski , davem@davemloft.net, edumazet@google.com, pabeni@redhat.com, dsahern@kernel.org, willemb@google.com, ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org, eddyz87@gmail.com, song@kernel.org, yonghong.song@linux.dev, john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me, haoluo@google.com, jolsa@kernel.org, horms@kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org References: <20250204183024.87508-1-kerneljasonxing@gmail.com> <20250204183024.87508-11-kerneljasonxing@gmail.com> <20250204175744.3f92c33e@kernel.org> <0a8e7b84-bab6-4852-8616-577d9b561f4c@linux.dev> <67a424d2aa9ea_19943029427@willemb.c.googlers.com.notmuch> <67a42ba112990_19c315294b7@willemb.c.googlers.com.notmuch> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT On 2/5/25 7:41 PM, Jason Xing wrote: > On Thu, Feb 6, 2025 at 11:25 AM Willem de Bruijn > wrote: >> >>>>> I think we can split the whole idea into two parts: for now, because >>>>> of the current series implementing the same function as SO_TIMETAMPING >>>>> does, I will implement the selective sample feature in the series. >>>>> After someday we finish tracing all the skb, then we will add the >>>>> corresponding selective sample feature. >>>> >>>> Are you saying that you will include selective sampling now or want to >>>> postpone it? >>> >>> A few months ago, I planned to do it after this series. Since you all >>> ask, it's not complex to have it included in this series :) >>> >>> Selective sampling has two kinds of meaning like I mentioned above, so >>> in the next re-spin I will implement the cmsg feature for bpf >>> extension in this series. >> >> Great thanks. > > I have to rephrase a bit in case Martin visits here soon: I will > compare two approaches 1) reply value, 2) bpf kfunc and then see which > way is better. I have already explained in details why the 1) reply value from the bpf prog won't work. Please go back to that reply which has the context. > >> >>> I'm doing the test right now. And leave >>> another selective sampling small feature until the feature of tracing >>> all the skbs is implemented if possible. >> >> Can you elaborate on this other feature? > > Do you recall oneday I asked your opinion privately about whether we > can trace _all the skbs_ (not the last skb from each sendmsg) to have > a better insight of kernel behaviour? I can also see a couple of > latency issues in the kernel. If it is approved, then corresponding > selective sampling should be supported. It's what I was trying to > describe. > > The advantage of relying on the timestamping feature is that we can > isolate normal flows and monitored flow so that normal flows wouldn't > be affected because of enabling the monitoring feature, compared to so > many open source monitoring applications I've dug into. They usually > directly hook the hot path like __tcp_transmit_skb() or > dev_queue_xmit, which will surely influence the normal flows and cause > performance degradation to some extent. I noticed that after > conducting some tests a few months ago. The principle behind the bpf > fentry is to replace some instructions at the very beginning of the > hooked function, so every time even normal flows entering the > monitored function will get affected. I sort of guess this while stalled in the traffic... :/ I was not asking to be able to "selective on all skb of a large msg". This will be a separate topic. If we really wanted to support this case (tbh, I am not convinced) in the future, there is more reason the default behavior should be "off" now for consistency reason. The comment was on the existing tcp_tx_timestamp(). First focus on allowing selective tracking of the skb that the current tcp_tx_timestamp() also tracks because it is the most understood use case. This will allow the bpf prog to select which tcp_sendmsg call it should track/sample. Perhaps the bpf prog will limit tracking X numbers of packets and then will stop there. Perhaps the bpf prog will only allocate X numbers of sample spaces in the bpf_sk_storage to track packet. There are many reasons that bpf prog may want to sample and stop tracking at some point even in the current tcp_tx_timestamp().