From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 427633AC1C for ; Mon, 7 Apr 2025 21:57:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=91.218.175.180 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744063026; cv=none; b=PtQeh8mGYskOCvB2JGcFQ48zAF3WBu6hcVK2MTkQtPZUk1stNAk/3H5BtGmAjl91ncBtkvxtDVEuoE4wMe6uA97PjkxNqqsES9TeA/3KX0GZ4FVQfnsqgSNzC5sxPB1Nv4UyCl2YDTqMfwIkR44+1mcxmdtGqYuMOu00EBclSv4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1744063026; c=relaxed/simple; bh=CXUbkgdb+eusHq7P3yexbWwGbiubwHu070X1U3C8AjE=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=h+UOmVZRRwC8VM8O+42YxTh2iz0vfAdtbkmCDyhghzUXV9jpSarT8T6iK7LU/3/+UO1pWLvfTGiYepAr9yCLf5NEfE2eZk2mepDKjPcFWiyOLaFtMd1ImD6JUig5CabkVxkhPHwbwBx5IZsL+4mmbsGDDNLwZJEa/nFAn2eJiww= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev; spf=pass smtp.mailfrom=linux.dev; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b=vmstxYt+; arc=none smtp.client-ip=91.218.175.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.dev Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.dev Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.dev header.i=@linux.dev header.b="vmstxYt+" Message-ID: <58bfc722-5dc4-4119-9c5c-49fb6b3da6cd@linux.dev> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1744063020; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8FSvbn3NUduRNn7N60McM+103mzS7lUNeC9xApBCbjI=; b=vmstxYt+fi0rwIR2KrnBgXKcNg51M2lgRaLXPQnSQQEhRPRjKZhnmqH/rrsiXl8b8Y8O3b A4PezGM35aUDehoj+xg5Es8XOsjlBDXIfwQOI4iiNpXmWR8SpHMSmTGJJ/OQXAe6gdO2ZC wrN/fyPvXQqB3rKOSgcNzx81BmgfVas= Date: Mon, 7 Apr 2025 14:56:54 -0700 Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Subject: Re: [RFC PATCH bpf-next 2/3] bpf: udp: Avoid socket skips and repeats during iteration To: Jordan Rife Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, Aditi Ghag , Daniel Borkmann , Willem de Bruijn References: <20250404220221.1665428-1-jordan@jrife.io> <20250404220221.1665428-3-jordan@jrife.io> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Martin KaFai Lau In-Reply-To: <20250404220221.1665428-3-jordan@jrife.io> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT On 4/4/25 3:02 PM, Jordan Rife wrote: > +static struct sock *bpf_iter_udp_resume(struct sock *first_sk, > + union bpf_udp_iter_batch_item *cookies, > + int n_cookies) > +{ > + struct sock *sk = NULL; > + int i = 0; > + > + for (; i < n_cookies; i++) { > + sk = first_sk; > + udp_portaddr_for_each_entry_from(sk) > + if (cookies[i].cookie == atomic64_read(&sk->sk_cookie)) > + goto done; > + } > +done: > + return sk; > +} > + > static struct sock *bpf_iter_udp_batch(struct seq_file *seq) > { > struct bpf_udp_iter_state *iter = seq->private; > struct udp_iter_state *state = &iter->state; > + unsigned int find_cookie, end_cookie = 0; > struct net *net = seq_file_net(seq); > - int resume_bucket, resume_offset; > struct udp_table *udptable; > unsigned int batch_sks = 0; > bool resized = false; > + int resume_bucket; > struct sock *sk; > > resume_bucket = state->bucket; > - resume_offset = iter->offset; > > /* The current batch is done, so advance the bucket. */ > if (iter->st_bucket_done) > @@ -3428,6 +3446,8 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) > * before releasing the bucket lock. This allows BPF programs that are > * called in seq_show to acquire the bucket lock if needed. > */ > + find_cookie = iter->cur_sk; > + end_cookie = iter->end_sk; > iter->cur_sk = 0; > iter->end_sk = 0; > iter->st_bucket_done = false; > @@ -3439,18 +3459,26 @@ static struct sock *bpf_iter_udp_batch(struct seq_file *seq) > if (hlist_empty(&hslot2->head)) > continue; > > - iter->offset = 0; > spin_lock_bh(&hslot2->lock); > - udp_portaddr_for_each_entry(sk, &hslot2->head) { > + /* Initialize sk to the first socket in hslot2. */ > + udp_portaddr_for_each_entry(sk, &hslot2->head) > + break; nit. It is to get the first entry? May be directly do hlist_entry_safe(hslot2->head.first, ... ) instead. > + /* Resume from the first (in iteration order) unseen socket from > + * the last batch that still exists in resume_bucket. Most of > + * the time this will just be where the last iteration left off > + * in resume_bucket unless that socket disappeared between > + * reads. > + * > + * Skip this if end_cookie isn't set; this is the first > + * batch, we're on bucket zero, and we want to start from the > + * beginning. > + */ > + if (state->bucket == resume_bucket && end_cookie) > + sk = bpf_iter_udp_resume(sk, > + &iter->batch[find_cookie], > + end_cookie - find_cookie); > + udp_portaddr_for_each_entry_from(sk) { > if (seq_sk_match(seq, sk)) { > - /* Resume from the last iterated socket at the > - * offset in the bucket before iterator was stopped. > - */ > - if (state->bucket == resume_bucket && > - iter->offset < resume_offset) { > - ++iter->offset; > - continue; > - } > if (iter->end_sk < iter->max_sk) { > sock_hold(sk); > iter->batch[iter->end_sk++].sock = sk; I looked at the details for these two functions. The approach looks good to me. Thanks for trying it. This should stop the potential duplicates during stop() and then re-start(). My understanding is that it may or may not batch something newer than the last stop(). This behavior should be similar to the current offset approach also. I think it is fine. The similar situation is true for the next bucket anyway.