From: Eric Dumazet <eric.dumazet@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
Xie He <xie.he.0141@gmail.com>
Cc: "eyal.birger@gmail.com" <eyal.birger@gmail.com>,
"yonatanlinik@gmail.com" <yonatanlinik@gmail.com>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
"Gong, Sishuai" <sishuai@purdue.edu>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>
Subject: Re: A data race between fanout_demux_rollover() and __fanout_unlink()
Date: Wed, 14 Apr 2021 19:52:13 +0200 [thread overview]
Message-ID: <abef5079-a604-8aa1-39ba-38ae115178e7@gmail.com> (raw)
In-Reply-To: <5958c722-7dcd-4342-291f-692a123ef931@gmail.com>
On 4/14/21 6:52 PM, Eric Dumazet wrote:
>
>
> On 4/14/21 1:27 AM, Willem de Bruijn wrote:
>> On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote:
>>>
>>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote:
>>>>
>>>> Hi,
>>>>
>>>> We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86.
>>>>
>>>> When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below.
>>>>
>>>> Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern.
>>>>
>>>> ------------------------------------------
>>>> Execution interleaving
>>>>
>>>> Thread 1 Thread 2
>>>>
>>>> __fanout_unlink() fanout_demux_rollover()
>>>> spin_lock(&f->lock);
>>>> po = pkt_sk(f->arr[idx]);
>>>> // po is a out-of-date value
>>>> f->arr[i] = f->arr[f->num_members - 1];
>>>> spin_unlock(&f->lock);
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Sishuai
>>>
>>> CC'ing more people.
>>
>> __fanout_unlink removes a socket from the fanout group, but ensures
>> that the socket is not destroyed until after no datapath can refer to
>> it anymore, through a call to synchronize_net.
>>
>
> Right, but there is a data race.
>
> Compiler might implement
>
> f->arr[i] = f->arr[f->num_members - 1];
>
> (And po = pkt_sk(f->arr[idx]);
>
> Using one-byte-at-a-time load/stores, yes crazy, but oh well.
>
> We should use READ_ONCE()/WRITE_ONCE() at very minimum,
> and rcu_dereference()/rcu_assign_pointer() since we clearly rely on standard RCU rules.
>
>
>
I will test something like :
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 118d585337d72f10cd31ec5ca7c55b508fc18baf..ba96db1880eae89febf77ba6ff943b054cd268d7 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1359,7 +1359,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f,
struct packet_sock *po, *po_next, *po_skip = NULL;
unsigned int i, j, room = ROOM_NONE;
- po = pkt_sk(f->arr[idx]);
+ po = pkt_sk(rcu_dereference(f->arr[idx]));
if (try_self) {
room = packet_rcv_has_room(po, skb);
@@ -1371,7 +1371,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f,
i = j = min_t(int, po->rollover->sock, num - 1);
do {
- po_next = pkt_sk(f->arr[i]);
+ po_next = pkt_sk(rcu_dereference(f->arr[i]));
if (po_next != po_skip && !READ_ONCE(po_next->pressure) &&
packet_rcv_has_room(po_next, skb) == ROOM_NORMAL) {
if (i != j)
@@ -1466,7 +1466,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev,
if (fanout_has_flag(f, PACKET_FANOUT_FLAG_ROLLOVER))
idx = fanout_demux_rollover(f, skb, idx, true, num);
- po = pkt_sk(f->arr[idx]);
+ po = pkt_sk(rcu_dereference(f->arr[idx]));
return po->prot_hook.func(skb, dev, &po->prot_hook, orig_dev);
}
@@ -1480,7 +1480,7 @@ static void __fanout_link(struct sock *sk, struct packet_sock *po)
struct packet_fanout *f = po->fanout;
spin_lock(&f->lock);
- f->arr[f->num_members] = sk;
+ rcu_assign_pointer(f->arr[f->num_members], sk);
smp_wmb();
f->num_members++;
if (f->num_members == 1)
@@ -1495,11 +1495,14 @@ static void __fanout_unlink(struct sock *sk, struct packet_sock *po)
spin_lock(&f->lock);
for (i = 0; i < f->num_members; i++) {
- if (f->arr[i] == sk)
+ if (rcu_dereference_protected(f->arr[i],
+ lockdep_is_held(&f->lock)) == sk)
break;
}
BUG_ON(i >= f->num_members);
- f->arr[i] = f->arr[f->num_members - 1];
+ rcu_assign_pointer(f->arr[i],
+ rcu_dereference_protected(f->arr[f->num_members - 1],
+ lockdep_is_held(&f->lock)));
f->num_members--;
if (f->num_members == 0)
__dev_remove_pack(&f->prot_hook);
diff --git a/net/packet/internal.h b/net/packet/internal.h
index 5f61e59ebbffaa25a8fdfe31f79211fe6a755c51..48af35b1aed2565267c0288e013e23ff51f2fcac 100644
--- a/net/packet/internal.h
+++ b/net/packet/internal.h
@@ -94,7 +94,7 @@ struct packet_fanout {
spinlock_t lock;
refcount_t sk_ref;
struct packet_type prot_hook ____cacheline_aligned_in_smp;
- struct sock *arr[];
+ struct sock __rcu *arr[];
};
struct packet_rollover {
prev parent reply other threads:[~2021-04-14 17:52 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-13 20:51 A data race between fanout_demux_rollover() and __fanout_unlink() Gong, Sishuai
2021-04-13 22:55 ` Xie He
2021-04-13 23:27 ` Willem de Bruijn
2021-04-14 16:52 ` Eric Dumazet
2021-04-14 17:52 ` Eric Dumazet [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=abef5079-a604-8aa1-39ba-38ae115178e7@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=davem@davemloft.net \
--cc=eyal.birger@gmail.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sishuai@purdue.edu \
--cc=willemdebruijn.kernel@gmail.com \
--cc=xie.he.0141@gmail.com \
--cc=yonatanlinik@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).