* A data race between fanout_demux_rollover() and __fanout_unlink() @ 2021-04-13 20:51 Gong, Sishuai 2021-04-13 22:55 ` Xie He 0 siblings, 1 reply; 5+ messages in thread From: Gong, Sishuai @ 2021-04-13 20:51 UTC (permalink / raw) To: xie.he.0141@gmail.com, eyal.birger@gmail.com, yonatanlinik@gmail.com Cc: netdev@vger.kernel.org Hi, We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86. When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below. Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern. ------------------------------------------ Execution interleaving Thread 1 Thread 2 __fanout_unlink() fanout_demux_rollover() spin_lock(&f->lock); po = pkt_sk(f->arr[idx]); // po is a out-of-date value f->arr[i] = f->arr[f->num_members - 1]; spin_unlock(&f->lock); Thanks, Sishuai ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A data race between fanout_demux_rollover() and __fanout_unlink() 2021-04-13 20:51 A data race between fanout_demux_rollover() and __fanout_unlink() Gong, Sishuai @ 2021-04-13 22:55 ` Xie He 2021-04-13 23:27 ` Willem de Bruijn 0 siblings, 1 reply; 5+ messages in thread From: Xie He @ 2021-04-13 22:55 UTC (permalink / raw) To: eyal.birger@gmail.com, yonatanlinik@gmail.com, netdev@vger.kernel.org, Gong, Sishuai, Willem de Bruijn, David S. Miller, Jakub Kicinski On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote: > > Hi, > > We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86. > > When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below. > > Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern. > > ------------------------------------------ > Execution interleaving > > Thread 1 Thread 2 > > __fanout_unlink() fanout_demux_rollover() > spin_lock(&f->lock); > po = pkt_sk(f->arr[idx]); > // po is a out-of-date value > f->arr[i] = f->arr[f->num_members - 1]; > spin_unlock(&f->lock); > > > > Thanks, > Sishuai CC'ing more people. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A data race between fanout_demux_rollover() and __fanout_unlink() 2021-04-13 22:55 ` Xie He @ 2021-04-13 23:27 ` Willem de Bruijn 2021-04-14 16:52 ` Eric Dumazet 0 siblings, 1 reply; 5+ messages in thread From: Willem de Bruijn @ 2021-04-13 23:27 UTC (permalink / raw) To: Xie He Cc: eyal.birger@gmail.com, yonatanlinik@gmail.com, netdev@vger.kernel.org, Gong, Sishuai, David S. Miller, Jakub Kicinski On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote: > > On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote: > > > > Hi, > > > > We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86. > > > > When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below. > > > > Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern. > > > > ------------------------------------------ > > Execution interleaving > > > > Thread 1 Thread 2 > > > > __fanout_unlink() fanout_demux_rollover() > > spin_lock(&f->lock); > > po = pkt_sk(f->arr[idx]); > > // po is a out-of-date value > > f->arr[i] = f->arr[f->num_members - 1]; > > spin_unlock(&f->lock); > > > > > > > > Thanks, > > Sishuai > > CC'ing more people. __fanout_unlink removes a socket from the fanout group, but ensures that the socket is not destroyed until after no datapath can refer to it anymore, through a call to synchronize_net. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A data race between fanout_demux_rollover() and __fanout_unlink() 2021-04-13 23:27 ` Willem de Bruijn @ 2021-04-14 16:52 ` Eric Dumazet 2021-04-14 17:52 ` Eric Dumazet 0 siblings, 1 reply; 5+ messages in thread From: Eric Dumazet @ 2021-04-14 16:52 UTC (permalink / raw) To: Willem de Bruijn, Xie He Cc: eyal.birger@gmail.com, yonatanlinik@gmail.com, netdev@vger.kernel.org, Gong, Sishuai, David S. Miller, Jakub Kicinski On 4/14/21 1:27 AM, Willem de Bruijn wrote: > On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote: >> >> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote: >>> >>> Hi, >>> >>> We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86. >>> >>> When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below. >>> >>> Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern. >>> >>> ------------------------------------------ >>> Execution interleaving >>> >>> Thread 1 Thread 2 >>> >>> __fanout_unlink() fanout_demux_rollover() >>> spin_lock(&f->lock); >>> po = pkt_sk(f->arr[idx]); >>> // po is a out-of-date value >>> f->arr[i] = f->arr[f->num_members - 1]; >>> spin_unlock(&f->lock); >>> >>> >>> >>> Thanks, >>> Sishuai >> >> CC'ing more people. > > __fanout_unlink removes a socket from the fanout group, but ensures > that the socket is not destroyed until after no datapath can refer to > it anymore, through a call to synchronize_net. > Right, but there is a data race. Compiler might implement f->arr[i] = f->arr[f->num_members - 1]; (And po = pkt_sk(f->arr[idx]); Using one-byte-at-a-time load/stores, yes crazy, but oh well. We should use READ_ONCE()/WRITE_ONCE() at very minimum, and rcu_dereference()/rcu_assign_pointer() since we clearly rely on standard RCU rules. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: A data race between fanout_demux_rollover() and __fanout_unlink() 2021-04-14 16:52 ` Eric Dumazet @ 2021-04-14 17:52 ` Eric Dumazet 0 siblings, 0 replies; 5+ messages in thread From: Eric Dumazet @ 2021-04-14 17:52 UTC (permalink / raw) To: Willem de Bruijn, Xie He Cc: eyal.birger@gmail.com, yonatanlinik@gmail.com, netdev@vger.kernel.org, Gong, Sishuai, David S. Miller, Jakub Kicinski On 4/14/21 6:52 PM, Eric Dumazet wrote: > > > On 4/14/21 1:27 AM, Willem de Bruijn wrote: >> On Tue, Apr 13, 2021 at 6:55 PM Xie He <xie.he.0141@gmail.com> wrote: >>> >>> On Tue, Apr 13, 2021 at 1:51 PM Gong, Sishuai <sishuai@purdue.edu> wrote: >>>> >>>> Hi, >>>> >>>> We found a data race in linux-5.12-rc3 between af_packet.c functions fanout_demux_rollover() and __fanout_unlink() and we are able to reproduce it under x86. >>>> >>>> When the two functions are running together, __fanout_unlink() will grab a lock and modify some attribute of packet_fanout variable, but fanout_demux_rollover() may or may not see this update depending on different interleavings, as shown in below. >>>> >>>> Currently, we didn’t find any explicit errors due to this data race. But in fanout_demux_rollover(), we noticed that the data-racing variable is involved in the later operation, which might be a concern. >>>> >>>> ------------------------------------------ >>>> Execution interleaving >>>> >>>> Thread 1 Thread 2 >>>> >>>> __fanout_unlink() fanout_demux_rollover() >>>> spin_lock(&f->lock); >>>> po = pkt_sk(f->arr[idx]); >>>> // po is a out-of-date value >>>> f->arr[i] = f->arr[f->num_members - 1]; >>>> spin_unlock(&f->lock); >>>> >>>> >>>> >>>> Thanks, >>>> Sishuai >>> >>> CC'ing more people. >> >> __fanout_unlink removes a socket from the fanout group, but ensures >> that the socket is not destroyed until after no datapath can refer to >> it anymore, through a call to synchronize_net. >> > > Right, but there is a data race. > > Compiler might implement > > f->arr[i] = f->arr[f->num_members - 1]; > > (And po = pkt_sk(f->arr[idx]); > > Using one-byte-at-a-time load/stores, yes crazy, but oh well. > > We should use READ_ONCE()/WRITE_ONCE() at very minimum, > and rcu_dereference()/rcu_assign_pointer() since we clearly rely on standard RCU rules. > > > I will test something like : diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 118d585337d72f10cd31ec5ca7c55b508fc18baf..ba96db1880eae89febf77ba6ff943b054cd268d7 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1359,7 +1359,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f, struct packet_sock *po, *po_next, *po_skip = NULL; unsigned int i, j, room = ROOM_NONE; - po = pkt_sk(f->arr[idx]); + po = pkt_sk(rcu_dereference(f->arr[idx])); if (try_self) { room = packet_rcv_has_room(po, skb); @@ -1371,7 +1371,7 @@ static unsigned int fanout_demux_rollover(struct packet_fanout *f, i = j = min_t(int, po->rollover->sock, num - 1); do { - po_next = pkt_sk(f->arr[i]); + po_next = pkt_sk(rcu_dereference(f->arr[i])); if (po_next != po_skip && !READ_ONCE(po_next->pressure) && packet_rcv_has_room(po_next, skb) == ROOM_NORMAL) { if (i != j) @@ -1466,7 +1466,7 @@ static int packet_rcv_fanout(struct sk_buff *skb, struct net_device *dev, if (fanout_has_flag(f, PACKET_FANOUT_FLAG_ROLLOVER)) idx = fanout_demux_rollover(f, skb, idx, true, num); - po = pkt_sk(f->arr[idx]); + po = pkt_sk(rcu_dereference(f->arr[idx])); return po->prot_hook.func(skb, dev, &po->prot_hook, orig_dev); } @@ -1480,7 +1480,7 @@ static void __fanout_link(struct sock *sk, struct packet_sock *po) struct packet_fanout *f = po->fanout; spin_lock(&f->lock); - f->arr[f->num_members] = sk; + rcu_assign_pointer(f->arr[f->num_members], sk); smp_wmb(); f->num_members++; if (f->num_members == 1) @@ -1495,11 +1495,14 @@ static void __fanout_unlink(struct sock *sk, struct packet_sock *po) spin_lock(&f->lock); for (i = 0; i < f->num_members; i++) { - if (f->arr[i] == sk) + if (rcu_dereference_protected(f->arr[i], + lockdep_is_held(&f->lock)) == sk) break; } BUG_ON(i >= f->num_members); - f->arr[i] = f->arr[f->num_members - 1]; + rcu_assign_pointer(f->arr[i], + rcu_dereference_protected(f->arr[f->num_members - 1], + lockdep_is_held(&f->lock))); f->num_members--; if (f->num_members == 0) __dev_remove_pack(&f->prot_hook); diff --git a/net/packet/internal.h b/net/packet/internal.h index 5f61e59ebbffaa25a8fdfe31f79211fe6a755c51..48af35b1aed2565267c0288e013e23ff51f2fcac 100644 --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -94,7 +94,7 @@ struct packet_fanout { spinlock_t lock; refcount_t sk_ref; struct packet_type prot_hook ____cacheline_aligned_in_smp; - struct sock *arr[]; + struct sock __rcu *arr[]; }; struct packet_rollover { ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-04-14 17:52 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-04-13 20:51 A data race between fanout_demux_rollover() and __fanout_unlink() Gong, Sishuai 2021-04-13 22:55 ` Xie He 2021-04-13 23:27 ` Willem de Bruijn 2021-04-14 16:52 ` Eric Dumazet 2021-04-14 17:52 ` Eric Dumazet
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).