* [PATCH net 1/2] tun: hold napi_mutex for all napi operations
@ 2019-01-07 20:02 Stanislav Fomichev
2019-01-07 20:02 ` [PATCH net 2/2] tun: always set skb->dev to tun->dev Stanislav Fomichev
2019-01-07 20:22 ` [PATCH net 1/2] tun: hold napi_mutex for all napi operations Eric Dumazet
0 siblings, 2 replies; 6+ messages in thread
From: Stanislav Fomichev @ 2019-01-07 20:02 UTC (permalink / raw)
To: netdev; +Cc: davem, jasowang, brouer, mst, edumazet, Stanislav Fomichev,
syzbot
BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1
Call Trace:
? napi_gro_frags+0xa7/0x2c0
tun_get_user+0xb50/0xf20
tun_chr_write_iter+0x53/0x70
new_sync_write+0xff/0x160
vfs_write+0x191/0x1e0
__x64_sys_write+0x5e/0xd0
do_syscall_64+0x47/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
I think there is a subtle race between sending a packet via tap and
attaching it:
CPU0: CPU1:
tun_chr_ioctl(TUNSETIFF)
tun_set_iff
tun_attach
rcu_assign_pointer(tfile->tun, tun);
tun_fops->write_iter()
tun_chr_write_iter
tun_napi_alloc_frags
napi_get_frags
napi->skb = napi_alloc_skb
tun_napi_init
netif_napi_add
napi->skb = NULL
napi->skb is NULL here
napi_gro_frags
napi_frags_skb
skb = napi->skb
skb_reset_mac_header(skb)
panic()
To fix, do the following:
* Move rcu_assign_pointer(tfile->tun, tun) to be the last thing we do
in tun_attach(); this should guarantee that when we call tun_get()
we always get an initialized object
* As another safeguard, always grab napi_mutex whenever doing any
napi operation; this should prevent napi state change between
calls to napi_get_frags and napi_gro_frags
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
drivers/net/tun.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a4fdad475594..7875f06011f2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -323,22 +323,30 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile,
tfile->napi_enabled = napi_en;
tfile->napi_frags_enabled = napi_en && napi_frags;
if (napi_en) {
+ mutex_lock(&tfile->napi_mutex);
netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll,
NAPI_POLL_WEIGHT);
napi_enable(&tfile->napi);
+ mutex_unlock(&tfile->napi_mutex);
}
}
static void tun_napi_disable(struct tun_file *tfile)
{
- if (tfile->napi_enabled)
+ if (tfile->napi_enabled) {
+ mutex_lock(&tfile->napi_mutex);
napi_disable(&tfile->napi);
+ mutex_unlock(&tfile->napi_mutex);
+ }
}
static void tun_napi_del(struct tun_file *tfile)
{
- if (tfile->napi_enabled)
+ if (tfile->napi_enabled) {
+ mutex_lock(&tfile->napi_mutex);
netif_napi_del(&tfile->napi);
+ mutex_unlock(&tfile->napi_mutex);
+ }
}
static bool tun_napi_frags_enabled(const struct tun_file *tfile)
@@ -856,7 +864,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
err = 0;
}
- rcu_assign_pointer(tfile->tun, tun);
rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile);
tun->numqueues++;
@@ -876,6 +883,11 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
* refcnt.
*/
+ /* All tun_fops depend on tun_get() returning non-null pointer.
+ * Thus, assigning tun to a tfile should be the last init operation,
+ * otherwise we risk using half-initialized object.
+ */
+ rcu_assign_pointer(tfile->tun, tun);
out:
return err;
}
--
2.20.1.97.g81188d93c3-goog
^ permalink raw reply related [flat|nested] 6+ messages in thread* [PATCH net 2/2] tun: always set skb->dev to tun->dev 2019-01-07 20:02 [PATCH net 1/2] tun: hold napi_mutex for all napi operations Stanislav Fomichev @ 2019-01-07 20:02 ` Stanislav Fomichev 2019-01-07 20:22 ` [PATCH net 1/2] tun: hold napi_mutex for all napi operations Eric Dumazet 1 sibling, 0 replies; 6+ messages in thread From: Stanislav Fomichev @ 2019-01-07 20:02 UTC (permalink / raw) To: netdev; +Cc: davem, jasowang, brouer, mst, edumazet, Stanislav Fomichev, syzbot While debugging previous issue I noticed that commit 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") started conditionally (!frags) calling eth_type_trans(skb, tun->dev) for IFF_TAP case. Since eth_type_trans sets skb->dev, some skbs can now have NULL skb->dev. Fix that by always setting skb->dev unconditionally. The syzbot fails with the following trace: WARNING: CPU: 0 PID: 11136 at net/core/flow_dissector.c:764 skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1240 [inline] skb_probe_transport_header include/linux/skbuff.h:2403 [inline] tun_get_user+0x2d4a/0x4250 drivers/net/tun.c:1906 tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1993 call_write_iter include/linux/fs.h:1808 [inline] new_sync_write fs/read_write.c:474 [inline] But I don't think there is an actual issue since we exercise flow dissector via eth_get_headlen which doesn't use skb (and hence BPF flow dissector). But let's still properly set skb->dev so we don't have any problems going forward. Reported-by: syzbot <syzkaller@googlegroups.com> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") Signed-off-by: Stanislav Fomichev <sdf@google.com> --- drivers/net/tun.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 7875f06011f2..af34baf978f3 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1899,6 +1899,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, return -EINVAL; } + skb->dev = tun->dev; switch (tun->flags & TUN_TYPE_MASK) { case IFF_TUN: if (tun->flags & IFF_NO_PI) { @@ -1920,7 +1921,6 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile, skb_reset_mac_header(skb); skb->protocol = pi.proto; - skb->dev = tun->dev; break; case IFF_TAP: if (!frags) -- 2.20.1.97.g81188d93c3-goog ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] tun: hold napi_mutex for all napi operations 2019-01-07 20:02 [PATCH net 1/2] tun: hold napi_mutex for all napi operations Stanislav Fomichev 2019-01-07 20:02 ` [PATCH net 2/2] tun: always set skb->dev to tun->dev Stanislav Fomichev @ 2019-01-07 20:22 ` Eric Dumazet 2019-01-07 21:02 ` Stanislav Fomichev 1 sibling, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2019-01-07 20:22 UTC (permalink / raw) To: Stanislav Fomichev Cc: netdev, David Miller, Jason Wang, Jesper Dangaard Brouer, Michael S. Tsirkin, syzbot On Mon, Jan 7, 2019 at 12:02 PM Stanislav Fomichev <sdf@google.com> wrote: > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1 > Call Trace: > ? napi_gro_frags+0xa7/0x2c0 > tun_get_user+0xb50/0xf20 > tun_chr_write_iter+0x53/0x70 > new_sync_write+0xff/0x160 > vfs_write+0x191/0x1e0 > __x64_sys_write+0x5e/0xd0 > do_syscall_64+0x47/0xf0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > I think there is a subtle race between sending a packet via tap and > attaching it: > > CPU0: CPU1: > tun_chr_ioctl(TUNSETIFF) > tun_set_iff > tun_attach > rcu_assign_pointer(tfile->tun, tun); > tun_fops->write_iter() > tun_chr_write_iter > tun_napi_alloc_frags > napi_get_frags > napi->skb = napi_alloc_skb > tun_napi_init > netif_napi_add > napi->skb = NULL > napi->skb is NULL here > napi_gro_frags > napi_frags_skb > skb = napi->skb > skb_reset_mac_header(skb) > panic() > > To fix, do the following: > * Move rcu_assign_pointer(tfile->tun, tun) to be the last thing we do > in tun_attach(); this should guarantee that when we call tun_get() > we always get an initialized object > * As another safeguard, always grab napi_mutex whenever doing any > napi operation; this should prevent napi state change between > calls to napi_get_frags and napi_gro_frags > > Reported-by: syzbot <syzkaller@googlegroups.com> > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > --- > drivers/net/tun.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index a4fdad475594..7875f06011f2 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -323,22 +323,30 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile, > tfile->napi_enabled = napi_en; > tfile->napi_frags_enabled = napi_en && napi_frags; > if (napi_en) { > + mutex_lock(&tfile->napi_mutex); > netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll, > NAPI_POLL_WEIGHT); > napi_enable(&tfile->napi); > + mutex_unlock(&tfile->napi_mutex); > } > } > > static void tun_napi_disable(struct tun_file *tfile) > { > - if (tfile->napi_enabled) > + if (tfile->napi_enabled) { > + mutex_lock(&tfile->napi_mutex); > napi_disable(&tfile->napi); > + mutex_unlock(&tfile->napi_mutex); > + } > } > > static void tun_napi_del(struct tun_file *tfile) > { > - if (tfile->napi_enabled) > + if (tfile->napi_enabled) { > + mutex_lock(&tfile->napi_mutex); > netif_napi_del(&tfile->napi); > + mutex_unlock(&tfile->napi_mutex); > + } > } > > static bool tun_napi_frags_enabled(const struct tun_file *tfile) > @@ -856,7 +864,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > err = 0; > } > > - rcu_assign_pointer(tfile->tun, tun); > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > tun->numqueues++; > > @@ -876,6 +883,11 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > * refcnt. > */ > > + /* All tun_fops depend on tun_get() returning non-null pointer. > + * Thus, assigning tun to a tfile should be the last init operation, > + * otherwise we risk using half-initialized object. > + */ > + rcu_assign_pointer(tfile->tun, tun); > out: > return err; > } Hmmm I believe the issue is different : We need to call tun_napi_init() before doing the publish in the tun->tfiles[] array My patch was : diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a4fdad47559462fbd049a89f880cc3fb33d1151d..dc751d1cbc21a2e2687c5739b44322cd64d0cb46 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -857,15 +857,15 @@ static int tun_attach(struct tun_struct *tun, struct file *file, } rcu_assign_pointer(tfile->tun, tun); + if (!tfile->detached) { + tun_napi_init(tun, tfile, napi, napi_frags); + sock_hold(&tfile->sk); + } rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); tun->numqueues++; - if (tfile->detached) { + if (tfile->detached) tun_enable_queue(tfile); - } else { - sock_hold(&tfile->sk); - tun_napi_init(tun, tfile, napi, napi_frags); - } if (rtnl_dereference(tun->xdp_prog)) sock_set_flag(&tfile->sk, SOCK_XDP); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] tun: hold napi_mutex for all napi operations 2019-01-07 20:22 ` [PATCH net 1/2] tun: hold napi_mutex for all napi operations Eric Dumazet @ 2019-01-07 21:02 ` Stanislav Fomichev 2019-01-07 21:10 ` Eric Dumazet 0 siblings, 1 reply; 6+ messages in thread From: Stanislav Fomichev @ 2019-01-07 21:02 UTC (permalink / raw) To: Eric Dumazet Cc: Stanislav Fomichev, netdev, David Miller, Jason Wang, Jesper Dangaard Brouer, Michael S. Tsirkin, syzbot On 01/07, Eric Dumazet wrote: > On Mon, Jan 7, 2019 at 12:02 PM Stanislav Fomichev <sdf@google.com> wrote: > > > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1 > > Call Trace: > > ? napi_gro_frags+0xa7/0x2c0 > > tun_get_user+0xb50/0xf20 > > tun_chr_write_iter+0x53/0x70 > > new_sync_write+0xff/0x160 > > vfs_write+0x191/0x1e0 > > __x64_sys_write+0x5e/0xd0 > > do_syscall_64+0x47/0xf0 > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > I think there is a subtle race between sending a packet via tap and > > attaching it: > > > > CPU0: CPU1: > > tun_chr_ioctl(TUNSETIFF) > > tun_set_iff > > tun_attach > > rcu_assign_pointer(tfile->tun, tun); > > tun_fops->write_iter() > > tun_chr_write_iter > > tun_napi_alloc_frags > > napi_get_frags > > napi->skb = napi_alloc_skb > > tun_napi_init > > netif_napi_add > > napi->skb = NULL > > napi->skb is NULL here > > napi_gro_frags > > napi_frags_skb > > skb = napi->skb > > skb_reset_mac_header(skb) > > panic() > > > > To fix, do the following: > > * Move rcu_assign_pointer(tfile->tun, tun) to be the last thing we do > > in tun_attach(); this should guarantee that when we call tun_get() > > we always get an initialized object > > * As another safeguard, always grab napi_mutex whenever doing any > > napi operation; this should prevent napi state change between > > calls to napi_get_frags and napi_gro_frags > > > > Reported-by: syzbot <syzkaller@googlegroups.com> > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") > > > > Signed-off-by: Stanislav Fomichev <sdf@google.com> > > --- > > drivers/net/tun.c | 18 +++++++++++++++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > > index a4fdad475594..7875f06011f2 100644 > > --- a/drivers/net/tun.c > > +++ b/drivers/net/tun.c > > @@ -323,22 +323,30 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile, > > tfile->napi_enabled = napi_en; > > tfile->napi_frags_enabled = napi_en && napi_frags; > > if (napi_en) { > > + mutex_lock(&tfile->napi_mutex); > > netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll, > > NAPI_POLL_WEIGHT); > > napi_enable(&tfile->napi); > > + mutex_unlock(&tfile->napi_mutex); > > } > > } > > > > static void tun_napi_disable(struct tun_file *tfile) > > { > > - if (tfile->napi_enabled) > > + if (tfile->napi_enabled) { > > + mutex_lock(&tfile->napi_mutex); > > napi_disable(&tfile->napi); > > + mutex_unlock(&tfile->napi_mutex); > > + } > > } > > > > static void tun_napi_del(struct tun_file *tfile) > > { > > - if (tfile->napi_enabled) > > + if (tfile->napi_enabled) { > > + mutex_lock(&tfile->napi_mutex); > > netif_napi_del(&tfile->napi); > > + mutex_unlock(&tfile->napi_mutex); > > + } > > } > > > > static bool tun_napi_frags_enabled(const struct tun_file *tfile) > > @@ -856,7 +864,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > > err = 0; > > } > > > > - rcu_assign_pointer(tfile->tun, tun); > > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > > tun->numqueues++; > > > > @@ -876,6 +883,11 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > > * refcnt. > > */ > > > > + /* All tun_fops depend on tun_get() returning non-null pointer. > > + * Thus, assigning tun to a tfile should be the last init operation, > > + * otherwise we risk using half-initialized object. > > + */ > > + rcu_assign_pointer(tfile->tun, tun); > > out: > > return err; > > } > > Hmmm I believe the issue is different : We need to call > tun_napi_init() before doing the publish in the tun->tfiles[] array > > My patch was : Still fails with your patch. Maybe the best way is to move both of those publishes (tfile->tun and tun->tfiles[]) to the end of tun_attach? It looks like tfile->tun is a publish for syscall side (most of tun_socket_ops and tun_fops call tun_get which looks at tun->dev and bail out if it's NULL) and tun->tfiles is (mostly, but not really) for napi side. Here is a repro I'm using if you want to poke it: syz-execprog -threaded -collide -repeat=0 -procs=6 <rep.syz> # See https://goo.gl/kgGztJ for information about syzkaller reproducers. #{"threaded":true,"collide":true,"repeat":true,"procs":6,"sandbox":"none","fault_call":-1,"tun":true,"tmpdir":true,"cgroups":true,"netdev":true,"resetnet":true,"segv":true} openat$tun(0xffffffffffffff9c, 0x0, 0x0, 0x0) r0 = openat$tun(0xffffffffffffff9c, &(0x7f0000001a00)='/dev/net/tun\x00', 0x0, 0x0) ioctl$TUNSETIFF(r0, 0x400454ca, &(0x7f0000000300)={"6e72300100", 0x1132}) r1 = socket$kcm(0x2, 0x3, 0x2) ioctl$PERF_EVENT_IOC_SET_FILTER(r1, 0x8914, &(0x7f0000000780)="6e7230010060a19ef9d2c673d9a1571cb9e1369bcd61ef7e49793ae18712eceb1daa769497800b7fbbd35b170c10751d39aeb660d863e49b8c4f3b3dad48902b5b2d6cfd0abd372c63bcf5d70df3fd4d2e8d443c88bc0e5637dd82fc3435bed4de5d693c9a781c863e05d8a6f8689a5be29216061f3ff53f8b6b396678e7ba155ef9152d7e43b1eccb2331eb8eb1ed5586dcf8b3b0b999361a44ff2c22c2abbef42dd24eabe6723346a6e46c0499a21442d8d00dcb57f013ff7595edd0ff076930de3675d34117a44eb0e4f832936da44e57e43a3e36bd48d2a85bf4fd4a804e83f2f3cf378a435af5e287d4e27337b4ada11b26219832ec6b2b38446b3b95fe3771e9f42ca30fb21e12f0a3d8bc2d85454af9fcc0232d8fd909448b01f46c593d31ea1c926465e35a4199079c3ca41128b17cb01fbf5b522be0fd02022ada37fecc14b6c8c8831883b85a1106f2f867020d529f17a350f20dd3bf51a98cfda70c2e3638a483fd3f87940bb478b07c4c110394c0093d17955089f2ca97bbe075124c9b1ff6500 d536a95d96f03d48596e008bf0a028b539cec796cec9bf585eb80fe3e0d26") perf_event_open$cgroup(&(0x7f0000000440)={0x7, 0x70, 0x1, 0x0, 0x6e36, 0x38000000, 0x0, 0x1, 0x21060, 0x0, 0x0, 0x0, 0x20, 0x3, 0xfc50, 0x9, 0x8000, 0x0, 0x0, 0x0, 0x46c8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x8, 0x3f, 0x401, 0x8001, 0x8, 0x9, 0x0, 0x0, 0x1, 0x1, @perf_config_ext={0x100000000, 0x6}, 0x10, 0x5, 0x2, 0x7, 0xfffffffffffffff9, 0x0, 0x7}, 0xffffffffffffffff, 0x5, 0xffffffffffffffff, 0x0) # 0x2 = O_RDWR r2 = openat$tun(0xffffffffffffff9c, &(0x7f0000001a00)='/dev/net/tun\x00', 0x2, 0x0) # IFF_TAP 0x0002 # IFF_NAPI 0x0010 # IFF_NAPI_FRAGS 0x0020 # IFF_MULTI_QUEUE 0x0100 # IFF_MULTICAST = 0x1000 ioctl$TUNSETIFF(r2, 0x400454ca, &(0x7f0000000300)={"6e72300100", 0x1132}) write$cgroup_int(r2, &(0x7f0000000000), 0x17b) > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c > index a4fdad47559462fbd049a89f880cc3fb33d1151d..dc751d1cbc21a2e2687c5739b44322cd64d0cb46 > 100644 > --- a/drivers/net/tun.c > +++ b/drivers/net/tun.c > @@ -857,15 +857,15 @@ static int tun_attach(struct tun_struct *tun, > struct file *file, > } > > rcu_assign_pointer(tfile->tun, tun); > + if (!tfile->detached) { > + tun_napi_init(tun, tfile, napi, napi_frags); > + sock_hold(&tfile->sk); > + } > rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > tun->numqueues++; > > - if (tfile->detached) { > + if (tfile->detached) > tun_enable_queue(tfile); > - } else { > - sock_hold(&tfile->sk); > - tun_napi_init(tun, tfile, napi, napi_frags); > - } > > if (rtnl_dereference(tun->xdp_prog)) > sock_set_flag(&tfile->sk, SOCK_XDP); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] tun: hold napi_mutex for all napi operations 2019-01-07 21:02 ` Stanislav Fomichev @ 2019-01-07 21:10 ` Eric Dumazet 2019-01-07 21:29 ` Stanislav Fomichev 0 siblings, 1 reply; 6+ messages in thread From: Eric Dumazet @ 2019-01-07 21:10 UTC (permalink / raw) To: Stanislav Fomichev, Eric Dumazet Cc: Stanislav Fomichev, netdev, David Miller, Jason Wang, Jesper Dangaard Brouer, Michael S. Tsirkin, syzbot On 01/07/2019 01:02 PM, Stanislav Fomichev wrote: > On 01/07, Eric Dumazet wrote: >> On Mon, Jan 7, 2019 at 12:02 PM Stanislav Fomichev <sdf@google.com> wrote: >>> >>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1 >>> Call Trace: >>> ? napi_gro_frags+0xa7/0x2c0 >>> tun_get_user+0xb50/0xf20 >>> tun_chr_write_iter+0x53/0x70 >>> new_sync_write+0xff/0x160 >>> vfs_write+0x191/0x1e0 >>> __x64_sys_write+0x5e/0xd0 >>> do_syscall_64+0x47/0xf0 >>> entry_SYSCALL_64_after_hwframe+0x44/0xa9 >>> >>> I think there is a subtle race between sending a packet via tap and >>> attaching it: >>> >>> CPU0: CPU1: >>> tun_chr_ioctl(TUNSETIFF) >>> tun_set_iff >>> tun_attach >>> rcu_assign_pointer(tfile->tun, tun); >>> tun_fops->write_iter() >>> tun_chr_write_iter >>> tun_napi_alloc_frags >>> napi_get_frags >>> napi->skb = napi_alloc_skb >>> tun_napi_init >>> netif_napi_add >>> napi->skb = NULL >>> napi->skb is NULL here >>> napi_gro_frags >>> napi_frags_skb >>> skb = napi->skb >>> skb_reset_mac_header(skb) >>> panic() >>> >>> To fix, do the following: >>> * Move rcu_assign_pointer(tfile->tun, tun) to be the last thing we do >>> in tun_attach(); this should guarantee that when we call tun_get() >>> we always get an initialized object >>> * As another safeguard, always grab napi_mutex whenever doing any >>> napi operation; this should prevent napi state change between >>> calls to napi_get_frags and napi_gro_frags >>> >>> Reported-by: syzbot <syzkaller@googlegroups.com> >>> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") >>> >>> Signed-off-by: Stanislav Fomichev <sdf@google.com> >>> --- >>> drivers/net/tun.c | 18 +++++++++++++++--- >>> 1 file changed, 15 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c >>> index a4fdad475594..7875f06011f2 100644 >>> --- a/drivers/net/tun.c >>> +++ b/drivers/net/tun.c >>> @@ -323,22 +323,30 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile, >>> tfile->napi_enabled = napi_en; >>> tfile->napi_frags_enabled = napi_en && napi_frags; >>> if (napi_en) { >>> + mutex_lock(&tfile->napi_mutex); >>> netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll, >>> NAPI_POLL_WEIGHT); >>> napi_enable(&tfile->napi); >>> + mutex_unlock(&tfile->napi_mutex); >>> } >>> } >>> >>> static void tun_napi_disable(struct tun_file *tfile) >>> { >>> - if (tfile->napi_enabled) >>> + if (tfile->napi_enabled) { >>> + mutex_lock(&tfile->napi_mutex); >>> napi_disable(&tfile->napi); >>> + mutex_unlock(&tfile->napi_mutex); >>> + } >>> } >>> >>> static void tun_napi_del(struct tun_file *tfile) >>> { >>> - if (tfile->napi_enabled) >>> + if (tfile->napi_enabled) { >>> + mutex_lock(&tfile->napi_mutex); >>> netif_napi_del(&tfile->napi); >>> + mutex_unlock(&tfile->napi_mutex); >>> + } >>> } >>> >>> static bool tun_napi_frags_enabled(const struct tun_file *tfile) >>> @@ -856,7 +864,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file, >>> err = 0; >>> } >>> >>> - rcu_assign_pointer(tfile->tun, tun); >>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); >>> tun->numqueues++; >>> >>> @@ -876,6 +883,11 @@ static int tun_attach(struct tun_struct *tun, struct file *file, >>> * refcnt. >>> */ >>> >>> + /* All tun_fops depend on tun_get() returning non-null pointer. >>> + * Thus, assigning tun to a tfile should be the last init operation, >>> + * otherwise we risk using half-initialized object. >>> + */ >>> + rcu_assign_pointer(tfile->tun, tun); >>> out: >>> return err; >>> } >> >> Hmmm I believe the issue is different : We need to call >> tun_napi_init() before doing the publish in the tun->tfiles[] array >> >> My patch was : > Still fails with your patch. > > Maybe the best way is to move both of those publishes (tfile->tun and > tun->tfiles[]) to the end of tun_attach? It looks like tfile->tun > is a publish for syscall side (most of tun_socket_ops and tun_fops call > tun_get which looks at tun->dev and bail out if it's NULL) and tun->tfiles > is (mostly, but not really) for napi side. > > Here is a repro I'm using if you want to poke it: > > syz-execprog -threaded -collide -repeat=0 -procs=6 <rep.syz> > > # See https://goo.gl/kgGztJ for information about syzkaller reproducers. > #{"threaded":true,"collide":true,"repeat":true,"procs":6,"sandbox":"none","fault_call":-1,"tun":true,"tmpdir":true,"cgroups":true,"netdev":true,"resetnet":true,"segv":true} > openat$tun(0xffffffffffffff9c, 0x0, 0x0, 0x0) > r0 = openat$tun(0xffffffffffffff9c, > &(0x7f0000001a00)='/dev/net/tun\x00', 0x0, 0x0) > ioctl$TUNSETIFF(r0, 0x400454ca, &(0x7f0000000300)={"6e72300100", > 0x1132}) > r1 = socket$kcm(0x2, 0x3, 0x2) > ioctl$PERF_EVENT_IOC_SET_FILTER(r1, 0x8914, > &(0x7f0000000780)="6e7230010060a19ef9d2c673d9a1571cb9e1369bcd61ef7e49793ae18712eceb1daa769497800b7fbbd35b170c10751d39aeb660d863e49b8c4f3b3dad48902b5b2d6cfd0abd372c63bcf5d70df3fd4d2e8d443c88bc0e5637dd82fc3435bed4de5d693c9a781c863e05d8a6f8689a5be29216061f3ff53f8b6b396678e7ba155ef9152d7e43b1eccb2331eb8eb1ed5586dcf8b3b0b999361a44ff2c22c2abbef42dd24eabe6723346a6e46c0499a21442d8d00dcb57f013ff7595edd0ff076930de3675d34117a44eb0e4f832936da44e57e43a3e36bd48d2a85bf4fd4a804e83f2f3cf378a435af5e287d4e27337b4ada11b26219832ec6b2b38446b3b95fe3771e9f42ca30fb21e12f0a3d8bc2d85454af9fcc0232d8fd909448b01f46c593d31ea1c926465e35a4199079c3ca41128b17cb01fbf5b522be0fd02022ada37fecc14b6c8c8831883b85a1106f2f867020d529f17a350f20dd3bf51a98cfda70c2e3638a483fd3f87940bb478b07c4c110394c0093d17955089f2ca97bbe075124c9b1ff65 00d536a95d96f03d48596e008bf0a028b539cec796cec9bf585eb80fe3e0d26") > perf_event_open$cgroup(&(0x7f0000000440)={0x7, 0x70, 0x1, 0x0, 0x6e36, > 0x38000000, 0x0, 0x1, 0x21060, 0x0, 0x0, 0x0, 0x20, 0x3, 0xfc50, 0x9, > 0x8000, 0x0, 0x0, 0x0, 0x46c8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > 0x0, 0x0, 0x8, 0x3f, 0x401, 0x8001, 0x8, 0x9, 0x0, 0x0, 0x1, 0x1, > @perf_config_ext={0x100000000, 0x6}, 0x10, 0x5, 0x2, 0x7, > 0xfffffffffffffff9, 0x0, 0x7}, 0xffffffffffffffff, 0x5, > 0xffffffffffffffff, 0x0) > # 0x2 = O_RDWR > r2 = openat$tun(0xffffffffffffff9c, > &(0x7f0000001a00)='/dev/net/tun\x00', 0x2, 0x0) > # IFF_TAP 0x0002 > # IFF_NAPI 0x0010 > # IFF_NAPI_FRAGS 0x0020 > # IFF_MULTI_QUEUE 0x0100 > # IFF_MULTICAST = 0x1000 > ioctl$TUNSETIFF(r2, 0x400454ca, &(0x7f0000000300)={"6e72300100", > 0x1132}) > write$cgroup_int(r2, &(0x7f0000000000), 0x17b) > > I dunno, I would prefer to not throw all these mutex_lock(&tfile->napi_mutex)/mutex_unlock(&tfile->napi_mutex) all over the places. Please publish a minimal patches, or a patch series explaining why each fix is needed. I do not really see why tun_napi_disable() and/or tun_napi_del() needs extra synchronization. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net 1/2] tun: hold napi_mutex for all napi operations 2019-01-07 21:10 ` Eric Dumazet @ 2019-01-07 21:29 ` Stanislav Fomichev 0 siblings, 0 replies; 6+ messages in thread From: Stanislav Fomichev @ 2019-01-07 21:29 UTC (permalink / raw) To: Eric Dumazet Cc: Eric Dumazet, Stanislav Fomichev, netdev, David Miller, Jason Wang, Jesper Dangaard Brouer, Michael S. Tsirkin, syzbot On 01/07, Eric Dumazet wrote: > > > On 01/07/2019 01:02 PM, Stanislav Fomichev wrote: > > On 01/07, Eric Dumazet wrote: > >> On Mon, Jan 7, 2019 at 12:02 PM Stanislav Fomichev <sdf@google.com> wrote: > >>> > >>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000d1 > >>> Call Trace: > >>> ? napi_gro_frags+0xa7/0x2c0 > >>> tun_get_user+0xb50/0xf20 > >>> tun_chr_write_iter+0x53/0x70 > >>> new_sync_write+0xff/0x160 > >>> vfs_write+0x191/0x1e0 > >>> __x64_sys_write+0x5e/0xd0 > >>> do_syscall_64+0x47/0xf0 > >>> entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >>> > >>> I think there is a subtle race between sending a packet via tap and > >>> attaching it: > >>> > >>> CPU0: CPU1: > >>> tun_chr_ioctl(TUNSETIFF) > >>> tun_set_iff > >>> tun_attach > >>> rcu_assign_pointer(tfile->tun, tun); > >>> tun_fops->write_iter() > >>> tun_chr_write_iter > >>> tun_napi_alloc_frags > >>> napi_get_frags > >>> napi->skb = napi_alloc_skb > >>> tun_napi_init > >>> netif_napi_add > >>> napi->skb = NULL > >>> napi->skb is NULL here > >>> napi_gro_frags > >>> napi_frags_skb > >>> skb = napi->skb > >>> skb_reset_mac_header(skb) > >>> panic() > >>> > >>> To fix, do the following: > >>> * Move rcu_assign_pointer(tfile->tun, tun) to be the last thing we do > >>> in tun_attach(); this should guarantee that when we call tun_get() > >>> we always get an initialized object > >>> * As another safeguard, always grab napi_mutex whenever doing any > >>> napi operation; this should prevent napi state change between > >>> calls to napi_get_frags and napi_gro_frags > >>> > >>> Reported-by: syzbot <syzkaller@googlegroups.com> > >>> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver") > >>> > >>> Signed-off-by: Stanislav Fomichev <sdf@google.com> > >>> --- > >>> drivers/net/tun.c | 18 +++++++++++++++--- > >>> 1 file changed, 15 insertions(+), 3 deletions(-) > >>> > >>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c > >>> index a4fdad475594..7875f06011f2 100644 > >>> --- a/drivers/net/tun.c > >>> +++ b/drivers/net/tun.c > >>> @@ -323,22 +323,30 @@ static void tun_napi_init(struct tun_struct *tun, struct tun_file *tfile, > >>> tfile->napi_enabled = napi_en; > >>> tfile->napi_frags_enabled = napi_en && napi_frags; > >>> if (napi_en) { > >>> + mutex_lock(&tfile->napi_mutex); > >>> netif_napi_add(tun->dev, &tfile->napi, tun_napi_poll, > >>> NAPI_POLL_WEIGHT); > >>> napi_enable(&tfile->napi); > >>> + mutex_unlock(&tfile->napi_mutex); > >>> } > >>> } > >>> > >>> static void tun_napi_disable(struct tun_file *tfile) > >>> { > >>> - if (tfile->napi_enabled) > >>> + if (tfile->napi_enabled) { > >>> + mutex_lock(&tfile->napi_mutex); > >>> napi_disable(&tfile->napi); > >>> + mutex_unlock(&tfile->napi_mutex); > >>> + } > >>> } > >>> > >>> static void tun_napi_del(struct tun_file *tfile) > >>> { > >>> - if (tfile->napi_enabled) > >>> + if (tfile->napi_enabled) { > >>> + mutex_lock(&tfile->napi_mutex); > >>> netif_napi_del(&tfile->napi); > >>> + mutex_unlock(&tfile->napi_mutex); > >>> + } > >>> } > >>> > >>> static bool tun_napi_frags_enabled(const struct tun_file *tfile) > >>> @@ -856,7 +864,6 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > >>> err = 0; > >>> } > >>> > >>> - rcu_assign_pointer(tfile->tun, tun); > >>> rcu_assign_pointer(tun->tfiles[tun->numqueues], tfile); > >>> tun->numqueues++; > >>> > >>> @@ -876,6 +883,11 @@ static int tun_attach(struct tun_struct *tun, struct file *file, > >>> * refcnt. > >>> */ > >>> > >>> + /* All tun_fops depend on tun_get() returning non-null pointer. > >>> + * Thus, assigning tun to a tfile should be the last init operation, > >>> + * otherwise we risk using half-initialized object. > >>> + */ > >>> + rcu_assign_pointer(tfile->tun, tun); > >>> out: > >>> return err; > >>> } > >> > >> Hmmm I believe the issue is different : We need to call > >> tun_napi_init() before doing the publish in the tun->tfiles[] array > >> > >> My patch was : > > Still fails with your patch. > > > > Maybe the best way is to move both of those publishes (tfile->tun and > > tun->tfiles[]) to the end of tun_attach? It looks like tfile->tun > > is a publish for syscall side (most of tun_socket_ops and tun_fops call > > tun_get which looks at tun->dev and bail out if it's NULL) and tun->tfiles > > is (mostly, but not really) for napi side. > > > > Here is a repro I'm using if you want to poke it: > > > > syz-execprog -threaded -collide -repeat=0 -procs=6 <rep.syz> > > > > # See https://goo.gl/kgGztJ for information about syzkaller reproducers. > > #{"threaded":true,"collide":true,"repeat":true,"procs":6,"sandbox":"none","fault_call":-1,"tun":true,"tmpdir":true,"cgroups":true,"netdev":true,"resetnet":true,"segv":true} > > openat$tun(0xffffffffffffff9c, 0x0, 0x0, 0x0) > > r0 = openat$tun(0xffffffffffffff9c, > > &(0x7f0000001a00)='/dev/net/tun\x00', 0x0, 0x0) > > ioctl$TUNSETIFF(r0, 0x400454ca, &(0x7f0000000300)={"6e72300100", > > 0x1132}) > > r1 = socket$kcm(0x2, 0x3, 0x2) > > ioctl$PERF_EVENT_IOC_SET_FILTER(r1, 0x8914, > > &(0x7f0000000780)="6e7230010060a19ef9d2c673d9a1571cb9e1369bcd61ef7e49793ae18712eceb1daa769497800b7fbbd35b170c10751d39aeb660d863e49b8c4f3b3dad48902b5b2d6cfd0abd372c63bcf5d70df3fd4d2e8d443c88bc0e5637dd82fc3435bed4de5d693c9a781c863e05d8a6f8689a5be29216061f3ff53f8b6b396678e7ba155ef9152d7e43b1eccb2331eb8eb1ed5586dcf8b3b0b999361a44ff2c22c2abbef42dd24eabe6723346a6e46c0499a21442d8d00dcb57f013ff7595edd0ff076930de3675d34117a44eb0e4f832936da44e57e43a3e36bd48d2a85bf4fd4a804e83f2f3cf378a435af5e287d4e27337b4ada11b26219832ec6b2b38446b3b95fe3771e9f42ca30fb21e12f0a3d8bc2d85454af9fcc0232d8fd909448b01f46c593d31ea1c926465e35a4199079c3ca41128b17cb01fbf5b522be0fd02022ada37fecc14b6c8c8831883b85a1106f2f867020d529f17a350f20dd3bf51a98cfda70c2e3638a483fd3f87940bb478b07c4c110394c0093d17955089f2ca97bbe075124c9b1ff 6500d536a95d96f03d48596e008bf0a028b539cec796cec9bf585eb80fe3e0d26") > > perf_event_open$cgroup(&(0x7f0000000440)={0x7, 0x70, 0x1, 0x0, 0x6e36, > > 0x38000000, 0x0, 0x1, 0x21060, 0x0, 0x0, 0x0, 0x20, 0x3, 0xfc50, 0x9, > > 0x8000, 0x0, 0x0, 0x0, 0x46c8, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, > > 0x0, 0x0, 0x8, 0x3f, 0x401, 0x8001, 0x8, 0x9, 0x0, 0x0, 0x1, 0x1, > > @perf_config_ext={0x100000000, 0x6}, 0x10, 0x5, 0x2, 0x7, > > 0xfffffffffffffff9, 0x0, 0x7}, 0xffffffffffffffff, 0x5, > > 0xffffffffffffffff, 0x0) > > # 0x2 = O_RDWR > > r2 = openat$tun(0xffffffffffffff9c, > > &(0x7f0000001a00)='/dev/net/tun\x00', 0x2, 0x0) > > # IFF_TAP 0x0002 > > # IFF_NAPI 0x0010 > > # IFF_NAPI_FRAGS 0x0020 > > # IFF_MULTI_QUEUE 0x0100 > > # IFF_MULTICAST = 0x1000 > > ioctl$TUNSETIFF(r2, 0x400454ca, &(0x7f0000000300)={"6e72300100", > > 0x1132}) > > write$cgroup_int(r2, &(0x7f0000000000), 0x17b) > > > > > > I dunno, I would prefer to not throw all these > mutex_lock(&tfile->napi_mutex)/mutex_unlock(&tfile->napi_mutex) all over the places. > > Please publish a minimal patches, or a patch series explaining why each fix is needed. > > I do not really see why tun_napi_disable() and/or tun_napi_del() needs extra synchronization. They don't, I've added them mostly for consistency. The main issue I was trying to fix (and the one I think I'm hitting) is the one in tun_get_user, where we do the following: 1. mutex_lock(&tfile->napi_mutex); 2. skb = tun_napi_alloc_frags(tfile, copylen, from); * skb = napi_get_frags(&tfile->napi); * napi->skb = napi_alloc_skb() 3. <something happens here that sets napi->skb to NULL> 4. napi_gro_frags(&tfile->napi); * skb = napi_frags_skb(napi); * struct sk_buff *skb = napi->skb; * skb_reset_mac_header(skb); * null pointer deref 5. mutex_unlock(&tfile->napi_mutex); We can basically always grab napi_mutex for all napi-related operations to make sure nothing happens to napi->skb in between napi_get_frags and napi_frags_skb or don't publish tfile->tun so tun_get_user never gets called. I'll send a v2 without napi_mutex'es, I've added them mostly as an additional safeguard. Publishing both tfile->tun and tun->tfiles at the end of tun_attach should be enough. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-01-07 21:29 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-01-07 20:02 [PATCH net 1/2] tun: hold napi_mutex for all napi operations Stanislav Fomichev 2019-01-07 20:02 ` [PATCH net 2/2] tun: always set skb->dev to tun->dev Stanislav Fomichev 2019-01-07 20:22 ` [PATCH net 1/2] tun: hold napi_mutex for all napi operations Eric Dumazet 2019-01-07 21:02 ` Stanislav Fomichev 2019-01-07 21:10 ` Eric Dumazet 2019-01-07 21:29 ` Stanislav Fomichev
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).