* [PATCH v2 0/2] nbd: fix use-after-freed and double lock bugs @ 2020-11-03 3:07 xiubli 2020-11-03 3:07 ` [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq xiubli 2020-11-03 3:07 ` [PATCH v2 2/2] nbd: add comments about double lock for config_lock confusion xiubli 0 siblings, 2 replies; 5+ messages in thread From: xiubli @ 2020-11-03 3:07 UTC (permalink / raw) To: josef, axboe, ming.lei; +Cc: nbd, linux-block, jdillama, mgolub, Xiubo Li From: Xiubo Li <xiubli@redhat.com> Changed in V2: - fixed possible double lock issue in recv_work(). - adding comments instead to explain why there won't have double lock issue. Xiubo Li (2): nbd: fix use-after-freed crash for nbd->recv_workq nbd: add comments about double lock for config_lock confusion drivers/block/nbd.c | 36 +++++++++++++++++++++++++++++++----- 1 file changed, 31 insertions(+), 5 deletions(-) -- 2.18.4 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq 2020-11-03 3:07 [PATCH v2 0/2] nbd: fix use-after-freed and double lock bugs xiubli @ 2020-11-03 3:07 ` xiubli 2020-11-03 4:12 ` Ming Lei 2020-11-03 3:07 ` [PATCH v2 2/2] nbd: add comments about double lock for config_lock confusion xiubli 1 sibling, 1 reply; 5+ messages in thread From: xiubli @ 2020-11-03 3:07 UTC (permalink / raw) To: josef, axboe, ming.lei; +Cc: nbd, linux-block, jdillama, mgolub, Xiubo Li From: Xiubo Li <xiubli@redhat.com> The crash call trace: <6>[ 1012.319386] block nbd1: NBD_DISCONNECT <1>[ 1012.319437] BUG: kernel NULL pointer dereference, address: 0000000000000020 <1>[ 1012.319439] #PF: supervisor write access in kernel mode <1>[ 1012.319441] #PF: error_code(0x0002) - not-present page <6>[ 1012.319442] PGD 0 P4D 0 <4>[ 1012.319448] Oops: 0002 [#1] SMP NOPTI <4>[ 1012.319454] CPU: 9 PID: 25111 Comm: rbd-nbd Tainted: G E 5.9.0+ #6 [...] <4>[ 1012.319505] PKRU: 55555554 <4>[ 1012.319506] Call Trace: <4>[ 1012.319560] flush_workqueue+0x81/0x440 <4>[ 1012.319598] nbd_disconnect_and_put+0x50/0x70 [nbd] <4>[ 1012.319607] nbd_genl_disconnect+0xc7/0x170 [nbd] <4>[ 1012.319627] genl_rcv_msg+0x1dd/0x2f9 <4>[ 1012.319642] ? genl_start+0x140/0x140 <4>[ 1012.319644] netlink_rcv_skb+0x49/0x110 <4>[ 1012.319649] genl_rcv+0x24/0x40 <4>[ 1012.319651] netlink_unicast+0x1a5/0x280 <4>[ 1012.319653] netlink_sendmsg+0x23d/0x470 <4>[ 1012.319667] sock_sendmsg+0x5b/0x60 <4>[ 1012.319676] ____sys_sendmsg+0x1ef/0x260 <4>[ 1012.319679] ? copy_msghdr_from_user+0x5c/0x90 <4>[ 1012.319680] ? ____sys_recvmsg+0xa5/0x180 <4>[ 1012.319682] ___sys_sendmsg+0x7c/0xc0 <4>[ 1012.319683] ? copy_msghdr_from_user+0x5c/0x90 <4>[ 1012.319685] ? ___sys_recvmsg+0x89/0xc0 <4>[ 1012.319692] ? __wake_up_common_lock+0x87/0xc0 <4>[ 1012.319715] ? __check_object_size+0x46/0x173 <4>[ 1012.319727] ? _copy_to_user+0x22/0x30 <4>[ 1012.319729] ? move_addr_to_user+0xc3/0x100 <4>[ 1012.319731] __sys_sendmsg+0x57/0xa0 <4>[ 1012.319744] do_syscall_64+0x33/0x40 <4>[ 1012.319760] entry_SYSCALL_64_after_hwframe+0x44/0xa9 <4>[ 1012.319780] RIP: 0033:0x7f5baa8e3ad5 In case the reference of nbd->config has reached zero and the related resource has been released, if another process tries to send the DISCONENCT cmd by using the netlink, it will potentially crash like this. Signed-off-by: Xiubo Li <xiubli@redhat.com> --- drivers/block/nbd.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index f46e26c9d9b3..3bb8281bb753 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -2003,16 +2003,31 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) static void nbd_disconnect_and_put(struct nbd_device *nbd) { + bool flush = true; + mutex_lock(&nbd->config_lock); nbd_disconnect(nbd); nbd_clear_sock(nbd); - mutex_unlock(&nbd->config_lock); /* - * Make sure recv thread has finished, so it does not drop the last - * config ref and try to destroy the workqueue from inside the work - * queue. + * In very rare case that the nbd->recv_workq may already have been + * released by the recv_work(). */ - flush_workqueue(nbd->recv_workq); + if (likely(!nbd->recv_workq)) + refcount_inc(&nbd->config_refs); + else + flush = false; + mutex_unlock(&nbd->config_lock); + + if (flush) { + /* + * Make sure recv thread has finished, so it does not drop + * the last config ref and try to destroy the workqueue + * from inside the work queue. + */ + flush_workqueue(nbd->recv_workq); + nbd_config_put(nbd); + } + if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, &nbd->config->runtime_flags)) nbd_config_put(nbd); -- 2.18.4 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq 2020-11-03 3:07 ` [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq xiubli @ 2020-11-03 4:12 ` Ming Lei 2020-11-03 6:40 ` Xiubo Li 0 siblings, 1 reply; 5+ messages in thread From: Ming Lei @ 2020-11-03 4:12 UTC (permalink / raw) To: xiubli; +Cc: josef, axboe, nbd, linux-block, jdillama, mgolub On Mon, Nov 02, 2020 at 10:07:57PM -0500, xiubli@redhat.com wrote: > From: Xiubo Li <xiubli@redhat.com> > > The crash call trace: > > <6>[ 1012.319386] block nbd1: NBD_DISCONNECT > <1>[ 1012.319437] BUG: kernel NULL pointer dereference, address: 0000000000000020 > <1>[ 1012.319439] #PF: supervisor write access in kernel mode > <1>[ 1012.319441] #PF: error_code(0x0002) - not-present page > <6>[ 1012.319442] PGD 0 P4D 0 > <4>[ 1012.319448] Oops: 0002 [#1] SMP NOPTI > <4>[ 1012.319454] CPU: 9 PID: 25111 Comm: rbd-nbd Tainted: G E 5.9.0+ #6 > [...] > <4>[ 1012.319505] PKRU: 55555554 > <4>[ 1012.319506] Call Trace: > <4>[ 1012.319560] flush_workqueue+0x81/0x440 > <4>[ 1012.319598] nbd_disconnect_and_put+0x50/0x70 [nbd] > <4>[ 1012.319607] nbd_genl_disconnect+0xc7/0x170 [nbd] > <4>[ 1012.319627] genl_rcv_msg+0x1dd/0x2f9 > <4>[ 1012.319642] ? genl_start+0x140/0x140 > <4>[ 1012.319644] netlink_rcv_skb+0x49/0x110 > <4>[ 1012.319649] genl_rcv+0x24/0x40 > <4>[ 1012.319651] netlink_unicast+0x1a5/0x280 > <4>[ 1012.319653] netlink_sendmsg+0x23d/0x470 > <4>[ 1012.319667] sock_sendmsg+0x5b/0x60 > <4>[ 1012.319676] ____sys_sendmsg+0x1ef/0x260 > <4>[ 1012.319679] ? copy_msghdr_from_user+0x5c/0x90 > <4>[ 1012.319680] ? ____sys_recvmsg+0xa5/0x180 > <4>[ 1012.319682] ___sys_sendmsg+0x7c/0xc0 > <4>[ 1012.319683] ? copy_msghdr_from_user+0x5c/0x90 > <4>[ 1012.319685] ? ___sys_recvmsg+0x89/0xc0 > <4>[ 1012.319692] ? __wake_up_common_lock+0x87/0xc0 > <4>[ 1012.319715] ? __check_object_size+0x46/0x173 > <4>[ 1012.319727] ? _copy_to_user+0x22/0x30 > <4>[ 1012.319729] ? move_addr_to_user+0xc3/0x100 > <4>[ 1012.319731] __sys_sendmsg+0x57/0xa0 > <4>[ 1012.319744] do_syscall_64+0x33/0x40 > <4>[ 1012.319760] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > <4>[ 1012.319780] RIP: 0033:0x7f5baa8e3ad5 > > In case the reference of nbd->config has reached zero and the > related resource has been released, if another process tries to > send the DISCONENCT cmd by using the netlink, it will potentially > crash like this. > > Signed-off-by: Xiubo Li <xiubli@redhat.com> > --- > drivers/block/nbd.c | 25 ++++++++++++++++++++----- > 1 file changed, 20 insertions(+), 5 deletions(-) > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c > index f46e26c9d9b3..3bb8281bb753 100644 > --- a/drivers/block/nbd.c > +++ b/drivers/block/nbd.c > @@ -2003,16 +2003,31 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) > > static void nbd_disconnect_and_put(struct nbd_device *nbd) > { > + bool flush = true; > + > mutex_lock(&nbd->config_lock); > nbd_disconnect(nbd); > nbd_clear_sock(nbd); > - mutex_unlock(&nbd->config_lock); > /* > - * Make sure recv thread has finished, so it does not drop the last > - * config ref and try to destroy the workqueue from inside the work > - * queue. > + * In very rare case that the nbd->recv_workq may already have been > + * released by the recv_work(). > */ > - flush_workqueue(nbd->recv_workq); > + if (likely(!nbd->recv_workq)) It is actually unlikely, but doesn't make any sense to annotate the check since it isn't fast path, so please remove it. > + refcount_inc(&nbd->config_refs); > + else > + flush = false; > + mutex_unlock(&nbd->config_lock); > + > + if (flush) { > + /* > + * Make sure recv thread has finished, so it does not drop > + * the last config ref and try to destroy the workqueue > + * from inside the work queue. > + */ > + flush_workqueue(nbd->recv_workq); > + nbd_config_put(nbd); > + } > + > if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, > &nbd->config->runtime_flags)) > nbd_config_put(nbd); The approach is fine, and the reason is that nbd_disconnect_and_put() still can be called via netlink when the nbd device has been closed: Once the above likely() is removed, feel free to add: Reviewed-by: Ming Lei <ming.lei@redhat.com> Thanks, Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq 2020-11-03 4:12 ` Ming Lei @ 2020-11-03 6:40 ` Xiubo Li 0 siblings, 0 replies; 5+ messages in thread From: Xiubo Li @ 2020-11-03 6:40 UTC (permalink / raw) To: Ming Lei; +Cc: josef, axboe, nbd, linux-block, jdillama, mgolub On 2020/11/3 12:12, Ming Lei wrote: > On Mon, Nov 02, 2020 at 10:07:57PM -0500, xiubli@redhat.com wrote: >> From: Xiubo Li <xiubli@redhat.com> >> >> The crash call trace: >> >> <6>[ 1012.319386] block nbd1: NBD_DISCONNECT >> <1>[ 1012.319437] BUG: kernel NULL pointer dereference, address: 0000000000000020 >> <1>[ 1012.319439] #PF: supervisor write access in kernel mode >> <1>[ 1012.319441] #PF: error_code(0x0002) - not-present page >> <6>[ 1012.319442] PGD 0 P4D 0 >> <4>[ 1012.319448] Oops: 0002 [#1] SMP NOPTI >> <4>[ 1012.319454] CPU: 9 PID: 25111 Comm: rbd-nbd Tainted: G E 5.9.0+ #6 >> [...] >> <4>[ 1012.319505] PKRU: 55555554 >> <4>[ 1012.319506] Call Trace: >> <4>[ 1012.319560] flush_workqueue+0x81/0x440 >> <4>[ 1012.319598] nbd_disconnect_and_put+0x50/0x70 [nbd] >> <4>[ 1012.319607] nbd_genl_disconnect+0xc7/0x170 [nbd] >> <4>[ 1012.319627] genl_rcv_msg+0x1dd/0x2f9 >> <4>[ 1012.319642] ? genl_start+0x140/0x140 >> <4>[ 1012.319644] netlink_rcv_skb+0x49/0x110 >> <4>[ 1012.319649] genl_rcv+0x24/0x40 >> <4>[ 1012.319651] netlink_unicast+0x1a5/0x280 >> <4>[ 1012.319653] netlink_sendmsg+0x23d/0x470 >> <4>[ 1012.319667] sock_sendmsg+0x5b/0x60 >> <4>[ 1012.319676] ____sys_sendmsg+0x1ef/0x260 >> <4>[ 1012.319679] ? copy_msghdr_from_user+0x5c/0x90 >> <4>[ 1012.319680] ? ____sys_recvmsg+0xa5/0x180 >> <4>[ 1012.319682] ___sys_sendmsg+0x7c/0xc0 >> <4>[ 1012.319683] ? copy_msghdr_from_user+0x5c/0x90 >> <4>[ 1012.319685] ? ___sys_recvmsg+0x89/0xc0 >> <4>[ 1012.319692] ? __wake_up_common_lock+0x87/0xc0 >> <4>[ 1012.319715] ? __check_object_size+0x46/0x173 >> <4>[ 1012.319727] ? _copy_to_user+0x22/0x30 >> <4>[ 1012.319729] ? move_addr_to_user+0xc3/0x100 >> <4>[ 1012.319731] __sys_sendmsg+0x57/0xa0 >> <4>[ 1012.319744] do_syscall_64+0x33/0x40 >> <4>[ 1012.319760] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> <4>[ 1012.319780] RIP: 0033:0x7f5baa8e3ad5 >> >> In case the reference of nbd->config has reached zero and the >> related resource has been released, if another process tries to >> send the DISCONENCT cmd by using the netlink, it will potentially >> crash like this. >> >> Signed-off-by: Xiubo Li <xiubli@redhat.com> >> --- >> drivers/block/nbd.c | 25 ++++++++++++++++++++----- >> 1 file changed, 20 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> index f46e26c9d9b3..3bb8281bb753 100644 >> --- a/drivers/block/nbd.c >> +++ b/drivers/block/nbd.c >> @@ -2003,16 +2003,31 @@ static int nbd_genl_connect(struct sk_buff *skb, struct genl_info *info) >> >> static void nbd_disconnect_and_put(struct nbd_device *nbd) >> { >> + bool flush = true; >> + >> mutex_lock(&nbd->config_lock); >> nbd_disconnect(nbd); >> nbd_clear_sock(nbd); >> - mutex_unlock(&nbd->config_lock); >> /* >> - * Make sure recv thread has finished, so it does not drop the last >> - * config ref and try to destroy the workqueue from inside the work >> - * queue. >> + * In very rare case that the nbd->recv_workq may already have been >> + * released by the recv_work(). >> */ >> - flush_workqueue(nbd->recv_workq); >> + if (likely(!nbd->recv_workq)) > It is actually unlikely, but doesn't make any sense to annotate the check > since it isn't fast path, so please remove it. Sure, will do it. Thanks. > >> + refcount_inc(&nbd->config_refs); >> + else >> + flush = false; >> + mutex_unlock(&nbd->config_lock); >> + >> + if (flush) { >> + /* >> + * Make sure recv thread has finished, so it does not drop >> + * the last config ref and try to destroy the workqueue >> + * from inside the work queue. >> + */ >> + flush_workqueue(nbd->recv_workq); >> + nbd_config_put(nbd); >> + } >> + >> if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, >> &nbd->config->runtime_flags)) >> nbd_config_put(nbd); > The approach is fine, and the reason is that nbd_disconnect_and_put() still > can be called via netlink when the nbd device has been closed: > > Once the above likely() is removed, feel free to add: > > Reviewed-by: Ming Lei <ming.lei@redhat.com> > > > Thanks, > Ming > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] nbd: add comments about double lock for config_lock confusion 2020-11-03 3:07 [PATCH v2 0/2] nbd: fix use-after-freed and double lock bugs xiubli 2020-11-03 3:07 ` [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq xiubli @ 2020-11-03 3:07 ` xiubli 1 sibling, 0 replies; 5+ messages in thread From: xiubli @ 2020-11-03 3:07 UTC (permalink / raw) To: josef, axboe, ming.lei; +Cc: nbd, linux-block, jdillama, mgolub, Xiubo Li From: Xiubo Li <xiubli@redhat.com> When calling the ioctl(), fget() will be called on this fd, and nbd_release() is only called when the fd's refcount drops to zero. With this we can make sure that the nbd_release() won't be called before the ioctl() finished. So there won't have the double lock issue for the "config_lock", which has already been held by nbd_ioctl(). Signed-off-by: Xiubo Li <xiubli@redhat.com> --- drivers/block/nbd.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c index 3bb8281bb753..48f36b003bf5 100644 --- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -1345,6 +1345,17 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd, sock_shutdown(nbd); __invalidate_device(bdev, true); nbd_bdev_reset(bdev); + + /* + * When calling the ioctl(), fget() will be called on this + * fd, and nbd_release() is only called when the fd's refcount + * drops to zero. With this we can make sure that the + * nbd_release() won't be called before the ioctl() finished. + * + * So there won't have the double lock issue if it will + * call the nbd_config_put() here for the "config_lock", which + * has already been held by nbd_ioctl(). + */ if (test_and_clear_bit(NBD_RT_HAS_CONFIG_REF, &nbd->config->runtime_flags)) nbd_config_put(nbd); -- 2.18.4 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-11-03 6:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-11-03 3:07 [PATCH v2 0/2] nbd: fix use-after-freed and double lock bugs xiubli 2020-11-03 3:07 ` [PATCH v2 1/2] nbd: fix use-after-freed crash for nbd->recv_workq xiubli 2020-11-03 4:12 ` Ming Lei 2020-11-03 6:40 ` Xiubo Li 2020-11-03 3:07 ` [PATCH v2 2/2] nbd: add comments about double lock for config_lock confusion xiubli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox