From: Maurizio Lombardi <mlombard@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, hch@lst.de, hare@suse.de,
chaitanya.kulkarni@wdc.com, jmeneghi@redhat.com
Subject: Re: [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
Date: Mon, 15 Nov 2021 08:47:20 +0100 [thread overview]
Message-ID: <20211115074720.GA21836@raketa> (raw)
In-Reply-To: <ee5a0405-a8a5-0d8a-d3be-11fe1011c073@grimberg.me>
Hello Sagi,
On Sun, Nov 14, 2021 at 12:28:45PM +0200, Sagi Grimberg wrote:
>
>
> I think you should shutdown the socket after you restore the socket
> callbacks, and after that cancel_work_sync to deny a self-requeue.
>
> > - flush_work(&queue->io_work);
> > + while (flush_work(&queue->io_work));
> > nvmet_tcp_uninit_data_in_cmds(queue);
> > nvmet_sq_destroy(&queue->nvme_sq);
> >
>
> How about something like this:
> --
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 6eb0b3153477..6425375faf5b 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1436,7 +1436,9 @@ static void nvmet_tcp_release_queue_work(struct
> work_struct *w)
> mutex_unlock(&nvmet_tcp_queue_mutex);
>
> nvmet_tcp_restore_socket_callbacks(queue);
> - flush_work(&queue->io_work);
> + /* stop accepting incoming data */
> + kernel_sock_shutdown(queue->sock, SHUT_RD);
> + cancel_work_sync(&queue->io_work);
>
> nvmet_tcp_uninit_data_in_cmds(queue);
> nvmet_sq_destroy(&queue->nvme_sq);
> --
>
This one doesn't work, remember that nvmet_sq_destroy() will start io_work
again and, as I wrote in my previous email, recvmsg() might still return some
data after the socket has been shut down.
[ 1034.234574] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1034.764498] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1035.291398] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1038.471686] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112.
[ 1039.002853] general protection fault, probably for non-canonical address 0x51ede3866dd88e22: 0000 [#1] SMP PTI
[ 1039.006622] CPU: 1 PID: 4869 Comm: kworker/1:2H Kdump: loaded Tainted: G W --------- --- 5.14.0_nvmet1+ #1
[ 1039.008778] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[ 1039.009840] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp]
[ 1039.011040] RIP: 0010:memcpy_erms+0x6/0x10
[ 1039.011841] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe
[ 1039.015216] RSP: 0018:ffffc02c4059fb30 EFLAGS: 00010206
[ 1039.016176] RAX: 51ede3866dd88e22 RBX: 000000000000248e RCX: 0000000000001000
[ 1039.017472] RDX: 0000000000001000 RSI: ffff99d7c397948e RDI: 51ede3866dd88e22
[ 1039.018767] RBP: ffff99d7cc02b000 R08: 0000000000000001 R09: 000000000000fe88
[ 1039.020074] R10: 000000000000012a R11: 0000000000003ed6 R12: 0000000000000000
[ 1039.021371] R13: 0000000000001000 R14: ffff99d7cc02b010 R15: 0000000000001a48
[ 1039.022668] FS: 0000000000000000(0000) GS:ffff99d8f7d00000(0000) knlGS:0000000000000000
[ 1039.024134] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1039.025181] CR2: 00007ffbf45c1658 CR3: 00000001051e4003 CR4: 00000000003706e0
[ 1039.026497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1039.027836] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1039.029185] Call Trace:
[ 1039.029696] _copy_to_iter+0x44d/0x710
[ 1039.030483] ? __virt_addr_valid+0x45/0x70
[ 1039.031284] ? __check_object_size.part.0+0x45/0x140
[ 1039.032200] __skb_datagram_iter+0x19b/0x2d0
[ 1039.032990] ? zerocopy_sg_from_iter+0x50/0x50
[ 1039.033788] skb_copy_datagram_iter+0x33/0x80
[ 1039.034568] tcp_recvmsg_locked+0x299/0x960
[ 1039.035338] tcp_recvmsg+0x72/0x1d0
[ 1039.036023] ? bpf_lsm_socket_sendmsg+0x10/0x10
[ 1039.036929] inet_recvmsg+0x57/0xf0
[ 1039.037624] nvmet_tcp_try_recv.constprop.0+0xcb/0x310 [nvmet_tcp]
[ 1039.038913] ? pick_next_task_fair+0x39/0x3b0
[ 1039.039771] nvmet_tcp_io_work+0x44/0xabd [nvmet_tcp]
[ 1039.040684] ? dequeue_task_fair+0xb1/0x370
[ 1039.041430] ? finish_task_switch.isra.0+0xb1/0x290
[ 1039.042305] process_one_work+0x1e6/0x380
[ 1039.043061] worker_thread+0x53/0x3d0
[ 1039.043720] ? process_one_work+0x380/0x380
[ 1039.044467] kthread+0x10f/0x130
[ 1039.045059] ? set_kthread_struct+0x40/0x40
[ 1039.045806] ret_from_fork+0x22/0x30
Maurizio
next prev parent reply other threads:[~2021-11-15 7:49 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-21 8:41 [PATCH 0/2] Fix a race condition when performing a controller reset Maurizio Lombardi
2021-10-21 8:41 ` [PATCH 1/2] nvmet: add an helper to free the iovec Maurizio Lombardi
2021-10-21 14:56 ` John Meneghini
2021-10-21 14:58 ` John Meneghini
2021-10-27 0:15 ` Chaitanya Kulkarni
2021-10-21 8:41 ` [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work Maurizio Lombardi
2021-10-21 14:57 ` John Meneghini
2021-10-26 15:42 ` Sagi Grimberg
2021-10-28 7:55 ` Maurizio Lombardi
2021-11-03 9:28 ` Sagi Grimberg
2021-11-03 11:31 ` Maurizio Lombardi
2021-11-04 12:59 ` Sagi Grimberg
2021-11-12 10:54 ` Maurizio Lombardi
2021-11-12 15:54 ` John Meneghini
2021-11-15 7:52 ` Maurizio Lombardi
2021-11-14 10:28 ` Sagi Grimberg
2021-11-15 7:47 ` Maurizio Lombardi [this message]
2021-11-15 9:48 ` Sagi Grimberg
2021-11-15 10:00 ` Maurizio Lombardi
2021-11-15 10:13 ` Sagi Grimberg
2021-11-15 10:57 ` Maurizio Lombardi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211115074720.GA21836@raketa \
--to=mlombard@redhat.com \
--cc=chaitanya.kulkarni@wdc.com \
--cc=hare@suse.de \
--cc=hch@lst.de \
--cc=jmeneghi@redhat.com \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.