public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
From: Maurizio Lombardi <mlombard@redhat.com>
To: Sagi Grimberg <sagi@grimberg.me>
Cc: linux-nvme@lists.infradead.org, hch@lst.de, hare@suse.de,
	chaitanya.kulkarni@wdc.com, jmeneghi@redhat.com
Subject: Re: [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work
Date: Mon, 15 Nov 2021 08:47:20 +0100	[thread overview]
Message-ID: <20211115074720.GA21836@raketa> (raw)
In-Reply-To: <ee5a0405-a8a5-0d8a-d3be-11fe1011c073@grimberg.me>

Hello Sagi,

On Sun, Nov 14, 2021 at 12:28:45PM +0200, Sagi Grimberg wrote:
> 
> 
> I think you should shutdown the socket after you restore the socket
> callbacks, and after that cancel_work_sync to deny a self-requeue.
> 
> > -	flush_work(&queue->io_work);
> > +	while (flush_work(&queue->io_work));
> >   	nvmet_tcp_uninit_data_in_cmds(queue);
> >   	nvmet_sq_destroy(&queue->nvme_sq);
> > 
> 
> How about something like this:
> --
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index 6eb0b3153477..6425375faf5b 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -1436,7 +1436,9 @@ static void nvmet_tcp_release_queue_work(struct
> work_struct *w)
>         mutex_unlock(&nvmet_tcp_queue_mutex);
> 
>         nvmet_tcp_restore_socket_callbacks(queue);
> -       flush_work(&queue->io_work);
> +       /* stop accepting incoming data */
> +       kernel_sock_shutdown(queue->sock, SHUT_RD);
> +       cancel_work_sync(&queue->io_work);
> 
>         nvmet_tcp_uninit_data_in_cmds(queue);
>         nvmet_sq_destroy(&queue->nvme_sq);
> --
> 

This one doesn't work, remember that nvmet_sq_destroy() will start io_work
again and, as I wrote in my previous email, recvmsg() might still return some
data after the socket has been shut down.

[ 1034.234574] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112. 
[ 1034.764498] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112. 
[ 1035.291398] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112. 
[ 1038.471686] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:a5f7dc87-f3f0-ea44-bfd1-0019ebbb4112. 
[ 1039.002853] general protection fault, probably for non-canonical address 0x51ede3866dd88e22: 0000 [#1] SMP PTI 
[ 1039.006622] CPU: 1 PID: 4869 Comm: kworker/1:2H Kdump: loaded Tainted: G        W        --------- ---  5.14.0_nvmet1+ #1 
[ 1039.008778] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 
[ 1039.009840] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp] 
[ 1039.011040] RIP: 0010:memcpy_erms+0x6/0x10 
[ 1039.011841] Code: cc cc cc cc eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 
[ 1039.015216] RSP: 0018:ffffc02c4059fb30 EFLAGS: 00010206 
[ 1039.016176] RAX: 51ede3866dd88e22 RBX: 000000000000248e RCX: 0000000000001000 
[ 1039.017472] RDX: 0000000000001000 RSI: ffff99d7c397948e RDI: 51ede3866dd88e22 
[ 1039.018767] RBP: ffff99d7cc02b000 R08: 0000000000000001 R09: 000000000000fe88 
[ 1039.020074] R10: 000000000000012a R11: 0000000000003ed6 R12: 0000000000000000 
[ 1039.021371] R13: 0000000000001000 R14: ffff99d7cc02b010 R15: 0000000000001a48 
[ 1039.022668] FS:  0000000000000000(0000) GS:ffff99d8f7d00000(0000) knlGS:0000000000000000 
[ 1039.024134] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[ 1039.025181] CR2: 00007ffbf45c1658 CR3: 00000001051e4003 CR4: 00000000003706e0 
[ 1039.026497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 
[ 1039.027836] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 
[ 1039.029185] Call Trace: 
[ 1039.029696]  _copy_to_iter+0x44d/0x710 
[ 1039.030483]  ? __virt_addr_valid+0x45/0x70 
[ 1039.031284]  ? __check_object_size.part.0+0x45/0x140 
[ 1039.032200]  __skb_datagram_iter+0x19b/0x2d0 
[ 1039.032990]  ? zerocopy_sg_from_iter+0x50/0x50 
[ 1039.033788]  skb_copy_datagram_iter+0x33/0x80 
[ 1039.034568]  tcp_recvmsg_locked+0x299/0x960 
[ 1039.035338]  tcp_recvmsg+0x72/0x1d0 
[ 1039.036023]  ? bpf_lsm_socket_sendmsg+0x10/0x10 
[ 1039.036929]  inet_recvmsg+0x57/0xf0 
[ 1039.037624]  nvmet_tcp_try_recv.constprop.0+0xcb/0x310 [nvmet_tcp] 
[ 1039.038913]  ? pick_next_task_fair+0x39/0x3b0 
[ 1039.039771]  nvmet_tcp_io_work+0x44/0xabd [nvmet_tcp] 
[ 1039.040684]  ? dequeue_task_fair+0xb1/0x370 
[ 1039.041430]  ? finish_task_switch.isra.0+0xb1/0x290 
[ 1039.042305]  process_one_work+0x1e6/0x380 
[ 1039.043061]  worker_thread+0x53/0x3d0 
[ 1039.043720]  ? process_one_work+0x380/0x380 
[ 1039.044467]  kthread+0x10f/0x130 
[ 1039.045059]  ? set_kthread_struct+0x40/0x40 
[ 1039.045806]  ret_from_fork+0x22/0x30 


Maurizio



  reply	other threads:[~2021-11-15  7:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21  8:41 [PATCH 0/2] Fix a race condition when performing a controller reset Maurizio Lombardi
2021-10-21  8:41 ` [PATCH 1/2] nvmet: add an helper to free the iovec Maurizio Lombardi
2021-10-21 14:56   ` John Meneghini
2021-10-21 14:58     ` John Meneghini
2021-10-27  0:15     ` Chaitanya Kulkarni
2021-10-21  8:41 ` [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work Maurizio Lombardi
2021-10-21 14:57   ` John Meneghini
2021-10-26 15:42   ` Sagi Grimberg
2021-10-28  7:55     ` Maurizio Lombardi
2021-11-03  9:28       ` Sagi Grimberg
2021-11-03 11:31         ` Maurizio Lombardi
2021-11-04 12:59           ` Sagi Grimberg
2021-11-12 10:54             ` Maurizio Lombardi
2021-11-12 15:54               ` John Meneghini
2021-11-15  7:52                 ` Maurizio Lombardi
2021-11-14 10:28               ` Sagi Grimberg
2021-11-15  7:47                 ` Maurizio Lombardi [this message]
2021-11-15  9:48                   ` Sagi Grimberg
2021-11-15 10:00                     ` Maurizio Lombardi
2021-11-15 10:13                       ` Sagi Grimberg
2021-11-15 10:57                       ` Maurizio Lombardi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211115074720.GA21836@raketa \
    --to=mlombard@redhat.com \
    --cc=chaitanya.kulkarni@wdc.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jmeneghi@redhat.com \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox