From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86919C433F5 for ; Thu, 28 Oct 2021 07:57:54 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 48A71610C7 for ; Thu, 28 Oct 2021 07:57:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 48A71610C7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:In-Reply-To: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=P9xZbrnBB4jaxttLfLGustZoUiHQeg8CcQl1iyvS7b0=; b=hsmSc09bekOgyYqD5oz/iw5y+N 9te3zHDjxa8+kMwxB2TpRIp+5iOgCjXaSZdp86M3xpjry2jZy3hNcFqPhcKfj2Eyl3qthEDGD3RvW Ul5/NPWqQnwefc+L2knsC7c8L//zzz5efcrR1txaVl7KHXCYplt5gduNM/QsGy8kFs3X07C5QntHR 5lnaAIhfSZhkuqTsMVJKiZDaEsFSYg5u/A4RKPkXQgVCs2mIFKI/Sy/o0WbX0L561BC0jc0cz6I6X 9VREdnexvSSaZa2iA7uvfsPOzdSwcw79I2gZTXbY/JG88BReDLhBH9IVDvk1+qykbznemJ4ttE9P2 inIdL4mg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1mg0IB-007Clg-PO; Thu, 28 Oct 2021 07:57:43 +0000 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1mg0GJ-007BfJ-F0 for linux-nvme@lists.infradead.org; Thu, 28 Oct 2021 07:55:49 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1635407744; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=P9xZbrnBB4jaxttLfLGustZoUiHQeg8CcQl1iyvS7b0=; b=SQShD2sovdTji+mWIbPUoVW+qflLnprXLa1X+4j+VhNz85XSXO53Vch83/dwzhykv9uifU rslUmtWrijooUuiqRdPEdcKFl/ewD54sn6ZADBM83SAZqNXMhxeDdsRYp9dafa1Hs+pJaX 2cJI69JbBp+s+jPm2AOiJBfQVBVha5c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-175-iGT7MZxvO7GdLGFD_zrJJQ-1; Thu, 28 Oct 2021 03:55:39 -0400 X-MC-Unique: iGT7MZxvO7GdLGFD_zrJJQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 14935362FA; Thu, 28 Oct 2021 07:55:38 +0000 (UTC) Received: from raketa (unknown [10.40.192.136]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4C0E160D30; Thu, 28 Oct 2021 07:55:33 +0000 (UTC) Date: Thu, 28 Oct 2021 09:55:31 +0200 From: Maurizio Lombardi To: Sagi Grimberg Cc: linux-nvme@lists.infradead.org, hch@lst.de, hare@suse.de, chaitanya.kulkarni@wdc.com, jmeneghi@redhat.com Subject: Re: [PATCH 2/2] nvmet: fix a race condition between release_queue and io_work Message-ID: <20211028075531.GA4904@raketa> References: <20211021084155.16109-1-mlombard@redhat.com> <20211021084155.16109-3-mlombard@redhat.com> <54e0464e-0d05-4611-10d9-7b706900af28@grimberg.me> MIME-Version: 1.0 In-Reply-To: <54e0464e-0d05-4611-10d9-7b706900af28@grimberg.me> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=mlombard@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20211028_005547_722789_BB7244FE X-CRM114-Status: GOOD ( 26.26 ) X-BeenThere: linux-nvme@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "Linux-nvme" Errors-To: linux-nvme-bounces+linux-nvme=archiver.kernel.org@lists.infradead.org Hi Sagi, On Tue, Oct 26, 2021 at 06:42:02PM +0300, Sagi Grimberg wrote: > On 10/21/21 11:41 AM, Maurizio Lombardi wrote: > > If the initiator executes a reset controller operation while > > performing I/O, the target kernel will crash because of a race condition > > between release_queue and io_work; > > Can you add the stack trace? Yes, this is an example: [ 526.368376] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:119cb961-04c6-5845-9ed0-66a692ae0ddd. [ 526.598367] nvmet: creating controller 2 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:119cb961-04c6-5845-9ed0-66a692ae0ddd. [ 526.827356] nvmet: creating controller 1 for subsystem test-nqn for NQN nqn.2014-08.org.nvmexpress:uuid:119cb961-04c6-5845-9ed0-66a692ae0ddd. [ 526.837836] general protection fault, probably for non-canonical address 0x893b50fb71133fa5: 0000 [#1] SMP PTI [ 526.840045] CPU: 1 PID: 225 Comm: kworker/1:1H Kdump: loaded Not tainted 5.14.0_nvme1+ #1 [ 526.841814] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 526.843064] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp] [ 526.844398] RIP: 0010:__kmalloc+0xed/0x3f0 [ 526.845300] Code: 08 65 4c 03 05 94 ad 2e 45 49 83 78 10 00 4d 8b 20 0f 84 95 02 00 00 4d 85 e4 0f 84 8c 02 00 00 8b 45 28 48 8b 7d 00 4c 01 e0 <48> 8b 18 48 89 c1 48 33 9d b8 00 00 00 4c 89 e0 48 0f c9 48 31 cb [ 526.849296] RSP: 0018:ffffaf66c014fc38 EFLAGS: 00010286 [ 526.850427] RAX: 893b50fb71133fa5 RBX: ffff8e1c83f02450 RCX: 0000000000000000 [ 526.851959] RDX: 000000000009de38 RSI: 0000000000000cc0 RDI: 00000000000311c0 [ 526.853496] RBP: ffff8e1d80042e00 R08: ffff8e1eb7d311c0 R09: 0000000000000000 [ 526.855027] R10: 00000000000035ee R11: 0000000000000000 R12: 893b50fb71132fa5 [ 526.856561] R13: 0000000000000cc0 R14: 0000000000000000 R15: ffffffffc072a240 [ 526.858092] FS: 0000000000000000(0000) GS:ffff8e1eb7d00000(0000) knlGS:0000000000000000 [ 526.859804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 526.860913] CR2: 000055aa6316b0e8 CR3: 0000000052010001 CR4: 00000000007706e0 [ 526.862292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 526.863658] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 526.865023] PKRU: 55555554 [ 526.865564] Call Trace: [ 526.866059] nvmet_tcp_map_data+0xc0/0x110 [nvmet_tcp] [ 526.867052] nvmet_tcp_done_recv_pdu+0x13b/0x2b0 [nvmet_tcp] [ 526.868144] nvmet_tcp_try_recv_pdu+0x17d/0x210 [nvmet_tcp] [ 526.869222] nvmet_tcp_try_recv.constprop.0+0x67/0x300 [nvmet_tcp] [ 526.870414] ? inet_sendpage+0x4f/0x80 [ 526.871156] nvmet_tcp_io_work+0x44/0xa69 [nvmet_tcp] [ 526.872134] ? dequeue_task_fair+0xb1/0x370 [ 526.872952] ? finish_task_switch.isra.0+0xb1/0x290 [ 526.873904] process_one_work+0x1e6/0x380 [ 526.874694] worker_thread+0x53/0x3d0 [ 526.875420] ? process_one_work+0x380/0x380 [ 526.876233] kthread+0x10f/0x130 [ 526.876872] ? set_kthread_struct+0x40/0x40 [ 526.877686] ret_from_fork+0x22/0x30 if you want to reproduce it just run something like $watch -n 0.1 "echo 1 > reset_controller" while you do some I/O with dd > > > nvmet_tcp_uninit_data_in_cmds() may be executed while io_work > > is running, calling flush_work(io_work) was not sufficient to > > prevent this because io_work could requeue itself. > > OK, then this should be sufficient to fix it right? with "this" you mean calling cancel_work_sync() before nvmet_tcp_uninit_data_in_cmds() and after nvmet_sq_destroy(), without checking sk_user_data in nvmet_tcp_queue_response() ? If yes, I tried that; it's not sufficient, it still crashes: [ 374.050932] nvmet_tcp: failed cmd 00000000ba978688 id 50 opcode 1, data_len: 1310720 [ 374.052699] nvmet_tcp: failed cmd 000000004ba3f36e id 60 opcode 1, data_len: 1310720 [ 374.054465] nvmet_tcp: failed cmd 0000000071ea477d id 61 opcode 1, data_len: 1310720 [ 374.056253] BUG: kernel NULL pointer dereference, address: 000000000000000c [ 374.057836] #PF: supervisor read access in kernel mode [ 374.059009] #PF: error_code(0x0000) - not-present page [ 374.060178] PGD 0 P4D 0 [ 374.060777] Oops: 0000 [#1] SMP PTI [ 374.061578] CPU: 1 PID: 226 Comm: kworker/1:1H Kdump: loaded Not tainted 5.14.0_nvme1+ #1 [ 374.063427] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 374.064749] Workqueue: nvmet_tcp_wq nvmet_tcp_io_work [nvmet_tcp] [ 374.066137] RIP: 0010:nvmet_tcp_map_pdu_iovec+0x66/0x110 [nvmet_tcp] [ 374.067573] Code: 00 00 48 c1 e8 0c 41 c1 ef 0c 44 89 bf e8 01 00 00 48 63 c8 49 c1 e7 05 4c 03 7f 30 89 87 ec 01 00 00 85 ed 0f 84 9d 00 00 00 <45> 8b 67 0c 49 8b 1f 45 29 ec 41 39 ec 44 0f 47 e5 48 83 e3 fc 49 [ 374.071780] RSP: 0018:ffffa98c8016fc48 EFLAGS: 00010206 [ 374.072965] RAX: 0000000000000140 RBX: ffff91c1da819208 RCX: 0000000000000140 [ 374.074579] RDX: 0000000000000000 RSI: 0000000000000002 RDI: ffff91c1da819208 [ 374.076188] RBP: 0000000000140000 R08: 0000000000010230 R09: 0000000000017e3a [ 374.077788] R10: 0000000000000125 R11: 0000000000000010 R12: 0000000000000000 [ 374.079396] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 374.081004] FS: 0000000000000000(0000) GS:ffff91c2f7d00000(0000) knlGS:0000000000000000 [ 374.082835] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 374.084117] CR2: 000000000000000c CR3: 000000010c79a003 CR4: 00000000007706e0 [ 374.085669] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 374.087147] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 374.088586] PKRU: 55555554 [ 374.089214] Call Trace: [ 374.089807] nvmet_tcp_done_recv_pdu+0x24c/0x2b0 [nvmet_tcp] [ 374.090971] nvmet_tcp_try_recv_pdu+0x17d/0x210 [nvmet_tcp] [ 374.092101] ? update_load_avg+0x7a/0x5a0 [ 374.092969] nvmet_tcp_try_recv.constprop.0+0x67/0x310 [nvmet_tcp] [ 374.094323] ? set_next_entity+0xac/0x1e0 [ 374.095149] ? pick_next_task_fair+0x18e/0x3b0 [ 374.096057] nvmet_tcp_io_work+0x44/0xabd [nvmet_tcp] [ 374.097141] ? dequeue_task_fair+0xb1/0x370 [ 374.098087] ? finish_task_switch.isra.0+0xb1/0x290 [ 374.099088] process_one_work+0x1e6/0x380 [ 374.099922] worker_thread+0x53/0x3d0 [ 374.100670] ? process_one_work+0x380/0x380 [ 374.101631] kthread+0x10f/0x130 [ 374.102336] ? set_kthread_struct+0x40/0x40 [ 374.103194] ret_from_fork+0x22/0x30 [ 374.103974] Modules linked in: nvmet_tcp nvmet t10_pi rfkill intel_rapl_msr intel_rapl_common isst_if_common nfit libnvdimm i2c_piix4 pcspkr joydev virtio_balloon drm fuse xfs libcrc32c ata_generic crct10dif_pclmul crc32_pclmul crc32c_intel virtio_net ghash_clmulni_intel net_failover ata_piix virtio_blk libata serio_raw failover dm_mirror dm_region_hash dm_log dm_mod [ 374.110690] CR2: 000000000000000c > > > * Fix this bug by preventing io_work from being enqueued when > > sk_user_data is NULL (it means that the queue is going to be deleted) > > This is triggered from the completion path, where the commands > are not in a state where they are still fetching data from the > host. How does this prevent the crash? io_work is also triggered every time a nvmet_req_init() fails and when nvmet_sq_destroy() is called, I am not really sure about the state of the commands in those cases. > > > * Ensure that all the memory allocated for the commands' iovec is freed > > How is this needed to prevent a crash? It's not really needed to prevent the crash, but without this you will see massive memory leaks. It happened sometimes that it exhausted the entire system memory before hitting the crash. Maurizio