From: Yi Zhang <yi.zhang@redhat.com>
To: linux-rdma@vger.kernel.org
Cc: skalluru@marvell.com, michal kalderon <michal.kalderon@marvell.com>
Subject: Fwd: [bug report] BUG: KASAN: use-after-free in qed_chain_free+0x6d2/0x7f0 [qed]
Date: Sat, 7 Mar 2020 03:30:46 -0500 (EST) [thread overview]
Message-ID: <1498769700.15716611.1583569845997.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <1955100504.15716600.1583569597057.JavaMail.zimbra@redhat.com>
Hello
On my NVMe RDMA qedr iWARP environment, reset_controller will lead bellow BUG, could anyone help check this issue.
Steps:
nvme connect -t rdma -a 172.31.50.102 -s 4420 -n testnqn
sleep 1
echo 1 >/sys/block/nvme0c0n1/device/reset_controller
HW:
# lspci | grep -i ql
08:00.0 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
08:00.1 Ethernet controller: QLogic Corp. FastLinQ QL45000 Series 25GbE Controller (rev 10)
dmesg:
client:
[ 153.609662] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[ 153.679142] nvme nvme0: creating 12 I/O queues.
[ 154.447885] nvme nvme0: mapped 12/0/0 default/read/poll queues.
[ 154.527204] nvme nvme0: new ctrl: NQN "testnqn", addr 172.31.50.102:4420
[ 154.554852] nvme0n1: detected capacity change from 0 to 268435456000
[ 173.337907] ==================================================================
[ 173.346307] BUG: KASAN: use-after-free in qed_chain_free+0x6d2/0x7f0 [qed]
[ 173.354065] Read of size 8 at addr ffff888242305000 by task kworker/u96:5/574
[ 173.363795] CPU: 3 PID: 574 Comm: kworker/u96:5 Not tainted 5.6.0-rc4 #1
[ 173.371350] Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.7.0 08/19/2019
[ 173.379787] Workqueue: nvme-reset-wq nvme_rdma_reset_ctrl_work [nvme_rdma]
[ 173.387537] Call Trace:
[ 173.390305] dump_stack+0x96/0xe0
[ 173.394062] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 173.399175] print_address_description.constprop.6+0x1b/0x220
[ 173.405667] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 173.410790] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 173.415915] __kasan_report.cold.9+0x37/0x77
[ 173.420737] ? quarantine_put+0x10/0x160
[ 173.425174] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 173.430290] kasan_report+0xe/0x20
[ 173.434141] qed_chain_free+0x6d2/0x7f0 [qed]
[ 173.439067] ? __kasan_slab_free+0x13a/0x170
[ 173.443915] ? qed_hw_remove+0x2b0/0x2b0 [qed]
[ 173.448947] qedr_cleanup_kernel+0xbf/0x420 [qedr]
[ 173.454367] ? qed_rdma_modify_srq+0x3c0/0x3c0 [qed]
[ 173.459991] ? qed_rdma_modify_srq+0x3c0/0x3c0 [qed]
[ 173.465595] qedr_destroy_qp+0x2b0/0x670 [qedr]
[ 173.470703] ? qedr_query_qp+0x1300/0x1300 [qedr]
[ 173.476015] ? _raw_spin_unlock_irq+0x24/0x30
[ 173.480944] ? wait_for_completion+0xc2/0x3d0
[ 173.485861] ? wait_for_completion_interruptible+0x440/0x440
[ 173.492258] ? mark_held_locks+0x78/0x130
[ 173.496784] ? _raw_spin_unlock_irqrestore+0x3e/0x50
[ 173.502385] ? lockdep_hardirqs_on+0x388/0x570
[ 173.507432] ib_destroy_qp_user+0x2ba/0x6c0 [ib_core]
[ 173.513148] nvme_rdma_destroy_queue_ib+0xc8/0x1a0 [nvme_rdma]
[ 173.519732] nvme_rdma_free_queue+0x2a/0x60 [nvme_rdma]
[ 173.525626] nvme_rdma_destroy_io_queues+0xb1/0x170 [nvme_rdma]
[ 173.532321] nvme_rdma_shutdown_ctrl+0x62/0xd0 [nvme_rdma]
[ 173.538510] nvme_rdma_reset_ctrl_work+0x2c/0xa0 [nvme_rdma]
[ 173.544908] process_one_work+0x920/0x1740
[ 173.549546] ? pwq_dec_nr_in_flight+0x2d0/0x2d0
[ 173.556213] worker_thread+0x87/0xb40
[ 173.561852] ? process_one_work+0x1740/0x1740
[ 173.568315] kthread+0x333/0x400
[ 173.573492] ? kthread_create_on_node+0xc0/0xc0
[ 173.580094] ret_from_fork+0x3a/0x50
[ 173.588720] The buggy address belongs to the page:
[ 173.595559] page:ffffea000908c140 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0
[ 173.606353] flags: 0x17ffffc0000000()
[ 173.611933] raw: 0017ffffc0000000 0000000000000000 ffffea000908c108 0000000000000000
[ 173.622111] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[ 173.632309] page dumped because: kasan: bad access detected
[ 173.643129] Memory state around the buggy address:
[ 173.649935] ffff888242304f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 173.659491] ffff888242304f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 173.669005] >ffff888242305000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 173.678511] ^
[ 173.683535] ffff888242305080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 173.693079] ffff888242305100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[ 173.702578] ==================================================================
[ 173.712132] Disabling lock debugging due to kernel taint
[ 174.480153] nvme nvme0: creating 12 I/O queues.
target:
[ 105.823713] null_blk: module loaded
[ 106.188952] nvmet: adding nsid 1 to subsystem testnqn
[ 106.209393] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[ 106.217135] nvmet_rdma: enabling port 2 (172.31.50.102:4420)
[ 148.399402] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:5cf10916-b4ca-45cd-956c-ab680ec3e636.
[ 167.842446]
[ 167.844133] ======================================================
[ 167.851087] WARNING: possible circular locking dependency detected
[ 167.858037] 5.6.0-rc4 #2 Not tainted
[ 167.862070] ------------------------------------------------------
[ 167.869018] kworker/2:1/141 is trying to acquire lock:
[ 167.874794] ffff8882581573e0 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 167.885361]
[ 167.885361] but task is already holding lock:
[ 167.891918] ffffc9000403fe00 ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x82d/0x1740
[ 167.903261]
[ 167.903261] which lock already depends on the new lock.
[ 167.903261]
[ 167.912449]
[ 167.912449] the existing dependency chain (in reverse order) is:
[ 167.920861]
[ 167.920861] -> #3 ((work_completion)(&queue->release_work)){+.+.}:
[ 167.929468] process_one_work+0x87f/0x1740
[ 167.934656] worker_thread+0x87/0xb40
[ 167.939356] kthread+0x333/0x400
[ 167.943569] ret_from_fork+0x3a/0x50
[ 167.948172]
[ 167.948172] -> #2 ((wq_completion)events){+.+.}:
[ 167.955038] flush_workqueue+0xf7/0x13c0
[ 167.960028] nvmet_rdma_queue_connect+0x15e0/0x1d60 [nvmet_rdma]
[ 167.967368] cma_cm_event_handler+0xb7/0x550 [rdma_cm]
[ 167.973743] iw_conn_req_handler+0x93c/0xdb0 [rdma_cm]
[ 167.980104] cm_work_handler+0x12d2/0x1920 [iw_cm]
[ 167.986093] process_one_work+0x920/0x1740
[ 167.991280] worker_thread+0x87/0xb40
[ 167.995984] kthread+0x333/0x400
[ 168.000199] ret_from_fork+0x3a/0x50
[ 168.004804]
[ 168.004804] -> #1 (&id_priv->handler_mutex/1){+.+.}:
[ 168.012051] __mutex_lock+0x13e/0x1420
[ 168.016857] iw_conn_req_handler+0x369/0xdb0 [rdma_cm]
[ 168.023217] cm_work_handler+0x12d2/0x1920 [iw_cm]
[ 168.030777] process_one_work+0x920/0x1740
[ 168.037564] worker_thread+0x87/0xb40
[ 168.043852] kthread+0x333/0x400
[ 168.049681] ret_from_fork+0x3a/0x50
[ 168.055836]
[ 168.055836] -> #0 (&id_priv->handler_mutex){+.+.}:
[ 168.065925] __lock_acquire+0x2313/0x3d90
[ 168.072505] lock_acquire+0x15a/0x3c0
[ 168.078664] __mutex_lock+0x13e/0x1420
[ 168.084907] rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 168.092257] nvmet_rdma_release_queue_work+0xcc/0x390 [nvmet_rdma]
[ 168.101213] process_one_work+0x920/0x1740
[ 168.107798] worker_thread+0x87/0xb40
[ 168.113875] kthread+0x333/0x400
[ 168.119461] ret_from_fork+0x3a/0x50
[ 168.125397]
[ 168.125397] other info that might help us debug this:
[ 168.125397]
[ 168.138272] Chain exists of:
[ 168.138272] &id_priv->handler_mutex --> (wq_completion)events --> (work_completion)(&queue->release_work)
[ 168.138272]
[ 168.157850] Possible unsafe locking scenario:
[ 168.157850]
[ 168.166962] CPU0 CPU1
[ 168.173311] ---- ----
[ 168.179615] lock((work_completion)(&queue->release_work));
[ 168.187210] lock((wq_completion)events);
[ 168.195872] lock((work_completion)(&queue->release_work));
[ 168.206278] lock(&id_priv->handler_mutex);
[ 168.212326]
[ 168.212326] *** DEADLOCK ***
[ 168.212326]
[ 168.222662] 2 locks held by kworker/2:1/141:
[ 168.228688] #0: ffff888107c23148 ((wq_completion)events){+.+.}, at: process_one_work+0x7f9/0x1740
[ 168.240058] #1: ffffc9000403fe00 ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x82d/0x1740
[ 168.253205]
[ 168.253205] stack backtrace:
[ 168.260685] CPU: 2 PID: 141 Comm: kworker/2:1 Not tainted 5.6.0-rc4 #2
[ 168.269346] Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.7.0 08/19/2019
[ 168.279143] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[ 168.288059] Call Trace:
[ 168.292229] dump_stack+0x96/0xe0
[ 168.297373] check_noncircular+0x356/0x410
[ 168.303372] ? ftrace_caller_op_ptr+0xe/0xe
[ 168.309444] ? print_circular_bug.isra.40+0x1e0/0x1e0
[ 168.316482] ? unwind_next_frame+0x1c5/0x1bb0
[ 168.322765] ? ret_from_fork+0x3a/0x50
[ 168.328372] ? perf_trace_lock_acquire+0x630/0x630
[ 168.335165] ? mark_lock+0xbe/0x1130
[ 168.340545] ? nvmet_rdma_release_queue_work+0xcc/0x390 [nvmet_rdma]
[ 168.349120] __lock_acquire+0x2313/0x3d90
[ 168.355086] ? ret_from_fork+0x3a/0x50
[ 168.360701] ? lockdep_hardirqs_on+0x570/0x570
[ 168.367151] ? stack_trace_save+0x8a/0xb0
[ 168.373093] ? stack_trace_consume_entry+0x160/0x160
[ 168.380122] lock_acquire+0x15a/0x3c0
[ 168.385648] ? rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 168.392633] __mutex_lock+0x13e/0x1420
[ 168.398287] ? rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 168.405298] ? sched_clock+0x5/0x10
[ 168.410614] ? rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 168.417582] ? find_held_lock+0x3a/0x1c0
[ 168.423452] ? mutex_lock_io_nested+0x12a0/0x12a0
[ 168.430200] ? mark_lock+0xbe/0x1130
[ 168.435644] ? mark_held_locks+0x78/0x130
[ 168.441569] ? _raw_spin_unlock_irqrestore+0x3e/0x50
[ 168.448574] ? rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 168.455564] rdma_destroy_id+0x108/0xc70 [rdma_cm]
[ 168.462344] nvmet_rdma_release_queue_work+0xcc/0x390 [nvmet_rdma]
[ 168.470665] process_one_work+0x920/0x1740
[ 168.476651] ? pwq_dec_nr_in_flight+0x2d0/0x2d0
[ 168.483200] worker_thread+0x87/0xb40
[ 168.488751] ? __kthread_parkme+0xc3/0x190
[ 168.494769] ? process_one_work+0x1740/0x1740
[ 168.501104] kthread+0x333/0x400
[ 168.506157] ? kthread_create_on_node+0xc0/0xc0
[ 168.512619] ret_from_fork+0x3a/0x50
[ 168.523833] ==================================================================
[ 168.533446] BUG: KASAN: use-after-free in qed_chain_free+0x6d2/0x7f0 [qed]
[ 168.542658] Read of size 8 at addr ffff88823b8dd000 by task kworker/2:1/141
[ 168.551970]
[ 168.555135] CPU: 2 PID: 141 Comm: kworker/2:1 Not tainted 5.6.0-rc4 #2
[ 168.563922] Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.7.0 08/19/2019
[ 168.573747] Workqueue: events nvmet_rdma_release_queue_work [nvmet_rdma]
[ 168.582718] Call Trace:
[ 168.586936] dump_stack+0x96/0xe0
[ 168.592134] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 168.598635] print_address_description.constprop.6+0x1b/0x220
[ 168.606500] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 168.613049] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 168.619527] __kasan_report.cold.9+0x37/0x77
[ 168.625793] ? qed_chain_free+0x6d2/0x7f0 [qed]
[ 168.632309] kasan_report+0xe/0x20
[ 168.637576] qed_chain_free+0x6d2/0x7f0 [qed]
[ 168.643913] ? __kasan_slab_free+0x13a/0x170
[ 168.650140] ? qed_hw_remove+0x2b0/0x2b0 [qed]
[ 168.656527] qedr_cleanup_kernel+0xbf/0x420 [qedr]
[ 168.663339] ? qed_rdma_modify_srq+0x3c0/0x3c0 [qed]
[ 168.670321] ? qed_rdma_modify_srq+0x3c0/0x3c0 [qed]
[ 168.677273] qedr_destroy_qp+0x2b0/0x670 [qedr]
[ 168.683714] ? qedr_query_qp+0x1300/0x1300 [qedr]
[ 168.690368] ? _raw_spin_unlock_irq+0x24/0x30
[ 168.696614] ? wait_for_completion+0xc2/0x3d0
[ 168.702877] ? wait_for_completion_interruptible+0x440/0x440
[ 168.710616] ib_destroy_qp_user+0x2ba/0x6c0 [ib_core]
[ 168.717630] ? nvmet_rdma_release_queue_work+0xcc/0x390 [nvmet_rdma]
[ 168.726186] nvmet_rdma_release_queue_work+0xd6/0x390 [nvmet_rdma]
[ 168.734511] process_one_work+0x920/0x1740
[ 168.740492] ? pwq_dec_nr_in_flight+0x2d0/0x2d0
[ 168.746992] worker_thread+0x87/0xb40
[ 168.752524] ? __kthread_parkme+0xc3/0x190
[ 168.758547] ? process_one_work+0x1740/0x1740
[ 168.764890] kthread+0x333/0x400
[ 168.769952] ? kthread_create_on_node+0xc0/0xc0
[ 168.776445] ret_from_fork+0x3a/0x50
[ 168.781870]
[ 168.784929] Allocated by task 802:
[ 168.790166] save_stack+0x19/0x80
[ 168.795248] __kasan_kmalloc.constprop.11+0xc1/0xd0
[ 168.802127] kmem_cache_alloc+0xf9/0x370
[ 168.807931] getname_flags+0xba/0x510
[ 168.813388] user_path_at_empty+0x1d/0x40
[ 168.819250] do_faccessat+0x21f/0x610
[ 168.824700] do_syscall_64+0x9f/0x4f0
[ 168.830184] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 168.837191]
[ 168.840197] Freed by task 802:
[ 168.844919] save_stack+0x19/0x80
[ 168.849914] __kasan_slab_free+0x125/0x170
[ 168.855758] kmem_cache_free+0xcd/0x3c0
[ 168.861291] filename_lookup.part.63+0x1e7/0x330
[ 168.867662] do_faccessat+0x21f/0x610
[ 168.872961] do_syscall_64+0x9f/0x4f0
[ 168.878246] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 168.885124]
[ 168.887977] The buggy address belongs to the object at ffff88823b8dc400
[ 168.887977] which belongs to the cache names_cache of size 4096
[ 168.904533] The buggy address is located 3072 bytes inside of
[ 168.904533] 4096-byte region [ffff88823b8dc400, ffff88823b8dd400)
[ 168.920388] The buggy address belongs to the page:
[ 168.927046] page:ffffea0008ee3600 refcount:1 mapcount:0 mapping:ffff888107fe5280 index:0x0 compound_mapcount: 0
[ 168.939650] flags: 0x17ffffc0010200(slab|head)
[ 168.945986] raw: 0017ffffc0010200 dead000000000100 dead000000000122 ffff888107fe5280
[ 168.956074] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[ 168.966185] page dumped because: kasan: bad access detected
[ 168.973835]
[ 168.976923] Memory state around the buggy address:
[ 168.983705] ffff88823b8dcf00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 168.993270] ffff88823b8dcf80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 169.002799] >ffff88823b8dd000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 169.012337] ^
[ 169.017384] ffff88823b8dd080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 169.026938] ffff88823b8dd100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 169.036457] ==================================================================
[ 169.183961] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:5cf10916-b4ca-45cd-956c-ab680ec3e636.
Best Regards,
Yi Zhang
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
prev parent reply other threads:[~2020-03-07 8:30 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <2005122746.15716557.1583568954926.JavaMail.zimbra@redhat.com>
2020-03-07 8:26 ` [bug report] BUG: KASAN: use-after-free in qed_chain_free+0x6d2/0x7f0 [qed] Yi Zhang
2020-03-07 8:30 ` Yi Zhang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1498769700.15716611.1583569845997.JavaMail.zimbra@redhat.com \
--to=yi.zhang@redhat.com \
--cc=linux-rdma@vger.kernel.org \
--cc=michal.kalderon@marvell.com \
--cc=skalluru@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.