Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Keith Busch <kbusch@kernel.org>
Cc: sagi@grimberg.me, Robin Murphy <robin.murphy@arm.com>,
	linux-nvme@lists.infradead.org, Christoph Hellwig <hch@lst.de>,
	axboe@fb.com, Will Deacon <will@kernel.org>,
	Alexey Dobriyan <adobriyan@gmail.com>
Subject: Re: [PATCH] nvme-pci: slimmer CQ head update
Date: Thu, 7 May 2020 16:11:23 +0100	[thread overview]
Message-ID: <8b297620-c72b-2184-36cb-032f5cfda05c@huawei.com> (raw)
In-Reply-To: <20200507142352.GA2621422@dhcp-10-100-145-180.wdl.wdc.com>

On 07/05/2020 15:23, Keith Busch wrote:
> On Thu, May 07, 2020 at 02:55:37PM +0100, John Garry wrote:
>> On 07/05/2020 12:04, Robin Murphy wrote:
>>>> [  177.132810] DMA-API: nvme 0000:85:00.0: device driver tries to
>>>> free DMA memor
>>>> y it has not allocated [device address=0x00000000ef371000]
>>>> [size=4096 bytes]
>>> [...]
>>>> [  177.276322]  debug_dma_unmap_page+0x6c/0x78
>>>> [  177.280487]  nvme_unmap_data+0x7c/0x23c
>>>> [  177.284305]  nvme_pci_complete_rq+0x28/0x58
>>>
>>> OK, so there's clearly something amiss there. I would have suggested
>>> next sticking the SMMU in passthrough to help focus on the DMA API
>>> debugging, but since that "DMA address" looks suspiciously like a
>>> physical address rather than an IOVA, I suspect that things might
>>> suddenly appear to be working fine if you do...
>>
>> OK, seems sensible. However it looks like this guy triggers the issue:
>>
>> 324b494c2862 nvme-pci: Remove two-pass completions
>>
>> With carrying the revert of $subject, it's a quick bisect to that patch.
> 
> That's weird.

Or maybe exacerbating some other fault?

  Do you see this with different nvme controllers?

I only have 3x, and they are all ES3000 V3 NVMe PCIe SSD

> Does your
> controller write the phase bit before writing the command id in the cqe?

I don't know. Is that sort of info available from nvme-cli?

> Asking because this looks like we're seeing an older command id in the
> cqe, and the only thing that patch you've bisected should do is remove a
> delay between observing the new phase and reading the command id.
> .

Another log, below, with SMMU off.

John


fio-2.1.10
Starting 60 processes
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
Jobs: 60 (f=60): 
[RRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRR]
[  885.335343] ------------[ cut here ]------------ta 00m:05s]
[  885.335999] ------------[ cut here ]------------
[  885.339967] DMA-API: nvme 0000:82:00.0: device driver tries to free 
DMA memor
y it has not allocated [device address=0x0000002fd5870000] [size=4096 bytes]
[  885.344575] WARNING: CPU: 41 PID: 4565 at block/blk-mq.c:665 
blk_mq_start_req
uest+0xc4/0xcc
[  885.344577] Modules linked in:
[  885.358287] WARNING: CPU: 39 PID: 1074 at kernel/dma/debug.c:1014 
check_unmap
+0x698/0x86c
[  885.366601] CPU: 41 PID: 4565 Comm: fio Not tainted 
5.6.0-rc4-gd64d242-dirty
#155
[  885.369645] Modules linked in:
[  885.377799] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.385262] CPU: 39 PID: 1074 Comm: irq/230-nvme1q2 Not tainted 
5.6.0-rc4-gd6
4d242-dirty #155
[  885.388308] pstate: 60400009 (nZCv daif +PAN -UAO)
[  885.397155] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.397157] pstate: 60c00009 (nZCv daif +PAN +UAO)
[  885.405656] pc : blk_mq_start_request+0xc4/0xcc
[  885.405662] lr : nvme_queue_rq+0x134/0x7cc
[  885.410437] pc : check_unmap+0x698/0x86c
[  885.419281] sp : ffff800025ccb770
[  885.419283] x29: ffff800025ccb770 x28: ffff002fdc16d200
[  885.424061] lr : check_unmap+0x698/0x86c
[  885.428577] x27: ffff002fdc16d318 x26: fffffe00bf3621c0
[  885.432663] sp : ffff8000217dbb40
[  885.436574] x25: 0000000000001000 x24: 0000000000000000
[  885.439881] x29: ffff8000217dbb40 x28: ffff002fdc6c6cd4
[  885.445177] x23: 0000000000001000 x22: ffff2027c7540000
[  885.449088] x27: ffffa99cc3c7f000 x26: 0000000000001000
[  885.454387] x21: ffff800025ccb8b0 x20: ffff2027c6ae0000
[  885.457694] x25: 0000000000000000 x24: ffff2027c7540000
[  885.462990] x19: ffff002fdc16d200 x18: 0000000000000000
[  885.468288] x23: ffffa99cc55630d0 x22: ffffa99cc530a000
[  885.473584] x17: 0000000000000000 x16: 0000000000000000
[  885.478882] x21: 0000002fd5870000 x20: ffff8000217dbbc0
[  885.484178] x15: 0000000000000000 x14: 0000000000000000
[  885.489477] x19: 0000002fd5870000 x18: 0000000000000000
[  885.494773] x13: 0000000066641000 x12: ffff2027a15cf9a0
[  885.500071] x17: 0000000000000000 x16: 0000000000000000
[  885.505367] x11: 0000000026641fff x10: 0000000000000002
[  885.510665] x15: 0000000000000000 x14: 7a69735b205d3030
[  885.515964] x9 : 0000000000a80000 x8 : ffff2027d3cb9ac8
[  885.521264] x13: 3030373835646632 x12: 3030303030307830
[  885.526559] x7 : ffffa99cc4f34000 x6 : 0000002fd5887000
[  885.531856] x11: 3d73736572646461 x10: 206563697665645b
[  885.537152] x5 : 0000000000000000 x4 : 0000000000000000
[  885.542449] x9 : ffffa99cc5321bc8 x8 : 6c6120746f6e2073
[  885.547745] x3 : ffff2027d039c0b0 x2 : 0000000000000000
[  885.553042] x7 : 6168207469207972 x6 : ffff002fffdbe1b8
[  885.558338] x1 : 0000000100000000 x0 : 0000000000000002
[  885.563634] x5 : 0000000000000000 x4 : 0000000000000000
[  885.568932] Call trace:
[  885.574228] x3 : 0000000000000000 x2 : ffff002fffdc5088
[  885.579531]  blk_mq_start_request+0xc4/0xcc
[  885.584821] x1 : 0000000100000001 x0 : 000000000000008d
[  885.590120]  nvme_queue_rq+0x134/0x7cc
[  885.595414] Call trace:
[  885.597858]  __blk_mq_try_issue_directly+0x108/0x1bc
[  885.603158]  check_unmap+0x698/0x86c
[  885.607324]  blk_mq_request_issue_directly+0x40/0x64
[  885.612620]  debug_dma_unmap_page+0x6c/0x78
[  885.616359]  blk_mq_try_issue_list_directly+0x50/0xc8
[  885.618800]  nvme_unmap_data+0x7c/0x23c
[  885.623752]  blk_mq_sched_insert_requests+0x170/0x1d0
[  885.623753]  blk_mq_flush_plug_list+0x10c/0x158
[  885.627318]  nvme_pci_complete_rq+0x3c/0x10c
[  885.632271]  blk_flush_plug_list+0xc4/0xd4
[  885.632273]  blk_finish_plug+0x30/0x40
[  885.636444]  blk_mq_complete_request+0x114/0x150
[  885.641484]  blkdev_direct_IO+0x3d4/0x444
[  885.645306]  nvme_irq+0xbc/0x204
[  885.650346]  generic_file_read_iter+0x90/0xaec
[  885.654863]  irq_thread_fn+0x28/0x6c
[  885.659118]  blkdev_read_iter+0x3c/0x54
[  885.663203]  irq_thread+0x158/0x1e8
[  885.666943]  aio_read+0xdc/0x138
[  885.671548]  kthread+0xf4/0x120
[  885.675544]  io_submit_one+0x4ac/0xbf0
[  885.675546]  __arm64_sys_io_submit+0x16c/0x1f8
[  885.678766]  ret_from_fork+0x10/0x18
[  885.683199]  el0_svc_common.constprop.3+0xb8/0x170
[  885.686765] ---[ end trace fc66a57b25e362aa ]---
[  885.690593]  do_el0_svc+0x70/0x88
[  885.724844]  el0_sync_handler+0xf4/0x130
[  885.728758]  el0_sync+0x140/0x180
[  885.732065] ---[ end trace fc66a57b25e362ab ]---
[  885.736768] ------------[ cut here ]------------
[  885.741379] refcount_t: underflow; use-after-free.
[  885.746184] WARNING: CPU: 39 PID: 1074 at lib/refcount.c:28 
refcount_warn_sat
urate+0x6c/0x13c
[  885.754687] Modules linked in:
[  885.757736] CPU: 39 PID: 1074 Comm: irq/230-nvme1q2 Tainted: G        W
    5.6.0-rc4-gd64d242-dirty #155
[  885.767623] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[  885.776471] pstate: 60c00009 (nZCv daif +PAN +UAO)
[  885.781250] pc : refcount_warn_saturate+0x6c/0x13c
[  885.786028] lr : refcount_warn_saturate+0x6c/0x13c
[  885.790805] sp : ffff8000217dbc40
[  885.794112] x29: ffff8000217dbc40 x28: ffff002fdc6c6cd4
[  885.799411] x27: ffffa99cc3c7f000 x26: ffffa99cc3c7f948
[  885.804710] x25: 0000000000000001 x24: ffffa99cc4cd3710
[  885.810007] x23: fffffffffffffff8 x22: ffffde07ebde5680
[  885.815305] x21: 0000000000000000 x20: ffff2027c6ae0000
[  885.820603] x19: ffff002fdc16d200 x18: 0000000000000000
[  885.825901] x17: 0000000000000000 x16: 0000000000000000
[  885.831199] x15: 0000000000000000 x14: ffff002fdd922948
[  885.836498] x13: ffff002fdd922150 x12: 0000000000000000
[  885.841796] x11: 00000000000008a4 x10: 000000000000000f
[  885.847094] x9 : ffffa99cc5321bc8 x8 : 72657466612d6573
[  885.852391] x7 : 75203b776f6c6672 x6 : ffff002fffdbe1b8
[  885.857689] x5 : 0000000000000000 x4 : 0000000000000000
[  885.862986] x3 : 0000000000000000 x2 : ffff002fffdc5088
[  885.868284] x1 : 0000000100000001 x0 : 0000000000000026
[  885.873582] Call trace:
[  885.876028]  refcount_warn_saturate+0x6c/0x13c
[  885.880462]  blk_mq_free_request+0x12c/0x14c
[  885.884723]  blk_mq_end_request+0x114/0x134
[  885.888898]  nvme_complete_rq+0x50/0x128
[  885.892811]  nvme_pci_complete_rq+0x44/0x10c
[  885.897070]  blk_mq_complete_request+0x114/0x150
[  885.901674]  nvme_irq+0xbc/0x204
[  885.904898]  irq_thread_fn+0x28/0x6c
[  885.908464]  irq_thread+0x158/0x1e8
[  885.911945]  kthread+0xf4/0x120
[  885.915080]  ret_from_fork+0x10/0x18
[  885.918646] ---[ end trace fc66a57b25e362ac ]---


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-07 15:12 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 18:45 [PATCH] nvme-pci: slimmer CQ head update Alexey Dobriyan
2020-02-29  5:53 ` Keith Busch
2020-05-06 11:03   ` John Garry
2020-05-06 12:47     ` Keith Busch
2020-05-06 13:24       ` Alexey Dobriyan
2020-05-06 13:44         ` John Garry
2020-05-06 14:01           ` Alexey Dobriyan
2020-05-06 14:35           ` Christoph Hellwig
2020-05-06 16:26             ` John Garry
2020-05-06 16:31               ` Will Deacon
2020-05-06 16:52                 ` Robin Murphy
2020-05-06 17:02                   ` John Garry
2020-05-07  8:18                     ` John Garry
2020-05-07 11:04                       ` Robin Murphy
2020-05-07 13:55                         ` John Garry
2020-05-07 14:23                           ` Keith Busch
2020-05-07 15:11                             ` John Garry [this message]
2020-05-07 15:35                               ` Keith Busch
2020-05-07 15:41                                 ` John Garry
2020-05-08 16:16                                   ` Keith Busch
2020-05-08 17:04                                     ` John Garry
2020-05-07 16:26                                 ` Robin Murphy
2020-05-07 17:35                                   ` Keith Busch
2020-05-07 17:44                                     ` Will Deacon
2020-05-07 18:06                                       ` Keith Busch
2020-05-08 11:40                                         ` Will Deacon
2020-05-08 14:07                                           ` Keith Busch
2020-05-08 15:34                                             ` Keith Busch
2020-05-06 14:44         ` Keith Busch
2020-05-07 15:58           ` Keith Busch
2020-05-07 20:07             ` [PATCH] nvme-pci: fix "slimmer CQ head update" Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8b297620-c72b-2184-36cb-032f5cfda05c@huawei.com \
    --to=john.garry@huawei.com \
    --cc=adobriyan@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=sagi@grimberg.me \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox