Linux-NVME Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: John Garry <john.garry@huawei.com>
To: Keith Busch <kbusch@kernel.org>, Alexey Dobriyan <adobriyan@gmail.com>
Cc: axboe@fb.com, hch@lst.de, linux-nvme@lists.infradead.org,
	sagi@grimberg.me
Subject: Re: [PATCH] nvme-pci: slimmer CQ head update
Date: Wed, 6 May 2020 12:03:35 +0100	[thread overview]
Message-ID: <defb25c5-5ae5-5ff9-66db-efb129bd7743@huawei.com> (raw)
In-Reply-To: <20200229055351.GA542920@dhcp-10-100-145-180.wdl.wdc.com>

On 29/02/2020 05:53, Keith Busch wrote:
> On Fri, Feb 28, 2020 at 09:45:19PM +0300, Alexey Dobriyan wrote:
>> @@ -982,11 +982,9 @@ static void nvme_complete_cqes(struct nvme_queue *nvmeq, u16 start, u16 end)
>>   

I think that this patch may be broken when use_threaded_handlers=1, as I 
see this crash on my arm64 system:

[   29.366700] Unable to handle kernel paging request at virtual address 
ffff800
01178500e
[   29.375316] Mem abort info:
[   29.378096]   ESR = 0x96000007
[   29.381135]   EC = 0x25: DABT (current EL), IL = 32 bits
[   29.386422]   SET = 0, FnV = 0
[   29.389461]   EA = 0, S1PTW = 0
[   29.392587] Data abort info:
[   29.395456]   ISV = 0, ISS = 0x00000007
[   29.399272]   CM = 0, WnR = 0
[   29.402226] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000020252be08000
[   29.408896] [ffff80001178500e] pgd=00000027ff835003, 
pud=00000027ff836003, pm
d=00000027ffa53003, pte=0000000000000000
[   29.419457] Internal error: Oops: 96000007 [#1] PREEMPT SMP
[   29.425003] Modules linked in:
[   29.428046] CPU: 17 PID: 1023 Comm: irq/137-nvme2q2 Not tainted 
5.7.0-rc4-ga3
c4a5a-dirty #117
[   29.436528] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 
2280-V2 CS V
3.B220.02 03/27/2020
[   29.445356] pstate: 40400089 (nZcv daIf +PAN -UAO)
[   29.450133] pc : nvme_irq_check+0x10/0x28
[   29.454130] lr : __handle_irq_event_percpu+0x5c/0x144
[   29.459158] sp : ffff80001008be90
[   29.462458] x29: ffff80001008be90 x28: ffff002fd7b144c0
[   29.467746] x27: ffffb111ce83e440 x26: ffffb111ce83e468
[   29.473032] x25: ffffb111cf14426f x24: ffff0027d6b67800
[   29.478319] x23: 0000000000000089 x22: ffff80001008bf1c
[   29.483606] x21: 0000000000000000 x20: ffff0027d6b67800
[   29.488893] x19: ffff2027ca8e6880 x18: 0000000000000000
[   29.494180] x17: 000007ffffffffff x16: 00000000000001de
[   29.499466] x15: 000000000001ffff x14: 0000000000000001
[   29.504754] x13: 000000000000002b x12: 000000000000002b
[   29.510041] x11: 000000000000ffff x10: ffffb111cef71cb8
[   29.515328] x9 : 0000000000000040 x8 : ffff2027d10fbe08
[   29.520616] x7 : 0000000000000000 x6 : 0000000000000000
[   29.525903] x5 : 0000000000000000 x4 : 0000000000000001
[   29.531190] x3 : 0000000000000400 x2 : ffff800011781000
[   29.536477] x1 : ffff800011785000 x0 : 0000000000000001
[   29.541764] Call trace:
[   29.544201]  nvme_irq_check+0x10/0x28
[   29.547847]  handle_irq_event_percpu+0x1c/0x54
[   29.552269]  handle_irq_event+0x44/0x74
[   29.556088]  handle_fasteoi_irq+0xa8/0x160
[   29.560167]  generic_handle_irq+0x2c/0x40
[   29.564158]  __handle_domain_irq+0x64/0xb0
[   29.568238]  gic_handle_irq+0x94/0x194
[   29.571970]  el1_irq+0xb8/0x180
[   29.575097]  nvme_irq+0xe4/0x1f4
[   29.578310]  irq_thread_fn+0x28/0x6c
[   29.581868]  irq_thread+0x158/0x1e8
[   29.585343]  kthread+0x124/0x150
[   29.588556]  ret_from_fork+0x10/0x18
[   29.592116] Code: 7940e023 f9402422 3941d020 8b031041 (79401c21)
[   29.598249] ---[ end trace 33259050d5109e6f ]---
[   29.607114] Kernel panic - not syncing: Fatal exception in interrupt
[   29.613459] SMP: stopping secondary CPUs
[   29.617404] Kernel Offset: 0x3111bd6d0000 from 0xffff800010000000
[   29.623468] PHYS_OFFSET: 0xffffe13f80000000
[   29.627633] CPU features: 0x080002,22808a18
[   29.631797] Memory Limit: none
[   29.639056] ---[ end Kernel panic - not syncing: Fatal exception in 
interrupt
]---

Reverting this patch makes the crash go away.

>>   static inline void nvme_update_cq_head(struct nvme_queue *nvmeq)
>>   {
>> -	if (nvmeq->cq_head == nvmeq->q_depth - 1) {
>> +	if (++nvmeq->cq_head == nvmeq->q_depth) {

I figure momentarily nvmeq->cq_head may transition to equal 
nvmeq->q_depth and then to 0, which causes an out-of-bounds access here:

static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq)
{
	return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) ==
			nvmeq->cq_phase;
}

>>   		nvmeq->cq_head = 0;
>> -		nvmeq->cq_phase = !nvmeq->cq_phase;
>> -	} else {
>> -		nvmeq->cq_head++;
>> +		nvmeq->cq_phase ^= 1;
>>   	}
>>   }
>>   

Any disagreement to sending a revert patch?

Thanks,
John

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

  reply	other threads:[~2020-05-06 11:04 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-28 18:45 [PATCH] nvme-pci: slimmer CQ head update Alexey Dobriyan
2020-02-29  5:53 ` Keith Busch
2020-05-06 11:03   ` John Garry [this message]
2020-05-06 12:47     ` Keith Busch
2020-05-06 13:24       ` Alexey Dobriyan
2020-05-06 13:44         ` John Garry
2020-05-06 14:01           ` Alexey Dobriyan
2020-05-06 14:35           ` Christoph Hellwig
2020-05-06 16:26             ` John Garry
2020-05-06 16:31               ` Will Deacon
2020-05-06 16:52                 ` Robin Murphy
2020-05-06 17:02                   ` John Garry
2020-05-07  8:18                     ` John Garry
2020-05-07 11:04                       ` Robin Murphy
2020-05-07 13:55                         ` John Garry
2020-05-07 14:23                           ` Keith Busch
2020-05-07 15:11                             ` John Garry
2020-05-07 15:35                               ` Keith Busch
2020-05-07 15:41                                 ` John Garry
2020-05-08 16:16                                   ` Keith Busch
2020-05-08 17:04                                     ` John Garry
2020-05-07 16:26                                 ` Robin Murphy
2020-05-07 17:35                                   ` Keith Busch
2020-05-07 17:44                                     ` Will Deacon
2020-05-07 18:06                                       ` Keith Busch
2020-05-08 11:40                                         ` Will Deacon
2020-05-08 14:07                                           ` Keith Busch
2020-05-08 15:34                                             ` Keith Busch
2020-05-06 14:44         ` Keith Busch
2020-05-07 15:58           ` Keith Busch
2020-05-07 20:07             ` [PATCH] nvme-pci: fix "slimmer CQ head update" Alexey Dobriyan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=defb25c5-5ae5-5ff9-66db-efb129bd7743@huawei.com \
    --to=john.garry@huawei.com \
    --cc=adobriyan@gmail.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=kbusch@kernel.org \
    --cc=linux-nvme@lists.infradead.org \
    --cc=sagi@grimberg.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox