* NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?
@ 2016-08-05 19:03 ` Marc MERLIN
0 siblings, 0 replies; 5+ messages in thread
From: Marc MERLIN @ 2016-08-05 19:03 UTC (permalink / raw)
I've been stuck on 4.4.x for a while (currently 4.4.5) because any
subsequent kernel would fail to suspend or resume (S3 sleep) on my
Thinkpad P70.
Due to lack of time, I only got around to doing a git bisect now
(sorry), and did it between 4.4.0 and 4.5.0
It's my first bisect, but I hope I did it right outside of the fact that
my kernel wasn't exactly the same each time due to having my .config
file change depending on which kernel I ended up on.
However, the patch found by bisect makes sense that it would be a good
culprit.
I use an NVME 512GB SSD in my laptop, and I guess very few people use those
which could be why I'm the first/only person to report this.
Sadly because NVME changed a lot between 4.4 and 4.5 and I'm not a
kernel hacker, I can't just reverse apply the patch to 4.5 and see if it
works because I'd have to unroll a bunch of other changes too, and
that's a bit beyond my expertise and time at hand right now.
Would this patch make sense as being the reason why I can't S3 sleep
anymore and would you have a test patch against 4.5, 4.6, or 4.7 I can
try to see if it fixes the problem?
Symptom is that my red LED (the dot for in in thinkpad on the back
cover) goes flashing in weird ways when I shut the lid, but not always
the same pattern, however none are the normal on/off gentle pulsing that
indicate proper S3 sleep.
The caps lock key LED also flashes rapidly when I open the lid and the
laptop is stone dead at this point.
Boot logs on 4.4.5 kernel where sleep works fine:
[ 1.245549] ahci 0000:00:17.0: version 3.0
[ 1.245733] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0xc impl SATA mode
[ 1.245771] ahci 0000:00:17.0: flags: 64bit ncq sntf pm led clo only pio slum part ems deso sadm sds apst
[ 1.251140] scsi host0: ahci
[ 1.251587] scsi host1: ahci
[ 1.251972] scsi host2: ahci
[ 1.252360] scsi host3: ahci
[ 1.252437] ata1: DUMMY
[ 1.252449] ata2: DUMMY
[ 1.252462] ata3: SATA max UDMA/133 abar m2048 at 0xd584c000 port 0xd584c200 irq 122
[ 1.252499] ata4: SATA max UDMA/133 abar m2048 at 0xd584c000 port 0xd584c280 irq 122
[ 1.253374] scsi host4: pata_legacy
[ 1.253439] ata5: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
[ 1.355385] nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8
[ 1.570804] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.570877] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.573097] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 1.573101] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 1.573690] ata3.00: supports DRM functions and may not be fully accessible
[ 1.574399] ata3.00: disabling queued TRIM support
[ 1.574402] ata3.00: ATA-9: Samsung SSD 850 EVO 2TB, EMT01B6Q, max UDMA/133
[ 1.574435] ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.575954] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 1.575958] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 1.576550] ata3.00: supports DRM functions and may not be fully accessible
[ 1.577209] ata3.00: disabling queued TRIM support
[ 1.578007] ata3.00: configured for UDMA/133
[ 1.578037] ata4.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 1.578040] ata4.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Patch found by bisect, attached
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
-------------- next part --------------
A non-text attachment was scrubbed...
Name: freeze.diff
Type: text/x-diff
Size: 3888 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20160805/934c1a84/attachment.bin>
^ permalink raw reply [flat|nested] 5+ messages in thread
* NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?
@ 2016-08-05 19:03 ` Marc MERLIN
0 siblings, 0 replies; 5+ messages in thread
From: Marc MERLIN @ 2016-08-05 19:03 UTC (permalink / raw)
To: keith.busch, axboe; +Cc: linux-nvme, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3713 bytes --]
I've been stuck on 4.4.x for a while (currently 4.4.5) because any
subsequent kernel would fail to suspend or resume (S3 sleep) on my
Thinkpad P70.
Due to lack of time, I only got around to doing a git bisect now
(sorry), and did it between 4.4.0 and 4.5.0
It's my first bisect, but I hope I did it right outside of the fact that
my kernel wasn't exactly the same each time due to having my .config
file change depending on which kernel I ended up on.
However, the patch found by bisect makes sense that it would be a good
culprit.
I use an NVME 512GB SSD in my laptop, and I guess very few people use those
which could be why I'm the first/only person to report this.
Sadly because NVME changed a lot between 4.4 and 4.5 and I'm not a
kernel hacker, I can't just reverse apply the patch to 4.5 and see if it
works because I'd have to unroll a bunch of other changes too, and
that's a bit beyond my expertise and time at hand right now.
Would this patch make sense as being the reason why I can't S3 sleep
anymore and would you have a test patch against 4.5, 4.6, or 4.7 I can
try to see if it fixes the problem?
Symptom is that my red LED (the dot for in in thinkpad on the back
cover) goes flashing in weird ways when I shut the lid, but not always
the same pattern, however none are the normal on/off gentle pulsing that
indicate proper S3 sleep.
The caps lock key LED also flashes rapidly when I open the lid and the
laptop is stone dead at this point.
Boot logs on 4.4.5 kernel where sleep works fine:
[ 1.245549] ahci 0000:00:17.0: version 3.0
[ 1.245733] ahci 0000:00:17.0: AHCI 0001.0301 32 slots 2 ports 6 Gbps 0xc impl SATA mode
[ 1.245771] ahci 0000:00:17.0: flags: 64bit ncq sntf pm led clo only pio slum part ems deso sadm sds apst
[ 1.251140] scsi host0: ahci
[ 1.251587] scsi host1: ahci
[ 1.251972] scsi host2: ahci
[ 1.252360] scsi host3: ahci
[ 1.252437] ata1: DUMMY
[ 1.252449] ata2: DUMMY
[ 1.252462] ata3: SATA max UDMA/133 abar m2048@0xd584c000 port 0xd584c200 irq 122
[ 1.252499] ata4: SATA max UDMA/133 abar m2048@0xd584c000 port 0xd584c280 irq 122
[ 1.253374] scsi host4: pata_legacy
[ 1.253439] ata5: PATA max PIO4 cmd 0x1f0 ctl 0x3f6 irq 14
[ 1.355385] nvme0n1: p1 p2 p3 p4 p5 p6 p7 p8
[ 1.570804] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.570877] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.573097] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 1.573101] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 1.573690] ata3.00: supports DRM functions and may not be fully accessible
[ 1.574399] ata3.00: disabling queued TRIM support
[ 1.574402] ata3.00: ATA-9: Samsung SSD 850 EVO 2TB, EMT01B6Q, max UDMA/133
[ 1.574435] ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.575954] ata3.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 1.575958] ata3.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 1.576550] ata3.00: supports DRM functions and may not be fully accessible
[ 1.577209] ata3.00: disabling queued TRIM support
[ 1.578007] ata3.00: configured for UDMA/133
[ 1.578037] ata4.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 1.578040] ata4.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
Patch found by bisect, attached
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
[-- Attachment #2: freeze.diff --]
[-- Type: text/x-diff, Size: 3888 bytes --]
saruman:/usr/src/linux# git bisect good
25646264e15af96c5c630fc742708b1eb3339222 is the first bad commit
commit 25646264e15af96c5c630fc742708b1eb3339222
Author: Keith Busch <keith.busch@intel.com>
Date: Mon Jan 4 09:10:57 2016 -0700
NVMe: Remove queue freezing on resets
NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.
This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index e31a256..8da4a8a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1372,12 +1372,14 @@ out:
return ret;
}
-void nvme_stop_queues(struct nvme_ctrl *ctrl)
+void nvme_freeze_queues(struct nvme_ctrl *ctrl)
{
struct nvme_ns *ns;
mutex_lock(&ctrl->namespaces_mutex);
list_for_each_entry(ns, &ctrl->namespaces, list) {
+ blk_mq_freeze_queue_start(ns->queue);
+
spin_lock_irq(ns->queue->queue_lock);
queue_flag_set(QUEUE_FLAG_STOPPED, ns->queue);
spin_unlock_irq(ns->queue->queue_lock);
@@ -1388,13 +1390,14 @@ void nvme_stop_queues(struct nvme_ctrl *ctrl)
mutex_unlock(&ctrl->namespaces_mutex);
}
-void nvme_start_queues(struct nvme_ctrl *ctrl)
+void nvme_unfreeze_queues(struct nvme_ctrl *ctrl)
{
struct nvme_ns *ns;
mutex_lock(&ctrl->namespaces_mutex);
list_for_each_entry(ns, &ctrl->namespaces, list) {
queue_flag_clear_unlocked(QUEUE_FLAG_STOPPED, ns->queue);
+ blk_mq_unfreeze_queue(ns->queue);
blk_mq_start_stopped_hw_queues(ns->queue, true);
blk_mq_kick_requeue_list(ns->queue);
}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 4722fad..4437592 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -238,8 +238,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl);
void nvme_scan_namespaces(struct nvme_ctrl *ctrl);
void nvme_remove_namespaces(struct nvme_ctrl *ctrl);
-void nvme_stop_queues(struct nvme_ctrl *ctrl);
-void nvme_start_queues(struct nvme_ctrl *ctrl);
+void nvme_freeze_queues(struct nvme_ctrl *ctrl);
+void nvme_unfreeze_queues(struct nvme_ctrl *ctrl);
struct request *nvme_alloc_request(struct request_queue *q,
struct nvme_command *cmd, unsigned int flags);
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 953fe48..ac6c7af 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1064,7 +1064,7 @@ static int nvme_suspend_queue(struct nvme_queue *nvmeq)
spin_unlock_irq(&nvmeq->q_lock);
if (!nvmeq->qid && nvmeq->dev->ctrl.admin_q)
- blk_mq_stop_hw_queues(nvmeq->dev->ctrl.admin_q);
+ blk_mq_freeze_queue_start(nvmeq->dev->ctrl.admin_q);
irq_set_affinity_hint(vector, NULL);
free_irq(vector, nvmeq);
@@ -1296,7 +1296,7 @@ static int nvme_alloc_admin_tags(struct nvme_dev *dev)
return -ENODEV;
}
} else
- blk_mq_start_stopped_hw_queues(dev->ctrl.admin_q, true);
+ blk_mq_unfreeze_queue(dev->ctrl.admin_q);
return 0;
}
@@ -1917,7 +1917,7 @@ static void nvme_dev_shutdown(struct nvme_dev *dev)
mutex_lock(&dev->shutdown_lock);
if (dev->bar) {
- nvme_stop_queues(&dev->ctrl);
+ nvme_freeze_queues(&dev->ctrl);
csts = readl(dev->bar + NVME_REG_CSTS);
}
if (csts & NVME_CSTS_CFS || !(csts & NVME_CSTS_RDY)) {
@@ -2026,7 +2026,7 @@ static void nvme_reset_work(struct work_struct *work)
dev_warn(dev->dev, "IO queues not created\n");
nvme_remove_namespaces(&dev->ctrl);
} else {
- nvme_start_queues(&dev->ctrl);
+ nvme_unfreeze_queues(&dev->ctrl);
nvme_dev_add(dev);
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?
2016-08-05 19:03 ` Marc MERLIN
@ 2016-08-05 20:50 ` Keith Busch
-1 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2016-08-05 20:50 UTC (permalink / raw)
On Fri, Aug 05, 2016@12:03:23PM -0700, Marc MERLIN wrote:
> Would this patch make sense as being the reason why I can't S3 sleep
> anymore and would you have a test patch against 4.5, 4.6, or 4.7 I can
> try to see if it fixes the problem?
Hi Marc,
It might be blk-mq's hot cpu notifier is invoked during suspend and
waiting for nvme's queues to freeze, which may not happen if a request
is waiting on a stopped queue. The patch you biseceted doesn't
necessarilly fix that, but the window for when a request could get queued
like that was much shorter.
Assuming that is the problem, S3 suspends PCI hardware before IO
tasks. I'll see if I can reproduce on one of my machines and look into
a fix.
Thanks,
Keith
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?
@ 2016-08-05 20:50 ` Keith Busch
0 siblings, 0 replies; 5+ messages in thread
From: Keith Busch @ 2016-08-05 20:50 UTC (permalink / raw)
To: Marc MERLIN; +Cc: axboe, linux-nvme, linux-kernel
On Fri, Aug 05, 2016 at 12:03:23PM -0700, Marc MERLIN wrote:
> Would this patch make sense as being the reason why I can't S3 sleep
> anymore and would you have a test patch against 4.5, 4.6, or 4.7 I can
> try to see if it fixes the problem?
Hi Marc,
It might be blk-mq's hot cpu notifier is invoked during suspend and
waiting for nvme's queues to freeze, which may not happen if a request
is waiting on a stopped queue. The patch you biseceted doesn't
necessarilly fix that, but the window for when a request could get queued
like that was much shorter.
Assuming that is the problem, S3 suspends PCI hardware before IO
tasks. I'll see if I can reproduce on one of my machines and look into
a fix.
Thanks,
Keith
^ permalink raw reply [flat|nested] 5+ messages in thread
* NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop?
[not found] ` <20160929060204.GA17637@merlins.org>
@ 2016-10-14 21:15 ` Marc MERLIN
0 siblings, 0 replies; 5+ messages in thread
From: Marc MERLIN @ 2016-10-14 21:15 UTC (permalink / raw)
Hi Keith,
I'm still stuck with the 4.4.x tree, which is starting to be a problem as
more fixes for btrfs and others are being rolled into newer kernels.
I realize that I'm in a bad place since this problem seems to be impacting
virtually no one as laptops that boot off nvme are likely rare, but is there
any chance this will get fixed somehow?
More importantly, do you have access to a thinkpad or other laptop with NVME
M2 for testing?
Thanks,
Marc
On Thu, Sep 29, 2016@08:02:04AM +0200, Marc MERLIN wrote:
> On Wed, Sep 14, 2016@10:46:16AM -0400, Keith Busch wrote:
> > On Tue, Sep 13, 2016@07:40:27PM -0700, Marc MERLIN wrote:
> > > Did you have any luck/more time recently?
> > >
> > > Thanks,
> > > Marc
> >
> > Hi Marc,
> >
> > I was working with Jens a couple weeks ago on something that sounded
> > possibly related. The blk-mq's CPU offlining could result in deadlock or
> > kernel crash, and we narrowed it down to incorrectly switching software
> > contexts. Jens fixed that in 4.8 with this commit:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=e57690fe009b2ab0cee8a57f53be634540e49c9d
> >
> > Is there any chance the latest 4.8 RC is working for you?
>
> My apologies for the late reply, been travelling since then.
> I booted it now (built 10 days ago), shut the lid, got a fast flashing
> of the thinkpad LED (2hz), laptop dead when I re-open the lid.
>
> Back to 4.4.5, things are ok again.
>
> Hope this helps.
>
> Marc
> --
> "A mouse is a device used to point at the xterm you want to type in" - A.S.R.
> Microsoft is to operating systems ....
> .... what McDonalds is to gourmet cooking
> Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-10-14 21:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-05 19:03 NVME regression in all kernels after 4.4.x for NVME in M2 slot for laptop? Marc MERLIN
2016-08-05 19:03 ` Marc MERLIN
2016-08-05 20:50 ` Keith Busch
2016-08-05 20:50 ` Keith Busch
[not found] ` <20160805205449.GC12853@merlins.org>
[not found] ` <20160815193733.GC18083@localhost.localdomain>
[not found] ` <20160914024027.GV24451@merlins.org>
[not found] ` <20160914144615.GA16556@localhost.localdomain>
[not found] ` <20160929060204.GA17637@merlins.org>
2016-10-14 21:15 ` Marc MERLIN
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.