public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Martin Steigerwald <martin@lichtvoll.de>
To: Hans de Goede <hdegoede@redhat.com>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Thorsten Leemhuis <regressions@leemhuis.info>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts
Date: Sun, 18 Mar 2018 23:06:02 +0100	[thread overview]
Message-ID: <2906688.e4ghZiFuBA@merkaba> (raw)
In-Reply-To: <d9c6e12b-a1ea-92e1-04ba-010e3db7c480@redhat.com>

Hi Hans.

Hans de Goede - 18.03.18, 22:34:
> On 14-03-18 13:48, Martin Steigerwald wrote:
> > Hans de Goede - 14.03.18, 12:05:
> >> Hi,
> >> 
> >> On 14-03-18 12:01, Martin Steigerwald wrote:
> >>> Hans de Goede - 11.03.18, 15:37:
> >>>> Hi Martin,
> >>>> 
> >>>> On 11-03-18 09:20, Martin Steigerwald wrote:
> >>>>> Hello.
> >>>>> 
> >>>>> Since 4.16-rc4 (upgraded from 4.15.2 which worked) I have an issue
> >>>>> with SMART checks occassionally failing like this:
> >>>>> 
> >>>>> smartd[28017]: Device: /dev/sdb [SAT], is in SLEEP mode, suspending
> >>>>> checks
> >>>>> udisksd[24408]: Error performing housekeeping for drive
> >>>>> /org/freedesktop/UDisks2/drives/INTEL_SSDSA2CW300G3_[…]: Error
> >>>>> updating
> >>>>> SMART data: Error sending ATA command CHECK POWER MODE: Unexpected
> >>>>> sense
> >>>>> data returned:#0120000: 0e 09 0c 00  00 00 ff 00  00 00 00 00  00 00
> >>>>> 50
> >>>>> 00    ..............P.#0120010: 00 00 00 00  00 00 00 00  00 00 00 00
> >>>>> 00
> >>>>> 00 00 00    ................#012 (g-io-error-quark, 0) merkaba
> >>>>> udisksd[24408]: Error performing housekeeping for drive
> >>>>> /org/freedesktop/UDisks2/drives/Crucial_CT480M500SSD3_[…]: Error
> >>>>> updating
> >>>>> SMART dat a: Error sending ATA command CHECK POWER MODE: Unexpected
> >>>>> sense
> >>>>> data returned:#0120000: 01 00 1d 00  00 00 0e 09  0c 00 00 00  ff 00
> >>>>> 00
> >>>>> 00    ................#0120010: 00 0 0 00 00  50 00 00 00  00 00 00 00
> >>>>> 00 00 00 00    ....P...........#012 (g-io-error-quark, 0)
> >>>>> 
> >>>>> (Intel SSD is connected via SATA, Crucial via mSATA in a ThinkPad
> >>>>> T520)
> >>>>> 
> >>>>> However when I then check manually with smartctl -a | -x | -H the
> >>>>> device
> >>>>> reports SMART data just fine.
> >>>>> 
> >>>>> As smartd correctly detects that device is in sleep mode, this may be
> >>>>> an
> >>>>> userspace issue in udisksd.
> >>>>> 
> >>>>> Also at some boot attempts the boot hangs with a message like "could
> >>>>> not
> >>>>> connect to lvmetad, scanning manually for devices". I use BTRFS RAID 1
> >>>>> on to LVs (each on one of the SSDs). A configuration that requires a
> >>>>> manual
> >>>>> adaption to InitRAMFS in order to boot (basically vgchange -ay before
> >>>>> btrfs device scan).
> >>>>> 
> >>>>> I wonder whether that has to do with the new SATA LPM policy stuff,
> >>>>> but
> >>>>> as
> >>>>> I had issues with
> >>>>> 
> >>>>> 3 => Medium power with Device Initiated PM enabled
> >>>>> 
> >>>>> (machine did not boot, which could also have been caused by me
> >>>>> accidentally
> >>>>> removing all TCP/IP network support in the kernel with that setting)
> >>>>> 
> >>>>> I set it back to
> >>>>> 
> >>>>> CONFIG_SATA_MOBILE_LPM_POLICY=0
> >>>>> 
> >>>>> (firmware settings)
> >>>> 
> >>>> Right, so at that settings the LPM policy changes are effectively
> >>>> disabled and cannot explain your SMART issues.
> >>>> 
> >>>> Still I would like to zoom in on this part of your bug report, because
> >>>> for Fedora 28 we are planning to ship with
> >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3
> >>>> and AFAIK Ubuntu has similar plans.
> >>>> 
> >>>> I suspect that the issue you were seeing with
> >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 were with the Crucial disk ? I've
> >>>> attached
> >>>> a patch for you to test, which disabled LPM for your model Crucial SSD
> >>>> (but
> >>>> keeps it on for the Intel disk) if you can confirm that with that patch
> >>>> you
> >>>> can run with
> >>>> CONFIG_SATA_MOBILE_LPM_POLICY=3 without issues that would be great.
> >>> 
> >>> With 4.16-rc5 with CONFIG_SATA_MOBILE_LPM_POLICY=3 the system
> >>> successfully
> >>> booted three times in a row. So feel free to add tested-by.
> >> 
> >> Thanks.
> >> 
> >> To be clear, you're talking about 4.16-rc5 with the patch I made to
> >> blacklist the Crucial disk I assume, not just plain 4.16-rc5, right ?
> > 
> > 4.16-rc5 with your
> > 
> > 0001-libata-Apply-NOLPM-quirk-to-Crucial-M500-480GB-SSDs.patch
> 
> I was about to submit this upstream and was planning on extending it to
> also cover the 960GB version, which lead to me doing a quick google.
> Judging from the google results it seems that there are multiple firmware
> versions of this SSD out there and I wonder if you are perhaps running
> an older version of the firmware. If you do:
> 
> dmesg | grep Crucial_CT480M500
> 
> You should see something like this:
> 
> ata2.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133
> 
> I'm interested in the "MU03" part, what is that in your case?

Although I never updated the firmware, I do have MU03:

% lsscsi | grep Crucial
[2:0:0:0]    disk    ATA      Crucial_CT480M50 MU03  /dev/sdb

% dmesg | grep Crucial_CT480M500
[    2.424537] ata3.00: ATA-9: Crucial_CT480M500SSD3, MU03, max UDMA/133

> Note I'm not saying we should not do the NOLPM quirk, but maybe we
> can limit it to older firmware.

Thanks,
-- 
Martin

  reply	other threads:[~2018-03-18 22:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-11  8:20 [Possible REGRESSION, 4.16-rc4] Error updating SMART data during runtime and could not connect to lvmetad at some boot attempts Martin Steigerwald
2018-03-11 14:37 ` Hans de Goede
2018-03-11 16:28   ` Martin Steigerwald
2018-03-11 16:41     ` Hans de Goede
2018-03-13 13:08   ` Martin Steigerwald
2018-03-13 14:32     ` Ming Lei
2018-03-13 14:56       ` Bart Van Assche
2018-03-14 11:01   ` Martin Steigerwald
2018-03-14 11:05     ` Hans de Goede
2018-03-14 12:48       ` Martin Steigerwald
2018-03-18 21:34         ` Hans de Goede
2018-03-18 22:06           ` Martin Steigerwald [this message]
2018-03-19  9:32             ` Hans de Goede
2018-03-15 10:48     ` Martin Steigerwald
2018-03-19  9:42 ` Thorsten Leemhuis
2018-03-19  9:50   ` Hans de Goede
2018-03-19 12:35     ` Martin Steigerwald
2018-04-10 17:30     ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2906688.e4ghZiFuBA@merkaba \
    --to=martin@lichtvoll.de \
    --cc=hdegoede@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=regressions@leemhuis.info \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox