Re: No I/O errors reported after SATA link hard reset

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

From: Tejun Heo <tj@kernel.org>
To: Gionatan Danti <g.danti@assyoma.it>
Cc: Bernd Schubert <bernd.schubert@fastmail.fm>,
	linux-ide@vger.kernel.org, linux-scsi@vger.kernel.org
Subject: Re: No I/O errors reported after SATA link hard reset
Date: Thu, 17 Aug 2017 07:46:57 -0700	[thread overview]
Message-ID: <20170817144657.GF3238792@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <4debd4d8dea1d534ef555ceae4429435@assyoma.it>

Hello,

On Thu, Aug 17, 2017 at 04:15:35PM +0200, Gionatan Danti wrote:
> Ok, so *this* is the root cause of the problem: libata not
> identifying spurious link renegotiations vs brief powerloss/powerup
> events. Out of curiosity: is this a SATA-specific problem (ie: in
> the SATA specification), or even SAS disks are affected?

No idea about SAS.  They're identical at the link layer tho.

> >Because we don't wanna be ditching disks on temporary link glitches,
> >which do happen once in a while.
> 
> Any chances to report I/O errors to the upper layers *without*
> offlining the device? In this manner, upper layers (ie: MDRAID) can
> act in a more informate way. For example: single disk device will
> simple retry the failed operation, while MDRAID can take the
> "badblocks" code path to deal with the error.

Upper layer can request to avoid retrying on errors but it won't help
too much.  It doesn't have much to do with specific commands.  A power
event can take place without any command in flight and lose the
buffered data.  Unless upper layer is tracking all that's being
written, there isn't much it can do outside doing full scan.  This is
a condition which should be handled from the driver side.

> >So, the right way to deal with the problem probably is making use of
> >the SMART counter which indicates power loss events and verify that
> >the counter hasn't increased over link issues.  If it changed, the
> >device should be detached and re-probed, which will make it come back
> >as a different block device.  Unfortunately, I haven't had the chance
> >to actually implement that.
> 
> This is a very good idea, maybe I can implement it in userspace with
> a simple, fast polling scheme (for example, each 60 seconds). Such a
> polling would not prevent all corruption scenarios, but will at
> least timely inform the user.

Yeah, looking into getting it implemented on the kernel side.

Thanks.

-- 
tejun

next prev parent reply	other threads:[~2017-08-17 14:46 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-16 22:27 No I/O errors reported after SATA link hard reset Gionatan Danti
2017-08-17  9:24 ` Bernd Schubert
2017-08-17 12:48   ` Tejun Heo
2017-08-17 13:18     ` Bernd Schubert
2017-08-17 13:25       ` Tejun Heo
2017-08-17 13:43         ` Bernd Schubert
2017-08-17 14:23       ` Gionatan Danti
2017-08-17 14:15     ` Gionatan Danti
2017-08-17 14:46       ` Tejun Heo [this message]
2017-08-17 15:01         ` Gionatan Danti
  -- strict thread matches above, loose matches on Subject: below --
2017-08-26 20:58 sonofagun
2017-08-27 18:42 ` Gionatan Danti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170817144657.GF3238792@devbig577.frc2.facebook.com \
    --to=tj@kernel.org \
    --cc=bernd.schubert@fastmail.fm \
    --cc=g.danti@assyoma.it \
    --cc=linux-ide@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox