All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurence Oberman <loberman@redhat.com>
To: Sitsofe Wheeler <sitsofe@gmail.com>, linux-ide@vger.kernel.org
Cc: linux-block@vger.kernel.org
Subject: Re: failed command: WRITE FPDMA QUEUED with Samsung 860 EVO
Date: Thu, 03 Jan 2019 15:47:38 -0500	[thread overview]
Message-ID: <1546548458.24199.2.camel@redhat.com> (raw)
In-Reply-To: <1546540117.24199.0.camel@redhat.com>

On Thu, 2019-01-03 at 13:28 -0500, Laurence Oberman wrote:
> On Wed, 2019-01-02 at 11:10 -0500, Laurence Oberman wrote:
> > On Wed, 2019-01-02 at 15:29 +0000, Sitsofe Wheeler wrote:
> > > (Also trying linux-ide list)
> > > 
> > > On Wed, 2 Jan 2019 at 15:25, Sitsofe Wheeler <sitsofe@gmail.com>
> > > wrote:
> > > > 
> > > > Hi,
> > > > 
> > > > I recently purchased a SATA Samsung 860 EVO SSD and put it in
> > > > an
> > > > old
> > > > HP microserver (which has an AMD N36L). By default, when the
> > > > disk
> > > > load
> > > > becomes a little heavy e.g. by running a job like
> > > > 
> > > > fio --name=test --readonly --rw=randread --filename /dev/sdb --
> > > > bs=32k \
> > > >     --ioengine=libaio --iodepth=32 --direct=1 --runtime=10m --
> > > > time_based=1
> > > > 
> > > > the kernel starts repeatedly producing error messages like:
> > > > 
> > > > [ 1177.729912] ata2.00: exception Emask 0x10 SAct 0x3c000 SErr
> > > > 0x0
> > > > action 0x6 frozen
> > > > [ 1177.729931] ata2.00: irq_stat 0x08000000, interface fatal
> > > > error
> > > > [ 1177.729943] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.729962] ata2.00: cmd 61/80:70:80:50:e6/06:00:00:00:00/40
> > > > tag
> > > > 14
> > > > ncq dma 851968 out
> > > > [ 1177.729962]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.729978] ata2.00: status: { DRDY }
> > > > [ 1177.729986] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.730002] ata2.00: cmd 61/00:78:00:57:e6/03:00:00:00:00/40
> > > > tag
> > > > 15
> > > > ncq dma 393216 out
> > > > [ 1177.730002]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.730017] ata2.00: status: { DRDY }
> > > > [ 1177.730024] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.730039] ata2.00: cmd 61/00:80:00:5a:e6/05:00:00:00:00/40
> > > > tag
> > > > 16
> > > > ncq dma 655360 out
> > > > [ 1177.730039]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.730053] ata2.00: status: { DRDY }
> > > > [ 1177.730060] ata2.00: failed command: WRITE FPDMA QUEUED
> > > > [ 1177.730078] ata2.00: cmd 61/00:88:00:5f:e6/01:00:00:00:00/40
> > > > tag
> > > > 17
> > > > ncq dma 131072 out
> > > > [ 1177.730078]          res 40/00:80:00:5a:e6/00:00:00:00:00/40
> > > > Emask
> > > > 0x10 (ATA bus error)
> > > > [ 1177.730096] ata2.00: status: { DRDY }
> > > > [ 1177.730108] ata2: hard resetting link
> > > > [ 1178.205831] ata2: SATA link up 3.0 Gbps (SStatus 123
> > > > SControl
> > > > 300)
> > > > [ 1178.206165] ata2.00: supports DRM functions and may not be
> > > > fully
> > > > accessible
> > > > [ 1178.209743] ata2.00: supports DRM functions and may not be
> > > > fully
> > > > accessible
> > > > [ 1178.212786] ata2.00: configured for UDMA/133
> > > > [ 1178.212826] ata2: EH complete
> > > > [ 1178.212988] ata2.00: Enabling discard_zeroes_data
> > > > 
> > > > I tried moving the SSD to another caddy and bay but the issue
> > > > persists. None of the regular hard disks (a Western Digital and
> > > > a
> > > > Seagate) nor the other SSD (a Crucial MX500) already in the
> > > > system
> > > > trigger the issue the Samsung 860 EVO does. Adding
> > > > 
> > > > libata.force=2.00:noncq
> > > > 
> > > > seems to make the issue go away but seemingly at some speed
> > > > cost
> > > > (at
> > > > least compared to what the MX500 achieves). The OS in use is
> > > > Ubuntu
> > > > 18.04 with a 4.15.0-43-generic kernel but even a 4.18.0-13-
> > > > generic
> > > > had
> > > > the same issue.
> > > > 
> > > > Is there anything software-wise that might need investigating
> > > > that
> > > > would allow NCQ to work and a better speed to be reached?
> > > 
> > > 
> > 
> > Hello 
> > 
> > I have seen issues reported due to low power delivery to the drive.
> > However investigating this, its starts with an exception Emask and
> > then
> > the link error code runs.
> > Reviewing online some folks are reporting cable issues can cause
> > this
> > or firmware.
> > I don't have one to test myself, and you are using an enclosure.
> > Are
> > you able to connect direct to the motherboard via another cable and
> > test again.
> > 
> > Regards
> > Laurence
> 
> I managed to find a 860 so going to test it and see if I see the same
> behavior and report back
> 
> Thanks
> Laurence

Hello

I put the 860 in an enclosure (MSA50) driven by a SAS HBA
(megaraid)sas)

The backplane is SAS or SATA

/dev/sg2  0 0 49 0  0  /dev/sdb  ATA       Samsung SSD 860   1B6Q

Running the same fio test of yours on latest RHEL7 and 4.20.0+-1 I am
unable to reproduce this issue of yours after multiple test runs.

Tests all run to completion with no errors on RHEL7 and upstream
kernels.

I have no way to test at the moment with a direct motherboard
connection to a SATA port so if this is a host side issue with sata
(ATA) I would not see it.

What this likely means is that the drive itself seems to be well
behaved here and the power or cable issue I alluded to earlier may be
worth looking into for you or possibly the host ATA interface.

RHEL7 kernel
3.10.0-862.11.1.el7.x86_64

test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB,
(T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
fio-3.3-38-gf5ec8
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3839,w=0 IOPS][eta
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=3974: Thu Jan  3 15:14:10 2019
   read: IOPS=3827, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
    slat (usec): min=7, max=374, avg=23.78, stdev= 6.09
    clat (usec): min=449, max=509311, avg=8330.29, stdev=2060.29
     lat (usec): min=514, max=509331, avg=8355.00, stdev=2060.29
    clat percentiles (usec):
     |  1.00th=[ 5342],  5.00th=[ 7767], 10.00th=[ 8225], 20.00th=[
8291],
     | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
8291],
     | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8455], 95.00th=[
8848],
     | 99.00th=[11600], 99.50th=[13042], 99.90th=[16581],
99.95th=[17695],
     | 99.99th=[19006]
   bw (  KiB/s): min=50560, max=124472, per=99.94%, avg=122409.89,
stdev=2592.08, samples=1200
   iops        : min= 1580, max= 3889, avg=3825.22, stdev=81.01,
samples=1200
  lat (usec)   : 500=0.01%, 750=0.03%, 1000=0.02%
  lat (msec)   : 2=0.08%, 4=0.32%, 10=97.20%, 20=2.34%, 50=0.01%
  lat (msec)   : 750=0.01%
  cpu          : usr=4.76%, sys=12.81%, ctx=2113947, majf=0, minf=14437
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued rwts: total=2296574,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-125MB/s),
io=70.1GiB (75.3GB), run=600009-600009msecmodinfo ata

Disk stats (read/write):
  sdb: ios=2295763/0, merge=0/0, ticks=18786069/0, in_queue=18784356,
util=100.00%

Upstream Kernel
4.20.0+-1.x86_64

[root@localhost ~]# ./test_ssd.sh 
test: (g=0): rw=randread, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB,
(T) 32.0KiB-32.0KiB, ioengine=libaio, iodepth=32
fio-3.3-38-gf5ec8
Starting 1 process
Jobs: 1 (f=1): [r(1)][100.0%][r=120MiB/s,w=0KiB/s][r=3835,w=0 IOPS][eta
00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=2895: Thu Jan  3 15:47:21 2019
   read: IOPS=3826, BW=120MiB/s (125MB/s)(70.1GiB/600009msec)
    slat (usec): min=5, max=410, avg=26.92, stdev= 3.81
    clat (usec): min=760, max=1287.1k, avg=8327.27, stdev=4756.19
     lat (usec): min=787, max=1287.1k, avg=8355.50, stdev=4756.18
    clat percentiles (usec):
     |  1.00th=[ 8225],  5.00th=[ 8291], 10.00th=[ 8291], 20.00th=[
8291],
     | 30.00th=[ 8291], 40.00th=[ 8291], 50.00th=[ 8291], 60.00th=[
8291],
     | 70.00th=[ 8356], 80.00th=[ 8356], 90.00th=[ 8356], 95.00th=[
8356],
     | 99.00th=[ 8455], 99.50th=[ 8455], 99.90th=[ 8455], 99.95th=[
8455],
     | 99.99th=[ 9765]
   bw (  KiB/s): min=25152, max=124559, per=100.00%, avg=122589.35,
stdev=3879.77, samples=1199
   iops        : min=  786, max= 3892, avg=3830.88, stdev=121.24,
samples=1199
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=99.99%, 20=0.01%
  cpu          : usr=4.19%, sys=18.68%, ctx=2295902, majf=0, minf=278
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%,
>=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%,
>=64=0.0%
     issued rwts: total=2296041,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
   READ: bw=120MiB/s (125MB/s), 120MiB/s-120MiB/s (125MB/s-125MB/s),
io=70.1GiB (75.2GB), run=600009-600009msec

Disk stats (read/write):
  sdb: ios=2296022/0, merge=0/0, ticks=19111730/0, in_queue=18408961,
util=99.87%



  reply	other threads:[~2019-01-03 20:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-02 15:25 failed command: WRITE FPDMA QUEUED with Samsung 860 EVO Sitsofe Wheeler
2019-01-02 15:29 ` Sitsofe Wheeler
2019-01-02 16:10   ` Laurence Oberman
2019-01-03 18:28     ` Laurence Oberman
2019-01-03 20:47       ` Laurence Oberman [this message]
2019-01-03 22:24         ` Sitsofe Wheeler
2019-01-03 22:40           ` Laurence Oberman
2019-01-04  7:33             ` Sitsofe Wheeler
2019-01-07  7:17               ` Hannes Reinecke
2019-01-07  7:41                 ` Sitsofe Wheeler
2019-01-07  8:46                   ` Hannes Reinecke
2019-01-08  7:06                     ` Sitsofe Wheeler
2019-01-09  6:54                       ` Sitsofe Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1546548458.24199.2.camel@redhat.com \
    --to=loberman@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-ide@vger.kernel.org \
    --cc=sitsofe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.