All of lore.kernel.org
 help / color / mirror / Atom feed
* NVM and swap device
@ 2016-01-13  3:40 Stephen Hemminger
  2016-01-13  8:26 ` Hannes Reinecke
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Stephen Hemminger @ 2016-01-13  3:40 UTC (permalink / raw)


I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
random errors. Suspect a driver problem rather than hardware.

I am using 4.4 kernel without patches.

kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)

The swap device was being added via /etc/fstab by UUID. 

I gave up and went back to spinning rust for swap device for stabilty.

Device partitions are:

Disk /dev/nvme0n1: 781422768 sectors, 372.6 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 304117A4-18EF-4B51-92F4-8015758B5CB0
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 781422734
Partitions will be aligned on 8-sector boundaries
Total free space is 0 sectors (0 bytes)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              34            2047   1007.0 KiB  EF02  BIOS boot partition
   2            2048        33556479   16.0 GiB    8200  Linux swap
   3        33556480       781422734   356.6 GiB   8300  Linux filesystem

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13  3:40 NVM and swap device Stephen Hemminger
@ 2016-01-13  8:26 ` Hannes Reinecke
  2016-01-13 15:13   ` Matthew Wilcox
  2016-01-13 17:47 ` Jens Axboe
  2016-01-15 17:42 ` Keith Busch
  2 siblings, 1 reply; 11+ messages in thread
From: Hannes Reinecke @ 2016-01-13  8:26 UTC (permalink / raw)


On 01/13/2016 04:40 AM, Stephen Hemminger wrote:
> I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> random errors. Suspect a driver problem rather than hardware.
>
> I am using 4.4 kernel without patches.
>
> kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
>
> The swap device was being added via /etc/fstab by UUID.
>
> I gave up and went back to spinning rust for swap device for stabilty.
>
> Device partitions are:
>
> Disk /dev/nvme0n1: 781422768 sectors, 372.6 GiB
> Logical sector size: 512 bytes
> Disk identifier (GUID): 304117A4-18EF-4B51-92F4-8015758B5CB0
> Partition table holds up to 128 entries
> First usable sector is 34, last usable sector is 781422734
> Partitions will be aligned on 8-sector boundaries
> Total free space is 0 sectors (0 bytes)
>
> Number  Start (sector)    End (sector)  Size       Code  Name
>     1              34            2047   1007.0 KiB  EF02  BIOS boot partition
>     2            2048        33556479   16.0 GiB    8200  Linux swap
>     3        33556480       781422734   356.6 GiB   8300  Linux filesystem
>
Ouch.

34 sectors is aligned to basically nothing, and is guaranteed to 
trip any alignment issues there are.

Please repartition the device and use some sane value like 2M alignment.

Cheers,

Hannes


-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare at suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg
GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG N?rnberg)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13  8:26 ` Hannes Reinecke
@ 2016-01-13 15:13   ` Matthew Wilcox
  0 siblings, 0 replies; 11+ messages in thread
From: Matthew Wilcox @ 2016-01-13 15:13 UTC (permalink / raw)


On Wed, Jan 13, 2016@09:26:29AM +0100, Hannes Reinecke wrote:
> On 01/13/2016 04:40 AM, Stephen Hemminger wrote:
> >I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> >The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> >random errors. Suspect a driver problem rather than hardware.
> >
> >I am using 4.4 kernel without patches.
> >
> >kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> >kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> >kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> >kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> >kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> >kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> >kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> >kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
> >
> >The swap device was being added via /etc/fstab by UUID.
> >
> >I gave up and went back to spinning rust for swap device for stabilty.
> >
> >Device partitions are:
> >
> >Disk /dev/nvme0n1: 781422768 sectors, 372.6 GiB
> >Logical sector size: 512 bytes
> >Disk identifier (GUID): 304117A4-18EF-4B51-92F4-8015758B5CB0
> >Partition table holds up to 128 entries
> >First usable sector is 34, last usable sector is 781422734
> >Partitions will be aligned on 8-sector boundaries
> >Total free space is 0 sectors (0 bytes)
> >
> >Number  Start (sector)    End (sector)  Size       Code  Name
> >    1              34            2047   1007.0 KiB  EF02  BIOS boot partition
> >    2            2048        33556479   16.0 GiB    8200  Linux swap
> >    3        33556480       781422734   356.6 GiB   8300  Linux filesystem
> >
> Ouch.
> 
> 34 sectors is aligned to basically nothing, and is guaranteed to trip any
> alignment issues there are.

Maybe, but that's the BIOS boot partition; who cares?  The swap partition
is aligned to 2048 sectors which should be good enough for anyone (it's 1MB).

Can you run a straight 'dd' from that partition, to see if you get errors?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13  3:40 NVM and swap device Stephen Hemminger
  2016-01-13  8:26 ` Hannes Reinecke
@ 2016-01-13 17:47 ` Jens Axboe
  2016-01-13 18:51   ` Stephen Hemminger
  2016-01-15 17:42 ` Keith Busch
  2 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2016-01-13 17:47 UTC (permalink / raw)


On 01/12/2016 08:40 PM, Stephen Hemminger wrote:
> I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> random errors. Suspect a driver problem rather than hardware.
>
> I am using 4.4 kernel without patches.
>
> kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
>
> The swap device was being added via /etc/fstab by UUID.
>
> I gave up and went back to spinning rust for swap device for stabilty.

That's very odd. Why are you suspecting a driver problem? Have you tried 
to thoroughly beat the device up with normal IO?

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13 17:47 ` Jens Axboe
@ 2016-01-13 18:51   ` Stephen Hemminger
  2016-01-13 18:55     ` Jens Axboe
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2016-01-13 18:51 UTC (permalink / raw)


On Wed, 13 Jan 2016 10:47:40 -0700
Jens Axboe <axboe@fb.com> wrote:

> On 01/12/2016 08:40 PM, Stephen Hemminger wrote:
> > I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> > The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> > random errors. Suspect a driver problem rather than hardware.
> >
> > I am using 4.4 kernel without patches.
> >
> > kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> > kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> > kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> > kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> > kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
> >
> > The swap device was being added via /etc/fstab by UUID.
> >
> > I gave up and went back to spinning rust for swap device for stabilty.
> 
> That's very odd. Why are you suspecting a driver problem? Have you tried 
> to thoroughly beat the device up with normal IO?
> 

I will try it tonight. Do you have a favorite test?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13 18:51   ` Stephen Hemminger
@ 2016-01-13 18:55     ` Jens Axboe
  2016-01-20 21:30       ` Stephen Hemminger
  0 siblings, 1 reply; 11+ messages in thread
From: Jens Axboe @ 2016-01-13 18:55 UTC (permalink / raw)


On 01/13/2016 11:51 AM, Stephen Hemminger wrote:
> On Wed, 13 Jan 2016 10:47:40 -0700
> Jens Axboe <axboe@fb.com> wrote:
>
>> On 01/12/2016 08:40 PM, Stephen Hemminger wrote:
>>> I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
>>> The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
>>> random errors. Suspect a driver problem rather than hardware.
>>>
>>> I am using 4.4 kernel without patches.
>>>
>>> kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
>>> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
>>> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
>>> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
>>> kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
>>> kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
>>> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
>>> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
>>>
>>> The swap device was being added via /etc/fstab by UUID.
>>>
>>> I gave up and went back to spinning rust for swap device for stabilty.
>>
>> That's very odd. Why are you suspecting a driver problem? Have you tried
>> to thoroughly beat the device up with normal IO?
>>
>
> I will try it tonight. Do you have a favorite test?

I'd run something that just beats up on it, reads and writes. If you 
have fio installed, something ala:

fio --ioengine=libaio --iodepth=8 --direct=1 --bs=4k 
--filename=/dev/nvme0n1 --numjobs=4 --norandommap --runtime=1h 
--time_based=1 --name=reads --rw=randread --name=writes --rw=randwrite

This will run 4 processes that randomly read from the device, and 4 that 
randomly write. Replace /dev/nvme0n1 with your swap partition. The test 
will run for 1 hour.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13  3:40 NVM and swap device Stephen Hemminger
  2016-01-13  8:26 ` Hannes Reinecke
  2016-01-13 17:47 ` Jens Axboe
@ 2016-01-15 17:42 ` Keith Busch
  2016-01-15 18:18   ` Stephen Hemminger
  2016-01-15 18:20   ` Stephen Hemminger
  2 siblings, 2 replies; 11+ messages in thread
From: Keith Busch @ 2016-01-15 17:42 UTC (permalink / raw)


On Tue, Jan 12, 2016@07:40:30PM -0800, Stephen Hemminger wrote:
> I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> random errors. Suspect a driver problem rather than hardware.
> 
> I am using 4.4 kernel without patches.
> 
> kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
> 
> The swap device was being added via /etc/fstab by UUID. 

If you haven't any further insights into the issue, could you check the
device's health? I'd be surprised if there is a problem with that since
you mentioned it was new card, but would like to rule that out if this
has hit a dead end on the other testing.

For that, we need smart logs. There are various tools available that
can read those logs. Here's an open source version:

  https://github.com/linux-nvme/nvme-cli

Here's example output from one of my drives with the above tool:

  # nvme smart-log /dev/nvme0
  Smart Log for NVME device:/dev/nvme0 namespace-id:ffffffff
  critical_warning                    : 0
  temperature                         : 29 C
  available_spare                     : 100%
  available_spare_threshold           : 10%
  percentage_used                     : 0%
  data_units_read                     : 577,600
  data_units_written                  : 3,182,404
  host_read_commands                  : 4,537,801
  host_write_commands                 : 18,713,235
  controller_busy_time                : 17
  power_cycles                        : 1
  power_on_hours                      : 163
  unsafe_shutdowns                    : 1
  media_errors                        : 0
  num_err_log_entries                 : 0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-15 17:42 ` Keith Busch
@ 2016-01-15 18:18   ` Stephen Hemminger
  2016-01-15 18:20   ` Stephen Hemminger
  1 sibling, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2016-01-15 18:18 UTC (permalink / raw)


On Fri, 15 Jan 2016 17:42:36 +0000
Keith Busch <keith.busch@intel.com> wrote:

> On Tue, Jan 12, 2016@07:40:30PM -0800, Stephen Hemminger wrote:
> > I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> > The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> > random errors. Suspect a driver problem rather than hardware.
> > 
> > I am using 4.4 kernel without patches.
> > 
> > kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> > kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> > kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> > kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> > kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
> > 
> > The swap device was being added via /etc/fstab by UUID. 
> 
> If you haven't any further insights into the issue, could you check the
> device's health? I'd be surprised if there is a problem with that since
> you mentioned it was new card, but would like to rule that out if this
> has hit a dead end on the other testing.
> 
> For that, we need smart logs. There are various tools available that
> can read those logs. Here's an open source version:
> 
>   https://github.com/linux-nvme/nvme-cli
> 
> Here's example output from one of my drives with the above tool:
> 
>   # nvme smart-log /dev/nvme0
>   Smart Log for NVME device:/dev/nvme0 namespace-id:ffffffff
>   critical_warning                    : 0
>   temperature                         : 29 C
>   available_spare                     : 100%
>   available_spare_threshold           : 10%
>   percentage_used                     : 0%
>   data_units_read                     : 577,600
>   data_units_written                  : 3,182,404
>   host_read_commands                  : 4,537,801
>   host_write_commands                 : 18,713,235
>   controller_busy_time                : 17
>   power_cycles                        : 1
>   power_on_hours                      : 163
>   unsafe_shutdowns                    : 1
>   media_errors                        : 0
>   num_err_log_entries                 : 0

I wanted to run more stress tests before reporting back.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-15 17:42 ` Keith Busch
  2016-01-15 18:18   ` Stephen Hemminger
@ 2016-01-15 18:20   ` Stephen Hemminger
  2016-01-15 21:19     ` Keith Busch
  1 sibling, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2016-01-15 18:20 UTC (permalink / raw)


On Fri, 15 Jan 2016 17:42:36 +0000
Keith Busch <keith.busch@intel.com> wrote:

> On Tue, Jan 12, 2016@07:40:30PM -0800, Stephen Hemminger wrote:
> > I have a nice shiny new Intel NVM PCI card; decided to use it for a filesystem and swap.
> > The filesystem (btrfs) is doing fine, but the swap device was throwing occasional
> > random errors. Suspect a driver problem rather than hardware.
> > 
> > I am using 4.4 kernel without patches.
> > 
> > kern.log:Jan 12 08:11:57 xeon-e3 kernel: [159474.037390] Read-error on swap-device (259:0:17597808)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855526] Read-error on swap-device (259:0:11355648)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87938.855530] Read-error on swap-device (259:0:11355656)
> > kern.log.1:Jan  7 08:32:10 xeon-e3 kernel: [87939.855467] Read-error on swap-device (259:0:16180824)
> > kern.log.1:Jan  8 08:24:07 xeon-e3 kernel: [63670.777981] Read-error on swap-device (259:0:32690768)
> > kern.log.1:Jan  9 09:25:02 xeon-e3 kernel: [153720.919325] Read-error on swap-device (259:0:220488)
> > kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.957675] Read-error on swap-device (259:0:24476232)
> > kern.log.1:Jan  9 16:40:05 xeon-e3 kernel: [179820.962673] Read-error on swap-device (259:0:33292816)
> > 
> > The swap device was being added via /etc/fstab by UUID. 
> 
> If you haven't any further insights into the issue, could you check the
> device's health? I'd be surprised if there is a problem with that since
> you mentioned it was new card, but would like to rule that out if this
> has hit a dead end on the other testing.
> 
> For that, we need smart logs. There are various tools available that
> can read those logs. Here's an open source version:
> 
>   https://github.com/linux-nvme/nvme-cli
> 
> Here's example output from one of my drives with the above tool:
> 
>   # nvme smart-log /dev/nvme0
>   Smart Log for NVME device:/dev/nvme0 namespace-id:ffffffff
>   critical_warning                    : 0
>   temperature                         : 29 C
>   available_spare                     : 100%
>   available_spare_threshold           : 10%
>   percentage_used                     : 0%
>   data_units_read                     : 577,600
>   data_units_written                  : 3,182,404
>   host_read_commands                  : 4,537,801
>   host_write_commands                 : 18,713,235
>   controller_busy_time                : 17
>   power_cycles                        : 1
>   power_on_hours                      : 163
>   unsafe_shutdowns                    : 1
>   media_errors                        : 0
>   num_err_log_entries                 : 0

Are these logs persistant?

# ./nvme smart-log /dev/nvme0
Smart Log for NVME device:/dev/nvme0 namespace-id:ffffffff
critical_warning                    : 0
temperature                         : 29 C
available_spare                     : 100%
available_spare_threshold           : 10%
percentage_used                     : 0%
data_units_read                     : 2,768,267
data_units_written                  : 4,085,497
host_read_commands                  : 74,458,451
host_write_commands                 : 68,420,945
controller_busy_time                : 12
power_cycles                        : 33
power_on_hours                      : 227
unsafe_shutdowns                    : 6
media_errors                        : 0
num_err_log_entries                 : 0
Critical Composite Temperature Time : 0
Temperature Sensor 1                : 0 C
Temperature Sensor 2                : 0 C
Temperature Sensor 3                : 0 C
Temperature Sensor 4                : 0 C
Temperature Sensor 5                : 0 C
Temperature Sensor 6                : 0 C
Temperature Sensor 7                : 0 C
Temperature Sensor 8                : 0 C

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-15 18:20   ` Stephen Hemminger
@ 2016-01-15 21:19     ` Keith Busch
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Busch @ 2016-01-15 21:19 UTC (permalink / raw)


On Fri, Jan 15, 2016@10:20:52AM -0800, Stephen Hemminger wrote:
> Are these logs persistant?

Yep, these are persistent logs.

Nothing unusual here. Sorry for the digression.

> critical_warning                    : 0
> temperature                         : 29 C
> available_spare                     : 100%
> available_spare_threshold           : 10%
> percentage_used                     : 0%
> data_units_read                     : 2,768,267
> data_units_written                  : 4,085,497
> host_read_commands                  : 74,458,451
> host_write_commands                 : 68,420,945
> controller_busy_time                : 12
> power_cycles                        : 33
> power_on_hours                      : 227
> unsafe_shutdowns                    : 6
> media_errors                        : 0
> num_err_log_entries                 : 0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* NVM and swap device
  2016-01-13 18:55     ` Jens Axboe
@ 2016-01-20 21:30       ` Stephen Hemminger
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2016-01-20 21:30 UTC (permalink / raw)


On Wed, 13 Jan 2016 11:55:41 -0700
Jens Axboe <axboe@fb.com> wrote:

> I'd run something that just beats up on it, reads and writes. If you 
> have fio installed, something ala:
> 
> fio --ioengine=libaio --iodepth=8 --direct=1 --bs=4k 
> --filename=/dev/nvme0n1 --numjobs=4 --norandommap --runtime=1h 
> --time_based=1 --name=reads --rw=randread --name=writes --rw=randwrite
> 
> This will run 4 processes that randomly read from the device, and 4 that 
> randomly write. Replace /dev/nvme0n1 with your swap partition. The test 
> will run for 1 hour.

The fio test ran fine without errors. I am beginning to think it something
unique to how I/O in swap path gets done.

Will turn it back on for swap and see what happens.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-01-20 21:30 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-01-13  3:40 NVM and swap device Stephen Hemminger
2016-01-13  8:26 ` Hannes Reinecke
2016-01-13 15:13   ` Matthew Wilcox
2016-01-13 17:47 ` Jens Axboe
2016-01-13 18:51   ` Stephen Hemminger
2016-01-13 18:55     ` Jens Axboe
2016-01-20 21:30       ` Stephen Hemminger
2016-01-15 17:42 ` Keith Busch
2016-01-15 18:18   ` Stephen Hemminger
2016-01-15 18:20   ` Stephen Hemminger
2016-01-15 21:19     ` Keith Busch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.