linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
@ 2015-09-06 17:51 Elliott, Robert (Persistent Memory)
  2015-09-08 12:56 ` Austin S Hemmelgarn
  0 siblings, 1 reply; 7+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2015-09-06 17:51 UTC (permalink / raw)
  To: dan.j.williams@intel.com
  Cc: linux-nvdimm@lists.01.org, linux-btrfs@vger.kernel.org,
	util-linux@vger.kernel.org, Kani, Toshimitsu, Knippers, Linda

mkfs.btrfs does not detect pmem devices as being SSDs in kernel 4.2.

Label:              (null)
UUID:               46603efe-728c-43fe-8241-ffc125e1a7ed
Node size:          16384
Sector size:        4096
Filesystem size:    8.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         DUP             417.56MiB
  System:           DUP              12.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     8.00GiB  /dev/pmem0

mkfs.btrfs opens "/sys/block/%s/queue/rotational" and looks for
0 (non-rotational - an SSD) or non-zero (rotational - a HDD).

However, strace shows it is having trouble creating that path.
The blkid_devno_to_wholedisk function from libblkid leads it to
this path:
	/sys/block/LNXSY/queue/rotational
which doesn't exist.

That is based on:
$ realpath /sys/block/pmem0
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus1/region0/namespace0.0/block/pmem0

$ realpath /sys/dev/block/259:0
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus1/region0/namespace0.0/block/pmem0

The impact looks limited to the print and causing it to not
automatically disable "metadata duplication on a single device."

References:
git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git 
git://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
http://comments.gmane.org/gmane.comp.file-systems.btrfs/18749

mkfs.c excerpt
==============
static int is_ssd(const char *file)
{
...
        /* Get whole disk name (not full path) for this devno */
        ret = blkid_devno_to_wholedisk(devno,
                        wholedisk, sizeof(wholedisk), NULL);
        if (ret) {
                blkid_free_probe(probe);
                return 0;
        }

        snprintf(sysfs_path, PATH_MAX, "/sys/block/%s/queue/rotational",
                 wholedisk);

        blkid_free_probe(probe);

        fd = open(sysfs_path, O_RDONLY);
        if (fd < 0) {
                return 0;
        }

        if (read(fd, &rotational, sizeof(char)) < sizeof(char)) {
                close(fd);
                return 0;
...
int main(int ac, char **av)
...
        if (!mixed) {
                if (!metadata_profile_opt) {
                        if (dev_cnt == 1 && ssd && verbose)
                                printf("Detected a SSD, turning off metadata "
                                "duplication.  Mkfs with -m dup if you want to "
                                "force metadata duplication.\n");

                        metadata_profile = (dev_cnt > 1) ?
                                        BTRFS_BLOCK_GROUP_RAID1 : (ssd) ?
                                        0: BTRFS_BLOCK_GROUP_DUP;
                }


strace
======
open("/dev/pmem0", O_RDWR|O_EXCL)       = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(259, 0), ...}) = 0
close(3)                                = 0
open("/dev/pmem0", O_RDONLY|O_CLOEXEC)  = 3
fadvise64(3, 0, 0, POSIX_FADV_RANDOM)   = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(259, 0), ...}) = 0
uname({sysname="Linux", nodename="s18", ...}) = 0
ioctl(3, BLKGETSIZE64, 8589934592)      = 0
open("/sys/dev/block/259:0", O_RDONLY|O_CLOEXEC) = 4
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4)                                = 0
open("/sys/dev/block/259:0", O_RDONLY|O_CLOEXEC) = 4
newfstatat(4, "partition", 0x7fffb67faa50, 0) = -1 ENOENT (No such file or directory)
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(4)                                = 0
ioctl(3, CDROM_GET_CAPABILITY, 0)       = -1 ENOTTY (Inappropriate ioctl for device)
open("/sys/dev/block/259:0", O_RDONLY|O_CLOEXEC) = 4
newfstatat(4, "partition", 0x7fffb67fab90, 0) = -1 ENOENT (No such file or directory)
openat(4, "dm/uuid", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
readlink("/sys/dev/block/259:0", "../../devices/LNXSYSTM:00/LNXSY", 31) = 31
close(4)                                = 0
close(3)                                = 0
open("/sys/block/LNXSY/queue/rotational", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/dev/pmem0", O_RDONLY)            = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(259, 0), ...}) = 0
ioctl(3, BLKGETSIZE64, 8589934592)      = 0
close(3)                                = 0
...


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
  2015-09-06 17:51 mkfs.btrfs cannot find rotational file for SSD detection for a pmem device Elliott, Robert (Persistent Memory)
@ 2015-09-08 12:56 ` Austin S Hemmelgarn
  2015-09-08 20:00   ` Elliott, Robert (Persistent Memory)
  0 siblings, 1 reply; 7+ messages in thread
From: Austin S Hemmelgarn @ 2015-09-08 12:56 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory), dan.j.williams@intel.com
  Cc: linux-nvdimm@lists.01.org, linux-btrfs@vger.kernel.org,
	util-linux@vger.kernel.org, Kani, Toshimitsu, Knippers, Linda

[-- Attachment #1: Type: text/plain, Size: 2644 bytes --]

On 2015-09-06 13:51, Elliott, Robert (Persistent Memory) wrote:
> mkfs.btrfs does not detect pmem devices as being SSDs in kernel 4.2.
>
> Label:              (null)
> UUID:               46603efe-728c-43fe-8241-ffc125e1a7ed
> Node size:          16384
> Sector size:        4096
> Filesystem size:    8.00GiB
> Block group profiles:
>    Data:             single            8.00MiB
>    Metadata:         DUP             417.56MiB
>    System:           DUP              12.00MiB
> SSD detected:       no
> Incompat features:  extref, skinny-metadata
> Number of devices:  1
> Devices:
>     ID        SIZE  PATH
>      1     8.00GiB  /dev/pmem0
>
> mkfs.btrfs opens "/sys/block/%s/queue/rotational" and looks for
> 0 (non-rotational - an SSD) or non-zero (rotational - a HDD).
While not directly related in this case, it's worth pointing out that 
there are lots of things that are not SSD's that get listed as 
non-rotational by default (and you can in fact change the value of this 
file from userspace), such as: virtualized block storage (xvdb and 
virtio at least, not sure about vmware or hyperv), and some networked 
block devices (NBD may or may not, depends on server side configuration, 
ATAoE did at least at one point, RBD gets listed as non-rotational, DRBD 
tracks underlying storage on the local node).
>
> However, strace shows it is having trouble creating that path.
> The blkid_devno_to_wholedisk function from libblkid leads it to
> this path:
> 	/sys/block/LNXSY/queue/rotational
> which doesn't exist.
>
> That is based on:
> $ realpath /sys/block/pmem0
> /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus1/region0/namespace0.0/block/pmem0
>
> $ realpath /sys/dev/block/259:0
> /sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0012:00/ndbus1/region0/namespace0.0/block/pmem0
>
> The impact looks limited to the print and causing it to not
> automatically disable "metadata duplication on a single device."
This is an issue inherent in the current pmem driver however, it should 
be fixed there and not in mkfs.btrfs, as other filesystems make 
decisions based on this file also, as does the I/O scheduler, and some 
block storage servers.  This gets tricky though because pmem isn't 
technically a block device at the low level, and doesn't use some parts 
of the block layer that most other block devices do.

On that note however, if the pmem device is backed by actual RAM and not 
flash storage (and most of them are from what I've seen), then the only 
advantage of using single metadata mode over dup is space savings, as 
RAM is not (usually) write limited.


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
  2015-09-08 12:56 ` Austin S Hemmelgarn
@ 2015-09-08 20:00   ` Elliott, Robert (Persistent Memory)
  2015-09-09 11:28     ` Austin S Hemmelgarn
  0 siblings, 1 reply; 7+ messages in thread
From: Elliott, Robert (Persistent Memory) @ 2015-09-08 20:00 UTC (permalink / raw)
  To: Austin S Hemmelgarn, dan.j.williams@intel.com
  Cc: linux-nvdimm@lists.01.org, linux-btrfs@vger.kernel.org,
	util-linux@vger.kernel.org, Kani, Toshimitsu, Knippers, Linda


> -----Original Message-----
> From: Austin S Hemmelgarn [mailto:ahferroin7@gmail.com]
> Sent: Tuesday, September 8, 2015 7:56 AM
> Subject: Re: mkfs.btrfs cannot find rotational file for SSD detection for
> a pmem device
> 
> On 2015-09-06 13:51, Elliott, Robert (Persistent Memory) wrote:
...
> > The impact looks limited to the print and causing it to not
> > automatically disable "metadata duplication on a single device."
> This is an issue inherent in the current pmem driver however, it should
> be fixed there and not in mkfs.btrfs, as other filesystems make
> decisions based on this file also, as does the I/O scheduler, and some
> block storage servers.  
> ...

The rotational file does exist, at:
/sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/ACPI0012\:00/ndbus1/region0/namespace0.0/block/pmem0/queue/rotational

One or more functions are having trouble parsing that 108-byte string
... mkfs.btrfs's is_ssd, libblkid's blkid_devno_to_wholedisk, or
libblkid's sysfs_devno_to_wholedisk.  I'm not sure where the
breakdown occurs.

This is reminiscent of an issue that numactl has parsing the path to
get to .../device/numa_node (rather than .../queue/rotational).  It
was confused by not finding "/devices/pci" in a path for a storage
device.

> This gets tricky though because pmem isn't
> technically a block device at the low level, and doesn't use some parts
> of the block layer that most other block devices do.
> 
> On that note however, if the pmem device is backed by actual RAM and not
> flash storage (and most of them are from what I've seen), then the only
> advantage of using single metadata mode over dup is space savings, as
> RAM is not (usually) write limited.

pmem devices will be a mix ranging from flash-backed DRAM to new
technologies like 3D Crosspoint, usually offering high performance
and good wearout characteristics.

The btrfs driver does detect it as SSD after mkfs.btrfs did not:
kernel: BTRFS info (device pmem0): disk space caching is enabled
kernel: BTRFS: has skinny extents
kernel: BTRFS: flagging fs with big metadata feature
kernel: BTRFS: detected SSD devices, enabling SSD mode


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
  2015-09-08 20:00   ` Elliott, Robert (Persistent Memory)
@ 2015-09-09 11:28     ` Austin S Hemmelgarn
  2015-09-09 12:12       ` Boaz Harrosh
  0 siblings, 1 reply; 7+ messages in thread
From: Austin S Hemmelgarn @ 2015-09-09 11:28 UTC (permalink / raw)
  To: Elliott, Robert (Persistent Memory), dan.j.williams@intel.com
  Cc: linux-nvdimm@lists.01.org, linux-btrfs@vger.kernel.org,
	util-linux@vger.kernel.org, Kani, Toshimitsu, Knippers, Linda

[-- Attachment #1: Type: text/plain, Size: 3428 bytes --]

On 2015-09-08 16:00, Elliott, Robert (Persistent Memory) wrote:
>
>> -----Original Message-----
>> From: Austin S Hemmelgarn [mailto:ahferroin7@gmail.com]
>> Sent: Tuesday, September 8, 2015 7:56 AM
>> Subject: Re: mkfs.btrfs cannot find rotational file for SSD detection for
>> a pmem device
>>
>> On 2015-09-06 13:51, Elliott, Robert (Persistent Memory) wrote:
> ...
>>> The impact looks limited to the print and causing it to not
>>> automatically disable "metadata duplication on a single device."
>> This is an issue inherent in the current pmem driver however, it should
>> be fixed there and not in mkfs.btrfs, as other filesystems make
>> decisions based on this file also, as does the I/O scheduler, and some
>> block storage servers.
>> ...
>
> The rotational file does exist, at:
> /sys/devices/LNXSYSTM\:00/LNXSYBUS\:00/ACPI0012\:00/ndbus1/region0/namespace0.0/block/pmem0/queue/rotational
>
> One or more functions are having trouble parsing that 108-byte string
> ... mkfs.btrfs's is_ssd, libblkid's blkid_devno_to_wholedisk, or
> libblkid's sysfs_devno_to_wholedisk.  I'm not sure where the
> breakdown occurs.
Ah, sorry about the confusion, I didn't think to actually look before 
commenting.  So, it looks like this amounts to some short-sighted 
coding, although I could see why they wouldn't have accounted for the 
possibility of having to parse some monstrous path like that, and that 
also would explain why kernel side stuff isn't choking on it.  Now, the 
real question is why we have to go through the full absolute path in 
sysfs, and can't just go through /sys/block/pmem0.
>
> This is reminiscent of an issue that numactl has parsing the path to
> get to .../device/numa_node (rather than .../queue/rotational).  It
> was confused by not finding "/devices/pci" in a path for a storage
> device.
>
>> This gets tricky though because pmem isn't
>> technically a block device at the low level, and doesn't use some parts
>> of the block layer that most other block devices do.
>>
>> On that note however, if the pmem device is backed by actual RAM and not
>> flash storage (and most of them are from what I've seen), then the only
>> advantage of using single metadata mode over dup is space savings, as
>> RAM is not (usually) write limited.
>
> pmem devices will be a mix ranging from flash-backed DRAM to new
> technologies like 3D Crosspoint, usually offering high performance
> and good wearout characteristics.
Hmm, I've never actually seen flash-backed DRAM based NV-DIMM's, 
although I've not necessarily been keeping up to date.  Most of what 
I've seen have been small (512M or 1G) ferro-electric RAM based ones, 
and an early design that was battery backed (which is just a crisis 
waiting to happen).
>
> The btrfs driver does detect it as SSD after mkfs.btrfs did not:
> kernel: BTRFS info (device pmem0): disk space caching is enabled
> kernel: BTRFS: has skinny extents
> kernel: BTRFS: flagging fs with big metadata feature
> kernel: BTRFS: detected SSD devices, enabling SSD mode
>
That makes sense if it's an issue in userspace with parsing of the path, 
although depending on the actual underlying storage for the pmem device, 
this may actually make things slower (the particular effect of SSD mode 
is that it tries to spread allocations out as much as possible, as this 
helps with wear-leveling on many SSD's).


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
  2015-09-09 11:28     ` Austin S Hemmelgarn
@ 2015-09-09 12:12       ` Boaz Harrosh
  2015-09-09 12:40         ` Austin S Hemmelgarn
  0 siblings, 1 reply; 7+ messages in thread
From: Boaz Harrosh @ 2015-09-09 12:12 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Elliott, Robert (Persistent Memory),
	dan.j.williams@intel.com
  Cc: Knippers, Linda, util-linux@vger.kernel.org, Kani, Toshimitsu,
	linux-btrfs@vger.kernel.org, linux-nvdimm@lists.01.org

On 09/09/2015 02:28 PM, Austin S Hemmelgarn wrote:
> On 2015-09-08 16:00, Elliott, Robert (Persistent Memory) wrote:
<>

> this may actually make things slower (the particular effect of SSD mode 
> is that it tries to spread allocations out as much as possible, as this 
> helps with wear-leveling on many SSD's).
> 

For DRAM based NvDIMM it matters not at all. For Flash based or the new
3d Xpoint it is a plus, so no harm in leaving it in

Just my 1.7 cents
Boaz


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
  2015-09-09 12:12       ` Boaz Harrosh
@ 2015-09-09 12:40         ` Austin S Hemmelgarn
  2015-09-09 13:07           ` Boaz Harrosh
  0 siblings, 1 reply; 7+ messages in thread
From: Austin S Hemmelgarn @ 2015-09-09 12:40 UTC (permalink / raw)
  To: Boaz Harrosh, Elliott, Robert (Persistent Memory),
	dan.j.williams@intel.com
  Cc: Knippers, Linda, util-linux@vger.kernel.org, Kani, Toshimitsu,
	linux-btrfs@vger.kernel.org, linux-nvdimm@lists.01.org

[-- Attachment #1: Type: text/plain, Size: 940 bytes --]

On 2015-09-09 08:12, Boaz Harrosh wrote:
> On 09/09/2015 02:28 PM, Austin S Hemmelgarn wrote:
>> On 2015-09-08 16:00, Elliott, Robert (Persistent Memory) wrote:
> <>
>
>> this may actually make things slower (the particular effect of SSD mode
>> is that it tries to spread allocations out as much as possible, as this
>> helps with wear-leveling on many SSD's).
>>
>
> For DRAM based NvDIMM it matters not at all. For Flash based or the new
> 3d Xpoint it is a plus, so no harm in leaving it in
>
Looking at it from another perspective however, a lot of modern RAM 
modules will stripe the bits across multiple chips to improve 
performance.  In such a situation, BTRFS making the effort to spread out 
the allocation as much as possible may have an impact because that 
allocation path is slower than the regular one (not by much, but even a 
few microseconds can make a difference when it is getting called a lot).



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3019 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: mkfs.btrfs cannot find rotational file for SSD detection for a pmem device
  2015-09-09 12:40         ` Austin S Hemmelgarn
@ 2015-09-09 13:07           ` Boaz Harrosh
  0 siblings, 0 replies; 7+ messages in thread
From: Boaz Harrosh @ 2015-09-09 13:07 UTC (permalink / raw)
  To: Austin S Hemmelgarn, Elliott, Robert (Persistent Memory),
	dan.j.williams@intel.com
  Cc: Knippers, Linda, util-linux@vger.kernel.org, Kani, Toshimitsu,
	linux-btrfs@vger.kernel.org, linux-nvdimm@lists.01.org

On 09/09/2015 03:40 PM, Austin S Hemmelgarn wrote:
> On 2015-09-09 08:12, Boaz Harrosh wrote:
>> On 09/09/2015 02:28 PM, Austin S Hemmelgarn wrote:
>>> On 2015-09-08 16:00, Elliott, Robert (Persistent Memory) wrote:
>> <>
>>
>>> this may actually make things slower (the particular effect of SSD mode
>>> is that it tries to spread allocations out as much as possible, as this
>>> helps with wear-leveling on many SSD's).
>>>
>>
>> For DRAM based NvDIMM it matters not at all. For Flash based or the new
>> 3d Xpoint it is a plus, so no harm in leaving it in
>>
> Looking at it from another perspective however, a lot of modern RAM 
> modules will stripe the bits across multiple chips to improve 
> performance.  In such a situation, BTRFS making the effort to spread out 
> the allocation as much as possible may have an impact because that 
> allocation path is slower than the regular one (not by much, but even a 
> few microseconds can make a difference when it is getting called a lot).
> 
> 

It is pointless to argue about this, but the allocations are 4k aligned
any which way, which means you are right at the beginning of the striping,
the RAM striping is a cacheline (64 bytes) granularity.

I think you will never find a single micro benchmark that will ever produce
a difference.

Cheers
Boaz


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-09 13:07 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-09-06 17:51 mkfs.btrfs cannot find rotational file for SSD detection for a pmem device Elliott, Robert (Persistent Memory)
2015-09-08 12:56 ` Austin S Hemmelgarn
2015-09-08 20:00   ` Elliott, Robert (Persistent Memory)
2015-09-09 11:28     ` Austin S Hemmelgarn
2015-09-09 12:12       ` Boaz Harrosh
2015-09-09 12:40         ` Austin S Hemmelgarn
2015-09-09 13:07           ` Boaz Harrosh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).