linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Inconsistent use of sectors vs 1k-blocks in sysfs and other places
@ 2016-12-17  5:14 John Brooks
  2016-12-18 22:32 ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: John Brooks @ 2016-12-17  5:14 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

I noticed that the md.rst (formerly md.txt) file in Documentation/ says that
the component_size attribute in sysfs is measured in sectors. What the
attribute actually returns is sectors/2 (1K blocks, perhaps?). This is the
relevant code from md.c:

static ssize_t
size_show(struct mddev *mddev, char *page)
{
        return sprintf(page, "%llu\n",
                (unsigned long long)mddev->dev_sectors / 2);
}

So the documentation doesn't match the code. Obviously, that needs fixing. But
in this case, I'm not sure which one is "wrong". mdadm's get_component_size()
function in sysfs.c reads this file and multiplies the result by 2 to get
sectors. So clearly this is a known behaviour and not just a forgotten typo.

Looking further, I found that this seems to be a point of inconsistency in
multiple areas:

- The per-device "offset" attribute is in sectors (as documented).
  The per-device "size" attribute is in 1K blocks (which the documentation
  doesn't specify).

- The "sync_completed" attribute uses sectors.
  The "mdstat" file in procfs uses 1K blocks.

- mdadm --examine shows Avail Dev Size in sectors.
  mdadm --detail shows Used Dev Size in 1K blocks.
  And they both show Array Size in 1K blocks.

I suspect that the sysfs attributes have stayed the way they are in the
interest of not breaking programs that use them. The easiest solution would
probably be to leave the behaviour as-is but update the documentation so it's
clear what units are used where.

Somewhat related, suspend_{lo|hi}, resync_{min|max} attributes specify ranges
in sectors, but the documentation does not specify if they are ranges on the
array size or device size. And RAID10 may even handle resync_max differently
from the rest; I didn't look deeply into that but see commit c805cdecea.

The reason I found out about the component_size issue is that I was trying to
make use of the min/max attributes in a project I'm working on, found that they
wanted device-based sectors, and then found that none of the sysfs attributes
actually tell you the device size in sectors (or the array size for that
matter, by the way).

I'm interested to hear what the developers think. I didn't do a thorough audit
to find all the different places that the different units are used; I just
pointed out a few that I noticed while working with it. So I think it's a good
discussion to bring up with the people familiar with the code.

P.S. First mailing list post :)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inconsistent use of sectors vs 1k-blocks in sysfs and other places
  2016-12-17  5:14 Inconsistent use of sectors vs 1k-blocks in sysfs and other places John Brooks
@ 2016-12-18 22:32 ` NeilBrown
  2016-12-20 22:14   ` John Brooks
  0 siblings, 1 reply; 4+ messages in thread
From: NeilBrown @ 2016-12-18 22:32 UTC (permalink / raw)
  To: John Brooks, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3594 bytes --]

On Sat, Dec 17 2016, John Brooks wrote:

> I noticed that the md.rst (formerly md.txt) file in Documentation/ says that
> the component_size attribute in sysfs is measured in sectors. What the
> attribute actually returns is sectors/2 (1K blocks, perhaps?). This is the
> relevant code from md.c:
>
> static ssize_t
> size_show(struct mddev *mddev, char *page)
> {
>         return sprintf(page, "%llu\n",
>                 (unsigned long long)mddev->dev_sectors / 2);
> }
>
> So the documentation doesn't match the code. Obviously, that needs fixing. But
> in this case, I'm not sure which one is "wrong". mdadm's get_component_size()
> function in sysfs.c reads this file and multiplies the result by 2 to get
> sectors. So clearly this is a known behaviour and not just a forgotten typo.

The code is right by definition.  As you say, mdadm uses this interface
so we cannot change it.

>
> Looking further, I found that this seems to be a point of inconsistency in
> multiple areas:

Yes.  Sad isn't it?
The original ioctl interface used 'K' so I copied that when I created
the sysfs interface.  After a while I realized that 'sectors' made a lot
more sense, so I change to that for subsequent additions.


>
> - The per-device "offset" attribute is in sectors (as documented).
>   The per-device "size" attribute is in 1K blocks (which the documentation
>   doesn't specify).
>
> - The "sync_completed" attribute uses sectors.
>   The "mdstat" file in procfs uses 1K blocks.
>
> - mdadm --examine shows Avail Dev Size in sectors.
>   mdadm --detail shows Used Dev Size in 1K blocks.
>   And they both show Array Size in 1K blocks.
>
> I suspect that the sysfs attributes have stayed the way they are in the
> interest of not breaking programs that use them. The easiest solution would
> probably be to leave the behaviour as-is but update the documentation so it's
> clear what units are used where.

Yes.  Not just "easiest", but "only acceptable".

>
> Somewhat related, suspend_{lo|hi}, resync_{min|max} attributes specify ranges
> in sectors, but the documentation does not specify if they are ranges on the
> array size or device size. And RAID10 may even handle resync_max differently
> from the rest; I didn't look deeply into that but see commit c805cdecea.

suspend_{lo|hi} are are array addresses
resync_{min|max} are array addresses for RAID1 and RAID10, and they are
device-addresses-offset-from-data_offset for RAID1 and RAID456.

>
> The reason I found out about the component_size issue is that I was trying to
> make use of the min/max attributes in a project I'm working on, found that they
> wanted device-based sectors, and then found that none of the sysfs attributes
> actually tell you the device size in sectors (or the array size for that
> matter, by the way).
>
> I'm interested to hear what the developers think. I didn't do a thorough audit
> to find all the different places that the different units are used; I just
> pointed out a few that I noticed while working with it. So I think it's a good
> discussion to bring up with the people familiar with the code.
>

I agree that it is sad that units aren't consistent, but there is not
much we can do about that.
It is bad that the documentation is incorrect and incomplete.  If you we
to post a patch which fixed some of it, I'm sure that would be
thankfully accepted.

In theory you could have a RAID1 with an odd number of sectors used in
each component, but I doubt that happens in practice.  So you can get
the component size in sectors by reading component_size and multiplying
by two.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inconsistent use of sectors vs 1k-blocks in sysfs and other places
  2016-12-18 22:32 ` NeilBrown
@ 2016-12-20 22:14   ` John Brooks
  2016-12-21  0:06     ` NeilBrown
  0 siblings, 1 reply; 4+ messages in thread
From: John Brooks @ 2016-12-20 22:14 UTC (permalink / raw)
  To: NeilBrown, linux-raid

On Mon, Dec 19, 2016 at 09:32:00AM +1100, NeilBrown wrote:
> The code is right by definition.  As you say, mdadm uses this interface
> so we cannot change it.

Yep, I expected as much and fully agree.

> >
> > Somewhat related, suspend_{lo|hi}, resync_{min|max} attributes specify ranges
> > in sectors, but the documentation does not specify if they are ranges on the
> > array size or device size. And RAID10 may even handle resync_max differently
> > from the rest; I didn't look deeply into that but see commit c805cdecea.
>
> suspend_{lo|hi} are are array addresses
> resync_{min|max} are array addresses for RAID1 and RAID10, and they are
> device-addresses-offset-from-data_offset for RAID1 and RAID456.

Good to know. One question, is it possible for the offset or size to be
different across disks in the same array? The documentation says that the size
is "normally" the same as component_size, implying that perhaps it could be
different.

> It is bad that the documentation is incorrect and incomplete.  If you we
> to post a patch which fixed some of it, I'm sure that would be
> thankfully accepted.

I will see what I can do.

> In theory you could have a RAID1 with an odd number of sectors used in
> each component, but I doubt that happens in practice.  So you can get
> the component size in sectors by reading component_size and multiplying
> by two.

Is that within a realm of possibility worth mentioning in the docs?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Inconsistent use of sectors vs 1k-blocks in sysfs and other places
  2016-12-20 22:14   ` John Brooks
@ 2016-12-21  0:06     ` NeilBrown
  0 siblings, 0 replies; 4+ messages in thread
From: NeilBrown @ 2016-12-21  0:06 UTC (permalink / raw)
  To: John Brooks, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2241 bytes --]

On Wed, Dec 21 2016, John Brooks wrote:

> On Mon, Dec 19, 2016 at 09:32:00AM +1100, NeilBrown wrote:
>> The code is right by definition.  As you say, mdadm uses this interface
>> so we cannot change it.
>
> Yep, I expected as much and fully agree.
>
>> >
>> > Somewhat related, suspend_{lo|hi}, resync_{min|max} attributes specify ranges
>> > in sectors, but the documentation does not specify if they are ranges on the
>> > array size or device size. And RAID10 may even handle resync_max differently
>> > from the rest; I didn't look deeply into that but see commit c805cdecea.
>>
>> suspend_{lo|hi} are are array addresses
>> resync_{min|max} are array addresses for RAID1 and RAID10, and they are
>> device-addresses-offset-from-data_offset for RAID1 and RAID456.
>
> Good to know. One question, is it possible for the offset or size to be
> different across disks in the same array? The documentation says that the size
> is "normally" the same as component_size, implying that perhaps it could be
> different.

raid0 and linear are special.
For linear, it is obvious that differently sized devices can be
combined.
For raid0, it is not quite so obvious.  The array is divided into
region.  The first region is striped over all devices.  Then next region
is striped over all devices that are bigger than the smallest, etc.

The offset can be different on all devices.  The size can be only for
raid0 and linear.

>
>> It is bad that the documentation is incorrect and incomplete.  If you we
>> to post a patch which fixed some of it, I'm sure that would be
>> thankfully accepted.
>
> I will see what I can do.
>
>> In theory you could have a RAID1 with an odd number of sectors used in
>> each component, but I doubt that happens in practice.  So you can get
>> the component size in sectors by reading component_size and multiplying
>> by two.
>
> Is that within a realm of possibility worth mentioning in the docs?

Probably not.  I'd have to check the code to see what it actually
allows.
If the kernel allows odd-sized RAID1, we would have to leave it that
way.  But I'd support changing mdadm (if needed) to refuse to create
a RAID1 with an odd number of sectors (or even a size that is not a
multiple of 4K).

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-12-21  0:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-12-17  5:14 Inconsistent use of sectors vs 1k-blocks in sysfs and other places John Brooks
2016-12-18 22:32 ` NeilBrown
2016-12-20 22:14   ` John Brooks
2016-12-21  0:06     ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).