linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Unable to reduce raid size.
@ 2014-07-18  8:50 Killian De Volder
  2014-07-18  9:40 ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Killian De Volder @ 2014-07-18  8:50 UTC (permalink / raw)
  To: linux-raid

Hello,

I have a strange issue, I cannot reduce the size of a degraded raid 5:

strace mdadm -vv --grow /dev/md125 --size=2778726400

Fails with:
Stace:
    open("/sys/block/md125/md/dev-sdb4/size", O_WRONLY) = 4
    write(4, "2778726400", 10)              = -1 EBUSY (Device or resource busy)
    close(4)
Stdout:
    component size of /dev/md125 unchanged at 2858285568K
Stderr:
    <nothing>


Any suggestions ?
Note: I can work around this bug, by moving partitions around a bit, and not requiring the size reduction.
However I suspect a bug/undocumented border case that should be resolved?


Things I tried:
---------------
- Disable bcache udev rules (as it was appearing in each attempt, maybe it got triggered during resize, seems not to be the case).
- I have tried the same with loop files -> this works fine even with bcache (and the udev rules disabled)
- Removed the internal write bitmap
- Opend the file os.open("md125",os.O_EXCL,os.O_RDWR) in Python to test if it's in use somewhere (command worked fine)
- Set array-size to the less then the desired new size (while still not destroying the FS below it).

 
Information:
------------
Kernel version: 3.15.5
mdadm tools: 3.3-r2

mdadm --detail:
/dev/md125:
        Version : 1.2
  Creation Time : Wed Apr 16 20:58:09 2014
     Raid Level : raid5
     Array Size : 8283750400 (7900.00 GiB 8482.56 GB)
  Used Dev Size : 2858285568 (2725.87 GiB 2926.88 GB)
   Raid Devices : 4
  Total Devices : 3
    Persistence : Superblock is persistent

    Update Time : Fri Jul 18 09:53:08 2014
          State : clean, degraded
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 512K

           UUID : 885c588b:c3503d9d:c67b86db:2887f8f7
         Events : 6440

    Number   Major   Minor   RaidDevice State
       0       8       36        0      active sync   /dev/sdc4
       2       0        0        2      removed
       2       8       20        2      active sync   /dev/sdb4
       4       8       52        3      active sync   /dev/sdd4

-- 
Killian De Volder
Megasoft bvba
killian.de.volder@megasoft.be


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to reduce raid size.
  2014-07-18  8:50 Unable to reduce raid size Killian De Volder
@ 2014-07-18  9:40 ` NeilBrown
  2014-07-18  9:58   ` Killian De Volder
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2014-07-18  9:40 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]

On Fri, 18 Jul 2014 10:50:54 +0200 Killian De Volder
<killian.de.volder@megasoft.be> wrote:

> Hello,
> 
> I have a strange issue, I cannot reduce the size of a degraded raid 5:
> 
> strace mdadm -vv --grow /dev/md125 --size=2778726400
> 
> Fails with:
> Stace:
>     open("/sys/block/md125/md/dev-sdb4/size", O_WRONLY) = 4
>     write(4, "2778726400", 10)              = -1 EBUSY (Device or resource busy)
>     close(4)

This condition isn't treated as an error by mdadm, so it isn't the cause.

Could you post the entire strace (really, bytes a cheap, alway provide more
detail than you think is needed.... though you did provide quite a bit)

Any kernel messages (dmesg output) ??

NeilBrown


> Stdout:
>     component size of /dev/md125 unchanged at 2858285568K
> Stderr:
>     <nothing>
> 
> 
> Any suggestions ?
> Note: I can work around this bug, by moving partitions around a bit, and not requiring the size reduction.
> However I suspect a bug/undocumented border case that should be resolved?
> 
> 
> Things I tried:
> ---------------
> - Disable bcache udev rules (as it was appearing in each attempt, maybe it got triggered during resize, seems not to be the case).
> - I have tried the same with loop files -> this works fine even with bcache (and the udev rules disabled)
> - Removed the internal write bitmap
> - Opend the file os.open("md125",os.O_EXCL,os.O_RDWR) in Python to test if it's in use somewhere (command worked fine)
> - Set array-size to the less then the desired new size (while still not destroying the FS below it).
> 
>  
> Information:
> ------------
> Kernel version: 3.15.5
> mdadm tools: 3.3-r2
> 
> mdadm --detail:
> /dev/md125:
>         Version : 1.2
>   Creation Time : Wed Apr 16 20:58:09 2014
>      Raid Level : raid5
>      Array Size : 8283750400 (7900.00 GiB 8482.56 GB)
>   Used Dev Size : 2858285568 (2725.87 GiB 2926.88 GB)
>    Raid Devices : 4
>   Total Devices : 3
>     Persistence : Superblock is persistent
> 
>     Update Time : Fri Jul 18 09:53:08 2014
>           State : clean, degraded
>  Active Devices : 3
> Working Devices : 3
>  Failed Devices : 0
>   Spare Devices : 0
> 
>          Layout : left-symmetric
>      Chunk Size : 512K
> 
>            UUID : 885c588b:c3503d9d:c67b86db:2887f8f7
>          Events : 6440
> 
>     Number   Major   Minor   RaidDevice State
>        0       8       36        0      active sync   /dev/sdc4
>        2       0        0        2      removed
>        2       8       20        2      active sync   /dev/sdb4
>        4       8       52        3      active sync   /dev/sdd4
> 


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to reduce raid size.
  2014-07-18  9:40 ` NeilBrown
@ 2014-07-18  9:58   ` Killian De Volder
  2014-07-18 10:48     ` NeilBrown
  0 siblings, 1 reply; 5+ messages in thread
From: Killian De Volder @ 2014-07-18  9:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Bytes are cheap, but screens are small (you'll have to scroll more).

"This condition isn't treated as an error by mdadm, so it isn't the cause."
This is not an error, but if the size isn't changed, the end result will be
component size of /dev/md125 unchanged at 2858285568K (skimmed the source code of mdadm, might have gotten it wrong though)

Full Strace below

execve("/sbin/mdadm", ["mdadm", "-vv", "--grow", "/dev/md125", "--size=2778726400"], [/* 99 vars */]) = 0
uname({sys="Linux", node="*****", ...}) = 0
brk(0)                                  = 0xf43000
brk(0xf441c0)                           = 0xf441c0
arch_prctl(ARCH_SET_FS, 0xf43880)       = 0
brk(0xf651c0)                           = 0xf651c0
brk(0xf66000)                           = 0xf66000
time(NULL)                              = 1405669998
getpid()                                = 20073
open("/dev/md125", O_RDWR)              = 3
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 125), ...}) = 0
ioctl(3, RAID_VERSION, 0x7fffd69526c0)  = 0
open("/etc/mdadm.conf", O_RDONLY)       = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4711ffe000
read(4, "ARRAY /dev/md/main metadata=1.2 "..., 4096) = 286
read(4, "", 4096)                       = 0
read(4, "", 4096)                       = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0
close(4)                                = 0
munmap(0x7f4711ffe000, 4096)            = 0
open("/etc/mdadm.conf.d", O_RDONLY)     = -1 ENOENT (No such file or directory)
uname({sys="Linux", node="qantourisc", ...}) = 0
geteuid()                               = 0
ioctl(3, GET_ARRAY_INFO, 0x7fffd6952570) = 0
ioctl(3, RAID_VERSION, 0x7fffd69502f0)  = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 125), ...}) = 0
readlink("/sys/dev/block/9:125", "../../devices/virtual/block/md12"..., 199) = 33
open("/sys/block/md125/md/metadata_version", O_RDONLY) = 4
read(4, "1.2\n", 1024)                  = 4
close(4)                                = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 125), ...}) = 0
readlink("/sys/dev/block/9:125", "../../devices/virtual/block/md12"..., 199) = 33
ioctl(3, RAID_VERSION, 0x7fffd69503d0)  = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 125), ...}) = 0
readlink("/sys/dev/block/9:125", "../../devices/virtual/block/md12"..., 199) = 33
open("/sys/block/md125/md/metadata_version", O_RDONLY) = 4
read(4, "1.2\n", 1024)                  = 4
close(4)                                = 0
open("/sys/block/md125/md/level", O_RDONLY) = 4
read(4, "raid5\n", 1024)                = 6
close(4)                                = 0
open("/sys/block/md125/md/raid_disks", O_RDONLY) = 4
read(4, "4\n", 1024)                    = 2
close(4)                                = 0
openat(AT_FDCWD, "/sys/block/md125/md/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 4
fcntl(4, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
getdents(4, /* 41 entries */, 32768)    = 1408
open("/sys/block/md125/md/dev-sdb4/slot", O_RDONLY) = 5
read(5, "2\n", 1024)                    = 2
close(5)                                = 0
open("/sys/block/md125/md/dev-sdb4/block/dev", O_RDONLY) = 5
read(5, "8:20\n", 1024)                 = 5
close(5)                                = 0
open("/sys/block/md125/md/dev-sdb4/block/device/state", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sys/block/md125/md/dev-sdb4/state", O_RDONLY) = 5
read(5, "in_sync\n", 1024)              = 8
close(5)                                = 0
open("/sys/block/md125/md/dev-sdc4/slot", O_RDONLY) = 5
read(5, "0\n", 1024)                    = 2
close(5)                                = 0
open("/sys/block/md125/md/dev-sdc4/block/dev", O_RDONLY) = 5
read(5, "8:36\n", 1024)                 = 5
close(5)                                = 0
open("/sys/block/md125/md/dev-sdc4/block/device/state", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sys/block/md125/md/dev-sdc4/state", O_RDONLY) = 5
read(5, "in_sync\n", 1024)              = 8
close(5)                                = 0
open("/sys/block/md125/md/dev-sdd4/slot", O_RDONLY) = 5
read(5, "3\n", 1024)                    = 2
close(5)                                = 0
open("/sys/block/md125/md/dev-sdd4/block/dev", O_RDONLY) = 5
read(5, "8:52\n", 1024)                 = 5
close(5)                                = 0
open("/sys/block/md125/md/dev-sdd4/block/device/state", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sys/block/md125/md/dev-sdd4/state", O_RDONLY) = 5
read(5, "in_sync\n", 1024)              = 8
close(5)                                = 0
getdents(4, /* 0 entries */, 32768)     = 0
close(4)                                = 0
open("/sys/block/md125/md/metadata_version", O_RDONLY) = 4
read(4, "1.2\n", 1024)                  = 4
close(4)                                = 0
open("/sys/block/md125/md//array_state", O_RDWR) = 4
lseek(4, 0, SEEK_SET)                   = 0
read(4, "clean\n", 20)                  = 6
close(4)                                = 0
stat("/sys/block/md125/md//sync_action", {st_mode=S_IFREG|0644, st_size=4096, ...}) = 0
open("/sys/block/md125/md//sync_action", O_RDWR) = 4
lseek(4, 0, SEEK_SET)                   = 0
read(4, "idle\n", 20)                   = 5
close(4)                                = 0
open("/sys/block/md125/md//sync_action", O_WRONLY) = 4
write(4, "frozen", 6)                   = 6
close(4)                                = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 125), ...}) = 0
open("/proc/devices", O_RDONLY)         = 4
fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4711ffe000
read(4, "Character devices:\n  1 mem\n  2 p"..., 1024) = 740
read(4, "", 1024)                       = 0
close(4)                                = 0
munmap(0x7f4711ffe000, 4096)            = 0
open("/sys/block/md125/md/component_size", O_RDONLY) = 4
read(4, "2858285568\n", 50)             = 11
close(4)                                = 0
open("/sys/block/md125/md/dev-sdb4/size", O_WRONLY) = 4
write(4, "2778726400", 10)              = -1 EBUSY (Device or resource busy)
close(4)                                = 0
ioctl(3, SET_ARRAY_INFO, 0x7fffd6952570) = 0
ioctl(3, GET_ARRAY_INFO, 0x7fffd6952570) = 0
fstat(3, {st_mode=S_IFBLK|0660, st_rdev=makedev(9, 125), ...}) = 0
open("/sys/block/md125/md/component_size", O_RDONLY) = 4
read(4, "2858285568\n", 50)             = 11
close(4)                                = 0
write(2, "mdadm: component size of /dev/md"..., 61mdadm: component size of /dev/md125 unchanged at 2858285568K
) = 61
open("/sys/block/md125/md/metadata_version", O_RDONLY) = 4
read(4, "1.2\n", 1024)                  = 4
close(4)                                = 0
open("/sys/block/md125/md//sync_action", O_WRONLY) = 4
write(4, "idle", 4)                     = 4
close(4)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Killian De Volder

On 18-07-14 11:40, NeilBrown wrote:
> On Fri, 18 Jul 2014 10:50:54 +0200 Killian De Volder
> <killian.de.volder@megasoft.be> wrote:
>
>> Hello,
>>
>> I have a strange issue, I cannot reduce the size of a degraded raid 5:
>>
>> strace mdadm -vv --grow /dev/md125 --size=2778726400
>>
>> Fails with:
>> Stace:
>>     open("/sys/block/md125/md/dev-sdb4/size", O_WRONLY) = 4
>>     write(4, "2778726400", 10)              = -1 EBUSY (Device or resource busy)
>>     close(4)
> This condition isn't treated as an error by mdadm, so it isn't the cause.
>
> Could you post the entire strace (really, bytes a cheap, alway provide more
> detail than you think is needed.... though you did provide quite a bit)
>
> Any kernel messages (dmesg output) ??
>
> NeilBrown
>
>
>> Stdout:
>>     component size of /dev/md125 unchanged at 2858285568K
>> Stderr:
>>     <nothing>
>>
>>
>> Any suggestions ?
>> Note: I can work around this bug, by moving partitions around a bit, and not requiring the size reduction.
>> However I suspect a bug/undocumented border case that should be resolved?
>>
>>
>> Things I tried:
>> ---------------
>> - Disable bcache udev rules (as it was appearing in each attempt, maybe it got triggered during resize, seems not to be the case).
>> - I have tried the same with loop files -> this works fine even with bcache (and the udev rules disabled)
>> - Removed the internal write bitmap
>> - Opend the file os.open("md125",os.O_EXCL,os.O_RDWR) in Python to test if it's in use somewhere (command worked fine)
>> - Set array-size to the less then the desired new size (while still not destroying the FS below it).
>>
>>  
>> Information:
>> ------------
>> Kernel version: 3.15.5
>> mdadm tools: 3.3-r2
>>
>> mdadm --detail:
>> /dev/md125:
>>         Version : 1.2
>>   Creation Time : Wed Apr 16 20:58:09 2014
>>      Raid Level : raid5
>>      Array Size : 8283750400 (7900.00 GiB 8482.56 GB)
>>   Used Dev Size : 2858285568 (2725.87 GiB 2926.88 GB)
>>    Raid Devices : 4
>>   Total Devices : 3
>>     Persistence : Superblock is persistent
>>
>>     Update Time : Fri Jul 18 09:53:08 2014
>>           State : clean, degraded
>>  Active Devices : 3
>> Working Devices : 3
>>  Failed Devices : 0
>>   Spare Devices : 0
>>
>>          Layout : left-symmetric
>>      Chunk Size : 512K
>>
>>            UUID : 885c588b:c3503d9d:c67b86db:2887f8f7
>>          Events : 6440
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8       36        0      active sync   /dev/sdc4
>>        2       0        0        2      removed
>>        2       8       20        2      active sync   /dev/sdb4
>>        4       8       52        3      active sync   /dev/sdd4
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Unable to reduce raid size.
  2014-07-18  9:58   ` Killian De Volder
@ 2014-07-18 10:48     ` NeilBrown
  2014-07-18 11:19       ` Killian De Volder
  0 siblings, 1 reply; 5+ messages in thread
From: NeilBrown @ 2014-07-18 10:48 UTC (permalink / raw)
  To: Killian De Volder; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2030 bytes --]

On Fri, 18 Jul 2014 11:58:25 +0200 Killian De Volder
<killian.de.volder@megasoft.be> wrote:

> Bytes are cheap, but screens are small (you'll have to scroll more).
> 
> "This condition isn't treated as an error by mdadm, so it isn't the cause."
> This is not an error, but if the size isn't changed, the end result will be
> component size of /dev/md125 unchanged at 2858285568K (skimmed the source code of mdadm, might have gotten it wrong though)
> 
> Full Strace below

Thanks.   It doesn't actually contain any surprises, but having seen it I
easily found the bug..... hard to explain.

The "SET_ARRAY_INFO" ioctl can be used to set the 'size' of the array, but
only
if the size fits in a signed int as a positive number.
However mdadm tests if it fits in an *unsigned* int.
So any size between 2^31 and 2^32 K can not effectively be set by mdadm.

I think this patch to mdadm will fix it - can you test?

diff --git a/Grow.c b/Grow.c
index ea9cc60e1f18..af59347ca75e 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1813,7 +1813,7 @@ int Grow_reshape(char *devname, int fd,
 		if (s->size == MAX_SIZE)
 			s->size = 0;
 		array.size = s->size;
-		if ((unsigned)array.size != s->size) {
+		if (array.size != (signed long long)s->size) {
 			/* got truncated to 32bit, write to
 			 * component_size instead
 			 */


The code that is reporting an error is setting the used size of each
individual device.
If you make the devices in an array bigger (typically if they are LVM volumes
and you resize them), then you cannot make the array bigger without first
telling md that the devices have changed size.
So mdadm first tells the kernel that the devices are big enough.  If they
were already that big, the kernel will return EBUSY, and mdadm will ignore it.
If the aren't really that big, the kernel will round down to the real size.

In your case the underlying devices hadn't changed size so mdadm was doing
something unnecessary and got an error which it ignored.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: Unable to reduce raid size.
  2014-07-18 10:48     ` NeilBrown
@ 2014-07-18 11:19       ` Killian De Volder
  0 siblings, 0 replies; 5+ messages in thread
From: Killian De Volder @ 2014-07-18 11:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Seems to have worked for me. Think I understand what went wrong there.

Thank you, the smaller disk has now been added, and it's rebuilding.

Killian De Volder

On 18-07-14 12:48, NeilBrown wrote:
> On Fri, 18 Jul 2014 11:58:25 +0200 Killian De Volder
> <killian.de.volder@megasoft.be> wrote:
>
>> Bytes are cheap, but screens are small (you'll have to scroll more).
>>
>> "This condition isn't treated as an error by mdadm, so it isn't the cause."
>> This is not an error, but if the size isn't changed, the end result will be
>> component size of /dev/md125 unchanged at 2858285568K (skimmed the source code of mdadm, might have gotten it wrong though)
>>
>> Full Strace below
> Thanks.   It doesn't actually contain any surprises, but having seen it I
> easily found the bug..... hard to explain.
>
> The "SET_ARRAY_INFO" ioctl can be used to set the 'size' of the array, but
> only
> if the size fits in a signed int as a positive number.
> However mdadm tests if it fits in an *unsigned* int.
> So any size between 2^31 and 2^32 K can not effectively be set by mdadm.
>
> I think this patch to mdadm will fix it - can you test?
>
> diff --git a/Grow.c b/Grow.c
> index ea9cc60e1f18..af59347ca75e 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -1813,7 +1813,7 @@ int Grow_reshape(char *devname, int fd,
>  		if (s->size == MAX_SIZE)
>  			s->size = 0;
>  		array.size = s->size;
> -		if ((unsigned)array.size != s->size) {
> +		if (array.size != (signed long long)s->size) {
>  			/* got truncated to 32bit, write to
>  			 * component_size instead
>  			 */
>
>
> The code that is reporting an error is setting the used size of each
> individual device.
> If you make the devices in an array bigger (typically if they are LVM volumes
> and you resize them), then you cannot make the array bigger without first
> telling md that the devices have changed size.
> So mdadm first tells the kernel that the devices are big enough.  If they
> were already that big, the kernel will return EBUSY, and mdadm will ignore it.
> If the aren't really that big, the kernel will round down to the real size.
>
> In your case the underlying devices hadn't changed size so mdadm was doing
> something unnecessary and got an error which it ignored.
>
> Thanks,
> NeilBrown
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-07-18 11:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-18  8:50 Unable to reduce raid size Killian De Volder
2014-07-18  9:40 ` NeilBrown
2014-07-18  9:58   ` Killian De Volder
2014-07-18 10:48     ` NeilBrown
2014-07-18 11:19       ` Killian De Volder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).