Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Guoqing Jiang @ 2016-09-21  6:40 UTC (permalink / raw)
  To: Anthony DeRobertis, linux-raid, 837964
In-Reply-To: <20160920171223.n7t3wa673qopky4c@derobert.net>



On 09/20/2016 01:12 PM, Anthony DeRobertis wrote:
>
>> Which kernel version are you used to created the array in case the kernel
>> was updated?
> I've had the array for a while (the superblocks with -E show a creation
> time of Wed Jun 16 14:25:08 2010). If I had to take a guess, I'd guess
> it was created with the Debian squeeze alpha1 installer... So probably
> 2.6.30 or 2.6.32.
>

Hmm, lots of things are changed from 2.6.30, so it is possible that 
latest mdadm
can't work well with array which was created with the old kernel.

Thanks,
Guoqing

^ permalink raw reply

* Re: Enlarging device of linear array again (Thank you Stan!)
From: Adam Goryachev @ 2016-09-21  0:01 UTC (permalink / raw)
  To: Ramon Hofer, Wols Lists; +Cc: linux-raid
In-Reply-To: <20160921010626.31ced172@hoferr-X240.hofer.rummelring>

On 21/09/16 09:06, Ramon Hofer wrote:
> Thank you very much for your answer, Wol!
>
>
> On Tue, 20 Sep 2016 22:52:00 +0100
> Wols Lists <antlists@youngman.org.uk> wrote:
>
>> On 20/09/16 20:34, Ramon Hofer wrote:
>>> I am using 4 TB WD red to replace the 1.5 TB disks.
>>>
>>> Do you think this could work?
>>> Are there any pitfalls?
>>> Should I unmount the array to perform all these steps? It might be
>>> safer since there is no redundancy during the replacement?
>>>
>>> I just wanted to check first before making any mistakes. Instead of
>>> maybe afterwards asking for help recovering my data :-P
>> I don't quite understand what you are doing, but ...
> Sorry I forgot to mention some things.
>
> I have a linear md0 containing four raid5 (md[1234]) to which XFS is
> stripe aligned due to the performance. [1], [2]
>
> The case for my home media server is a Norco [3] with 20 slots. Four are
> dedicated to MythTV.
> 4 x 4 are in md0.
>
> I want to replace the four 1.5 TB disks with 4 TB to have more
> space.
>
> On the four 4 TB disks I create two partitions (1.5 TB and 2.5 TB).
> Each old 1.5 TB disk gets replaced with the partition on the 4 TB
> disks. From the 2.5 TB of each of the four 4TB disks I create a new
> RAID5 (md5) and add this to the linear md0. At the end I expand the
> file system.
Assuming that you are replacing the four disks that are in the RAID5 
array which is at the "end" of your linear raid0, then you don't 
actually need to create two partitions at all. Just replace the four 
drives (dd from source to destination while offline). When complete, (if 
they are partitioned, delete/re-create the single partition to fill the 
drive), grow the RAID5 array to fill the new drives, then grow the md0 
to take the rest of the RAID5 space onto the end of the drive, then grow 
your filesystem/etc.

>> The other way to do it is just shut the machine down, remove all
>> drives except one and put a new big drive in. Boot from a rescue CD,
>> partition the new drive and dd the old partitions to the new drive.
>> Again rinse and repeat for all four drives, and then put all four new
>> drives in the system and reboot. At which point you can expand all the
>> filesystems/raids to use all the space available. This *shouldn't* be
>> a risky operation. Indeed, if you're going to shut the array down,
>> this is probably the simplest and safest option.
> So you suggest to dd the 1.5 TB disks to the 1.5 TB
> partition on the 4 TB disks instead of just let mdadm rebuild the
> array?
Agreed, possibly faster, definitely safer if you value your data.
> Can I just do:
>
> sudo dd if=/dev/sde of=/dev/sdf1
Depends, is the existing sde partitioned? Make extra sure you pair up 
each source/destination drive, etc...
>
> Assuming sde is the old 1.5 TB disk and sdf1 is the 1.5 TB partition on
> the new 4 TB disk.
> Can I be sure that mdadm recognizes the partitions correctly as the
> replacement for the old disks?
mdadm will use the content of the drive/partition to work out what 
belongs in/to what array, so it should be fine. However, its a great 
idea to keep the four old drives aside until you are sure everything is 
working properly (another advantage to avoid replacing one at a time...)

Note that this advice might mess with your "performance tuning", I don't 
know enough about that side of things to comment further.

Regards,
Adam



-- 
Adam Goryachev Website Managers www.websitemanagers.com.au

^ permalink raw reply

* Re: Enlarging device of linear array again (Thank you Stan!)
From: Ramon Hofer @ 2016-09-20 23:06 UTC (permalink / raw)
  To: Wols Lists; +Cc: linux-raid
In-Reply-To: <57E1AF80.8000406@youngman.org.uk>

Thank you very much for your answer, Wol!

On Tue, 20 Sep 2016 22:52:00 +0100
Wols Lists <antlists@youngman.org.uk> wrote:

> On 20/09/16 20:34, Ramon Hofer wrote:
> > I am using 4 TB WD red to replace the 1.5 TB disks.
> > 
> > Do you think this could work?
> > Are there any pitfalls?
> > Should I unmount the array to perform all these steps? It might be
> > safer since there is no redundancy during the replacement?
> > 
> > I just wanted to check first before making any mistakes. Instead of
> > maybe afterwards asking for help recovering my data :-P  
> 
> I don't quite understand what you are doing, but ...

Sorry I forgot to mention some things.

I have a linear md0 containing four raid5 (md[1234]) to which XFS is
stripe aligned due to the performance. [1], [2]

The case for my home media server is a Norco [3] with 20 slots. Four are
dedicated to MythTV.
4 x 4 are in md0.

I want to replace the four 1.5 TB disks with 4 TB to have more
space.

On the four 4 TB disks I create two partitions (1.5 TB and 2.5 TB).
Each old 1.5 TB disk gets replaced with the partition on the 4 TB
disks. From the 2.5 TB of each of the four 4TB disks I create a new
RAID5 (md5) and add this to the linear md0. At the end I expand the
file system. 

> The other way to do it is just shut the machine down, remove all
> drives except one and put a new big drive in. Boot from a rescue CD,
> partition the new drive and dd the old partitions to the new drive.
> Again rinse and repeat for all four drives, and then put all four new
> drives in the system and reboot. At which point you can expand all the
> filesystems/raids to use all the space available. This *shouldn't* be
> a risky operation. Indeed, if you're going to shut the array down,
> this is probably the simplest and safest option.

So you suggest to dd the 1.5 TB disks to the 1.5 TB
partition on the 4 TB disks instead of just let mdadm rebuild the
array?
I was thinking about this as well because the dd process might be
faster because the CPU load would be significantly less than the RAID5
rebuild process. And only one disk is read during the copy process -
instead of all the other three for each of the four rebuilds.

Can I just do:

sudo dd if=/dev/sde of=/dev/sdf1

Assuming sde is the old 1.5 TB disk and sdf1 is the 1.5 TB partition on
the new 4 TB disk.
Can I be sure that mdadm recognizes the partitions correctly as the
replacement for the old disks?

Best regards,
Ramon

[1] http://marc.info/?l=linux-raid&m=137077874619605&w=2
[2] http://marc.info/?l=linux-raid&m=134031891621599&w=2
[3] http://www.norcotek.com/?s=RPC-4020

^ permalink raw reply

* Re: Enlarging device of linear array again (Thank you Stan!)
From: Wols Lists @ 2016-09-20 21:52 UTC (permalink / raw)
  To: Ramon Hofer, linux-raid
In-Reply-To: <20160920213442.1559310f@hoferr-X240.hofer.rummelring>

On 20/09/16 20:34, Ramon Hofer wrote:
> I am using 4 TB WD red to replace the 1.5 TB disks.
> 
> Do you think this could work?
> Are there any pitfalls?
> Should I unmount the array to perform all these steps? It might be
> safer since there is no redundancy during the replacement?
> 
> I just wanted to check first before making any mistakes. Instead of
> maybe afterwards asking for help recovering my data :-P

I don't quite understand what you are doing, but ...

Do you have four existing drives (sde, sdf, sdg, sdh) and you are
planning on replacing them with four new, larger drives? (It comes over
that you are planning to create four partitions on one drive to replace
the four drives, and that just isn't right).

If you are planning to use four new drives, then the obvious thing is to
get either an add-in pci card to give you more SATA connections, or a
USB adaptor (caveat, I've never done anything like this) so you can
temporarily add in an extra drive.

If you've got an add-in PCI card, stick a new 4TB drive in, use "mdadm
--replace" to move one of the old drives across, and remove the old
drive. Rinse and repeat until all four drives are done.

If you got a USB case, put a new drive in the computer, and the old
drive in the case. Again, use "mdadm --replace" to move the old drive
across, and remove the old drive from the case. Rinse and repeat until
all four drives are done. I believe that it's a lot faster reading from
than writing to USB, hence me saying move the small drive to USB then
copy from it.

That way, at no point is your raid at risk while you're replacing a drive.

The other way to do it is just shut the machine down, remove all drives
except one and put a new big drive in. Boot from a rescue CD, partition
the new drive and dd the old partitions to the new drive. Again rinse
and repeat for all four drives, and then put all four new drives in the
system and reboot. At which point you can expand all the
filesystems/raids to use all the space available. This *shouldn't* be a
risky operation. Indeed, if you're going to shut the array down, this is
probably the simplest and safest option.

Cheers,
Wol

^ permalink raw reply

* Enlarging device of linear array again (Thank you Stan!)
From: Ramon Hofer @ 2016-09-20 19:34 UTC (permalink / raw)
  To: linux-raid

Hi all

In 2012 this list - especially Stan Hoeppner - helped me setting up a
linear RAID containing four RAID5. Thank you so much again! It is still
working amazingly well :-D

Now I am at the point where I want to add more storage but have no more
slots. As discussed in 2013 I will replace four discs with partitions
from a larger HDD and add the remaining space to a new RAID5.

md2 should be replaced:

$ sudo mdadm -D /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Sun Jun 17 20:08:48 2012
     Raid Level : raid5
     Array Size : 4395412224 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Tue Sep 20 09:12:16 2016
          State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 128K

           Name : media-server:2  (local to host media-server)
           UUID : 1c74447b:33070712:cfcfa5af:cbfea660
         Events : 2449

    Number   Major   Minor   RaidDevice State
       0       8       64        0      active sync   /dev/sde
       1       8       80        1      active sync   /dev/sdf
       2       8       96        2      active sync   /dev/sdg
       4       8      112        3      active sync   /dev/sdh


$ sudo fdisk -l /dev/sd[efgh]

Disk /dev/sde: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x00097014


Disk /dev/sdf: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0007694c


Disk /dev/sdg: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000da169


Disk /dev/sdh: 1.4 TiB, 1500301910016 bytes, 2930277168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x000beebc


$ sudo mdadm --examine /dev/sd[efgh]
/dev/sde:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1c74447b:33070712:cfcfa5af:cbfea660
           Name : media-server:2  (local to host media-server)
  Creation Time : Sun Jun 17 20:08:48 2012
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
     Array Size : 4395412224 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930274816 (1397.26 GiB 1500.30 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=304 sectors
          State : clean
    Device UUID : 270f29c9:0a36cd7a:27324b70:7f4e929b

    Update Time : Tue Sep 20 09:12:16 2016
       Checksum : dc24915c - correct
         Events : 2449

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 0
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1c74447b:33070712:cfcfa5af:cbfea660
           Name : media-server:2  (local to host media-server)
  Creation Time : Sun Jun 17 20:08:48 2012
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
     Array Size : 4395412224 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930274816 (1397.26 GiB 1500.30 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=304 sectors
          State : clean
    Device UUID : fe2fffdc:6d072a9d:87757913:ae7365db

    Update Time : Tue Sep 20 09:12:16 2016
       Checksum : f558ec26 - correct
         Events : 2449

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 1
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1c74447b:33070712:cfcfa5af:cbfea660
           Name : media-server:2  (local to host media-server)
  Creation Time : Sun Jun 17 20:08:48 2012
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
     Array Size : 4395412224 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930274816 (1397.26 GiB 1500.30 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=304 sectors
          State : clean
    Device UUID : f27f55c2:f66f5fe9:02943932:2cf47cca

    Update Time : Tue Sep 20 09:12:16 2016
       Checksum : 34bc439e - correct
         Events : 2449

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 2
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 1c74447b:33070712:cfcfa5af:cbfea660
           Name : media-server:2  (local to host media-server)
  Creation Time : Sun Jun 17 20:08:48 2012
     Raid Level : raid5
   Raid Devices : 4

 Avail Dev Size : 2930275120 (1397.26 GiB 1500.30 GB)
     Array Size : 4395412224 (4191.79 GiB 4500.90 GB)
  Used Dev Size : 2930274816 (1397.26 GiB 1500.30 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
   Unused Space : before=1968 sectors, after=304 sectors
          State : clean
    Device UUID : 642511df:7f5d1022:a1b40e7b:6ebd37c6

    Update Time : Tue Sep 20 09:12:16 2016
       Checksum : ceb8c19b - correct
         Events : 2449

         Layout : left-symmetric
     Chunk Size : 128K

   Device Role : Active device 3
   Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)




I hope to simply do the following steps:

## Replace old disk with new partitions

# Step 1: Create partitions on disk
$ sudo fdisk /dev/sde
create 1465137408 (1397.26 GiB 1500.30 GB) partition and a second on
with the rest

# Step 2: Replace old disk with new partition
$ sudo mdadm --manage /dev/md4 --add /dev/sde1

# Step 3: Wait until rebuilt

# Step 4: Repeat steps 1-3 for /dev/sd[fgh]


## Add new partitions

# Step 5: Create new Raid5
sudo mdadm -C /dev/md5 -c 128 -n4 -l5 /dev/sd[efgh]2

# Step 6: Add new Raid5 to linear
sudo mdadm --grow /dev/md0 --add /dev/md5

# Step 7: Grow filesystem
sudo xfs_growfs /mnt/media-raid



I am using 4 TB WD red to replace the 1.5 TB disks.

Do you think this could work?
Are there any pitfalls?
Should I unmount the array to perform all these steps? It might be
safer since there is no redundancy during the replacement?

I just wanted to check first before making any mistakes. Instead of
maybe afterwards asking for help recovering my data :-P


Best regards,
Ramon

^ permalink raw reply

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Anthony DeRobertis @ 2016-09-20 18:31 UTC (permalink / raw)
  To: Guoqing Jiang, linux-raid, 837964
In-Reply-To: <57E10311.7040601@suse.com>

[-- Attachment #1: Type: text/plain, Size: 1212 bytes --]

Sorry for the amount of emails I'm sending, but I noticed something 
that's probably important. I'm also appending some gdb log from tracing 
through the function (trying to answer why it's doing cluster mode stuff 
at all).

While tracing through, I noticed that *before* the write-bitmap loop, 
mdadm -E considers the superblock valid. That agrees with what I saw 
from strace, I suppose. To my first glance, it figures out how much to 
write by calling this function:

static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned int boundary)
{
	unsigned long long bits, bytes;

	bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
	bytes = (bits+7) >> 3;
	bytes += sizeof(bitmap_super_t);
	bytes = ROUND_UP(bytes, boundary);

	return bytes;
}

That code looked familiar, and I figured out where—it's also in 
95a05b37e8eb2bc0803b1a0298fce6adc60eff16, the commit that I found 
originally broke it. But that commit is making a change to it: it 
changed the ROUND_UP line from 512 to 4096 (and from the gdb trace, 
boundary==4096).

I tested changing that line to "bytes = ROUND_UP(bytes, 512);", and it 
works. Adds the new disk to the array and produces no warnings or errors.

[-- Attachment #2: gdb.txt --]
[-- Type: text/plain, Size: 13228 bytes --]

Starting program: /var/tmp/mdadm/mdadm/mdadm -a /dev/md/pv0 /dev/sdc3

Breakpoint 1, write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2351
2351		struct mdp_superblock_1 *sb = st->sb;
st = 0x6b0780
fd = 5
update = NodeNumUpdate
$1 = (struct supertype *) 0x6b0780
$2 = {ss = 0x69c060 <super1>, minor_version = 0, max_devs = 1920, container_devnm = '\000' <repeats 31 times>, sb = 0x6c7000, 
  info = 0x6c6450, other = 0x0, devsize = 0, data_offset = 0, ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, 
  sock = 0, devnm = "md127", '\000' <repeats 26 times>, devcnt = 0, retry_soon = 0, nodes = 0, cluster_name = 0x0, devs = 0x0}
#0  write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2351
        sb = 0x6c8000
        bms = 0x8e6492c800000400
        rv = 0
        buf = 0x15250a2b
        towrite = 1953005968
        n = 0
        len = 0
        afd = {fd = 243328694, blk_sz = 5}
        i = 7106560
        total_bm_space = 2199023255557
        bm_space_per_node = 7110656
#1  0x000000000044530c in write_init_super1 (st=0x6b0780) at super1.c:1851
        sb = 0x6c7000
        refst = 0x6c6490
        rv = 0
        bm_space = 264
        di = 0x6c6450
        dsize = 1953005985
        array_size = 1953005568
        sb_offset = 1953005968
        data_offset = 0
#2  0x00000000004169d0 in Manage_add (fd=3, tfd=4, dv=0x6b0040, tst=0x6b0780, array=0x7fffffffda40, force=0, verbose=0, 
    devname=0x7fffffffe4b7 "/dev/md/pv0", update=0x0, rdev=2083, array_size=1953005568, raid_slot=-1) at Manage.c:971
        dfd = 5
        ldsize = 999939064320
        dev_st = 0x6c6390
        j = 8
        disc = {number = 8, major = 8, minor = 35, raid_disk = -1, state = 0}
#3  0x00000000004183f5 in Manage_subdevs (devname=0x7fffffffe4b7 "/dev/md/pv0", fd=3, devlist=0x6b0040, verbose=0, test=0, update=0x0, 
    force=0) at Manage.c:1617
        rdev = 2083
        rv = 0
        mj = -142377600
        mn = 32767
        array = {major_version = 1, minor_version = 0, patch_version = 3, ctime = 1276712708, level = 10, size = 976502784, nr_disks = 4, 
          raid_disks = 4, md_minor = 127, not_persistent = 0, utime = 1474393877, state = 256, active_disks = 4, working_disks = 4, 
          failed_disks = 0, spare_disks = 0, layout = 513, chunk_size = 524288}
        array_size = 1953005568
        dv = 0x6b0040
        tfd = 4
        tst = 0x6b0780
        subarray = 0x0
        sysfd = -1
        count = 0
        info = {array = {major_version = -9784, minor_version = 32767, patch_version = -136434289, ctime = 32767, level = 2, size = 0, 
            nr_disks = -134254776, raid_disks = 32767, md_minor = 1, not_persistent = 0, utime = 0, state = 0, active_disks = 1, 
            working_disks = 0, failed_disks = -134225560, spare_disks = 32767, layout = -7824, chunk_size = 32767}, disk = {
            number = -10032, major = 32767, minor = -117177849, raid_disk = 0, state = 0}, events = 140737354130624, uuid = {-9968, 32767, 
            0, 1}, 
          name = "\000\331\377\377\377\177\000\000\354\222s\360\000\000\000\000\223\024@\000\000\000\000\000\377\377\377\377\000\000\000\000@", data_offset = 140737346016776, new_data_offset = 140737354099120, component_size = 140737488345760, custom_array_size = 140737351942788, 
          reshape_active = 1, reshape_progress = 140737354129344, recovery_blocked = 0, journal_device_required = 0, 
          journal_clean = -136478512, space_before = 140737351876824, space_after = 140737351876808, {resync_start = 140737349770912, 
            recovery_start = 140737349770912}, bitmap_offset = 140737488345760, safe_mode_delay = 0, new_level = 6905808, delta_disks = 0, 
          new_layout = 4206336, new_chunk = 0, errors = -7872, cache_size = 0, mismatch_cnt = 0, 
          text_version = "\000\000\000\000`\340\377\377\377\177\000\000\326w\336\367\377\177\000\000\001", '\000' <repeats 23 times>, "\b\026\204\367\377\177", container_member = -9504, container_enough = 32767, sys_name = "md127", '\000' <repeats 26 times>, 
          devs = 0xff000000000000, next = 0x0, recovery_fd = -16777216, state_fd = -65536, prev_state = 0, curr_state = 0, next_state = 0, 
          sysfs_array_state = "\000\000\377\377", '\000' <repeats 15 times>}
        devinfo = {array = {major_version = -142323768, minor_version = 32767, patch_version = 0, ctime = 0, level = -2147483646, size = 0, 
            nr_disks = 4706142, raid_disks = 0, md_minor = 4, not_persistent = 0, utime = 4272203, state = 0, active_disks = -10000, 
            working_disks = 32767, failed_disks = -136412540, spare_disks = 32767, layout = -142323768, chunk_size = 32767}, disk = {
            number = -134225984, major = 32767, minor = -9856, raid_disk = 32767, state = -10144}, events = 140737488345192, uuid = {
            -10145, 32767, -136395088, 32767}, 
          name = "p\330\377\377\377\177\000\000\310d\377\367\377\177\000\000\225W\275\367\002\000\000\000`\330\377\377\377\177\000\000t", 
          data_offset = 140737488345183, new_data_offset = 1627, component_size = 140737354099120, custom_array_size = 140737345977728, 
          reshape_active = -142323768, reshape_progress = 140737351919787, recovery_blocked = 1627, journal_device_required = 0, 
          journal_clean = -142323768, space_before = 140737354099120, space_after = 140737488345144, {resync_start = 140737488345140, 
            recovery_start = 140737488345140}, bitmap_offset = 140737351918145, safe_mode_delay = 7, new_level = 4199571, delta_disks = 0, 
          new_layout = 4196120, new_chunk = 0, errors = -10184, cache_size = 4034106092, mismatch_cnt = 63032907, 
          text_version = "\000\000\000\000,\000\000\000\000\000\000\000\020\331\377\377\377\177\000\000\310O\204\367\377\177\000\000\200}\203\367\377\177\000\000\064\330\377\377\377\177\000\000\000\331\377\377\377\177", container_member = -134254856, container_enough = 32767, 
          sys_name = "\004\000\000\000\000\000\000\000ibcm\000\000\000\000o.4\000\377\177\000\000\376\377\377\377\000\000\000", devs = 0x0, 
          next = 0x7fffffffd998, recovery_fd = -134224704, state_fd = 32767, prev_state = -9824, curr_state = 32767, 
          next_state = -134254776, sysfs_array_state = "\377\177\000\000\000\000\000\000\000\000\000\000h\341\377\367\377\177\000"}
        frozen = 1
        busy = 0
        raid_slot = -1
#4  0x0000000000406948 in main (argc=4, argv=0x7fffffffe148) at mdadm.c:1368
        mode = 4
        opt = -1
        option_index = -1
        rv = 0
        i = 0
        array_size = 0
        data_offset = 1
        ident = {devname = 0x7fffffffdff8 "\340C\204", <incomplete sequence \367>, uuid_set = 0, uuid = {32767, 2, 0, -134254776}, 
          name = "\000\177\000\000\001", '\000' <repeats 15 times>, "\001\000\000\000\000\000\000\000h\341\377\367\377", 
          super_minor = 65534, devices = 0x0, level = 65534, raid_disks = 65534, spare_disks = 0, st = 0x0, autof = 0, spare_group = 0x0, 
          bitmap_file = 0x0, bitmap_fd = -1, container = 0x0, member = 0x0, next = 0x7ffff7ffe168, {assembled = -142326816}}
        configfile = 0x0
        devmode = 97
        bitmap_fd = -1
        devlist = 0x6b0010
        devlistend = 0x6b0060
        dv = 0x6b0040
        devs_found = 2
        symlinks = 0x0
        grow_continue = 0
        c = {readonly = 0, runstop = 0, verbose = 0, brief = 0, force = 0, homehost = 0x7fffffffdcd0 "Zia", require_homehost = 1, 
          prefer = 0x0, export = 0, test = 0, subarray = 0x0, update = 0x0, scan = 0, SparcAdjust = 0, autof = 0, delay = 0, 
          freeze_reshape = 0, backup_file = 0x0, invalid_backup = 0, action = 0x0, nodes = 0, homecluster = 0x0}
        s = {raiddisks = 0, sparedisks = 0, journaldisks = 0, level = 65534, layout = 65534, layout_str = 0x0, chunk = 0, 
          bitmap_chunk = 65534, bitmap_file = 0x0, assume_clean = 0, write_behind = 0, size = 0}
        sys_hostname = "Zia\000\377\177\000\000\360\303\373\367\377\177\000\000\000\000\000\000\000\000\000\000\330\331\377\367\377\177\000\000\340\336\377\377\377\177\000\000\217-\336\367\377\177\000\000\002\000\000\000\000\000\000\000\360\303\373\367\377\177\000\000\001", '\000' <repeats 15 times>, "\001\000\000\000\000\000\000\000\330\331\377\367\377\177\000\000\000\000 \271\377\377\377\377\000\000\342\004\275\357\377\377`\\i", '\000' <repeats 13 times>, "\300\344\377\367\377\177\000\000\220\335\377\377\377\177\000\000\000\000\200\271\001\000\000\000\200\335\377\377\377\177\000\000\307\016\340=\000\000\000\000t \336\367\377\177\000\000\377\377\377\377\000\000\000\000D\b\000\000\000\000\000\000\260i\377\367\377\177\000\000"...
        mailaddr = 0x0
        program = 0x0
        increments = 20
        daemonise = 0
        pidfile = 0x0
        oneshot = 0
        spare_sharing = 1
        ss = 0x0
        writemostly = 0
        shortopt = 0x6965a0 <short_bitmap_options> "-ABCDEFGIQhVXYWZ:vqb:c:i:l:p:m:n:x:u:c:d:z:U:N:sarfRSow1tye:"
        dosyslog = 0
        rebuild_map = 0
        remove_path = 0x0
        udev_filename = 0x0
        dump_directory = 0x0
        print_help = 0
        outf = 0x0
        mdfd = 3
2352		bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb)+MAX_SB_SIZE);
2353		int rv = 0;
$3 = {magic = 1836345698, version = 4, uuid = "\310@\320\336\006&׃?\033(\334\305\354d\232", events = 1124486, events_cleared = 1124486, 
  sync_size = 3906011136, state = 0, chunksize = 2097152, daemon_sleep = 5, write_behind = 0, sectors_reserved = 0, nodes = 0, 
  cluster_name = '\000' <repeats 63 times>, pad = '\000' <repeats 119 times>}
$4 = (void *) 0x6c7000
2357		unsigned int i = 0;
2360		switch (update) {
2373			if (st->minor_version != 2 && bms->version == BITMAP_MAJOR_CLUSTERED) {
2378			if (bms->version == BITMAP_MAJOR_CLUSTERED) {
2394				if (st->nodes)
No symbol "BITMAP_MAJOR_CLUSTERED" in current context.
$5 = 4
2396				break;
2419		init_afd(&afd, fd);
2421		locate_bitmap1(st, fd, 0);
$6 = {fd = 5, blk_sz = 512}
2423		if (posix_memalign(&buf, 4096, 4096))
$7 = (struct supertype *) 0x6b0780
$8 = {ss = 0x69c060 <super1>, minor_version = 0, max_devs = 1920, container_devnm = '\000' <repeats 31 times>, sb = 0x6c7000, 
  info = 0x6c6450, other = 0x0, devsize = 0, data_offset = 0, ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, 
  sock = 0, devnm = "md127", '\000' <repeats 26 times>, devcnt = 0, retry_soon = 0, nodes = 0, cluster_name = 0x0, devs = 0x0}
2430			if (i)
2433				memset(buf, 0xff, 4096);
2434			memcpy(buf, (char *)bms, sizeof(bitmap_super_t));
2436			towrite = calc_bitmap_size(bms, 4096);
2437			while (towrite > 0) {
$9 = 122880
2438				n = towrite;
2439				if (n > 4096)
2440					n = 4096;
2441				n = awrite(&afd, buf, n);
2442				if (n > 0)
2443					towrite -= n;
2446				if (i)
2449					memset(buf, 0xff, 4096);
2437			while (towrite > 0) {
2438				n = towrite;
2439				if (n > 4096)
2440					n = 4096;
2441				n = awrite(&afd, buf, n);
2442				if (n > 0)
2443					towrite -= n;
2446				if (i)
2449					memset(buf, 0xff, 4096);
2437			while (towrite > 0) {
2438				n = towrite;
2439				if (n > 4096)
2440					n = 4096;
2441				n = awrite(&afd, buf, n);
2442				if (n > 0)
2443					towrite -= n;
2446				if (i)
2449					memset(buf, 0xff, 4096);
2437			while (towrite > 0) {
2438				n = towrite;
2439				if (n > 4096)
$10 = 110592
Continue program being debugged, after signal or breakpoint.
Usage: continue [N]
If proceeding from breakpoint, a number N may be used as an argument,
which means to set the ignore count of that breakpoint to N - 1 (so that
the breakpoint won't break until the Nth time it is reached).

If non-stop mode is enabled, continue only the current thread,
otherwise all the threads in the program are continued.  To 
continue all stopped threads in non-stop mode, use the -a option.
Specifying -a and an ignore count simultaneously is an error.
Execute until the program reaches a source line greater than the current
or a specified location (same args as break command) within the current frame.
write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2451
2451			fsync(fd);
Continuing.
[Inferior 1 (process 23866) exited with code 01]
Breakpoint 2 at 0x440d25: file super1.c, line 165.
Starting program: /var/tmp/mdadm/mdadm/mdadm -a /dev/md/pv0 /dev/sdc3

Breakpoint 1, write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2351
2351		struct mdp_superblock_1 *sb = st->sb;
Continuing.

Breakpoint 2, calc_bitmap_size (bms=0x6c8000, boundary=4096) at super1.c:165
165		bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9);
bms = 0x6c8000
boundary = 4096
$11 = {magic = 1836345698, version = 4, uuid = "\310@\320\336\006&׃?\033(\334\305\354d\232", events = 1124486, events_cleared = 1124486, 
  sync_size = 3906011136, state = 0, chunksize = 2097152, daemon_sleep = 5, write_behind = 0, sectors_reserved = 0, nodes = 0, 
  cluster_name = '\000' <repeats 63 times>, pad = '\000' <repeats 119 times>}
166		bytes = (bits+7) >> 3;
167		bytes += sizeof(bitmap_super_t);
168		bytes = ROUND_UP(bytes, boundary);
$12 = 119458
170		return bytes;
$13 = 122880
Continuing.
[Inferior 1 (process 25040) exited with code 01]
quit

^ permalink raw reply

* Re: [PATCH] mdadm: replace hard coded string length
From: Thomas Fjellstrom @ 2016-09-20 18:03 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: linux-raid
In-Reply-To: <wrfjpoo4qazf.fsf@redhat.com>

On Friday, September 16, 2016 8:34:44 AM MDT Jes Sorensen wrote:
> Thomas Fjellstrom <thomas@fjellstrom.ca> writes:
> > On Thursday, September 15, 2016 12:15:30 PM MDT Jes Sorensen wrote:
> >> Song Liu <songliubraving@fb.com> writes:
> >> > @@ -1124,7 +1124,7 @@ static int update_super1(struct supertype *st,
> >> > struct mdinfo *info,>
> >> > 
> >> >  		if (c)
> >> >  		
> >> >  			strncpy(info->name, c+1, 31 - (c-sb->set_name));
> >> >  		
> >> >  		else
> >> > 
> >> > -			strncpy(info->name, sb->set_name, 32);
> >> > +			strncpy(info->name, sb->set_name, sizeof(sb->set_name));
> >> > 
> >> >  		info->name[32] = 0;
> >> >  	
> >> >  	}
> >> 
> >> I was about to apply this, but this is actually wrong. You need to use
> >> the size of the destination, not of the source as the limit.
> >> 
> >> Sorry for the hassle.
> > 
> > I'm not aware of the full details, but either they are the same size, or
> > they aren't, and you need to use the minimum size of both to avoid any
> > kind of overflow (source or dest, read and write). I presume the
> > destination is smaller?
> 
> When copying a null terminated string, you need to check against the
> size of the destination, not the source. It may happen to be they are
> the same size here, but if code is later moved around you could get into
> a situation where that is no longer the case. Checking against the size
> of the destination is the correct way.

Yes, I wasn't paying close enough attention, str*cpy does the check on the 
source length so yes, you want to check against just the destination length in 
this case. In essense, you're checking both and clamping to the minimum of 
either buffer.

> Second, when you reply to a mailing list posting, kindly refrain from
> removing the person you respond to from the CC list.

Sorry, I felt like it'd be spamming the two other people.

> Jes
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thomas Fjellstrom
thomas@fjellstrom.ca

^ permalink raw reply

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Anthony DeRobertis @ 2016-09-20 17:52 UTC (permalink / raw)
  To: Guoqing Jiang, linux-raid, 837964
In-Reply-To: <57E10311.7040601@suse.com>

BTW: the change from apparently working but spewing errors back to not 
working is:

bbc24bb35020793b9e6fa2111b15882f0dbfe36e is the first broken commit
commit bbc24bb35020793b9e6fa2111b15882f0dbfe36e
Author: Guoqing Jiang <gqjiang@suse.com>
Date:   Mon May 9 10:22:58 2016 +0800

     super1: make the check for NodeNumUpdate more accurate
     
     We missed to check the version is BITMAP_MAJOR_CLUSTERED
     or not, otherwise mdadm can't create array with other 1.x
     metadatas (1.0 and 1.1).
     
     Signed-off-by: Guoqing Jiang <gqjiang@suse.com>
     Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>

:100644 100644 972b4700455426d47f52141416d873b6c745fa07 fa933676621f6431398192b1c0b26f3ce53deac3 M      super1.c



^ permalink raw reply

* RAID1 Questions.  Please help
From: WNSDEV @ 2016-09-20 17:40 UTC (permalink / raw)
  To: linux-raid

Hi,  I would greatly appreciate some insight here.

 
Ubuntu 16.04 LTS

$ uname -a 
Linux hostname 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016
x86_64 x86_64 x86_64 GNU/Linux

$ mdadm -V
mdadm - v3.3 - 3rd September 2013

RAID1 configuration of 2 SSDs.

Fresh install of Ubuntu 16.04LTS using the installer to create the RAID1
with 3 partitions (/boot,swap,/)

Output of cat /proc/mdstat after installation:


Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4]
[raid10] 
md2 : active raid1 sda3[0] sdb3[1]
      634795008 blocks super 1.2 [2/2] [UU]
      bitmap: 0/5 pages [0KB], 65536KB chunk

md1 : active raid1 sda2[0] sdb2[1]
      126887936 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid1 sda1[0] sdb1[1]
      19514368 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

Question 1: why is there no bitmap for md0 which is /boot?


Next :  To simulate a failure, I removed sdb from the array, then physically
removed it and installed a new disk in the same physical slot. 
           I partitioned the new disk, synced it to the remaining disk (sda)
then grubed it.  

the output of cat /proc/mdstat after is:

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4]
[raid10] 
md2 : active raid1 sdb3[1] sda3[0]
      634795008 blocks super 1.2 [2/2] [UU]
      bitmap: 0/5 pages [0KB], 65536KB chunk

md1 : active raid1 sdb2[1] sda2[0]
      126887936 blocks super 1.2 [2/2] [UU]
      bitmap: 0/1 pages [0KB], 65536KB chunk

md0 : active raid1 sdb1[2] sda1[0]        
      19514368 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

Question 2:  why is number in [] for md0, sdb1 now=2 when before it was 1?

Question 3:  Why didn't the numbers for sdb2 and sdb3 change ?
Question 4:  Does it matter what the number are?  Is this an indication of a
problem?
Question 5:  Does this posting pertain here?
http://www.spinics.net/lists/raid/msg49766.html

Thank you for any information or direction,
Pete Sangas






^ permalink raw reply

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Anthony DeRobertis @ 2016-09-20 17:12 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid, 837964
In-Reply-To: <57E10311.7040601@suse.com>

On Tue, Sep 20, 2016 at 05:36:17AM -0400, Guoqing Jiang wrote:

> The md-cluster code should only work raid1 with 1.2 metadata, and your array
> (/dev/md/pv0) is raid10 with
> 1.0 metadata (if I read bug correctly),

Correct, it's 1.0 metadata and raid10.

> I assume it only happens with existed array,  a new created one doesn't have
> the problem, right? And I can't
> reproduce it from my side.

yeah, I made a 1.0 raid10 with 4 LVs, and couldn't trigger the bug. Of
course, that array was also much smaller.

> 
> Which kernel version are you used to created the array in case the kernel
> was updated?

I've had the array for a while (the superblocks with -E show a creation
time of Wed Jun 16 14:25:08 2010). If I had to take a guess, I'd guess
it was created with the Debian squeeze alpha1 installer... So probably
2.6.30 or 2.6.32.

> Also pls show the
> output of "mdadm -X $DISK", and your bitmap is a little weird (but I don't
> try with 10 level before, so maybe
> it is correct).

I'd guess it was created by doing 'mdadm --grow --bitmap internal' at
some point after the array was created, but I'm not sure. Could have
been created with the array.

# ./mdadm -X /dev/sd[abde]3 
        Filename : /dev/sda3
           Magic : 6d746962
         Version : 4
            UUID : c840d0de:0626d783:3f1b28dc:c5ec649a
          Events : 1124478
  Events Cleared : 1124478
           State : OK
       Chunksize : 2 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1953005568 (1862.53 GiB 1999.88 GB)
          Bitmap : 953616 bits (chunks), 54 dirty (0.0%)
        Filename : /dev/sdb3
           Magic : 6d746962
         Version : 4
            UUID : c840d0de:0626d783:3f1b28dc:c5ec649a
          Events : 1124478
  Events Cleared : 1124478
           State : OK
       Chunksize : 2 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1953005568 (1862.53 GiB 1999.88 GB)
          Bitmap : 953616 bits (chunks), 54 dirty (0.0%)
        Filename : /dev/sdd3
           Magic : 6d746962
         Version : 4
            UUID : c840d0de:0626d783:3f1b28dc:c5ec649a
          Events : 1124478
  Events Cleared : 1124478
           State : OK
       Chunksize : 2 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1953005568 (1862.53 GiB 1999.88 GB)
          Bitmap : 953616 bits (chunks), 54 dirty (0.0%)
        Filename : /dev/sde3
           Magic : 6d746962
         Version : 4
            UUID : c840d0de:0626d783:3f1b28dc:c5ec649a
          Events : 1124478
  Events Cleared : 1124478
           State : OK
       Chunksize : 2 MB
          Daemon : 5s flush period
      Write Mode : Normal
       Sync Size : 1953005568 (1862.53 GiB 1999.88 GB)
          Bitmap : 953616 bits (chunks), 84 dirty (0.0%)


^ permalink raw reply

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Guoqing Jiang @ 2016-09-20  9:36 UTC (permalink / raw)
  To: Anthony DeRobertis, linux-raid, 837964
In-Reply-To: <63417807-ae42-ed60-8c8b-3b699994c34c@derobert.net>

On 09/20/2016 03:02 AM, Anthony DeRobertis wrote:
> On 09/20/2016 01:38 AM, Guoqing Jiang wrote:
>>
>> Thanks for report, could you try the latest tree 
>> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git?
>> I guess 45a87c2f31335a759190dff663a881bc78ca5443 should resolve it , 
>> and I can add a spare disk
>> to native raid (internal bitmap) with different metadatas (0.9, 1.0 
>> to 1.2).
>
> (please keep me cc'd, I'm not subscribed)
>
> $ git rev-parse --short HEAD
> 676e87a
> $ make -j4
> ...
>
> # ./mdadm -a /dev/md/pv0 /dev/sdc3
> mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument
>
> [375036.613907] md: sdc3 does not have a valid v1.0 superblock, not 
> importing!
> [375036.613926] md: md_import_device returned -22

The md-cluster code should only work raid1 with 1.2 metadata, and your 
array (/dev/md/pv0) is raid10 with
1.0 metadata (if I read bug correctly), so it is weird that your array 
can invoke the code for md-cluster.

I assume it only happens with existed array,  a new created one doesn't 
have the problem, right? And I can't
reproduce it from my side.

Which kernel version are you used to created the array in case the 
kernel was updated? Also pls show the
output of "mdadm -X $DISK", and your bitmap is a little weird (but I 
don't try with 10 level before, so maybe
it is correct).

Internal Bitmap : -234 sectors from superblock

Thanks,
Guoqing

^ permalink raw reply

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Anthony DeRobertis @ 2016-09-20  7:02 UTC (permalink / raw)
  To: Guoqing Jiang, linux-raid, 837964
In-Reply-To: <57E0CB6C.2040000@suse.com>

On 09/20/2016 01:38 AM, Guoqing Jiang wrote:
>
> Thanks for report, could you try the latest tree 
> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git?
> I guess 45a87c2f31335a759190dff663a881bc78ca5443 should resolve it , 
> and I can add a spare disk
> to native raid (internal bitmap) with different metadatas (0.9, 1.0 to 
> 1.2).

(please keep me cc'd, I'm not subscribed)

$ git rev-parse --short HEAD
676e87a
$ make -j4
...

# ./mdadm -a /dev/md/pv0 /dev/sdc3
mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument

[375036.613907] md: sdc3 does not have a valid v1.0 superblock, not 
importing!
[375036.613926] md: md_import_device returned -22

So current master seems to be back to broken completely. It's 3AM here, 
will check more (and do another bisect, to find when it went from 
weird-errors-but-works to broken) tomorrow.

^ permalink raw reply

* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Guoqing Jiang @ 2016-09-20  5:38 UTC (permalink / raw)
  To: Anthony DeRobertis, linux-raid, 837964
In-Reply-To: <20160919163229.uccdr6bxiwetqvwo@derobert.net>



On 09/19/2016 12:32 PM, Anthony DeRobertis wrote:
> (please cc me, I'm not subscribed.)
>
> mdadm 3.4 can not manage to add a spare to my array, it fails like:
>
>     # mdadm -a /dev/md/pv0 /dev/sdc3
>     mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument
>
> and the kernel logs:
>
>     md: sdc3 does not have a valid v1.0 superblock, not importing!
>     md: md_import_device returned -22
>
> This worked in 3.3.4. I performed two git bisects and found that:
>
> a) it was broken by 95a05b37e8eb2bc0803b1a0298fce6adc60eff16
> b) it is sort-of fixed by 81306e021ebdcc4baef866da82d25c3f0a415d2d
>     (which AFAIK isn't yet released)
>
> I say sort of fixed in that it adds it to the array, but spits out some
> worrying errors (and I have no idea if it'd actually work, e.g., if it'd
> assemble again):
>
>     # ./mdadm -a /dev/md/pv0 /dev/sdc3
>     mdadm: Warning: cluster md only works with superblock 1.2
>     mdadm: Failed to write metadata to /dev/sdc3

Thanks for report, could you try the latest tree 
git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git?
I guess 45a87c2f31335a759190dff663a881bc78ca5443 should resolve it , and 
I can add a spare disk
to native raid (internal bitmap) with different metadatas (0.9, 1.0 to 1.2).

Pls let me know the result, I will look into it if the issue still exists.

Thanks,
Guoqing

^ permalink raw reply

* [PATCH] raid5: fix to detect failure of register_shrinker
From: Chao Yu @ 2016-09-20  2:33 UTC (permalink / raw)
  To: shli; +Cc: linux-raid, linux-kernel, chao, Chao Yu

register_shrinker can fail after commit 1d3d4437eae1 ("vmscan: per-node
deferred work"), we should detect the failure of it, otherwise we may
fail to register shrinker after raid5 configuration was setup successfully.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
---
 drivers/md/raid5.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 766c3b7..b819a9a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6632,7 +6632,12 @@ static struct r5conf *setup_conf(struct mddev *mddev)
 	conf->shrinker.count_objects = raid5_cache_count;
 	conf->shrinker.batch = 128;
 	conf->shrinker.flags = 0;
-	register_shrinker(&conf->shrinker);
+	if (register_shrinker(&conf->shrinker)) {
+		printk(KERN_ERR
+		       "md/raid:%s: couldn't register shrinker.\n",
+		       mdname(mddev));
+		goto abort;
+	}
 
 	sprintf(pers_name, "raid%d", mddev->new_level);
 	conf->thread = md_register_thread(raid5d, mddev, pers_name);
-- 
2.7.2

^ permalink raw reply related

* Bug#837964: 95a05b3 broke mdadm --add on my superblock 1.0 array
From: Anthony DeRobertis @ 2016-09-19 16:32 UTC (permalink / raw)
  To: linux-raid; +Cc: 837964

(please cc me, I'm not subscribed.)

mdadm 3.4 can not manage to add a spare to my array, it fails like:

   # mdadm -a /dev/md/pv0 /dev/sdc3
   mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument

and the kernel logs:

   md: sdc3 does not have a valid v1.0 superblock, not importing!
   md: md_import_device returned -22

This worked in 3.3.4. I performed two git bisects and found that:

a) it was broken by 95a05b37e8eb2bc0803b1a0298fce6adc60eff16
b) it is sort-of fixed by 81306e021ebdcc4baef866da82d25c3f0a415d2d
   (which AFAIK isn't yet released)

I say sort of fixed in that it adds it to the array, but spits out some
worrying errors (and I have no idea if it'd actually work, e.g., if it'd
assemble again):

   # ./mdadm -a /dev/md/pv0 /dev/sdc3 
   mdadm: Warning: cluster md only works with superblock 1.2
   mdadm: Failed to write metadata to /dev/sdc3

I'm not using (or at least not on purpose!) cluster md, as these are
internal SATA drives accessible by only one machine.

I wasn't able to reproduce it on a new (small) test array, so it might
be something specific to this array.

I've reported this bug to Debian, and that bug report contains a lot of
system information that I won't repeat here (because it's quite long):

	https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837964

that contains mdadm -E output, the mdadm.conf, etc.

^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Benjammin2068 @ 2016-09-19 16:17 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux-RAID
In-Reply-To: <57DF84BC.2040703@youngman.org.uk>

On 09/19/2016 01:25 AM, Wols Lists wrote:
>
> Yeah. I've done that a couple of times. Create the new partition larger
> than the old one. dd the old partition across. Use whatever
> filesystem-specific tool there was to grow the file system into all
> available space on the partition.
>
> Oh yes - and be damn careful with FAT :-) I can't remember the details,
> but when there was a problem it used to prefer a faulty filesystem size
> to the partition size, and would gaily sail off the end of the
> partition, trashing the next partition. My "record to USB" TV seems
> rather prone to this :-(
>

These drives are wholly allocated to nothing but the RAID array... so I only have to make 1 partition and it's more or less the whole disk. :)

I've got the new WDs online and am growing that RAID5 to a RAID6 as we speak.

(two thumbs up)

I have (2) HD103SJ drives left in the array... one installed when the array was built and has about 44500 hours on it... while the other only has about 38400hours on it.

smartctl is keeping an eye on them for me. ;)

The rest of the drives are relatively new (especially after the episode of drive failures a couple weeks ago).

Thanks again for the help everyone!

 -Ben

^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Wols Lists @ 2016-09-19  6:25 UTC (permalink / raw)
  To: Benjammin2068; +Cc: Linux-RAID
In-Reply-To: <03f24dac-c34b-31d0-a514-0d9c99e07fd2@gmail.com>

On 18/09/16 22:29, Benjammin2068 wrote:
> Sounds good. I was more worried about the specifics of the partition and how mdadm sees a larger sized partition -- NOT just a larger sized drive. (on which a same size partition could be built)

Yeah. I've done that a couple of times. Create the new partition larger
than the old one. dd the old partition across. Use whatever
filesystem-specific tool there was to grow the file system into all
available space on the partition.

Oh yes - and be damn careful with FAT :-) I can't remember the details,
but when there was a problem it used to prefer a faulty filesystem size
to the partition size, and would gaily sail off the end of the
partition, trashing the next partition. My "record to USB" TV seems
rather prone to this :-(

Cheers,
Wol

^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Benjammin2068 @ 2016-09-18 21:29 UTC (permalink / raw)
  To: Wols Lists; +Cc: Linux-RAID
In-Reply-To: <57DF055E.9060102@youngman.org.uk>

On 09/18/2016 04:21 PM, Wols Lists wrote:
> On 18/09/16 20:58, Benjammin2068 wrote:
>> Aha! That's what I needed to know.
>>
>> I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.
> Good. It's a bit like string logic - if the buffer is bigger than the
> string everything's fine, but if the string is bigger than the buffer,
> well, ooopppssssss.
>
> Basically, I think the root cause of all this mess is that drive
> sectors/blocks/whatever used to be 512 bytes. So, obviously, it made
> sense to have sector 0 be the boot sector, and your first partition
> started in sector 1. If your drives are small, you don't want to waste
> space.
>
> Then the new drives came along with 4K sectors. Aarghh. Put an old-style
> partition scheme on a new-style drive, and every OS 4K block would start
> in the 2nd 512-byte block of a 4K drive sector. So every disk write from
> the OS would force the drive to read two sectors from disk, overlay the
> OS block over them, and write them both back. Not nice. And the latest
> drives refuse to do that!

hah.. yea.. I remember when it happened (and why). (I still have a seagate ST-251 40MB MFM HD sitting in a box with my Atari software on it. Right  now, it's Schrodinger's drive. It still working as long as I don't pull it out and test it. LoL....)

Drive companies claimed (and maybe rightfully so) that the 512B sector with all the seeks required to read data was wasteful. (considering the armature movement needed for scattered files and people who didn't defrag their drives.)

Also, the number of sectors that could be numbered on a drive was an issue with the sizes of drives coming out.

a 2^32 sectors @ 512bytes = 2,199,023,255,552 <-- doesn't that number ring a bell. ;)

So they moved to bigger sector sizes.

> Which is one of the reasons why modern partitioning programs start the
> first partition - iirc - at the start of the 3rd megabyte of the disk.
> Leaving plenty of space for the boot/startup code.

Yup. Now with all the bootloaders...

>
> So it's not worth replicating your old partitions directly on the new
> drives. Just make sure the new drives are the same size (or a bit
> larger) than the old ones, and move the data across. Bit like copying a
> string :-)

Sounds good. I was more worried about the specifics of the partition and how mdadm sees a larger sized partition -- NOT just a larger sized drive. (on which a same size partition could be built)

Thanks again,

 -Ben

^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Wols Lists @ 2016-09-18 21:21 UTC (permalink / raw)
  To: Benjammin2068, Chris Murphy; +Cc: Linux-RAID
In-Reply-To: <4e08d03d-6759-84fd-6467-3bb2d1d9d320@gmail.com>

On 18/09/16 20:58, Benjammin2068 wrote:
> Aha! That's what I needed to know.
> 
> I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.

Good. It's a bit like string logic - if the buffer is bigger than the
string everything's fine, but if the string is bigger than the buffer,
well, ooopppssssss.

Basically, I think the root cause of all this mess is that drive
sectors/blocks/whatever used to be 512 bytes. So, obviously, it made
sense to have sector 0 be the boot sector, and your first partition
started in sector 1. If your drives are small, you don't want to waste
space.

Then the new drives came along with 4K sectors. Aarghh. Put an old-style
partition scheme on a new-style drive, and every OS 4K block would start
in the 2nd 512-byte block of a 4K drive sector. So every disk write from
the OS would force the drive to read two sectors from disk, overlay the
OS block over them, and write them both back. Not nice. And the latest
drives refuse to do that!

Which is one of the reasons why modern partitioning programs start the
first partition - iirc - at the start of the 3rd megabyte of the disk.
Leaving plenty of space for the boot/startup code.

So it's not worth replicating your old partitions directly on the new
drives. Just make sure the new drives are the same size (or a bit
larger) than the old ones, and move the data across. Bit like copying a
string :-)

Cheers,
Wol

^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Benjammin2068 @ 2016-09-18 19:58 UTC (permalink / raw)
  To: Wols Lists, Chris Murphy; +Cc: Linux-RAID
In-Reply-To: <57DEE853.4060001@youngman.org.uk>



On 09/18/2016 02:17 PM, Wols Lists wrote:
>
> I'm sure you know this, but getting the physical/logical block size
> out-of-sync hurts disk performance. And copying a smaller partition into
> a larger allocated space is perfectly harmless. So...
>
> I'd simply use a modern partition manager (such as gdisk) to partition
> your new drives such that the new partitions are larger than the
> existing ones, and are properly aligned relative to the drive geometry.
>
> Then copy the old partitions across however you were planning - whether
> it's "mdadm --replace" or stopping the array and "dd old-device
> new-device" or whatever.
>
> If you've got a bit of wasted space, or whatever, who cares.
> You can resize your file-systems to use all available space, if you wish
> (can't remember how, whenever I've done that sort of stuff it hasn't
> been hard).
>
> But I'd certainly try and avoid those offset warnings - it smacks to me
> of a mismatch between 512-byte blocks and 4K disk sectors, and I
> wouldn't want the drive firmware messing about correcting mismatches
> between OS 4K blocks and drive 4K blocks. I don't fully understand it
> but I know there was a lot of grief with exactly this sort of thing in
> the transition from 512-byte to 4K.
>


Aha! That's what I needed to know.

I was wondering if I can make a partition (I think) that's 3/4 of a block larger (3072bytes) than the original /dev/sdX1's on the old HD103SJs drives.

You've answered my question perfectly.

I can use sfdisk or parted to get that done...

Thanks a bunch!

 -Ben


^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Wols Lists @ 2016-09-18 19:17 UTC (permalink / raw)
  To: Benjammin2068, Chris Murphy; +Cc: Linux-RAID
In-Reply-To: <484e25ca-0a8e-f666-b1c8-ebc92a49f999@gmail.com>

On 18/09/16 19:41, Benjammin2068 wrote:
> I'll check - this is CentOS... but I've (as shown in followup email) played with fdisk (which doesn't bother me) and some of the others...
> 
> now I just have to sort out this offset issue which I think I'm stuck with due to different partition sizes.

Don't quite understand what you're trying to do, but ...

I'm sure you know this, but getting the physical/logical block size
out-of-sync hurts disk performance. And copying a smaller partition into
a larger allocated space is perfectly harmless. So...

I'd simply use a modern partition manager (such as gdisk) to partition
your new drives such that the new partitions are larger than the
existing ones, and are properly aligned relative to the drive geometry.

Then copy the old partitions across however you were planning - whether
it's "mdadm --replace" or stopping the array and "dd old-device
new-device" or whatever.

If you've got a bit of wasted space, or whatever, who cares.
You can resize your file-systems to use all available space, if you wish
(can't remember how, whenever I've done that sort of stuff it hasn't
been hard).

But I'd certainly try and avoid those offset warnings - it smacks to me
of a mismatch between 512-byte blocks and 4K disk sectors, and I
wouldn't want the drive firmware messing about correcting mismatches
between OS 4K blocks and drive 4K blocks. I don't fully understand it
but I know there was a lot of grief with exactly this sort of thing in
the transition from 512-byte to 4K.

Cheers,
Wol

^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Benjammin2068 @ 2016-09-18 18:41 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Linux-RAID
In-Reply-To: <CAJCQCtQAazjCmXfSgbcJNDxtM2pHRcqbmEo1j=Og-dQRPEAzaQ@mail.gmail.com>

On 09/18/2016 12:50 PM, Chris Murphy wrote:
>
> This is one of the dumbest things, haha. I do not for the life of me
> understand what distribution won't backport this, if they're unwilling
> to put modern tools for modern hardware in their distributions. It's
> one of the simplest, safest backports they could do and yet they
> don't. Incredible to me.

Yeaaaa.... and considering how often I have to do these kinds of installs or admin... it's... well.. yea.


> Any version of gdisk will do this correctly out of the box, so you can
> just install that from your existing old distro presumably. And if you
> can't, then get a recent live CD from pretty much anybody: Fedora 23
> or Fedora 24 has gdisk already on the media, and its version of parted
> and fdisk, also included, all do alignment to 4KiB sectors correctly.
>
> Actually, on either Fedora live media version you can do
>
> dnf install https://kojipkgs.fedoraproject.org//packages/blivet-gui/2.0.1/1.fc25/noarch/blivet-gui-2.0.1-1.fc25.noarch.rpm
>
> Which is the current version, and it will work on F24 for sure and
> maybe/probably F23 also. And dnf will sort out any additional
> dependencies needed. It has a similar gparted style UI, but it will do
> all kinds of wild things: mdadm raid, LVM raid, Btrfs. It'll create
> the partitions, RAID, LV's, file systems, and it will discover things
> already on the drive and properly wipe their signatures with a proper
> tear down before creating the new things. So you don't end up with
> crusty old stuff coming back to haunt you some other day.
>

I'll check - this is CentOS... but I've (as shown in followup email) played with fdisk (which doesn't bother me) and some of the others...

now I just have to sort out this offset issue which I think I'm stuck with due to different partition sizes.

 -Ben


^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Benjammin2068 @ 2016-09-18 18:08 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <e6180f36-f246-b4ad-ef53-8012e583b09e@gmail.com>

As an update to this, here's some data:


the older Samsung HD103SJ drives (3 of the 4 drive RAID5 are still alive and well in this stack) have partition#1 (/dev/sdX1) which lists out at:

> [root@quantum myth]# sfdisk -l -uM /dev/sdc        <-- this is the output from one of the 3 HD103SJ drives. The partition was originally created by palimpest.
>
> Disk /dev/sdc: 121601 cylinders, 255 heads, 63 sectors/track
> Units = mebibytes of 1048576 bytes, blocks of 1024 bytes, counting from 0
>
>    Device Boot Start   End    MiB    #blocks   Id  System
> /dev/sdc1         0+ 953867- 953868- 976760001   fd  Linux raid autodetect
> /dev/sdc2         0      -      0          0    0  Empty
> /dev/sdc3         0      -      0          0    0  Empty
> /dev/sdc4         0      -      0          0    0  Empty

When I do the math:

976,760,001 * 1024  = 1,000,202,241,024 bytes --- ok, so that's /dev/sdX1

Now we take 1,000,202,241,024 / 4096 (block size of new drives) = 244190000.25 -- so I have a 1024byte (2 512byte sector) difference between the 2 models when trying to switch over.

Is there a best practice for how to contend with this? (resize the partition somehow on the raid and then alter the partitions sizes -2 sectors to make then /8 nicely? I know. Sounds insane. I have backups. I'd do it. :P )

Should I just eat the performance hit for now?

Thanks,

 -Ben



^ permalink raw reply

* Re: Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Chris Murphy @ 2016-09-18 17:50 UTC (permalink / raw)
  To: Benjammin2068; +Cc: Linux-RAID
In-Reply-To: <e6180f36-f246-b4ad-ef53-8012e583b09e@gmail.com>

On Sun, Sep 18, 2016 at 11:13 AM, Benjammin2068 <benjammin2068@gmail.com> wrote:
> In a followup question to my arrays, I have a question about the new WDs with the larger sector size geometry but support 512B sectors.
>
> I bought some WD Reds (WD10EFRX) drives.
>
> When I let the linux "Disk Utility" (palimpest <- who the heck named that anyway?) do the RAID management with a new drive, it partitions on cyls and not sectors.
>
> So it makes a partition and then complains to me it's off by 512bytes which could affect performance.

This is one of the dumbest things, haha. I do not for the life of me
understand what distribution won't backport this, if they're unwilling
to put modern tools for modern hardware in their distributions. It's
one of the simplest, safest backports they could do and yet they
don't. Incredible to me.

Anyway, yeah partition with something not from the Pleistocene.
Seriously, it's that old, it's that much of a solved problem, for
probably 5 years, maybe even longer.

Any version of gdisk will do this correctly out of the box, so you can
just install that from your existing old distro presumably. And if you
can't, then get a recent live CD from pretty much anybody: Fedora 23
or Fedora 24 has gdisk already on the media, and its version of parted
and fdisk, also included, all do alignment to 4KiB sectors correctly.

Actually, on either Fedora live media version you can do

dnf install https://kojipkgs.fedoraproject.org//packages/blivet-gui/2.0.1/1.fc25/noarch/blivet-gui-2.0.1-1.fc25.noarch.rpm

Which is the current version, and it will work on F24 for sure and
maybe/probably F23 also. And dnf will sort out any additional
dependencies needed. It has a similar gparted style UI, but it will do
all kinds of wild things: mdadm raid, LVM raid, Btrfs. It'll create
the partitions, RAID, LV's, file systems, and it will discover things
already on the drive and properly wipe their signatures with a proper
tear down before creating the new things. So you don't end up with
crusty old stuff coming back to haunt you some other day.

-- 
Chris Murphy

^ permalink raw reply

* Best tool to partition Drives with new sector geometry - (WAS: Need Help with crashed RAID5 (that was rebuilding and then had SATA error on another drive))
From: Benjammin2068 @ 2016-09-18 17:13 UTC (permalink / raw)
  To: linux-raid
In-Reply-To: <57C41A47.5050506@youngman.org.uk>

In a followup question to my arrays, I have a question about the new WDs with the larger sector size geometry but support 512B sectors.

I bought some WD Reds (WD10EFRX) drives.

When I let the linux "Disk Utility" (palimpest <- who the heck named that anyway?) do the RAID management with a new drive, it partitions on cyls and not sectors.

So it makes a partition and then complains to me it's off by 512bytes which could affect performance.

Gee. Thanks.

So I can use g/parted -- or fdisk....

but I thought I'd get any suggestions for the preferred tool and any pitfalls to watch out for.

Thanks,

 -Ben

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox