* Bug#837964: 95a05b3 broke mdadm --add on my superblock 1.0 array @ 2016-09-19 16:32 Anthony DeRobertis 2016-09-20 5:38 ` Guoqing Jiang 0 siblings, 1 reply; 12+ messages in thread From: Anthony DeRobertis @ 2016-09-19 16:32 UTC (permalink / raw) To: linux-raid; +Cc: 837964 (please cc me, I'm not subscribed.) mdadm 3.4 can not manage to add a spare to my array, it fails like: # mdadm -a /dev/md/pv0 /dev/sdc3 mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument and the kernel logs: md: sdc3 does not have a valid v1.0 superblock, not importing! md: md_import_device returned -22 This worked in 3.3.4. I performed two git bisects and found that: a) it was broken by 95a05b37e8eb2bc0803b1a0298fce6adc60eff16 b) it is sort-of fixed by 81306e021ebdcc4baef866da82d25c3f0a415d2d (which AFAIK isn't yet released) I say sort of fixed in that it adds it to the array, but spits out some worrying errors (and I have no idea if it'd actually work, e.g., if it'd assemble again): # ./mdadm -a /dev/md/pv0 /dev/sdc3 mdadm: Warning: cluster md only works with superblock 1.2 mdadm: Failed to write metadata to /dev/sdc3 I'm not using (or at least not on purpose!) cluster md, as these are internal SATA drives accessible by only one machine. I wasn't able to reproduce it on a new (small) test array, so it might be something specific to this array. I've reported this bug to Debian, and that bug report contains a lot of system information that I won't repeat here (because it's quite long): https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=837964 that contains mdadm -E output, the mdadm.conf, etc. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-19 16:32 Bug#837964: 95a05b3 broke mdadm --add on my superblock 1.0 array Anthony DeRobertis @ 2016-09-20 5:38 ` Guoqing Jiang 2016-09-20 7:02 ` Anthony DeRobertis 0 siblings, 1 reply; 12+ messages in thread From: Guoqing Jiang @ 2016-09-20 5:38 UTC (permalink / raw) To: Anthony DeRobertis, linux-raid, 837964 On 09/19/2016 12:32 PM, Anthony DeRobertis wrote: > (please cc me, I'm not subscribed.) > > mdadm 3.4 can not manage to add a spare to my array, it fails like: > > # mdadm -a /dev/md/pv0 /dev/sdc3 > mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument > > and the kernel logs: > > md: sdc3 does not have a valid v1.0 superblock, not importing! > md: md_import_device returned -22 > > This worked in 3.3.4. I performed two git bisects and found that: > > a) it was broken by 95a05b37e8eb2bc0803b1a0298fce6adc60eff16 > b) it is sort-of fixed by 81306e021ebdcc4baef866da82d25c3f0a415d2d > (which AFAIK isn't yet released) > > I say sort of fixed in that it adds it to the array, but spits out some > worrying errors (and I have no idea if it'd actually work, e.g., if it'd > assemble again): > > # ./mdadm -a /dev/md/pv0 /dev/sdc3 > mdadm: Warning: cluster md only works with superblock 1.2 > mdadm: Failed to write metadata to /dev/sdc3 Thanks for report, could you try the latest tree git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git? I guess 45a87c2f31335a759190dff663a881bc78ca5443 should resolve it , and I can add a spare disk to native raid (internal bitmap) with different metadatas (0.9, 1.0 to 1.2). Pls let me know the result, I will look into it if the issue still exists. Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 5:38 ` Guoqing Jiang @ 2016-09-20 7:02 ` Anthony DeRobertis 2016-09-20 9:36 ` Guoqing Jiang 0 siblings, 1 reply; 12+ messages in thread From: Anthony DeRobertis @ 2016-09-20 7:02 UTC (permalink / raw) To: Guoqing Jiang, linux-raid, 837964 On 09/20/2016 01:38 AM, Guoqing Jiang wrote: > > Thanks for report, could you try the latest tree > git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git? > I guess 45a87c2f31335a759190dff663a881bc78ca5443 should resolve it , > and I can add a spare disk > to native raid (internal bitmap) with different metadatas (0.9, 1.0 to > 1.2). (please keep me cc'd, I'm not subscribed) $ git rev-parse --short HEAD 676e87a $ make -j4 ... # ./mdadm -a /dev/md/pv0 /dev/sdc3 mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument [375036.613907] md: sdc3 does not have a valid v1.0 superblock, not importing! [375036.613926] md: md_import_device returned -22 So current master seems to be back to broken completely. It's 3AM here, will check more (and do another bisect, to find when it went from weird-errors-but-works to broken) tomorrow. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 7:02 ` Anthony DeRobertis @ 2016-09-20 9:36 ` Guoqing Jiang 2016-09-20 17:12 ` Anthony DeRobertis ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Guoqing Jiang @ 2016-09-20 9:36 UTC (permalink / raw) To: Anthony DeRobertis, linux-raid, 837964 On 09/20/2016 03:02 AM, Anthony DeRobertis wrote: > On 09/20/2016 01:38 AM, Guoqing Jiang wrote: >> >> Thanks for report, could you try the latest tree >> git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git? >> I guess 45a87c2f31335a759190dff663a881bc78ca5443 should resolve it , >> and I can add a spare disk >> to native raid (internal bitmap) with different metadatas (0.9, 1.0 >> to 1.2). > > (please keep me cc'd, I'm not subscribed) > > $ git rev-parse --short HEAD > 676e87a > $ make -j4 > ... > > # ./mdadm -a /dev/md/pv0 /dev/sdc3 > mdadm: add new device failed for /dev/sdc3 as 8: Invalid argument > > [375036.613907] md: sdc3 does not have a valid v1.0 superblock, not > importing! > [375036.613926] md: md_import_device returned -22 The md-cluster code should only work raid1 with 1.2 metadata, and your array (/dev/md/pv0) is raid10 with 1.0 metadata (if I read bug correctly), so it is weird that your array can invoke the code for md-cluster. I assume it only happens with existed array, a new created one doesn't have the problem, right? And I can't reproduce it from my side. Which kernel version are you used to created the array in case the kernel was updated? Also pls show the output of "mdadm -X $DISK", and your bitmap is a little weird (but I don't try with 10 level before, so maybe it is correct). Internal Bitmap : -234 sectors from superblock Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 9:36 ` Guoqing Jiang @ 2016-09-20 17:12 ` Anthony DeRobertis 2016-09-21 6:40 ` Guoqing Jiang 2016-09-20 17:52 ` Anthony DeRobertis 2016-09-20 18:31 ` Anthony DeRobertis 2 siblings, 1 reply; 12+ messages in thread From: Anthony DeRobertis @ 2016-09-20 17:12 UTC (permalink / raw) To: Guoqing Jiang; +Cc: linux-raid, 837964 On Tue, Sep 20, 2016 at 05:36:17AM -0400, Guoqing Jiang wrote: > The md-cluster code should only work raid1 with 1.2 metadata, and your array > (/dev/md/pv0) is raid10 with > 1.0 metadata (if I read bug correctly), Correct, it's 1.0 metadata and raid10. > I assume it only happens with existed array, a new created one doesn't have > the problem, right? And I can't > reproduce it from my side. yeah, I made a 1.0 raid10 with 4 LVs, and couldn't trigger the bug. Of course, that array was also much smaller. > > Which kernel version are you used to created the array in case the kernel > was updated? I've had the array for a while (the superblocks with -E show a creation time of Wed Jun 16 14:25:08 2010). If I had to take a guess, I'd guess it was created with the Debian squeeze alpha1 installer... So probably 2.6.30 or 2.6.32. > Also pls show the > output of "mdadm -X $DISK", and your bitmap is a little weird (but I don't > try with 10 level before, so maybe > it is correct). I'd guess it was created by doing 'mdadm --grow --bitmap internal' at some point after the array was created, but I'm not sure. Could have been created with the array. # ./mdadm -X /dev/sd[abde]3 Filename : /dev/sda3 Magic : 6d746962 Version : 4 UUID : c840d0de:0626d783:3f1b28dc:c5ec649a Events : 1124478 Events Cleared : 1124478 State : OK Chunksize : 2 MB Daemon : 5s flush period Write Mode : Normal Sync Size : 1953005568 (1862.53 GiB 1999.88 GB) Bitmap : 953616 bits (chunks), 54 dirty (0.0%) Filename : /dev/sdb3 Magic : 6d746962 Version : 4 UUID : c840d0de:0626d783:3f1b28dc:c5ec649a Events : 1124478 Events Cleared : 1124478 State : OK Chunksize : 2 MB Daemon : 5s flush period Write Mode : Normal Sync Size : 1953005568 (1862.53 GiB 1999.88 GB) Bitmap : 953616 bits (chunks), 54 dirty (0.0%) Filename : /dev/sdd3 Magic : 6d746962 Version : 4 UUID : c840d0de:0626d783:3f1b28dc:c5ec649a Events : 1124478 Events Cleared : 1124478 State : OK Chunksize : 2 MB Daemon : 5s flush period Write Mode : Normal Sync Size : 1953005568 (1862.53 GiB 1999.88 GB) Bitmap : 953616 bits (chunks), 54 dirty (0.0%) Filename : /dev/sde3 Magic : 6d746962 Version : 4 UUID : c840d0de:0626d783:3f1b28dc:c5ec649a Events : 1124478 Events Cleared : 1124478 State : OK Chunksize : 2 MB Daemon : 5s flush period Write Mode : Normal Sync Size : 1953005568 (1862.53 GiB 1999.88 GB) Bitmap : 953616 bits (chunks), 84 dirty (0.0%) ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 17:12 ` Anthony DeRobertis @ 2016-09-21 6:40 ` Guoqing Jiang 0 siblings, 0 replies; 12+ messages in thread From: Guoqing Jiang @ 2016-09-21 6:40 UTC (permalink / raw) To: Anthony DeRobertis, linux-raid, 837964 On 09/20/2016 01:12 PM, Anthony DeRobertis wrote: > >> Which kernel version are you used to created the array in case the kernel >> was updated? > I've had the array for a while (the superblocks with -E show a creation > time of Wed Jun 16 14:25:08 2010). If I had to take a guess, I'd guess > it was created with the Debian squeeze alpha1 installer... So probably > 2.6.30 or 2.6.32. > Hmm, lots of things are changed from 2.6.30, so it is possible that latest mdadm can't work well with array which was created with the old kernel. Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 9:36 ` Guoqing Jiang 2016-09-20 17:12 ` Anthony DeRobertis @ 2016-09-20 17:52 ` Anthony DeRobertis 2016-09-20 18:31 ` Anthony DeRobertis 2 siblings, 0 replies; 12+ messages in thread From: Anthony DeRobertis @ 2016-09-20 17:52 UTC (permalink / raw) To: Guoqing Jiang, linux-raid, 837964 BTW: the change from apparently working but spewing errors back to not working is: bbc24bb35020793b9e6fa2111b15882f0dbfe36e is the first broken commit commit bbc24bb35020793b9e6fa2111b15882f0dbfe36e Author: Guoqing Jiang <gqjiang@suse.com> Date: Mon May 9 10:22:58 2016 +0800 super1: make the check for NodeNumUpdate more accurate We missed to check the version is BITMAP_MAJOR_CLUSTERED or not, otherwise mdadm can't create array with other 1.x metadatas (1.0 and 1.1). Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> :100644 100644 972b4700455426d47f52141416d873b6c745fa07 fa933676621f6431398192b1c0b26f3ce53deac3 M super1.c ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 9:36 ` Guoqing Jiang 2016-09-20 17:12 ` Anthony DeRobertis 2016-09-20 17:52 ` Anthony DeRobertis @ 2016-09-20 18:31 ` Anthony DeRobertis 2016-09-21 6:45 ` Guoqing Jiang 2 siblings, 1 reply; 12+ messages in thread From: Anthony DeRobertis @ 2016-09-20 18:31 UTC (permalink / raw) To: Guoqing Jiang, linux-raid, 837964 [-- Attachment #1: Type: text/plain, Size: 1212 bytes --] Sorry for the amount of emails I'm sending, but I noticed something that's probably important. I'm also appending some gdb log from tracing through the function (trying to answer why it's doing cluster mode stuff at all). While tracing through, I noticed that *before* the write-bitmap loop, mdadm -E considers the superblock valid. That agrees with what I saw from strace, I suppose. To my first glance, it figures out how much to write by calling this function: static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned int boundary) { unsigned long long bits, bytes; bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9); bytes = (bits+7) >> 3; bytes += sizeof(bitmap_super_t); bytes = ROUND_UP(bytes, boundary); return bytes; } That code looked familiar, and I figured out where—it's also in 95a05b37e8eb2bc0803b1a0298fce6adc60eff16, the commit that I found originally broke it. But that commit is making a change to it: it changed the ROUND_UP line from 512 to 4096 (and from the gdb trace, boundary==4096). I tested changing that line to "bytes = ROUND_UP(bytes, 512);", and it works. Adds the new disk to the array and produces no warnings or errors. [-- Attachment #2: gdb.txt --] [-- Type: text/plain, Size: 13228 bytes --] Starting program: /var/tmp/mdadm/mdadm/mdadm -a /dev/md/pv0 /dev/sdc3 Breakpoint 1, write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2351 2351 struct mdp_superblock_1 *sb = st->sb; st = 0x6b0780 fd = 5 update = NodeNumUpdate $1 = (struct supertype *) 0x6b0780 $2 = {ss = 0x69c060 <super1>, minor_version = 0, max_devs = 1920, container_devnm = '\000' <repeats 31 times>, sb = 0x6c7000, info = 0x6c6450, other = 0x0, devsize = 0, data_offset = 0, ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, sock = 0, devnm = "md127", '\000' <repeats 26 times>, devcnt = 0, retry_soon = 0, nodes = 0, cluster_name = 0x0, devs = 0x0} #0 write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2351 sb = 0x6c8000 bms = 0x8e6492c800000400 rv = 0 buf = 0x15250a2b towrite = 1953005968 n = 0 len = 0 afd = {fd = 243328694, blk_sz = 5} i = 7106560 total_bm_space = 2199023255557 bm_space_per_node = 7110656 #1 0x000000000044530c in write_init_super1 (st=0x6b0780) at super1.c:1851 sb = 0x6c7000 refst = 0x6c6490 rv = 0 bm_space = 264 di = 0x6c6450 dsize = 1953005985 array_size = 1953005568 sb_offset = 1953005968 data_offset = 0 #2 0x00000000004169d0 in Manage_add (fd=3, tfd=4, dv=0x6b0040, tst=0x6b0780, array=0x7fffffffda40, force=0, verbose=0, devname=0x7fffffffe4b7 "/dev/md/pv0", update=0x0, rdev=2083, array_size=1953005568, raid_slot=-1) at Manage.c:971 dfd = 5 ldsize = 999939064320 dev_st = 0x6c6390 j = 8 disc = {number = 8, major = 8, minor = 35, raid_disk = -1, state = 0} #3 0x00000000004183f5 in Manage_subdevs (devname=0x7fffffffe4b7 "/dev/md/pv0", fd=3, devlist=0x6b0040, verbose=0, test=0, update=0x0, force=0) at Manage.c:1617 rdev = 2083 rv = 0 mj = -142377600 mn = 32767 array = {major_version = 1, minor_version = 0, patch_version = 3, ctime = 1276712708, level = 10, size = 976502784, nr_disks = 4, raid_disks = 4, md_minor = 127, not_persistent = 0, utime = 1474393877, state = 256, active_disks = 4, working_disks = 4, failed_disks = 0, spare_disks = 0, layout = 513, chunk_size = 524288} array_size = 1953005568 dv = 0x6b0040 tfd = 4 tst = 0x6b0780 subarray = 0x0 sysfd = -1 count = 0 info = {array = {major_version = -9784, minor_version = 32767, patch_version = -136434289, ctime = 32767, level = 2, size = 0, nr_disks = -134254776, raid_disks = 32767, md_minor = 1, not_persistent = 0, utime = 0, state = 0, active_disks = 1, working_disks = 0, failed_disks = -134225560, spare_disks = 32767, layout = -7824, chunk_size = 32767}, disk = { number = -10032, major = 32767, minor = -117177849, raid_disk = 0, state = 0}, events = 140737354130624, uuid = {-9968, 32767, 0, 1}, name = "\000\331\377\377\377\177\000\000\354\222s\360\000\000\000\000\223\024@\000\000\000\000\000\377\377\377\377\000\000\000\000@", data_offset = 140737346016776, new_data_offset = 140737354099120, component_size = 140737488345760, custom_array_size = 140737351942788, reshape_active = 1, reshape_progress = 140737354129344, recovery_blocked = 0, journal_device_required = 0, journal_clean = -136478512, space_before = 140737351876824, space_after = 140737351876808, {resync_start = 140737349770912, recovery_start = 140737349770912}, bitmap_offset = 140737488345760, safe_mode_delay = 0, new_level = 6905808, delta_disks = 0, new_layout = 4206336, new_chunk = 0, errors = -7872, cache_size = 0, mismatch_cnt = 0, text_version = "\000\000\000\000`\340\377\377\377\177\000\000\326w\336\367\377\177\000\000\001", '\000' <repeats 23 times>, "\b\026\204\367\377\177", container_member = -9504, container_enough = 32767, sys_name = "md127", '\000' <repeats 26 times>, devs = 0xff000000000000, next = 0x0, recovery_fd = -16777216, state_fd = -65536, prev_state = 0, curr_state = 0, next_state = 0, sysfs_array_state = "\000\000\377\377", '\000' <repeats 15 times>} devinfo = {array = {major_version = -142323768, minor_version = 32767, patch_version = 0, ctime = 0, level = -2147483646, size = 0, nr_disks = 4706142, raid_disks = 0, md_minor = 4, not_persistent = 0, utime = 4272203, state = 0, active_disks = -10000, working_disks = 32767, failed_disks = -136412540, spare_disks = 32767, layout = -142323768, chunk_size = 32767}, disk = { number = -134225984, major = 32767, minor = -9856, raid_disk = 32767, state = -10144}, events = 140737488345192, uuid = { -10145, 32767, -136395088, 32767}, name = "p\330\377\377\377\177\000\000\310d\377\367\377\177\000\000\225W\275\367\002\000\000\000`\330\377\377\377\177\000\000t", data_offset = 140737488345183, new_data_offset = 1627, component_size = 140737354099120, custom_array_size = 140737345977728, reshape_active = -142323768, reshape_progress = 140737351919787, recovery_blocked = 1627, journal_device_required = 0, journal_clean = -142323768, space_before = 140737354099120, space_after = 140737488345144, {resync_start = 140737488345140, recovery_start = 140737488345140}, bitmap_offset = 140737351918145, safe_mode_delay = 7, new_level = 4199571, delta_disks = 0, new_layout = 4196120, new_chunk = 0, errors = -10184, cache_size = 4034106092, mismatch_cnt = 63032907, text_version = "\000\000\000\000,\000\000\000\000\000\000\000\020\331\377\377\377\177\000\000\310O\204\367\377\177\000\000\200}\203\367\377\177\000\000\064\330\377\377\377\177\000\000\000\331\377\377\377\177", container_member = -134254856, container_enough = 32767, sys_name = "\004\000\000\000\000\000\000\000ibcm\000\000\000\000o.4\000\377\177\000\000\376\377\377\377\000\000\000", devs = 0x0, next = 0x7fffffffd998, recovery_fd = -134224704, state_fd = 32767, prev_state = -9824, curr_state = 32767, next_state = -134254776, sysfs_array_state = "\377\177\000\000\000\000\000\000\000\000\000\000h\341\377\367\377\177\000"} frozen = 1 busy = 0 raid_slot = -1 #4 0x0000000000406948 in main (argc=4, argv=0x7fffffffe148) at mdadm.c:1368 mode = 4 opt = -1 option_index = -1 rv = 0 i = 0 array_size = 0 data_offset = 1 ident = {devname = 0x7fffffffdff8 "\340C\204", <incomplete sequence \367>, uuid_set = 0, uuid = {32767, 2, 0, -134254776}, name = "\000\177\000\000\001", '\000' <repeats 15 times>, "\001\000\000\000\000\000\000\000h\341\377\367\377", super_minor = 65534, devices = 0x0, level = 65534, raid_disks = 65534, spare_disks = 0, st = 0x0, autof = 0, spare_group = 0x0, bitmap_file = 0x0, bitmap_fd = -1, container = 0x0, member = 0x0, next = 0x7ffff7ffe168, {assembled = -142326816}} configfile = 0x0 devmode = 97 bitmap_fd = -1 devlist = 0x6b0010 devlistend = 0x6b0060 dv = 0x6b0040 devs_found = 2 symlinks = 0x0 grow_continue = 0 c = {readonly = 0, runstop = 0, verbose = 0, brief = 0, force = 0, homehost = 0x7fffffffdcd0 "Zia", require_homehost = 1, prefer = 0x0, export = 0, test = 0, subarray = 0x0, update = 0x0, scan = 0, SparcAdjust = 0, autof = 0, delay = 0, freeze_reshape = 0, backup_file = 0x0, invalid_backup = 0, action = 0x0, nodes = 0, homecluster = 0x0} s = {raiddisks = 0, sparedisks = 0, journaldisks = 0, level = 65534, layout = 65534, layout_str = 0x0, chunk = 0, bitmap_chunk = 65534, bitmap_file = 0x0, assume_clean = 0, write_behind = 0, size = 0} sys_hostname = "Zia\000\377\177\000\000\360\303\373\367\377\177\000\000\000\000\000\000\000\000\000\000\330\331\377\367\377\177\000\000\340\336\377\377\377\177\000\000\217-\336\367\377\177\000\000\002\000\000\000\000\000\000\000\360\303\373\367\377\177\000\000\001", '\000' <repeats 15 times>, "\001\000\000\000\000\000\000\000\330\331\377\367\377\177\000\000\000\000 \271\377\377\377\377\000\000\342\004\275\357\377\377`\\i", '\000' <repeats 13 times>, "\300\344\377\367\377\177\000\000\220\335\377\377\377\177\000\000\000\000\200\271\001\000\000\000\200\335\377\377\377\177\000\000\307\016\340=\000\000\000\000t \336\367\377\177\000\000\377\377\377\377\000\000\000\000D\b\000\000\000\000\000\000\260i\377\367\377\177\000\000"... mailaddr = 0x0 program = 0x0 increments = 20 daemonise = 0 pidfile = 0x0 oneshot = 0 spare_sharing = 1 ss = 0x0 writemostly = 0 shortopt = 0x6965a0 <short_bitmap_options> "-ABCDEFGIQhVXYWZ:vqb:c:i:l:p:m:n:x:u:c:d:z:U:N:sarfRSow1tye:" dosyslog = 0 rebuild_map = 0 remove_path = 0x0 udev_filename = 0x0 dump_directory = 0x0 print_help = 0 outf = 0x0 mdfd = 3 2352 bitmap_super_t *bms = (bitmap_super_t*)(((char*)sb)+MAX_SB_SIZE); 2353 int rv = 0; $3 = {magic = 1836345698, version = 4, uuid = "\310@\320\336\006&׃?\033(\334\305\354d\232", events = 1124486, events_cleared = 1124486, sync_size = 3906011136, state = 0, chunksize = 2097152, daemon_sleep = 5, write_behind = 0, sectors_reserved = 0, nodes = 0, cluster_name = '\000' <repeats 63 times>, pad = '\000' <repeats 119 times>} $4 = (void *) 0x6c7000 2357 unsigned int i = 0; 2360 switch (update) { 2373 if (st->minor_version != 2 && bms->version == BITMAP_MAJOR_CLUSTERED) { 2378 if (bms->version == BITMAP_MAJOR_CLUSTERED) { 2394 if (st->nodes) No symbol "BITMAP_MAJOR_CLUSTERED" in current context. $5 = 4 2396 break; 2419 init_afd(&afd, fd); 2421 locate_bitmap1(st, fd, 0); $6 = {fd = 5, blk_sz = 512} 2423 if (posix_memalign(&buf, 4096, 4096)) $7 = (struct supertype *) 0x6b0780 $8 = {ss = 0x69c060 <super1>, minor_version = 0, max_devs = 1920, container_devnm = '\000' <repeats 31 times>, sb = 0x6c7000, info = 0x6c6450, other = 0x0, devsize = 0, data_offset = 0, ignore_hw_compat = 0, updates = 0x0, update_tail = 0x0, arrays = 0x0, sock = 0, devnm = "md127", '\000' <repeats 26 times>, devcnt = 0, retry_soon = 0, nodes = 0, cluster_name = 0x0, devs = 0x0} 2430 if (i) 2433 memset(buf, 0xff, 4096); 2434 memcpy(buf, (char *)bms, sizeof(bitmap_super_t)); 2436 towrite = calc_bitmap_size(bms, 4096); 2437 while (towrite > 0) { $9 = 122880 2438 n = towrite; 2439 if (n > 4096) 2440 n = 4096; 2441 n = awrite(&afd, buf, n); 2442 if (n > 0) 2443 towrite -= n; 2446 if (i) 2449 memset(buf, 0xff, 4096); 2437 while (towrite > 0) { 2438 n = towrite; 2439 if (n > 4096) 2440 n = 4096; 2441 n = awrite(&afd, buf, n); 2442 if (n > 0) 2443 towrite -= n; 2446 if (i) 2449 memset(buf, 0xff, 4096); 2437 while (towrite > 0) { 2438 n = towrite; 2439 if (n > 4096) 2440 n = 4096; 2441 n = awrite(&afd, buf, n); 2442 if (n > 0) 2443 towrite -= n; 2446 if (i) 2449 memset(buf, 0xff, 4096); 2437 while (towrite > 0) { 2438 n = towrite; 2439 if (n > 4096) $10 = 110592 Continue program being debugged, after signal or breakpoint. Usage: continue [N] If proceeding from breakpoint, a number N may be used as an argument, which means to set the ignore count of that breakpoint to N - 1 (so that the breakpoint won't break until the Nth time it is reached). If non-stop mode is enabled, continue only the current thread, otherwise all the threads in the program are continued. To continue all stopped threads in non-stop mode, use the -a option. Specifying -a and an ignore count simultaneously is an error. Execute until the program reaches a source line greater than the current or a specified location (same args as break command) within the current frame. write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2451 2451 fsync(fd); Continuing. [Inferior 1 (process 23866) exited with code 01] Breakpoint 2 at 0x440d25: file super1.c, line 165. Starting program: /var/tmp/mdadm/mdadm/mdadm -a /dev/md/pv0 /dev/sdc3 Breakpoint 1, write_bitmap1 (st=0x6b0780, fd=5, update=NodeNumUpdate) at super1.c:2351 2351 struct mdp_superblock_1 *sb = st->sb; Continuing. Breakpoint 2, calc_bitmap_size (bms=0x6c8000, boundary=4096) at super1.c:165 165 bits = __le64_to_cpu(bms->sync_size) / (__le32_to_cpu(bms->chunksize)>>9); bms = 0x6c8000 boundary = 4096 $11 = {magic = 1836345698, version = 4, uuid = "\310@\320\336\006&׃?\033(\334\305\354d\232", events = 1124486, events_cleared = 1124486, sync_size = 3906011136, state = 0, chunksize = 2097152, daemon_sleep = 5, write_behind = 0, sectors_reserved = 0, nodes = 0, cluster_name = '\000' <repeats 63 times>, pad = '\000' <repeats 119 times>} 166 bytes = (bits+7) >> 3; 167 bytes += sizeof(bitmap_super_t); 168 bytes = ROUND_UP(bytes, boundary); $12 = 119458 170 return bytes; $13 = 122880 Continuing. [Inferior 1 (process 25040) exited with code 01] quit ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-20 18:31 ` Anthony DeRobertis @ 2016-09-21 6:45 ` Guoqing Jiang 2016-09-22 2:40 ` Guoqing Jiang 0 siblings, 1 reply; 12+ messages in thread From: Guoqing Jiang @ 2016-09-21 6:45 UTC (permalink / raw) To: Anthony DeRobertis, linux-raid, 837964 On 09/20/2016 02:31 PM, Anthony DeRobertis wrote: > Sorry for the amount of emails I'm sending, but I noticed something > that's probably important. I'm also appending some gdb log from > tracing through the function (trying to answer why it's doing cluster > mode stuff at all). > > While tracing through, I noticed that *before* the write-bitmap loop, > mdadm -E considers the superblock valid. That agrees with what I saw > from strace, I suppose. To my first glance, it figures out how much to > write by calling this function: > > static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned int > boundary) > { > unsigned long long bits, bytes; > > bits = __le64_to_cpu(bms->sync_size) / > (__le32_to_cpu(bms->chunksize)>>9); > bytes = (bits+7) >> 3; > bytes += sizeof(bitmap_super_t); > bytes = ROUND_UP(bytes, boundary); > > return bytes; > } > > That code looked familiar, and I figured out where—it's also in > 95a05b37e8eb2bc0803b1a0298fce6adc60eff16, the commit that I found > originally broke it. But that commit is making a change to it: it > changed the ROUND_UP line from 512 to 4096 (and from the gdb trace, > boundary==4096). > > I tested changing that line to "bytes = ROUND_UP(bytes, 512);", and it > works. Adds the new disk to the array and produces no warnings or errors. I think it is is a coincidence that above change works, 4a3d29e commit made the change but it didn't change the logic at all. Also seems the problem is not related to md-cluster code as your gdb debug shows it run into below part because the version is 4. /* no need to change bms->nodes for other bitmap types */ Thanks, Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-21 6:45 ` Guoqing Jiang @ 2016-09-22 2:40 ` Guoqing Jiang 2016-09-22 16:53 ` Anthony DeRobertis 2016-10-06 1:26 ` Bug#837964: " NeilBrown 0 siblings, 2 replies; 12+ messages in thread From: Guoqing Jiang @ 2016-09-22 2:40 UTC (permalink / raw) To: Anthony DeRobertis, linux-raid, 837964 On 09/21/2016 02:45 AM, Guoqing Jiang wrote: > > > On 09/20/2016 02:31 PM, Anthony DeRobertis wrote: >> Sorry for the amount of emails I'm sending, but I noticed something >> that's probably important. I'm also appending some gdb log from >> tracing through the function (trying to answer why it's doing cluster >> mode stuff at all). >> >> While tracing through, I noticed that *before* the write-bitmap loop, >> mdadm -E considers the superblock valid. That agrees with what I saw >> from strace, I suppose. To my first glance, it figures out how much >> to write by calling this function: >> >> static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned >> int boundary) >> { >> unsigned long long bits, bytes; >> >> bits = __le64_to_cpu(bms->sync_size) / >> (__le32_to_cpu(bms->chunksize)>>9); >> bytes = (bits+7) >> 3; >> bytes += sizeof(bitmap_super_t); >> bytes = ROUND_UP(bytes, boundary); >> >> return bytes; >> } >> >> That code looked familiar, and I figured out where—it's also in >> 95a05b37e8eb2bc0803b1a0298fce6adc60eff16, the commit that I found >> originally broke it. But that commit is making a change to it: it >> changed the ROUND_UP line from 512 to 4096 (and from the gdb trace, >> boundary==4096). >> >> I tested changing that line to "bytes = ROUND_UP(bytes, 512);", and >> it works. Adds the new disk to the array and produces no warnings or >> errors. > > I think it is is a coincidence that above change works, 4a3d29e > commit made > the change but it didn't change the logic at all. Hmm, seems bitmap is aligned to 512 in previous mdadm, but with commit 95a05b3 we made it aligned to 4k, so it causes the latest mdadm can't work with previous created array. Does the below change work? Thanks. diff --git a/super1.c b/super1.c index 9f62d23..6a0b075 100644 --- a/super1.c +++ b/super1.c @@ -2433,7 +2433,10 @@ static int write_bitmap1(struct supertype *st, int fd, enum bitmap_update update memset(buf, 0xff, 4096); memcpy(buf, (char *)bms, sizeof(bitmap_super_t)); - towrite = calc_bitmap_size(bms, 4096); + if (__le32_to_cpu(bms->nodes) == 0) + towrite = calc_bitmap_size(bms, 512); + else + towrite = calc_bitmap_size(bms, 4096); while (towrite > 0) { Regards, Guoqing ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-22 2:40 ` Guoqing Jiang @ 2016-09-22 16:53 ` Anthony DeRobertis 2016-10-06 1:26 ` Bug#837964: " NeilBrown 1 sibling, 0 replies; 12+ messages in thread From: Anthony DeRobertis @ 2016-09-22 16:53 UTC (permalink / raw) To: Guoqing Jiang; +Cc: linux-raid, 837964 On Wed, Sep 21, 2016 at 10:40:22PM -0400, Guoqing Jiang wrote: > > Does the below change work? Thanks. Yes, this patch (after un-email-mangling it) fixes it. > > diff --git a/super1.c b/super1.c > index 9f62d23..6a0b075 100644 > --- a/super1.c > +++ b/super1.c > @@ -2433,7 +2433,10 @@ static int write_bitmap1(struct supertype *st, int > fd, enum bitmap_update update > memset(buf, 0xff, 4096); > memcpy(buf, (char *)bms, sizeof(bitmap_super_t)); > > - towrite = calc_bitmap_size(bms, 4096); > + if (__le32_to_cpu(bms->nodes) == 0) > + towrite = calc_bitmap_size(bms, 512); > + else > + towrite = calc_bitmap_size(bms, 4096); > while (towrite > 0) { > > Regards, > Guoqing ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Bug#837964: 95a05b3 broke mdadm --add on my superblock 1.0 array 2016-09-22 2:40 ` Guoqing Jiang 2016-09-22 16:53 ` Anthony DeRobertis @ 2016-10-06 1:26 ` NeilBrown 1 sibling, 0 replies; 12+ messages in thread From: NeilBrown @ 2016-10-06 1:26 UTC (permalink / raw) To: Guoqing Jiang, 837964, Anthony DeRobertis, linux-raid [-- Attachment #1: Type: text/plain, Size: 4054 bytes --] On Thu, Sep 22 2016, Guoqing Jiang wrote: > On 09/21/2016 02:45 AM, Guoqing Jiang wrote: >> >> >> On 09/20/2016 02:31 PM, Anthony DeRobertis wrote: >>> Sorry for the amount of emails I'm sending, but I noticed something >>> that's probably important. I'm also appending some gdb log from >>> tracing through the function (trying to answer why it's doing cluster >>> mode stuff at all). >>> >>> While tracing through, I noticed that *before* the write-bitmap loop, >>> mdadm -E considers the superblock valid. That agrees with what I saw >>> from strace, I suppose. To my first glance, it figures out how much >>> to write by calling this function: >>> >>> static unsigned int calc_bitmap_size(bitmap_super_t *bms, unsigned >>> int boundary) >>> { >>> unsigned long long bits, bytes; >>> >>> bits = __le64_to_cpu(bms->sync_size) / >>> (__le32_to_cpu(bms->chunksize)>>9); >>> bytes = (bits+7) >> 3; >>> bytes += sizeof(bitmap_super_t); >>> bytes = ROUND_UP(bytes, boundary); >>> >>> return bytes; >>> } >>> >>> That code looked familiar, and I figured out where—it's also in >>> 95a05b37e8eb2bc0803b1a0298fce6adc60eff16, the commit that I found >>> originally broke it. But that commit is making a change to it: it >>> changed the ROUND_UP line from 512 to 4096 (and from the gdb trace, >>> boundary==4096). >>> >>> I tested changing that line to "bytes = ROUND_UP(bytes, 512);", and >>> it works. Adds the new disk to the array and produces no warnings or >>> errors. >> >> I think it is is a coincidence that above change works, 4a3d29e >> commit made >> the change but it didn't change the logic at all. > > Hmm, seems bitmap is aligned to 512 in previous mdadm, but with commit > 95a05b3 > we made it aligned to 4k, so it causes the latest mdadm can't work with > previous > created array. > > Does the below change work? Thanks. > > diff --git a/super1.c b/super1.c > index 9f62d23..6a0b075 100644 > --- a/super1.c > +++ b/super1.c > @@ -2433,7 +2433,10 @@ static int write_bitmap1(struct supertype *st, > int fd, enum bitmap_update update > memset(buf, 0xff, 4096); > memcpy(buf, (char *)bms, sizeof(bitmap_super_t)); > > - towrite = calc_bitmap_size(bms, 4096); > + if (__le32_to_cpu(bms->nodes) == 0) > + towrite = calc_bitmap_size(bms, 512); > + else > + towrite = calc_bitmap_size(bms, 4096); > while (towrite > 0) { (sorry for the late reply ... travel, jetlag, ....) I think a better, simpler, fix is: > - towrite = calc_bitmap_size(bms, 4096); > + towrite = calc_bitmap_size(bms, 512); The only reason that we are rounding up here is that we are using O_DIRECT writes and they require 512-byte alignment. Any bytes beyond the end of the actual bitmap will be ignored, so it doesn't matter whether they are written or not. Current mdadm always aligns bitmaps on a 4K boundary, but older version of mdadm didn't. If the bitmap was less than 4K before the superblock (quite possible), writing 4K for bitmap would corrupt the superblock. This can certainly happen with 1.0 metadata. However ... the reason that everything is now 4K aligned is that some drives use a 4K block size. For those, we really should be doing 4K writes, not 512-byte writes. So it would make sense to round up to 4K sometimes, and use 512 at other times. However the correct test isn't whether cluster-raid is in use. The metadata has always been aligned on a 4K boundary. If data_offset and bblog_offset and bitmap_offset all have 4K alignment, then rounding up to 4K for the bitmap writes would be correct. If anything have a smaller alignment, then it isn't necessary and so should be avoided. So the best fix would be to test those 3 offsets, and round up to a multiple of 4096 only if all of them are on a 4K boundary. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 800 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2016-10-06 1:26 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-09-19 16:32 Bug#837964: 95a05b3 broke mdadm --add on my superblock 1.0 array Anthony DeRobertis 2016-09-20 5:38 ` Guoqing Jiang 2016-09-20 7:02 ` Anthony DeRobertis 2016-09-20 9:36 ` Guoqing Jiang 2016-09-20 17:12 ` Anthony DeRobertis 2016-09-21 6:40 ` Guoqing Jiang 2016-09-20 17:52 ` Anthony DeRobertis 2016-09-20 18:31 ` Anthony DeRobertis 2016-09-21 6:45 ` Guoqing Jiang 2016-09-22 2:40 ` Guoqing Jiang 2016-09-22 16:53 ` Anthony DeRobertis 2016-10-06 1:26 ` Bug#837964: " NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).