Linux RAID subsystem development
 help / color / mirror / Atom feed
* Re: Second of 3 drives in RAID5 missing
From: Alex Elder @ 2023-05-07  1:29 UTC (permalink / raw)
  To: Wol, Hannes Reinecke, linux-raid
In-Reply-To: <0b5a2849-90ec-573c-03ed-0847135a4e9d@youngman.org.uk>

On 5/6/23 6:28 PM, Wol wrote:
> 
> 
> On 06/05/2023 23:29, Hannes Reinecke wrote:
>>>     Device Role : Active device 2
>>>     Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
>>> root@meat:/#
>>>
>> mdadm manage /dev/md127 --add /dev/sdd ?
> 
> OMG NO!
> 
> That really will trash your array ...
> 
> Cheers,
> Wol

This is why I sent the original message; I really
want to avoid losing my data because of a dumb
misunderstanding.  I did look at the Linux_Raid
page on raid.wiki.kernel.org but was not confident
I knew the right thing to do.  I'm very familiar
(as a developer) with storage software, just not
MD and the tools to manage its volumes.

I suspect that putting a proper MD superblock on the
middle partition (sdc1, out of sd{b,c,d}1) might be
enough to get it to assemble again.  After that I
think I'll be able to rebuild the newly replaced
drive and also rename it to /dev/md/z.

Is it an easy command?  Is any more information required?

Thanks.

					-Alex

^ permalink raw reply

* Re: Second of 3 drives in RAID5 missing
From: Wols Lists @ 2023-05-07  7:35 UTC (permalink / raw)
  To: Alex Elder, Hannes Reinecke, linux-raid
In-Reply-To: <8f046b28-f187-66d8-f67c-3e5821f66e92@ieee.org>

On 07/05/2023 02:29, Alex Elder wrote:
> On 5/6/23 6:28 PM, Wol wrote:
>>
>>
>> On 06/05/2023 23:29, Hannes Reinecke wrote:
>>>>     Device Role : Active device 2
>>>>     Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)
>>>> root@meat:/#
>>>>
>>> mdadm manage /dev/md127 --add /dev/sdd ?
>>
>> OMG NO!
>>
>> That really will trash your array ...
>>
>> Cheers,
>> Wol
> 
> This is why I sent the original message; I really
> want to avoid losing my data because of a dumb
> misunderstanding.  I did look at the Linux_Raid
> page on raid.wiki.kernel.org but was not confident
> I knew the right thing to do.  I'm very familiar
> (as a developer) with storage software, just not
> MD and the tools to manage its volumes.
> 
> I suspect that putting a proper MD superblock on the
> middle partition (sdc1, out of sd{b,c,d}1) might be
> enough to get it to assemble again.  After that I
> think I'll be able to rebuild the newly replaced
> drive and also rename it to /dev/md/z.
> 
> Is it an easy command?  Is any more information required?
> 
mdadm array --add /dev/sdc1

The reason I reacted with horror at the previous message is that 
/dev/sdd1 is already part of the array. Adding /dev/sdd (which is quite 
possible) would destroy /dev/sdd1 and you'd be left with only one 
working partition out of three - that's the array gone ...

Read the wiki on how to add a drive. I suspect that's where you went 
wrong in the first place. Make sure you've got the right drive to add - 
it said you had sdb1 and sdd1, so sdc1 is missing and that's the one you 
want to add (CHECK BEFORE YOU ADD). The kernel can move things around so 
make sure between booting and adding that nothing "weird" has happened.

Then you should be good to go. Just CHECK. And DOUBLE CHECK.

Cheers,
Wol


^ permalink raw reply

* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
From: Jove @ 2023-05-07 11:30 UTC (permalink / raw)
  To: Wol; +Cc: Yu Kuai, linux-raid, yukuai (C)
In-Reply-To: <90efe591-999e-93b4-5c52-440fe4aff161@youngman.org.uk>

Hi Wol,

> I wouldn't think it necessary to scrap the array, but if you've backed
> it up and are happier doing so ...

Not particularly. I have taken backups and I am reshaping it to 5 raid
devices and if it works, I'll keep it.

> As for the "invalid backup" problem, you should never have given it a
> backup in the first place, and (while I don't know the code) I very much
> expect it ignored the option completely.

I don't know, Wol. I added the option because the wiki recommended it.
All I know is that when I tried to resume the reshape without the
option or without the --invalid-backup option, mdadm complained it
could not restore the critical section and refused to assemble the
array.

> Once an array goes through reshapes, it can be a lot harder to work
> out the layout if you have to rescue the array by recreating it.

I am no longer going to rely on the array alone to keep my data safe.
Should this array ever fail again, there will be backups to recover
from.

Thanks,

    Johan



On Sat, May 6, 2023 at 11:59 PM Wol <antlists@youngman.org.uk> wrote:
>
> On 06/05/2023 14:07, Jove wrote:
> > Hi Kuai,
> >
> > Just to confirm, the array seems fine after the reshape. Copying files now.
> >
> > Would it be best if I scrap this array and create a new one or is this
> > array safe to use in the long term? It had to use the --invalid-backup
> > flag to get it to reshape, so there might be corruption before that
> > resume point?
> >
> > I have to do a reshape anyway, to 5 raid devices.
> >
> I wouldn't think it necessary to scrap the array, but if you've backed
> it up and are happier doing so ...
>
> AIUI it was an external program squeezing in where it shouldn't that
> (quite literally) threw a spanner in the works and jammed things up. The
> array itself should be perfectly okay.
>
> As for the "invalid backup" problem, you should never have given it a
> backup in the first place, and (while I don't know the code) I very much
> expect it ignored the option completely. You have superblock 1.2, which
> has a chunk of space "reserved for internal use", one of which is to
> provide this backup.
>
> The only real good reason I can think of for scrapping and recreating
> the array is that it will give you a clean array, with ALL THE CURRENT
> DEFAULTS. This is important if anything goes wrong in future, if you
> have an array with a known creation date, that has not been "messed
> about" with since, it's easier to recover if you're really stupid and
> damage it and lose your records of the layout. Once an array goes
> through reshapes, it can be a lot harder to work out the layout if you
> have to rescue the array by recreating it.
>
> Cheers,
> Wol

^ permalink raw reply

* Re: Second of 3 drives in RAID5 missing
From: Alex Elder @ 2023-05-07 16:47 UTC (permalink / raw)
  To: Wols Lists, Hannes Reinecke, linux-raid
In-Reply-To: <b754545f-c505-71d9-6da0-2df8c607ae52@youngman.org.uk>

On 5/7/23 2:35 AM, Wols Lists wrote:
>>
>> Is it an easy command?  Is any more information required?
>>
> mdadm array --add /dev/sdc1

I think I'm on my way back now.  Thank you very much.	-Alex

root@meat:/home/elder# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
[raid4] [raid10]
md127 : active raid5 sdc1[4] sdb1[0] sdd1[3]
       15627786240 blocks super 1.2 level 5, 512k chunk, algorithm 2 
[3/2] [U_U]
       [>....................]  recovery =  1.2% (96275752/7813893120) 
finish=639.3min speed=201178K/sec
       bitmap: 0/59 pages [0KB], 65536KB chunk

unused devices: <none>
root@meat:/home/elder# mdadm --detail /dev/md127
/dev/md127:
            Version : 1.2
      Creation Time : Sun Oct 22 21:19:23 2017
         Raid Level : raid5
         Array Size : 15627786240 (14.55 TiB 16.00 TB)
      Used Dev Size : 7813893120 (7.28 TiB 8.00 TB)
       Raid Devices : 3
      Total Devices : 3
        Persistence : Superblock is persistent

      Intent Bitmap : Internal

        Update Time : Sun May  7 11:45:04 2023
              State : clean, degraded, recovering
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 0
      Spare Devices : 1

             Layout : left-symmetric
         Chunk Size : 512K

Consistency Policy : bitmap

     Rebuild Status : 1% complete

               Name : meat:z  (local to host meat)
               UUID : 8a021a34:f19bbc01:7bcf6f8e:3bea43a9
             Events : 9636

     Number   Major   Minor   RaidDevice State
        0       8       17        0      active sync   /dev/sdb1
        4       8       33        1      spare rebuilding   /dev/sdc1
        3       8       49        2      active sync   /dev/sdd1
root@meat:/home/elder#


^ permalink raw reply

* Re: Second of 3 drives in RAID5 missing
From: Wol @ 2023-05-07 21:17 UTC (permalink / raw)
  To: Alex Elder, Hannes Reinecke, linux-raid
In-Reply-To: <fc55e52f-f97c-03b1-04a2-c2c300f9550b@ieee.org>

On 07/05/2023 17:47, Alex Elder wrote:
> I think I'm on my way back now.  Thank you very much.    -Alex
> 
> root@meat:/home/elder# cat /proc/mdstat
> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
> [raid4] [raid10]
> md127 : active raid5 sdc1[4] sdb1[0] sdd1[3]
>        15627786240 blocks super 1.2 level 5, 512k chunk, algorithm 2 
> [3/2] [U_U]
>        [>....................]  recovery =  1.2% (96275752/7813893120) 
> finish=639.3min speed=201178K/sec
>        bitmap: 0/59 pages [0KB], 65536KB chunk

Looking good!

Just one last bit of advice - if you can, get another drive. Okay, you 
may not have a bay, or a sata port, or the cash ...

Add it to the array same as before. That'll give you a 3-drive raid plus 
spare. If another drive fails, it will then just start rebuilding 
straight away.

You'll get various people saying "if you've got 4 drives, just go 
raid-6". I'm not going to advise either way, other than to say "don't do 
it just now". If you read the list archive you'll see a couple of arrays 
have got wedged upgrading - udevd quite literally threw a monkey wrench 
into the works. Looks like an easy fix, but better not to risk it - just 
wait until the problem is fixed :-)

Nice drives btw, I've got a 3-drive raid-5 plus spare setup - 3TB 
Barracuda (DON'T use those!), two by 4TB Ironwolf, and an 8TB N300.

Cheers,
Wol

^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: Yu Kuai @ 2023-05-08  1:23 UTC (permalink / raw)
  To: David Gilmour, Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <CAO2ABiq5bB0cD7c+cS1Vw2PqSZNadyXUgonfEH6Gwsz8d9OiTQ@mail.gmail.com>

Hi,

在 2023/05/06 21:19, David Gilmour 写道:
>>From what I can tell it does look very similar. I stopped the
> systemd-udevd service and renamed it to systemd-udevd.bak. My system
> still hung on the assemble command. I'm not savvy enough to decode the
> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
> the process stack output suggest the same i/o block the other post
> indicates?
> 
> × systemd-udevd.service - Rule-based Manager for Device Events and Files
>       Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
>       Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
> MDT; 1min 27s ago
>     Duration: 1d 20h 16min 29.633s
> TriggeredBy: × systemd-udevd-kernel.socket
>               × systemd-udevd-control.socket
>         Docs: man:systemd-udevd.service(8)
>               man:udev(7)
>      Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
> (code=exited, status=203/EXEC)
>     Main PID: 27440 (code=exited, status=203/EXEC)
>          CPU: 5ms
> 
> ----------------------
> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> /dev/sdb /dev/sdf --force
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> mdadm: /dev/md127 has an active reshape - checking if critical section
> needs to be restored
> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> mdadm: Failed to find backup of critical section
> mdadm: continuing without restoring backup
> mdadm: added /dev/sdh to /dev/md127 as 1
> mdadm: added /dev/sdg to /dev/md127 as 2
> mdadm: added /dev/sdc to /dev/md127 as 3
> mdadm: added /dev/sdb to /dev/md127 as 4
> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> mdadm: added /dev/sda to /dev/md127 as 0
> 
> #hangs indefinitely at this point in the output
> 
> ------------------------------------------
> 
> 
> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> /dev/sdb /dev/sdf --force
> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
> 
> #cat /proc/27454/stack
> [<0>] mddev_suspend.part.0+0xdf/0x150
> [<0>] suspend_lo_store+0xc5/0xf0
> [<0>] md_attr_store+0x83/0xf0
> [<0>] kernfs_fop_write_iter+0x124/0x1b0
> [<0>] new_sync_write+0xff/0x190
> [<0>] vfs_write+0x1ef/0x280
> [<0>] ksys_write+0x5f/0xe0
> [<0>] do_syscall_64+0x5c/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> #cat /proc/27457/stack
> [<0>] md_thread+0x122/0x160
> [<0>] kthread+0xe0/0x100
> [<0>] ret_from_fork+0x22/0x30
> 

Is there any thread stuck at raid5_make_request? something like below:

Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
pid: 8121 ppid:   706 flags:0x00000006
Apr 23 19:17:22 atom kernel: Call Trace:
Apr 23 19:17:22 atom kernel:  <TASK>
Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90

By the way, cat /sys/block/mdxx/inflight can prove this as well.

If this is the case, can you find out who is accessing the array?

Thanks,
Kuai


^ permalink raw reply

* Re: [PATCH] Fix race of "mdadm --add" and "mdadm --incremental"
From: Li Xiao Keng @ 2023-05-08  1:30 UTC (permalink / raw)
  To: jes, mwilck, pmenzel, colyli, linux-raid; +Cc: miaoguanqin, louhongxiang
In-Reply-To: <b72aacfa-f99f-0322-5247-aa25aa30cd96@huawei.com>

ping

On 2023/4/23 9:30, Li Xiao Keng wrote:
> ping
> 
> On 2023/4/17 22:01, Li Xiao Keng wrote:
>> When we add a new disk to a raid, it may return -EBUSY.
>>
>> The main process of --add:
>> 1. dev_open
>> 2. store_super1(st, di->fd) in write_init_super1
>> 3. fsync(di->fd) in write_init_super1
>> 4. close(di->fd)
>> 5. ioctl(ADD_NEW_DISK)
>>
>> However, there will be some udev(change) event after step4. Then
>> "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
>> will be add to md device. After that, ioctl will return -EBUSY.
>>
>> Here we add map_lock before write_init_super in "mdadm --add"
>> to fix this race.
>>
>> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
>> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
>> ---
>>  Assemble.c |  5 ++++-
>>  Manage.c   | 25 +++++++++++++++++--------
>>  2 files changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/Assemble.c b/Assemble.c
>> index 49804941..086890ed 100644
>> --- a/Assemble.c
>> +++ b/Assemble.c
>> @@ -1479,8 +1479,11 @@ try_again:
>>  	 * to our list.  We flag them so that we don't try to re-add,
>>  	 * but can remove if they turn out to not be wanted.
>>  	 */
>> -	if (map_lock(&map))
>> +	if (map_lock(&map)) {
>>  		pr_err("failed to get exclusive lock on mapfile - continue anyway...\n");
>> +		return 1;
>> +	}
>> +
>>  	if (c->update == UOPT_UUID)
>>  		mp = NULL;
>>  	else
>> diff --git a/Manage.c b/Manage.c
>> index f54de7c6..6a101bae 100644
>> --- a/Manage.c
>> +++ b/Manage.c
>> @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  	struct supertype *dev_st;
>>  	int j;
>>  	mdu_disk_info_t disc;
>> +	struct map_ent *map = NULL;
>>  
>>  	if (!get_dev_size(tfd, dv->devname, &ldsize)) {
>>  		if (dv->disposition == 'M')
>> @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  		disc.raid_disk = 0;
>>  	}
>>  
>> +	if (map_lock(&map)) {
>> +		pr_err("failed to get exclusive lock on mapfile when add disk\n");
>> +		return -1;
>> +	}
>>  	if (array->not_persistent==0) {
>>  		int dfd;
>>  		if (dv->disposition == 'j')
>> @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  		dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT);
>>  		if (tst->ss->add_to_super(tst, &disc, dfd,
>>  					  dv->devname, INVALID_SECTORS))
>> -			return -1;
>> +			goto unlock;
>>  		if (tst->ss->write_init_super(tst))
>> -			return -1;
>> +			goto unlock;
>>  	} else if (dv->disposition == 'A') {
>>  		/*  this had better be raid1.
>>  		 * As we are "--re-add"ing we must find a spare slot
>> @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  			pr_err("add failed for %s: could not get exclusive access to container\n",
>>  			       dv->devname);
>>  			tst->ss->free_super(tst);
>> -			return -1;
>> +			goto unlock;
>>  		}
>>  
>>  		/* Check if metadata handler is able to accept the drive */
>>  		if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER, 0, 1, NULL,
>>  		    0, 0, dv->devname, NULL, 0, 1)) {
>>  			close(container_fd);
>> -			return -1;
>> +			goto unlock;
>>  		}
>>  
>>  		Kill(dv->devname, NULL, 0, -1, 0);
>> @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  					  dv->devname, INVALID_SECTORS)) {
>>  			close(dfd);
>>  			close(container_fd);
>> -			return -1;
>> +			goto unlock;
>>  		}
>>  		if (!mdmon_running(tst->container_devnm))
>>  			tst->ss->sync_metadata(tst);
>> @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  			       dv->devname);
>>  			close(container_fd);
>>  			tst->ss->free_super(tst);
>> -			return -1;
>> +			goto unlock;
>>  		}
>>  		sra->array.level = LEVEL_CONTAINER;
>>  		/* Need to set data_offset and component_size */
>> @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  			pr_err("add new device to external metadata failed for %s\n", dv->devname);
>>  			close(container_fd);
>>  			sysfs_free(sra);
>> -			return -1;
>> +			goto unlock;
>>  		}
>>  		ping_monitor(devnm);
>>  		sysfs_free(sra);
>> @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  			else
>>  				pr_err("add new device failed for %s as %d: %s\n",
>>  				       dv->devname, j, strerror(errno));
>> -			return -1;
>> +			goto unlock;
>>  		}
>>  		if (dv->disposition == 'j') {
>>  			pr_err("Journal added successfully, making %s read-write\n", devname);
>> @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>  	}
>>  	if (verbose >= 0)
>>  		pr_err("added %s\n", dv->devname);
>> +	map_unlock(&map);
>>  	return 1;
>> +unlock:
>> +	map_unlock(&map);
>> +	return -1;
>>  }
>>  
>>  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev *dv,
>>

^ permalink raw reply

* Re: [PATCH] Fix race of "mdadm --add" and "mdadm --incremental"
From: Coly Li @ 2023-05-08  1:40 UTC (permalink / raw)
  To: Li Xiao Keng; +Cc: jes, mwilck, pmenzel, linux-raid, miaoguanqin, louhongxiang
In-Reply-To: <8c6579f8-4510-a3f9-6161-0ea1ee34fec6@huawei.com>



> 2023年5月8日 09:30,Li Xiao Keng <lixiaokeng@huawei.com> 写道:
> 
> ping
> 
> On 2023/4/23 9:30, Li Xiao Keng wrote:
>> ping
>> 
>> On 2023/4/17 22:01, Li Xiao Keng wrote:
>>> When we add a new disk to a raid, it may return -EBUSY.
>>> 
>>> The main process of --add:
>>> 1. dev_open
>>> 2. store_super1(st, di->fd) in write_init_super1
>>> 3. fsync(di->fd) in write_init_super1
>>> 4. close(di->fd)
>>> 5. ioctl(ADD_NEW_DISK)
>>> 
>>> However, there will be some udev(change) event after step4. Then
>>> "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
>>> will be add to md device. After that, ioctl will return -EBUSY.
>>> 

Hi Xiao Keng,

The above description of the race is not informative enough, I am aware exactly how the race is from, therefore I am not able to response for the fix.

Coly Li




>>> Here we add map_lock before write_init_super in "mdadm --add"
>>> to fix this race.
>>> 
>>> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
>>> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>
>>> ---
>>> Assemble.c |  5 ++++-
>>> Manage.c   | 25 +++++++++++++++++--------
>>> 2 files changed, 21 insertions(+), 9 deletions(-)
>>> 
>>> diff --git a/Assemble.c b/Assemble.c
>>> index 49804941..086890ed 100644
>>> --- a/Assemble.c
>>> +++ b/Assemble.c
>>> @@ -1479,8 +1479,11 @@ try_again:
>>> * to our list.  We flag them so that we don't try to re-add,
>>> * but can remove if they turn out to not be wanted.
>>> */
>>> - if (map_lock(&map))
>>> + if (map_lock(&map)) {
>>> pr_err("failed to get exclusive lock on mapfile - continue anyway...\n");
>>> + return 1;
>>> + }
>>> +
>>> if (c->update == UOPT_UUID)
>>> mp = NULL;
>>> else
>>> diff --git a/Manage.c b/Manage.c
>>> index f54de7c6..6a101bae 100644
>>> --- a/Manage.c
>>> +++ b/Manage.c
>>> @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> struct supertype *dev_st;
>>> int j;
>>> mdu_disk_info_t disc;
>>> + struct map_ent *map = NULL;
>>> 
>>> if (!get_dev_size(tfd, dv->devname, &ldsize)) {
>>> if (dv->disposition == 'M')
>>> @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> disc.raid_disk = 0;
>>> }
>>> 
>>> + if (map_lock(&map)) {
>>> + pr_err("failed to get exclusive lock on mapfile when add disk\n");
>>> + return -1;
>>> + }
>>> if (array->not_persistent==0) {
>>> int dfd;
>>> if (dv->disposition == 'j')
>>> @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> dfd = dev_open(dv->devname, O_RDWR | O_EXCL|O_DIRECT);
>>> if (tst->ss->add_to_super(tst, &disc, dfd,
>>>  dv->devname, INVALID_SECTORS))
>>> - return -1;
>>> + goto unlock;
>>> if (tst->ss->write_init_super(tst))
>>> - return -1;
>>> + goto unlock;
>>> } else if (dv->disposition == 'A') {
>>> /*  this had better be raid1.
>>> * As we are "--re-add"ing we must find a spare slot
>>> @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> pr_err("add failed for %s: could not get exclusive access to container\n",
>>>       dv->devname);
>>> tst->ss->free_super(tst);
>>> - return -1;
>>> + goto unlock;
>>> }
>>> 
>>> /* Check if metadata handler is able to accept the drive */
>>> if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER, 0, 1, NULL,
>>>    0, 0, dv->devname, NULL, 0, 1)) {
>>> close(container_fd);
>>> - return -1;
>>> + goto unlock;
>>> }
>>> 
>>> Kill(dv->devname, NULL, 0, -1, 0);
>>> @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>>  dv->devname, INVALID_SECTORS)) {
>>> close(dfd);
>>> close(container_fd);
>>> - return -1;
>>> + goto unlock;
>>> }
>>> if (!mdmon_running(tst->container_devnm))
>>> tst->ss->sync_metadata(tst);
>>> @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>>       dv->devname);
>>> close(container_fd);
>>> tst->ss->free_super(tst);
>>> - return -1;
>>> + goto unlock;
>>> }
>>> sra->array.level = LEVEL_CONTAINER;
>>> /* Need to set data_offset and component_size */
>>> @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> pr_err("add new device to external metadata failed for %s\n", dv->devname);
>>> close(container_fd);
>>> sysfs_free(sra);
>>> - return -1;
>>> + goto unlock;
>>> }
>>> ping_monitor(devnm);
>>> sysfs_free(sra);
>>> @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> else
>>> pr_err("add new device failed for %s as %d: %s\n",
>>>       dv->devname, j, strerror(errno));
>>> - return -1;
>>> + goto unlock;
>>> }
>>> if (dv->disposition == 'j') {
>>> pr_err("Journal added successfully, making %s read-write\n", devname);
>>> @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>>> }
>>> if (verbose >= 0)
>>> pr_err("added %s\n", dv->devname);
>>> + map_unlock(&map);
>>> return 1;
>>> +unlock:
>>> + map_unlock(&map);
>>> + return -1;
>>> }
>>> 
>>> int Manage_remove(struct supertype *tst, int fd, struct mddev_dev *dv,
>>> 


^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: David Gilmour @ 2023-05-08  5:57 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <04036a22-c0b0-8ca1-0220-a531c47a1e25@huaweicloud.com>

[-- Attachment #1: Type: text/plain, Size: 5902 bytes --]

I'm not sure what I'm looking for here but here is the output of the
inflight file immediately after the mdadm assemble hangs. Does this
indicate something accessing the array?

#cat /sys/block/md127/inflight
       1        0

Also attached is an strace of my mdadm command that hung in case that
reveals something relevant:
strace mdadm --assemble --verbose
--backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
/dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
tee mdadm_strace_output.txt

On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/06 21:19, David Gilmour 写道:
> >>From what I can tell it does look very similar. I stopped the
> > systemd-udevd service and renamed it to systemd-udevd.bak. My system
> > still hung on the assemble command. I'm not savvy enough to decode the
> > details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
> > the process stack output suggest the same i/o block the other post
> > indicates?
> >
> > × systemd-udevd.service - Rule-based Manager for Device Events and Files
> >       Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
> >       Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
> > MDT; 1min 27s ago
> >     Duration: 1d 20h 16min 29.633s
> > TriggeredBy: × systemd-udevd-kernel.socket
> >               × systemd-udevd-control.socket
> >         Docs: man:systemd-udevd.service(8)
> >               man:udev(7)
> >      Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
> > (code=exited, status=203/EXEC)
> >     Main PID: 27440 (code=exited, status=203/EXEC)
> >          CPU: 5ms
> >
> > ----------------------
> > #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> > --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> > /dev/sdb /dev/sdf --force
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> > mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> > mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> > mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> > mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
> > mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> > mdadm: /dev/md127 has an active reshape - checking if critical section
> > needs to be restored
> > mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> > mdadm: Failed to find backup of critical section
> > mdadm: continuing without restoring backup
> > mdadm: added /dev/sdh to /dev/md127 as 1
> > mdadm: added /dev/sdg to /dev/md127 as 2
> > mdadm: added /dev/sdc to /dev/md127 as 3
> > mdadm: added /dev/sdb to /dev/md127 as 4
> > mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> > mdadm: added /dev/sda to /dev/md127 as 0
> >
> > #hangs indefinitely at this point in the output
> >
> > ------------------------------------------
> >
> >
> > root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
> > mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> > --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> > /dev/sdb /dev/sdf --force
> > root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
> >
> > #cat /proc/27454/stack
> > [<0>] mddev_suspend.part.0+0xdf/0x150
> > [<0>] suspend_lo_store+0xc5/0xf0
> > [<0>] md_attr_store+0x83/0xf0
> > [<0>] kernfs_fop_write_iter+0x124/0x1b0
> > [<0>] new_sync_write+0xff/0x190
> > [<0>] vfs_write+0x1ef/0x280
> > [<0>] ksys_write+0x5f/0xe0
> > [<0>] do_syscall_64+0x5c/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > #cat /proc/27457/stack
> > [<0>] md_thread+0x122/0x160
> > [<0>] kthread+0xe0/0x100
> > [<0>] ret_from_fork+0x22/0x30
> >
>
> Is there any thread stuck at raid5_make_request? something like below:
>
> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
> pid: 8121 ppid:   706 flags:0x00000006
> Apr 23 19:17:22 atom kernel: Call Trace:
> Apr 23 19:17:22 atom kernel:  <TASK>
> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
>
> By the way, cat /sys/block/mdxx/inflight can prove this as well.
>
> If this is the case, can you find out who is accessing the array?
>
> Thanks,
> Kuai
>

[-- Attachment #2: mdadm_strace_output.zip --]
[-- Type: application/x-zip-compressed, Size: 24214 bytes --]

^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: Yu Kuai @ 2023-05-08  7:08 UTC (permalink / raw)
  To: David Gilmour, Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <CAO2ABioUC9Wy=7FaPAP+AUmd5S-Xanj2d9JJYkqU4BL8DxW5Bg@mail.gmail.com>

Hi,

在 2023/05/08 13:54, David Gilmour 写道:
> I'm not sure what I'm looking for here but here is the output of the
> inflight file immediately after the mdadm assemble hangs. Does this
> indicate something accessing the array?
> 
> #cat /sys/block/md127/inflight
>         1        0
> 

Yes, something is accessing the array. Do you try to grep all the task
that is "D" state?

ps -elf | grep " D "

Is there any task stuck in raid5_make_request?

cat /proc/$pid/stack

> Also attached is an strace of my mdadm command that hung in case that
> reveals something relevant:
> strace mdadm --assemble --verbose
> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
> tee mdadm_strace_output.txt

I don't think this will be helpful, mdadm is unlikely the task that
is accessing the array.

Thanks,
Kuai
> 
> 
> 
> 
> 
> On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/06 21:19, David Gilmour 写道:
>>> >From what I can tell it does look very similar. I stopped the
>>> systemd-udevd service and renamed it to systemd-udevd.bak. My system
>>> still hung on the assemble command. I'm not savvy enough to decode the
>>> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
>>> the process stack output suggest the same i/o block the other post
>>> indicates?
>>>
>>> × systemd-udevd.service - Rule-based Manager for Device Events and Files
>>>        Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
>>>        Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
>>> MDT; 1min 27s ago
>>>      Duration: 1d 20h 16min 29.633s
>>> TriggeredBy: × systemd-udevd-kernel.socket
>>>                × systemd-udevd-control.socket
>>>          Docs: man:systemd-udevd.service(8)
>>>                man:udev(7)
>>>       Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
>>> (code=exited, status=203/EXEC)
>>>      Main PID: 27440 (code=exited, status=203/EXEC)
>>>           CPU: 5ms
>>>
>>> ----------------------
>>> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
>>> /dev/sdb /dev/sdf --force
>>> mdadm: looking for devices for /dev/md127
>>> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
>>> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
>>> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
>>> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
>>> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
>>> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
>>> mdadm: /dev/md127 has an active reshape - checking if critical section
>>> needs to be restored
>>> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
>>> mdadm: Failed to find backup of critical section
>>> mdadm: continuing without restoring backup
>>> mdadm: added /dev/sdh to /dev/md127 as 1
>>> mdadm: added /dev/sdg to /dev/md127 as 2
>>> mdadm: added /dev/sdc to /dev/md127 as 3
>>> mdadm: added /dev/sdb to /dev/md127 as 4
>>> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
>>> mdadm: added /dev/sda to /dev/md127 as 0
>>>
>>> #hangs indefinitely at this point in the output
>>>
>>> ------------------------------------------
>>>
>>>
>>> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
>>> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
>>> /dev/sdb /dev/sdf --force
>>> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
>>>
>>> #cat /proc/27454/stack
>>> [<0>] mddev_suspend.part.0+0xdf/0x150
>>> [<0>] suspend_lo_store+0xc5/0xf0
>>> [<0>] md_attr_store+0x83/0xf0
>>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
>>> [<0>] new_sync_write+0xff/0x190
>>> [<0>] vfs_write+0x1ef/0x280
>>> [<0>] ksys_write+0x5f/0xe0
>>> [<0>] do_syscall_64+0x5c/0x90
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>> #cat /proc/27457/stack
>>> [<0>] md_thread+0x122/0x160
>>> [<0>] kthread+0xe0/0x100
>>> [<0>] ret_from_fork+0x22/0x30
>>>
>>
>> Is there any thread stuck at raid5_make_request? something like below:
>>
>> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
>> pid: 8121 ppid:   706 flags:0x00000006
>> Apr 23 19:17:22 atom kernel: Call Trace:
>> Apr 23 19:17:22 atom kernel:  <TASK>
>> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
>> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
>> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
>> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
>> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
>> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
>> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
>> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
>> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
>> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
>> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
>> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
>> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
>> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
>> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
>> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
>> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
>> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
>> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
>> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
>> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
>> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
>> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
>> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
>> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
>>
>> By the way, cat /sys/block/mdxx/inflight can prove this as well.
>>
>> If this is the case, can you find out who is accessing the array?
>>
>> Thanks,
>> Kuai
>>


^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: David Gilmour @ 2023-05-08  7:43 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <b1252ee9-4309-a1a9-d2c4-3e278a3e70b6@huaweicloud.com>

Two mdadm processes show up:

#ps -elf | grep " D "
1 D root        1251       1  0  80   0 -   936 wait_w 00:28 ?
00:00:00 /sbin/mdadm --monitor --scan --syslog -f
--pid-file=/run/mdadm/mdadm.pid
4 D root        4130    4091  0  80   0 -   953 mddev_ 01:33 pts/1
00:00:00 mdadm --assemble --verbose
--backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
/dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force

Process 1251 (mdadm) has a raid5_make_request descriptor in it:

#cat /proc/1251/stack
[<0>] wait_woken+0x50/0x70
[<0>] raid5_make_request+0x2cb/0x3e0 [raid456]
[<0>] md_handle_request+0x135/0x1e0
[<0>] __submit_bio+0x89/0x130
[<0>] __submit_bio_noacct+0x81/0x1f0
[<0>] submit_bh_wbc+0x11e/0x140
[<0>] block_read_full_folio+0x1d5/0x290
[<0>] filemap_read_folio+0x43/0x300
[<0>] do_read_cache_folio+0x112/0x3f0
[<0>] read_cache_page+0x15/0x90
[<0>] read_part_sector+0x3a/0x160
[<0>] read_lba+0xff/0x260
[<0>] is_gpt_valid.part.0+0x66/0x3d0
[<0>] find_valid_gpt.constprop.0+0x20e/0x540
[<0>] efi_partition+0x80/0x390
[<0>] check_partition+0x103/0x1d0
[<0>] bdev_disk_changed.part.0+0xb5/0x200
[<0>] blkdev_get_whole+0x7a/0x90
[<0>] blkdev_get_by_dev.part.0+0x13b/0x300
[<0>] blkdev_open+0x4c/0x90
[<0>] do_dentry_open+0x14f/0x380
[<0>] do_open+0x21a/0x3d0
[<0>] path_openat+0x10f/0x2b0
[<0>] do_filp_open+0xb2/0x160
[<0>] do_sys_openat2+0x9a/0x160
[<0>] __x64_sys_openat+0x53/0xa0
[<0>] do_syscall_64+0x5c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd

#cat /proc/4130/stack
[<0>] mddev_suspend.part.0+0xdf/0x150
[<0>] suspend_lo_store+0xc5/0xf0
[<0>] md_attr_store+0x83/0xf0
[<0>] kernfs_fop_write_iter+0x124/0x1b0
[<0>] new_sync_write+0xff/0x190
[<0>] vfs_write+0x1ef/0x280
[<0>] ksys_write+0x5f/0xe0
[<0>] do_syscall_64+0x5c/0x90
[<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd


On Mon, May 8, 2023 at 1:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/08 13:54, David Gilmour 写道:
> > I'm not sure what I'm looking for here but here is the output of the
> > inflight file immediately after the mdadm assemble hangs. Does this
> > indicate something accessing the array?
> >
> > #cat /sys/block/md127/inflight
> >         1        0
> >
>
> Yes, something is accessing the array. Do you try to grep all the task
> that is "D" state?
>
> ps -elf | grep " D "
>
> Is there any task stuck in raid5_make_request?
>
> cat /proc/$pid/stack
>
> > Also attached is an strace of my mdadm command that hung in case that
> > reveals something relevant:
> > strace mdadm --assemble --verbose
> > --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> > /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
> > tee mdadm_strace_output.txt
>
> I don't think this will be helpful, mdadm is unlikely the task that
> is accessing the array.
>
> Thanks,
> Kuai
> >
> >
> >
> >
> >
> > On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/06 21:19, David Gilmour 写道:
> >>> >From what I can tell it does look very similar. I stopped the
> >>> systemd-udevd service and renamed it to systemd-udevd.bak. My system
> >>> still hung on the assemble command. I'm not savvy enough to decode the
> >>> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
> >>> the process stack output suggest the same i/o block the other post
> >>> indicates?
> >>>
> >>> × systemd-udevd.service - Rule-based Manager for Device Events and Files
> >>>        Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
> >>>        Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
> >>> MDT; 1min 27s ago
> >>>      Duration: 1d 20h 16min 29.633s
> >>> TriggeredBy: × systemd-udevd-kernel.socket
> >>>                × systemd-udevd-control.socket
> >>>          Docs: man:systemd-udevd.service(8)
> >>>                man:udev(7)
> >>>       Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
> >>> (code=exited, status=203/EXEC)
> >>>      Main PID: 27440 (code=exited, status=203/EXEC)
> >>>           CPU: 5ms
> >>>
> >>> ----------------------
> >>> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> >>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> >>> /dev/sdb /dev/sdf --force
> >>> mdadm: looking for devices for /dev/md127
> >>> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> >>> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> >>> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> >>> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> >>> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
> >>> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> >>> mdadm: /dev/md127 has an active reshape - checking if critical section
> >>> needs to be restored
> >>> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> >>> mdadm: Failed to find backup of critical section
> >>> mdadm: continuing without restoring backup
> >>> mdadm: added /dev/sdh to /dev/md127 as 1
> >>> mdadm: added /dev/sdg to /dev/md127 as 2
> >>> mdadm: added /dev/sdc to /dev/md127 as 3
> >>> mdadm: added /dev/sdb to /dev/md127 as 4
> >>> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> >>> mdadm: added /dev/sda to /dev/md127 as 0
> >>>
> >>> #hangs indefinitely at this point in the output
> >>>
> >>> ------------------------------------------
> >>>
> >>>
> >>> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
> >>> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> >>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> >>> /dev/sdb /dev/sdf --force
> >>> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
> >>>
> >>> #cat /proc/27454/stack
> >>> [<0>] mddev_suspend.part.0+0xdf/0x150
> >>> [<0>] suspend_lo_store+0xc5/0xf0
> >>> [<0>] md_attr_store+0x83/0xf0
> >>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
> >>> [<0>] new_sync_write+0xff/0x190
> >>> [<0>] vfs_write+0x1ef/0x280
> >>> [<0>] ksys_write+0x5f/0xe0
> >>> [<0>] do_syscall_64+0x5c/0x90
> >>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>
> >>> #cat /proc/27457/stack
> >>> [<0>] md_thread+0x122/0x160
> >>> [<0>] kthread+0xe0/0x100
> >>> [<0>] ret_from_fork+0x22/0x30
> >>>
> >>
> >> Is there any thread stuck at raid5_make_request? something like below:
> >>
> >> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
> >> pid: 8121 ppid:   706 flags:0x00000006
> >> Apr 23 19:17:22 atom kernel: Call Trace:
> >> Apr 23 19:17:22 atom kernel:  <TASK>
> >> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
> >> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
> >> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
> >> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
> >> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
> >> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
> >> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
> >> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
> >> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
> >> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
> >> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
> >> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
> >> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
> >> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
> >> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
> >> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
> >> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
> >> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
> >> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
> >> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
> >> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
> >> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
> >> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
> >> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
> >> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
> >>
> >> By the way, cat /sys/block/mdxx/inflight can prove this as well.
> >>
> >> If this is the case, can you find out who is accessing the array?
> >>
> >> Thanks,
> >> Kuai
> >>
>

^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: Yu Kuai @ 2023-05-08  7:55 UTC (permalink / raw)
  To: David Gilmour, Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <CAO2ABioXHT9c4qPx5S4dKsMZLyE0xLGBzST5tSTu8YPmX4FxYQ@mail.gmail.com>

Hi,

在 2023/05/08 15:43, David Gilmour 写道:
> Two mdadm processes show up:
> 
> #ps -elf | grep " D "
> 1 D root        1251       1  0  80   0 -   936 wait_w 00:28 ?
> 00:00:00 /sbin/mdadm --monitor --scan --syslog -f
> --pid-file=/run/mdadm/mdadm.pid

So this is the one accessing the array, can you stop this before
assemble?

Thanks,
Kuai
> 4 D root        4130    4091  0  80   0 -   953 mddev_ 01:33 pts/1
> 00:00:00 mdadm --assemble --verbose
> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
> 
> Process 1251 (mdadm) has a raid5_make_request descriptor in it:
> 
> #cat /proc/1251/stack
> [<0>] wait_woken+0x50/0x70
> [<0>] raid5_make_request+0x2cb/0x3e0 [raid456]
> [<0>] md_handle_request+0x135/0x1e0
> [<0>] __submit_bio+0x89/0x130
> [<0>] __submit_bio_noacct+0x81/0x1f0
> [<0>] submit_bh_wbc+0x11e/0x140
> [<0>] block_read_full_folio+0x1d5/0x290
> [<0>] filemap_read_folio+0x43/0x300
> [<0>] do_read_cache_folio+0x112/0x3f0
> [<0>] read_cache_page+0x15/0x90
> [<0>] read_part_sector+0x3a/0x160
> [<0>] read_lba+0xff/0x260
> [<0>] is_gpt_valid.part.0+0x66/0x3d0
> [<0>] find_valid_gpt.constprop.0+0x20e/0x540
> [<0>] efi_partition+0x80/0x390
> [<0>] check_partition+0x103/0x1d0
> [<0>] bdev_disk_changed.part.0+0xb5/0x200
> [<0>] blkdev_get_whole+0x7a/0x90
> [<0>] blkdev_get_by_dev.part.0+0x13b/0x300
> [<0>] blkdev_open+0x4c/0x90
> [<0>] do_dentry_open+0x14f/0x380
> [<0>] do_open+0x21a/0x3d0
> [<0>] path_openat+0x10f/0x2b0
> [<0>] do_filp_open+0xb2/0x160
> [<0>] do_sys_openat2+0x9a/0x160
> [<0>] __x64_sys_openat+0x53/0xa0
> [<0>] do_syscall_64+0x5c/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> #cat /proc/4130/stack
> [<0>] mddev_suspend.part.0+0xdf/0x150
> [<0>] suspend_lo_store+0xc5/0xf0
> [<0>] md_attr_store+0x83/0xf0
> [<0>] kernfs_fop_write_iter+0x124/0x1b0
> [<0>] new_sync_write+0xff/0x190
> [<0>] vfs_write+0x1ef/0x280
> [<0>] ksys_write+0x5f/0xe0
> [<0>] do_syscall_64+0x5c/0x90
> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> 
> 
> On Mon, May 8, 2023 at 1:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/08 13:54, David Gilmour 写道:
>>> I'm not sure what I'm looking for here but here is the output of the
>>> inflight file immediately after the mdadm assemble hangs. Does this
>>> indicate something accessing the array?
>>>
>>> #cat /sys/block/md127/inflight
>>>          1        0
>>>
>>
>> Yes, something is accessing the array. Do you try to grep all the task
>> that is "D" state?
>>
>> ps -elf | grep " D "
>>
>> Is there any task stuck in raid5_make_request?
>>
>> cat /proc/$pid/stack
>>
>>> Also attached is an strace of my mdadm command that hung in case that
>>> reveals something relevant:
>>> strace mdadm --assemble --verbose
>>> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
>>> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
>>> tee mdadm_strace_output.txt
>>
>> I don't think this will be helpful, mdadm is unlikely the task that
>> is accessing the array.
>>
>> Thanks,
>> Kuai
>>>
>>>
>>>
>>>
>>>
>>> On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/05/06 21:19, David Gilmour 写道:
>>>>> >From what I can tell it does look very similar. I stopped the
>>>>> systemd-udevd service and renamed it to systemd-udevd.bak. My system
>>>>> still hung on the assemble command. I'm not savvy enough to decode the
>>>>> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
>>>>> the process stack output suggest the same i/o block the other post
>>>>> indicates?
>>>>>
>>>>> × systemd-udevd.service - Rule-based Manager for Device Events and Files
>>>>>         Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
>>>>>         Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
>>>>> MDT; 1min 27s ago
>>>>>       Duration: 1d 20h 16min 29.633s
>>>>> TriggeredBy: × systemd-udevd-kernel.socket
>>>>>                 × systemd-udevd-control.socket
>>>>>           Docs: man:systemd-udevd.service(8)
>>>>>                 man:udev(7)
>>>>>        Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
>>>>> (code=exited, status=203/EXEC)
>>>>>       Main PID: 27440 (code=exited, status=203/EXEC)
>>>>>            CPU: 5ms
>>>>>
>>>>> ----------------------
>>>>> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
>>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
>>>>> /dev/sdb /dev/sdf --force
>>>>> mdadm: looking for devices for /dev/md127
>>>>> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
>>>>> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
>>>>> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
>>>>> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
>>>>> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
>>>>> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
>>>>> mdadm: /dev/md127 has an active reshape - checking if critical section
>>>>> needs to be restored
>>>>> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
>>>>> mdadm: Failed to find backup of critical section
>>>>> mdadm: continuing without restoring backup
>>>>> mdadm: added /dev/sdh to /dev/md127 as 1
>>>>> mdadm: added /dev/sdg to /dev/md127 as 2
>>>>> mdadm: added /dev/sdc to /dev/md127 as 3
>>>>> mdadm: added /dev/sdb to /dev/md127 as 4
>>>>> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
>>>>> mdadm: added /dev/sda to /dev/md127 as 0
>>>>>
>>>>> #hangs indefinitely at this point in the output
>>>>>
>>>>> ------------------------------------------
>>>>>
>>>>>
>>>>> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
>>>>> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
>>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
>>>>> /dev/sdb /dev/sdf --force
>>>>> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
>>>>>
>>>>> #cat /proc/27454/stack
>>>>> [<0>] mddev_suspend.part.0+0xdf/0x150
>>>>> [<0>] suspend_lo_store+0xc5/0xf0
>>>>> [<0>] md_attr_store+0x83/0xf0
>>>>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
>>>>> [<0>] new_sync_write+0xff/0x190
>>>>> [<0>] vfs_write+0x1ef/0x280
>>>>> [<0>] ksys_write+0x5f/0xe0
>>>>> [<0>] do_syscall_64+0x5c/0x90
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>
>>>>> #cat /proc/27457/stack
>>>>> [<0>] md_thread+0x122/0x160
>>>>> [<0>] kthread+0xe0/0x100
>>>>> [<0>] ret_from_fork+0x22/0x30
>>>>>
>>>>
>>>> Is there any thread stuck at raid5_make_request? something like below:
>>>>
>>>> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
>>>> pid: 8121 ppid:   706 flags:0x00000006
>>>> Apr 23 19:17:22 atom kernel: Call Trace:
>>>> Apr 23 19:17:22 atom kernel:  <TASK>
>>>> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
>>>> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
>>>> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
>>>> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
>>>> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
>>>> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
>>>> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
>>>> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
>>>> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
>>>> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
>>>> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
>>>> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
>>>> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
>>>> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
>>>> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
>>>> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
>>>> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
>>>> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
>>>> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
>>>> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
>>>> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
>>>> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
>>>> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
>>>> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
>>>> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
>>>>
>>>> By the way, cat /sys/block/mdxx/inflight can prove this as well.
>>>>
>>>> If this is the case, can you find out who is accessing the array?
>>>>
>>>> Thanks,
>>>> Kuai
>>>>
>>
> .
> 


^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: David Gilmour @ 2023-05-08  8:27 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <51a28406-f850-5f4e-1d2d-87c06df75a9d@huaweicloud.com>

This seems to be kicked off somehow from the mdadm assemble command.
There is no additional mdadm process before I run the assemble command
but there is after:

#ps -elf | grep " D "
0 S root        3993    3953  0  80   0 - 55416 pipe_r 02:18 pts/1
00:00:00 grep --color=auto  D

#mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
--invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
/dev/sde /dev/sdf --force
mdadm: looking for devices for /dev/md127
mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
mdadm: /dev/sde is identified as a member of /dev/md127, slot 4.
mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
mdadm: /dev/md127 has an active reshape - checking if critical section
needs to be restored
mdadm: No backup metadata on /root/mdadm5-6_backup_md127
mdadm: Failed to find backup of critical section
mdadm: continuing without restoring backup
mdadm: added /dev/sdh to /dev/md127 as 1
mdadm: added /dev/sdg to /dev/md127 as 2
mdadm: added /dev/sdc to /dev/md127 as 3
mdadm: added /dev/sde to /dev/md127 as 4
mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
mdadm: added /dev/sda to /dev/md127 as 0
<hangs here>

#Then in a separate terminal:
#ps -elf | grep " D "
1 D root        1249       1  0  80   0 -   936 wait_w 02:15 ?
00:00:00 /sbin/mdadm --monitor --scan --syslog -f
--pid-file=/run/mdadm/mdadm.pid
4 D root        4086    4038  0  80   0 -   953 mddev_ 02:22 pts/2
00:00:00 mdadm --assemble --verbose
--backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
/dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
0 S root        4102    3869  0  80   0 - 55449 pipe_r 02:25 pts/0
00:00:00 grep --color=auto  D



On Mon, May 8, 2023 at 1:56 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/08 15:43, David Gilmour 写道:
> > Two mdadm processes show up:
> >
> > #ps -elf | grep " D "
> > 1 D root        1251       1  0  80   0 -   936 wait_w 00:28 ?
> > 00:00:00 /sbin/mdadm --monitor --scan --syslog -f
> > --pid-file=/run/mdadm/mdadm.pid
>
> So this is the one accessing the array, can you stop this before
> assemble?
>
> Thanks,
> Kuai
> > 4 D root        4130    4091  0  80   0 -   953 mddev_ 01:33 pts/1
> > 00:00:00 mdadm --assemble --verbose
> > --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> > /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
> >
> > Process 1251 (mdadm) has a raid5_make_request descriptor in it:
> >
> > #cat /proc/1251/stack
> > [<0>] wait_woken+0x50/0x70
> > [<0>] raid5_make_request+0x2cb/0x3e0 [raid456]
> > [<0>] md_handle_request+0x135/0x1e0
> > [<0>] __submit_bio+0x89/0x130
> > [<0>] __submit_bio_noacct+0x81/0x1f0
> > [<0>] submit_bh_wbc+0x11e/0x140
> > [<0>] block_read_full_folio+0x1d5/0x290
> > [<0>] filemap_read_folio+0x43/0x300
> > [<0>] do_read_cache_folio+0x112/0x3f0
> > [<0>] read_cache_page+0x15/0x90
> > [<0>] read_part_sector+0x3a/0x160
> > [<0>] read_lba+0xff/0x260
> > [<0>] is_gpt_valid.part.0+0x66/0x3d0
> > [<0>] find_valid_gpt.constprop.0+0x20e/0x540
> > [<0>] efi_partition+0x80/0x390
> > [<0>] check_partition+0x103/0x1d0
> > [<0>] bdev_disk_changed.part.0+0xb5/0x200
> > [<0>] blkdev_get_whole+0x7a/0x90
> > [<0>] blkdev_get_by_dev.part.0+0x13b/0x300
> > [<0>] blkdev_open+0x4c/0x90
> > [<0>] do_dentry_open+0x14f/0x380
> > [<0>] do_open+0x21a/0x3d0
> > [<0>] path_openat+0x10f/0x2b0
> > [<0>] do_filp_open+0xb2/0x160
> > [<0>] do_sys_openat2+0x9a/0x160
> > [<0>] __x64_sys_openat+0x53/0xa0
> > [<0>] do_syscall_64+0x5c/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> > #cat /proc/4130/stack
> > [<0>] mddev_suspend.part.0+0xdf/0x150
> > [<0>] suspend_lo_store+0xc5/0xf0
> > [<0>] md_attr_store+0x83/0xf0
> > [<0>] kernfs_fop_write_iter+0x124/0x1b0
> > [<0>] new_sync_write+0xff/0x190
> > [<0>] vfs_write+0x1ef/0x280
> > [<0>] ksys_write+0x5f/0xe0
> > [<0>] do_syscall_64+0x5c/0x90
> > [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >
> >
> > On Mon, May 8, 2023 at 1:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/08 13:54, David Gilmour 写道:
> >>> I'm not sure what I'm looking for here but here is the output of the
> >>> inflight file immediately after the mdadm assemble hangs. Does this
> >>> indicate something accessing the array?
> >>>
> >>> #cat /sys/block/md127/inflight
> >>>          1        0
> >>>
> >>
> >> Yes, something is accessing the array. Do you try to grep all the task
> >> that is "D" state?
> >>
> >> ps -elf | grep " D "
> >>
> >> Is there any task stuck in raid5_make_request?
> >>
> >> cat /proc/$pid/stack
> >>
> >>> Also attached is an strace of my mdadm command that hung in case that
> >>> reveals something relevant:
> >>> strace mdadm --assemble --verbose
> >>> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> >>> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
> >>> tee mdadm_strace_output.txt
> >>
> >> I don't think this will be helpful, mdadm is unlikely the task that
> >> is accessing the array.
> >>
> >> Thanks,
> >> Kuai
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/05/06 21:19, David Gilmour 写道:
> >>>>> >From what I can tell it does look very similar. I stopped the
> >>>>> systemd-udevd service and renamed it to systemd-udevd.bak. My system
> >>>>> still hung on the assemble command. I'm not savvy enough to decode the
> >>>>> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
> >>>>> the process stack output suggest the same i/o block the other post
> >>>>> indicates?
> >>>>>
> >>>>> × systemd-udevd.service - Rule-based Manager for Device Events and Files
> >>>>>         Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
> >>>>>         Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
> >>>>> MDT; 1min 27s ago
> >>>>>       Duration: 1d 20h 16min 29.633s
> >>>>> TriggeredBy: × systemd-udevd-kernel.socket
> >>>>>                 × systemd-udevd-control.socket
> >>>>>           Docs: man:systemd-udevd.service(8)
> >>>>>                 man:udev(7)
> >>>>>        Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
> >>>>> (code=exited, status=203/EXEC)
> >>>>>       Main PID: 27440 (code=exited, status=203/EXEC)
> >>>>>            CPU: 5ms
> >>>>>
> >>>>> ----------------------
> >>>>> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> >>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> >>>>> /dev/sdb /dev/sdf --force
> >>>>> mdadm: looking for devices for /dev/md127
> >>>>> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> >>>>> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> >>>>> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> >>>>> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> >>>>> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
> >>>>> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> >>>>> mdadm: /dev/md127 has an active reshape - checking if critical section
> >>>>> needs to be restored
> >>>>> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> >>>>> mdadm: Failed to find backup of critical section
> >>>>> mdadm: continuing without restoring backup
> >>>>> mdadm: added /dev/sdh to /dev/md127 as 1
> >>>>> mdadm: added /dev/sdg to /dev/md127 as 2
> >>>>> mdadm: added /dev/sdc to /dev/md127 as 3
> >>>>> mdadm: added /dev/sdb to /dev/md127 as 4
> >>>>> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> >>>>> mdadm: added /dev/sda to /dev/md127 as 0
> >>>>>
> >>>>> #hangs indefinitely at this point in the output
> >>>>>
> >>>>> ------------------------------------------
> >>>>>
> >>>>>
> >>>>> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
> >>>>> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> >>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> >>>>> /dev/sdb /dev/sdf --force
> >>>>> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
> >>>>>
> >>>>> #cat /proc/27454/stack
> >>>>> [<0>] mddev_suspend.part.0+0xdf/0x150
> >>>>> [<0>] suspend_lo_store+0xc5/0xf0
> >>>>> [<0>] md_attr_store+0x83/0xf0
> >>>>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
> >>>>> [<0>] new_sync_write+0xff/0x190
> >>>>> [<0>] vfs_write+0x1ef/0x280
> >>>>> [<0>] ksys_write+0x5f/0xe0
> >>>>> [<0>] do_syscall_64+0x5c/0x90
> >>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>>>
> >>>>> #cat /proc/27457/stack
> >>>>> [<0>] md_thread+0x122/0x160
> >>>>> [<0>] kthread+0xe0/0x100
> >>>>> [<0>] ret_from_fork+0x22/0x30
> >>>>>
> >>>>
> >>>> Is there any thread stuck at raid5_make_request? something like below:
> >>>>
> >>>> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
> >>>> pid: 8121 ppid:   706 flags:0x00000006
> >>>> Apr 23 19:17:22 atom kernel: Call Trace:
> >>>> Apr 23 19:17:22 atom kernel:  <TASK>
> >>>> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
> >>>> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
> >>>> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
> >>>> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
> >>>> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
> >>>> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
> >>>> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
> >>>> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
> >>>> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
> >>>> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
> >>>> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
> >>>> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
> >>>> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
> >>>> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
> >>>> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
> >>>> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
> >>>> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
> >>>> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
> >>>> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
> >>>> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
> >>>> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
> >>>> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
> >>>> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
> >>>> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
> >>>> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
> >>>>
> >>>> By the way, cat /sys/block/mdxx/inflight can prove this as well.
> >>>>
> >>>> If this is the case, can you find out who is accessing the array?
> >>>>
> >>>> Thanks,
> >>>> Kuai
> >>>>
> >>
> > .
> >
>

^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: Yu Kuai @ 2023-05-08  9:53 UTC (permalink / raw)
  To: David Gilmour, Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <CAO2ABiqEoi4iB__b7KdYu_jQqmeB8joh5xurHNeXj9583mcjjA@mail.gmail.com>

Hi,

在 2023/05/08 16:27, David Gilmour 写道:
> This seems to be kicked off somehow from the mdadm assemble command.
> There is no additional mdadm process before I run the assemble command
> but there is after:

If this is the case, I'm sorry that I'm not familiar how mdadm works and
how can this deadlock be bypassed. If you can compile a new kernel, I
can give you a patch to let such io failed, but I'm not sure if mdadm
can still make progress or just fail to assemble.

Thanks,
Kuai
> 
> #ps -elf | grep " D "
> 0 S root        3993    3953  0  80   0 - 55416 pipe_r 02:18 pts/1
> 00:00:00 grep --color=auto  D
> 
> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> /dev/sde /dev/sdf --force
> mdadm: looking for devices for /dev/md127
> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> mdadm: /dev/sde is identified as a member of /dev/md127, slot 4.
> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> mdadm: /dev/md127 has an active reshape - checking if critical section
> needs to be restored
> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> mdadm: Failed to find backup of critical section
> mdadm: continuing without restoring backup
> mdadm: added /dev/sdh to /dev/md127 as 1
> mdadm: added /dev/sdg to /dev/md127 as 2
> mdadm: added /dev/sdc to /dev/md127 as 3
> mdadm: added /dev/sde to /dev/md127 as 4
> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> mdadm: added /dev/sda to /dev/md127 as 0
> <hangs here>
> 
> #Then in a separate terminal:
> #ps -elf | grep " D "
> 1 D root        1249       1  0  80   0 -   936 wait_w 02:15 ?
> 00:00:00 /sbin/mdadm --monitor --scan --syslog -f
> --pid-file=/run/mdadm/mdadm.pid
> 4 D root        4086    4038  0  80   0 -   953 mddev_ 02:22 pts/2
> 00:00:00 mdadm --assemble --verbose
> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
> 0 S root        4102    3869  0  80   0 - 55449 pipe_r 02:25 pts/0
> 00:00:00 grep --color=auto  D
> 
> 
> 
> On Mon, May 8, 2023 at 1:56 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>
>> Hi,
>>
>> 在 2023/05/08 15:43, David Gilmour 写道:
>>> Two mdadm processes show up:
>>>
>>> #ps -elf | grep " D "
>>> 1 D root        1251       1  0  80   0 -   936 wait_w 00:28 ?
>>> 00:00:00 /sbin/mdadm --monitor --scan --syslog -f
>>> --pid-file=/run/mdadm/mdadm.pid
>>
>> So this is the one accessing the array, can you stop this before
>> assemble?
>>
>> Thanks,
>> Kuai
>>> 4 D root        4130    4091  0  80   0 -   953 mddev_ 01:33 pts/1
>>> 00:00:00 mdadm --assemble --verbose
>>> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
>>> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
>>>
>>> Process 1251 (mdadm) has a raid5_make_request descriptor in it:
>>>
>>> #cat /proc/1251/stack
>>> [<0>] wait_woken+0x50/0x70
>>> [<0>] raid5_make_request+0x2cb/0x3e0 [raid456]
>>> [<0>] md_handle_request+0x135/0x1e0
>>> [<0>] __submit_bio+0x89/0x130
>>> [<0>] __submit_bio_noacct+0x81/0x1f0
>>> [<0>] submit_bh_wbc+0x11e/0x140
>>> [<0>] block_read_full_folio+0x1d5/0x290
>>> [<0>] filemap_read_folio+0x43/0x300
>>> [<0>] do_read_cache_folio+0x112/0x3f0
>>> [<0>] read_cache_page+0x15/0x90
>>> [<0>] read_part_sector+0x3a/0x160
>>> [<0>] read_lba+0xff/0x260
>>> [<0>] is_gpt_valid.part.0+0x66/0x3d0
>>> [<0>] find_valid_gpt.constprop.0+0x20e/0x540
>>> [<0>] efi_partition+0x80/0x390
>>> [<0>] check_partition+0x103/0x1d0
>>> [<0>] bdev_disk_changed.part.0+0xb5/0x200
>>> [<0>] blkdev_get_whole+0x7a/0x90
>>> [<0>] blkdev_get_by_dev.part.0+0x13b/0x300
>>> [<0>] blkdev_open+0x4c/0x90
>>> [<0>] do_dentry_open+0x14f/0x380
>>> [<0>] do_open+0x21a/0x3d0
>>> [<0>] path_openat+0x10f/0x2b0
>>> [<0>] do_filp_open+0xb2/0x160
>>> [<0>] do_sys_openat2+0x9a/0x160
>>> [<0>] __x64_sys_openat+0x53/0xa0
>>> [<0>] do_syscall_64+0x5c/0x90
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>> #cat /proc/4130/stack
>>> [<0>] mddev_suspend.part.0+0xdf/0x150
>>> [<0>] suspend_lo_store+0xc5/0xf0
>>> [<0>] md_attr_store+0x83/0xf0
>>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
>>> [<0>] new_sync_write+0xff/0x190
>>> [<0>] vfs_write+0x1ef/0x280
>>> [<0>] ksys_write+0x5f/0xe0
>>> [<0>] do_syscall_64+0x5c/0x90
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>
>>>
>>> On Mon, May 8, 2023 at 1:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> 在 2023/05/08 13:54, David Gilmour 写道:
>>>>> I'm not sure what I'm looking for here but here is the output of the
>>>>> inflight file immediately after the mdadm assemble hangs. Does this
>>>>> indicate something accessing the array?
>>>>>
>>>>> #cat /sys/block/md127/inflight
>>>>>           1        0
>>>>>
>>>>
>>>> Yes, something is accessing the array. Do you try to grep all the task
>>>> that is "D" state?
>>>>
>>>> ps -elf | grep " D "
>>>>
>>>> Is there any task stuck in raid5_make_request?
>>>>
>>>> cat /proc/$pid/stack
>>>>
>>>>> Also attached is an strace of my mdadm command that hung in case that
>>>>> reveals something relevant:
>>>>> strace mdadm --assemble --verbose
>>>>> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
>>>>> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
>>>>> tee mdadm_strace_output.txt
>>>>
>>>> I don't think this will be helpful, mdadm is unlikely the task that
>>>> is accessing the array.
>>>>
>>>> Thanks,
>>>> Kuai
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> 在 2023/05/06 21:19, David Gilmour 写道:
>>>>>>> >From what I can tell it does look very similar. I stopped the
>>>>>>> systemd-udevd service and renamed it to systemd-udevd.bak. My system
>>>>>>> still hung on the assemble command. I'm not savvy enough to decode the
>>>>>>> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
>>>>>>> the process stack output suggest the same i/o block the other post
>>>>>>> indicates?
>>>>>>>
>>>>>>> × systemd-udevd.service - Rule-based Manager for Device Events and Files
>>>>>>>          Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
>>>>>>>          Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
>>>>>>> MDT; 1min 27s ago
>>>>>>>        Duration: 1d 20h 16min 29.633s
>>>>>>> TriggeredBy: × systemd-udevd-kernel.socket
>>>>>>>                  × systemd-udevd-control.socket
>>>>>>>            Docs: man:systemd-udevd.service(8)
>>>>>>>                  man:udev(7)
>>>>>>>         Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
>>>>>>> (code=exited, status=203/EXEC)
>>>>>>>        Main PID: 27440 (code=exited, status=203/EXEC)
>>>>>>>             CPU: 5ms
>>>>>>>
>>>>>>> ----------------------
>>>>>>> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
>>>>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
>>>>>>> /dev/sdb /dev/sdf --force
>>>>>>> mdadm: looking for devices for /dev/md127
>>>>>>> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
>>>>>>> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
>>>>>>> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
>>>>>>> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
>>>>>>> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
>>>>>>> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
>>>>>>> mdadm: /dev/md127 has an active reshape - checking if critical section
>>>>>>> needs to be restored
>>>>>>> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
>>>>>>> mdadm: Failed to find backup of critical section
>>>>>>> mdadm: continuing without restoring backup
>>>>>>> mdadm: added /dev/sdh to /dev/md127 as 1
>>>>>>> mdadm: added /dev/sdg to /dev/md127 as 2
>>>>>>> mdadm: added /dev/sdc to /dev/md127 as 3
>>>>>>> mdadm: added /dev/sdb to /dev/md127 as 4
>>>>>>> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
>>>>>>> mdadm: added /dev/sda to /dev/md127 as 0
>>>>>>>
>>>>>>> #hangs indefinitely at this point in the output
>>>>>>>
>>>>>>> ------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
>>>>>>> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
>>>>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
>>>>>>> /dev/sdb /dev/sdf --force
>>>>>>> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
>>>>>>>
>>>>>>> #cat /proc/27454/stack
>>>>>>> [<0>] mddev_suspend.part.0+0xdf/0x150
>>>>>>> [<0>] suspend_lo_store+0xc5/0xf0
>>>>>>> [<0>] md_attr_store+0x83/0xf0
>>>>>>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
>>>>>>> [<0>] new_sync_write+0xff/0x190
>>>>>>> [<0>] vfs_write+0x1ef/0x280
>>>>>>> [<0>] ksys_write+0x5f/0xe0
>>>>>>> [<0>] do_syscall_64+0x5c/0x90
>>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>>>>>>
>>>>>>> #cat /proc/27457/stack
>>>>>>> [<0>] md_thread+0x122/0x160
>>>>>>> [<0>] kthread+0xe0/0x100
>>>>>>> [<0>] ret_from_fork+0x22/0x30
>>>>>>>
>>>>>>
>>>>>> Is there any thread stuck at raid5_make_request? something like below:
>>>>>>
>>>>>> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
>>>>>> pid: 8121 ppid:   706 flags:0x00000006
>>>>>> Apr 23 19:17:22 atom kernel: Call Trace:
>>>>>> Apr 23 19:17:22 atom kernel:  <TASK>
>>>>>> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
>>>>>> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
>>>>>> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
>>>>>> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
>>>>>> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
>>>>>> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
>>>>>> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
>>>>>> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
>>>>>> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
>>>>>> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
>>>>>> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
>>>>>> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
>>>>>> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
>>>>>> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
>>>>>> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
>>>>>> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
>>>>>> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
>>>>>> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
>>>>>> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
>>>>>> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
>>>>>> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
>>>>>> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
>>>>>> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
>>>>>> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
>>>>>> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
>>>>>>
>>>>>> By the way, cat /sys/block/mdxx/inflight can prove this as well.
>>>>>>
>>>>>> If this is the case, can you find out who is accessing the array?
>>>>>>
>>>>>> Thanks,
>>>>>> Kuai
>>>>>>
>>>>
>>> .
>>>
>>
> .
> 


^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: David Gilmour @ 2023-05-08 11:20 UTC (permalink / raw)
  To: Yu Kuai; +Cc: linux-raid, Song Liu, yukuai (C)
In-Reply-To: <1392b816-bdaf-da5f-acc8-b6677aa71e3b@huaweicloud.com>

Ok, well I'm willing to try anything at this point. Do you need
anything from me for a patch? Here is my current kernel details:

Linux homer 5.14.0-305.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 27
11:32:15 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux



On Mon, May 8, 2023 at 3:53 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
>
> Hi,
>
> 在 2023/05/08 16:27, David Gilmour 写道:
> > This seems to be kicked off somehow from the mdadm assemble command.
> > There is no additional mdadm process before I run the assemble command
> > but there is after:
>
> If this is the case, I'm sorry that I'm not familiar how mdadm works and
> how can this deadlock be bypassed. If you can compile a new kernel, I
> can give you a patch to let such io failed, but I'm not sure if mdadm
> can still make progress or just fail to assemble.
>
> Thanks,
> Kuai
> >
> > #ps -elf | grep " D "
> > 0 S root        3993    3953  0  80   0 - 55416 pipe_r 02:18 pts/1
> > 00:00:00 grep --color=auto  D
> >
> > #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> > --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> > /dev/sde /dev/sdf --force
> > mdadm: looking for devices for /dev/md127
> > mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> > mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> > mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> > mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> > mdadm: /dev/sde is identified as a member of /dev/md127, slot 4.
> > mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> > mdadm: /dev/md127 has an active reshape - checking if critical section
> > needs to be restored
> > mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> > mdadm: Failed to find backup of critical section
> > mdadm: continuing without restoring backup
> > mdadm: added /dev/sdh to /dev/md127 as 1
> > mdadm: added /dev/sdg to /dev/md127 as 2
> > mdadm: added /dev/sdc to /dev/md127 as 3
> > mdadm: added /dev/sde to /dev/md127 as 4
> > mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> > mdadm: added /dev/sda to /dev/md127 as 0
> > <hangs here>
> >
> > #Then in a separate terminal:
> > #ps -elf | grep " D "
> > 1 D root        1249       1  0  80   0 -   936 wait_w 02:15 ?
> > 00:00:00 /sbin/mdadm --monitor --scan --syslog -f
> > --pid-file=/run/mdadm/mdadm.pid
> > 4 D root        4086    4038  0  80   0 -   953 mddev_ 02:22 pts/2
> > 00:00:00 mdadm --assemble --verbose
> > --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> > /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
> > 0 S root        4102    3869  0  80   0 - 55449 pipe_r 02:25 pts/0
> > 00:00:00 grep --color=auto  D
> >
> >
> >
> > On Mon, May 8, 2023 at 1:56 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>
> >> Hi,
> >>
> >> 在 2023/05/08 15:43, David Gilmour 写道:
> >>> Two mdadm processes show up:
> >>>
> >>> #ps -elf | grep " D "
> >>> 1 D root        1251       1  0  80   0 -   936 wait_w 00:28 ?
> >>> 00:00:00 /sbin/mdadm --monitor --scan --syslog -f
> >>> --pid-file=/run/mdadm/mdadm.pid
> >>
> >> So this is the one accessing the array, can you stop this before
> >> assemble?
> >>
> >> Thanks,
> >> Kuai
> >>> 4 D root        4130    4091  0  80   0 -   953 mddev_ 01:33 pts/1
> >>> 00:00:00 mdadm --assemble --verbose
> >>> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> >>> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force
> >>>
> >>> Process 1251 (mdadm) has a raid5_make_request descriptor in it:
> >>>
> >>> #cat /proc/1251/stack
> >>> [<0>] wait_woken+0x50/0x70
> >>> [<0>] raid5_make_request+0x2cb/0x3e0 [raid456]
> >>> [<0>] md_handle_request+0x135/0x1e0
> >>> [<0>] __submit_bio+0x89/0x130
> >>> [<0>] __submit_bio_noacct+0x81/0x1f0
> >>> [<0>] submit_bh_wbc+0x11e/0x140
> >>> [<0>] block_read_full_folio+0x1d5/0x290
> >>> [<0>] filemap_read_folio+0x43/0x300
> >>> [<0>] do_read_cache_folio+0x112/0x3f0
> >>> [<0>] read_cache_page+0x15/0x90
> >>> [<0>] read_part_sector+0x3a/0x160
> >>> [<0>] read_lba+0xff/0x260
> >>> [<0>] is_gpt_valid.part.0+0x66/0x3d0
> >>> [<0>] find_valid_gpt.constprop.0+0x20e/0x540
> >>> [<0>] efi_partition+0x80/0x390
> >>> [<0>] check_partition+0x103/0x1d0
> >>> [<0>] bdev_disk_changed.part.0+0xb5/0x200
> >>> [<0>] blkdev_get_whole+0x7a/0x90
> >>> [<0>] blkdev_get_by_dev.part.0+0x13b/0x300
> >>> [<0>] blkdev_open+0x4c/0x90
> >>> [<0>] do_dentry_open+0x14f/0x380
> >>> [<0>] do_open+0x21a/0x3d0
> >>> [<0>] path_openat+0x10f/0x2b0
> >>> [<0>] do_filp_open+0xb2/0x160
> >>> [<0>] do_sys_openat2+0x9a/0x160
> >>> [<0>] __x64_sys_openat+0x53/0xa0
> >>> [<0>] do_syscall_64+0x5c/0x90
> >>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>
> >>> #cat /proc/4130/stack
> >>> [<0>] mddev_suspend.part.0+0xdf/0x150
> >>> [<0>] suspend_lo_store+0xc5/0xf0
> >>> [<0>] md_attr_store+0x83/0xf0
> >>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
> >>> [<0>] new_sync_write+0xff/0x190
> >>> [<0>] vfs_write+0x1ef/0x280
> >>> [<0>] ksys_write+0x5f/0xe0
> >>> [<0>] do_syscall_64+0x5c/0x90
> >>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>
> >>>
> >>> On Mon, May 8, 2023 at 1:08 AM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> 在 2023/05/08 13:54, David Gilmour 写道:
> >>>>> I'm not sure what I'm looking for here but here is the output of the
> >>>>> inflight file immediately after the mdadm assemble hangs. Does this
> >>>>> indicate something accessing the array?
> >>>>>
> >>>>> #cat /sys/block/md127/inflight
> >>>>>           1        0
> >>>>>
> >>>>
> >>>> Yes, something is accessing the array. Do you try to grep all the task
> >>>> that is "D" state?
> >>>>
> >>>> ps -elf | grep " D "
> >>>>
> >>>> Is there any task stuck in raid5_make_request?
> >>>>
> >>>> cat /proc/$pid/stack
> >>>>
> >>>>> Also attached is an strace of my mdadm command that hung in case that
> >>>>> reveals something relevant:
> >>>>> strace mdadm --assemble --verbose
> >>>>> --backup-file=/root/mdadm5-6_backup_md127 --invalid-backup /dev/md127
> >>>>> /dev/sda /dev/sdh /dev/sdg /dev/sdc /dev/sde /dev/sdf --force 2>&1 |
> >>>>> tee mdadm_strace_output.txt
> >>>>
> >>>> I don't think this will be helpful, mdadm is unlikely the task that
> >>>> is accessing the array.
> >>>>
> >>>> Thanks,
> >>>> Kuai
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sun, May 7, 2023 at 7:23 PM Yu Kuai <yukuai1@huaweicloud.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> 在 2023/05/06 21:19, David Gilmour 写道:
> >>>>>>> >From what I can tell it does look very similar. I stopped the
> >>>>>>> systemd-udevd service and renamed it to systemd-udevd.bak. My system
> >>>>>>> still hung on the assemble command. I'm not savvy enough to decode the
> >>>>>>> details here but does the "mddev_suspend.part.0+0xdf/0x150" line in
> >>>>>>> the process stack output suggest the same i/o block the other post
> >>>>>>> indicates?
> >>>>>>>
> >>>>>>> × systemd-udevd.service - Rule-based Manager for Device Events and Files
> >>>>>>>          Loaded: loaded (/usr/lib/systemd/system/systemd-udevd.service; static)
> >>>>>>>          Active: failed (Result: exit-code) since Sat 2023-05-06 06:59:11
> >>>>>>> MDT; 1min 27s ago
> >>>>>>>        Duration: 1d 20h 16min 29.633s
> >>>>>>> TriggeredBy: × systemd-udevd-kernel.socket
> >>>>>>>                  × systemd-udevd-control.socket
> >>>>>>>            Docs: man:systemd-udevd.service(8)
> >>>>>>>                  man:udev(7)
> >>>>>>>         Process: 27440 ExecStart=/usr/lib/systemd/systemd-udevd
> >>>>>>> (code=exited, status=203/EXEC)
> >>>>>>>        Main PID: 27440 (code=exited, status=203/EXEC)
> >>>>>>>             CPU: 5ms
> >>>>>>>
> >>>>>>> ----------------------
> >>>>>>> #mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> >>>>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> >>>>>>> /dev/sdb /dev/sdf --force
> >>>>>>> mdadm: looking for devices for /dev/md127
> >>>>>>> mdadm: /dev/sda is identified as a member of /dev/md127, slot 0.
> >>>>>>> mdadm: /dev/sdh is identified as a member of /dev/md127, slot 1.
> >>>>>>> mdadm: /dev/sdg is identified as a member of /dev/md127, slot 2.
> >>>>>>> mdadm: /dev/sdc is identified as a member of /dev/md127, slot 3.
> >>>>>>> mdadm: /dev/sdb is identified as a member of /dev/md127, slot 4.
> >>>>>>> mdadm: /dev/sdf is identified as a member of /dev/md127, slot 5.
> >>>>>>> mdadm: /dev/md127 has an active reshape - checking if critical section
> >>>>>>> needs to be restored
> >>>>>>> mdadm: No backup metadata on /root/mdadm5-6_backup_md127
> >>>>>>> mdadm: Failed to find backup of critical section
> >>>>>>> mdadm: continuing without restoring backup
> >>>>>>> mdadm: added /dev/sdh to /dev/md127 as 1
> >>>>>>> mdadm: added /dev/sdg to /dev/md127 as 2
> >>>>>>> mdadm: added /dev/sdc to /dev/md127 as 3
> >>>>>>> mdadm: added /dev/sdb to /dev/md127 as 4
> >>>>>>> mdadm: added /dev/sdf to /dev/md127 as 5 (possibly out of date)
> >>>>>>> mdadm: added /dev/sda to /dev/md127 as 0
> >>>>>>>
> >>>>>>> #hangs indefinitely at this point in the output
> >>>>>>>
> >>>>>>> ------------------------------------------
> >>>>>>>
> >>>>>>>
> >>>>>>> root       27454  0.0  0.0   3812  2656 pts/1    D+   07:00   0:00
> >>>>>>> mdadm --assemble --verbose --backup-file=/root/mdadm5-6_backup_md127
> >>>>>>> --invalid-backup /dev/md127 /dev/sda /dev/sdh /dev/sdg /dev/sdc
> >>>>>>> /dev/sdb /dev/sdf --force
> >>>>>>> root       27457  0.0  0.0      0     0 ?        S    07:00   0:00 [md127_raid6]
> >>>>>>>
> >>>>>>> #cat /proc/27454/stack
> >>>>>>> [<0>] mddev_suspend.part.0+0xdf/0x150
> >>>>>>> [<0>] suspend_lo_store+0xc5/0xf0
> >>>>>>> [<0>] md_attr_store+0x83/0xf0
> >>>>>>> [<0>] kernfs_fop_write_iter+0x124/0x1b0
> >>>>>>> [<0>] new_sync_write+0xff/0x190
> >>>>>>> [<0>] vfs_write+0x1ef/0x280
> >>>>>>> [<0>] ksys_write+0x5f/0xe0
> >>>>>>> [<0>] do_syscall_64+0x5c/0x90
> >>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> >>>>>>>
> >>>>>>> #cat /proc/27457/stack
> >>>>>>> [<0>] md_thread+0x122/0x160
> >>>>>>> [<0>] kthread+0xe0/0x100
> >>>>>>> [<0>] ret_from_fork+0x22/0x30
> >>>>>>>
> >>>>>>
> >>>>>> Is there any thread stuck at raid5_make_request? something like below:
> >>>>>>
> >>>>>> Apr 23 19:17:22 atom kernel: task:systemd-udevd   state:D stack:    0
> >>>>>> pid: 8121 ppid:   706 flags:0x00000006
> >>>>>> Apr 23 19:17:22 atom kernel: Call Trace:
> >>>>>> Apr 23 19:17:22 atom kernel:  <TASK>
> >>>>>> Apr 23 19:17:22 atom kernel:  __schedule+0x20a/0x550
> >>>>>> Apr 23 19:17:22 atom kernel:  schedule+0x5a/0xc0
> >>>>>> Apr 23 19:17:22 atom kernel:  schedule_timeout+0x11f/0x160
> >>>>>> Apr 23 19:17:22 atom kernel:  ? make_stripe_request+0x284/0x490 [raid456]
> >>>>>> Apr 23 19:17:22 atom kernel:  wait_woken+0x50/0x70
> >>>>>> Apr 23 19:17:22 atom kernel:  raid5_make_request+0x2cb/0x3e0 [raid456]
> >>>>>> Apr 23 19:17:22 atom kernel:  ? sched_show_numa+0xf0/0xf0
> >>>>>> Apr 23 19:17:22 atom kernel:  md_handle_request+0x132/0x1e0
> >>>>>> Apr 23 19:17:22 atom kernel:  ? do_mpage_readpage+0x282/0x6b0
> >>>>>> Apr 23 19:17:22 atom kernel:  __submit_bio+0x86/0x130
> >>>>>> Apr 23 19:17:22 atom kernel:  __submit_bio_noacct+0x81/0x1f0
> >>>>>> Apr 23 19:17:22 atom kernel:  mpage_readahead+0x15c/0x1d0
> >>>>>> Apr 23 19:17:22 atom kernel:  ? blkdev_write_begin+0x20/0x20
> >>>>>> Apr 23 19:17:22 atom kernel:  read_pages+0x58/0x2f0
> >>>>>> Apr 23 19:17:22 atom kernel:  page_cache_ra_unbounded+0x137/0x180
> >>>>>> Apr 23 19:17:22 atom kernel:  force_page_cache_ra+0xc5/0xf0
> >>>>>> Apr 23 19:17:22 atom kernel:  filemap_get_pages+0xe4/0x350
> >>>>>> Apr 23 19:17:22 atom kernel:  filemap_read+0xbe/0x3c0
> >>>>>> Apr 23 19:17:22 atom kernel:  ? make_kgid+0x13/0x20
> >>>>>> Apr 23 19:17:22 atom kernel:  ? deactivate_locked_super+0x90/0xa0
> >>>>>> Apr 23 19:17:22 atom kernel:  blkdev_read_iter+0xaf/0x170
> >>>>>> Apr 23 19:17:22 atom kernel:  new_sync_read+0xf9/0x180
> >>>>>> Apr 23 19:17:22 atom kernel:  vfs_read+0x13c/0x190
> >>>>>> Apr 23 19:17:22 atom kernel:  ksys_read+0x5f/0xe0
> >>>>>> Apr 23 19:17:22 atom kernel:  do_syscall_64+0x59/0x90
> >>>>>>
> >>>>>> By the way, cat /sys/block/mdxx/inflight can prove this as well.
> >>>>>>
> >>>>>> If this is the case, can you find out who is accessing the array?
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Kuai
> >>>>>>
> >>>>
> >>> .
> >>>
> >>
> > .
> >
>

^ permalink raw reply

* [PATCH 1/1] Stop mdcheck_continue timer when mdcheck_start service can finish check
From: Xiao Ni @ 2023-05-08 13:30 UTC (permalink / raw)
  To: jes; +Cc: linux-raid

mdcheck_continue is triggered by mdcheck_start timer. It's used to
continue check action if the raid is too big and mdcheck_start
service can't finish check action. If mdcheck start can finish check
action, it doesn't need to mdcheck continue service anymore. So stop
it when mdcheck start service can finish check action.

Signed-off-by: Xiao Ni <xni@redhat.com>
---
 misc/mdcheck | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/misc/mdcheck b/misc/mdcheck
index 700c3e252e72..f56972c8ed10 100644
--- a/misc/mdcheck
+++ b/misc/mdcheck
@@ -140,7 +140,13 @@ do
 		echo $a > $fl
 		any=yes
 	done
-	if [ -z "$any" ]; then exit 0; fi
+	if [ -z "$any" ]; then
+		#mdcheck_continue.timer is started by mdcheck_start.timer.
+		#When he check action can be finished in mdcheck_start.service,
+		#it doesn't need mdcheck_continue anymore.
+		systemctl stop mdcheck_continue.timer
+		exit 0;
+	fi
 	sleep 120
 done
 
-- 
2.32.0 (Apple Git-132)


^ permalink raw reply related

* Re: [PATCH] Fix race of "mdadm --add" and "mdadm --incremental"
From: Martin Wilck @ 2023-05-08 13:35 UTC (permalink / raw)
  To: Li Xiao Keng, jes, pmenzel, colyli, linux-raid; +Cc: miaoguanqin, louhongxiang
In-Reply-To: <20230417140144.3013024-1-lixiaokeng@huawei.com>

Hello Li Xiao Keng,

On Mon, 2023-04-17 at 22:01 +0800, Li Xiao Keng wrote:
> When we add a new disk to a raid, it may return -EBUSY.
> 
> The main process of --add:
> 1. dev_open
> 2. store_super1(st, di->fd) in write_init_super1
> 3. fsync(di->fd) in write_init_super1
> 4. close(di->fd)
> 5. ioctl(ADD_NEW_DISK)
> 
> However, there will be some udev(change) event after step4. Then
> "/usr/sbin/mdadm --incremental ..." will be run, and the new disk
> will be add to md device. After that, ioctl will return -EBUSY.
> 
> Here we add map_lock before write_init_super in "mdadm --add"
> to fix this race.
> 
> Signed-off-by: Li Xiao Keng <lixiaokeng@huawei.com>
> Signed-off-by: Guanqin Miao <miaoguanqin@huawei.com>

I don't feel familiar enough with the mdadm code to write an
authoritative review. In particular, I don't fully understand the 
intended semantics of map_lock(); thus while I believe the way you 
are using this lock is correct, I can't assert this with certainty.
Jes, Coly, or someone else should double-check that.

Anyway, I'll try to get the communication on this issue going again.

I think you should expand some more on your commit description. It
makes sense for people who followed the previous discussion, but it
should be self-explanatory also for people reading the commit 5y from
now.

Other than that, I only have one nitpick (see below).

Regards
Martin

> ---
>  Assemble.c |  5 ++++-
>  Manage.c   | 25 +++++++++++++++++--------
>  2 files changed, 21 insertions(+), 9 deletions(-)
> 
> diff --git a/Assemble.c b/Assemble.c
> index 49804941..086890ed 100644
> --- a/Assemble.c
> +++ b/Assemble.c
> @@ -1479,8 +1479,11 @@ try_again:
>          * to our list.  We flag them so that we don't try to re-add,
>          * but can remove if they turn out to not be wanted.
>          */
> -       if (map_lock(&map))
> +       if (map_lock(&map)) {
>                 pr_err("failed to get exclusive lock on mapfile -
> continue anyway...\n");

As you added a "return 1" here, the "continue anyway" message is wrong.
You need to change it.

> +               return 1;
> +       }
> +
>         if (c->update == UOPT_UUID)
>                 mp = NULL;
>         else
> diff --git a/Manage.c b/Manage.c
> index f54de7c6..6a101bae 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>         struct supertype *dev_st;
>         int j;
>         mdu_disk_info_t disc;
> +       struct map_ent *map = NULL;
>  
>         if (!get_dev_size(tfd, dv->devname, &ldsize)) {
>                 if (dv->disposition == 'M')
> @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                 disc.raid_disk = 0;
>         }
>  
> +       if (map_lock(&map)) {
> +               pr_err("failed to get exclusive lock on mapfile when
> add disk\n");
> +               return -1;
> +       }
>         if (array->not_persistent==0) {
>                 int dfd;
>                 if (dv->disposition == 'j')
> @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                 dfd = dev_open(dv->devname, O_RDWR |
> O_EXCL|O_DIRECT);
>                 if (tst->ss->add_to_super(tst, &disc, dfd,
>                                           dv->devname,
> INVALID_SECTORS))
> -                       return -1;
> +                       goto unlock;
>                 if (tst->ss->write_init_super(tst))
> -                       return -1;
> +                       goto unlock;
>         } else if (dv->disposition == 'A') {
>                 /*  this had better be raid1.
>                  * As we are "--re-add"ing we must find a spare slot
> @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>                         pr_err("add failed for %s: could not get
> exclusive access to container\n",
>                                dv->devname);
>                         tst->ss->free_super(tst);
> -                       return -1;
> +                       goto unlock;
>                 }
>  
>                 /* Check if metadata handler is able to accept the
> drive */
>                 if (!tst->ss->validate_geometry(tst, LEVEL_CONTAINER,
> 0, 1, NULL,
>                     0, 0, dv->devname, NULL, 0, 1)) {
>                         close(container_fd);
> -                       return -1;
> +                       goto unlock;
>                 }
>  
>                 Kill(dv->devname, NULL, 0, -1, 0);
> @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                                           dv->devname,
> INVALID_SECTORS)) {
>                         close(dfd);
>                         close(container_fd);
> -                       return -1;
> +                       goto unlock;
>                 }
>                 if (!mdmon_running(tst->container_devnm))
>                         tst->ss->sync_metadata(tst);
> @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct mddev_dev
> *dv,
>                                dv->devname);
>                         close(container_fd);
>                         tst->ss->free_super(tst);
> -                       return -1;
> +                       goto unlock;
>                 }
>                 sra->array.level = LEVEL_CONTAINER;
>                 /* Need to set data_offset and component_size */
> @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>                         pr_err("add new device to external metadata
> failed for %s\n", dv->devname);
>                         close(container_fd);
>                         sysfs_free(sra);
> -                       return -1;
> +                       goto unlock;
>                 }
>                 ping_monitor(devnm);
>                 sysfs_free(sra);
> @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>                         else
>                                 pr_err("add new device failed for %s
> as %d: %s\n",
>                                        dv->devname, j,
> strerror(errno));
> -                       return -1;
> +                       goto unlock;
>                 }
>                 if (dv->disposition == 'j') {
>                         pr_err("Journal added successfully, making %s
> read-write\n", devname);
> @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct
> mddev_dev *dv,
>         }
>         if (verbose >= 0)
>                 pr_err("added %s\n", dv->devname);
> +       map_unlock(&map);
>         return 1;
> +unlock:
> +       map_unlock(&map);
> +       return -1;
>  }
>  
>  int Manage_remove(struct supertype *tst, int fd, struct mddev_dev
> *dv,


^ permalink raw reply

* Re: mdadm minimum kernel version requirements?
From: Jes Sorensen @ 2023-05-08 20:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: Kernel.org-Linux-RAID, Mariusz Tkaczyk
In-Reply-To: <168116364433.24821.9557577764628245206@noble.neil.brown.name>

On 4/10/23 17:54, NeilBrown wrote:
> On Tue, 11 Apr 2023, Jes Sorensen wrote:
>> Hi,
>>
>> I bumped the minimum kernel version required for mdadm to 2.6.32.
>>
>> Should we drop support for anything prior to 3.10 at this point, since
>> RHEL7 is 3.10 based and SLES12 seems to be 3.12 based.
>>
>> Thoughts?
> 
> When you talk about changing the required kernel version, I would find
> it helpful if you at least mention what actual kernel features you now
> want to depend on - at least the more significant ones.
> 
> Aside from features, I'd rather think about how old the kernel is.
> 2.6.32 is over 13 years old.
> 3.10 is very nearly 10 years old.
> If there is something significant that landed in 3.10 that we want to
> depend on, then requiring that seems perfectly reasonable.
> 
> I think the oldest SLE kernel that you might care about would be 4.12
> (SLE12-SP5 - nearly 6 years old).  Anyone using an older SLE release
> values stability over new functionality and is not going to be trialling
> a new mdadm.

Hi Neil,

I guess my mindset is more that I don't expect RHEL/SLES grade distros
to fully upgrade mdadm, but I do see them backporting changes occasionally.

I was mostly basing my question on what I see us testing for in the
actual code. Dropping support for anything prior to SLES 12 (4.12) and
RHEL 8 (kernel 4.18) seems fair.

Cheers,
Jes



^ permalink raw reply

* Re: [PATCH 0/4] Few config related refactors
From: Jes Sorensen @ 2023-05-08 20:26 UTC (permalink / raw)
  To: Mariusz Tkaczyk; +Cc: linux-raid, colyli, xni
In-Reply-To: <20230323165017.27121-1-mariusz.tkaczyk@linux.intel.com>

On 3/23/23 12:50, Mariusz Tkaczyk wrote:
> Hi Jes,
> These patches remove multiple inlines across code and replace them
> by defines or functions. No functional changes intended. The goal
> is to make this some code reusable for both config and cmdline
> (mdadm.c). I next patchset I will start optimizing names verification
> (extended v2 of previous patchset).

Applied!

I'll push the later, I left my key at home.

Thanks,
Jes



^ permalink raw reply

* Re: [PATCH] enable RAID for SATA under VMD
From: Jes Sorensen @ 2023-05-08 20:29 UTC (permalink / raw)
  To: Mariusz Tkaczyk, Kevin Friedberg; +Cc: Kinga Tanska, linux-raid
In-Reply-To: <20230505094410.00001aa3@linux.intel.com>

On 5/5/23 03:44, Mariusz Tkaczyk wrote:
> On Fri, 5 May 2023 03:31:11 -0400
> Kevin Friedberg <kev.friedberg@gmail.com> wrote:
> 
>> On Fri, Apr 28, 2023 at 3:31 AM Kinga Tanska
>> <kinga.tanska@linux.intel.com> wrote:
>>
>>> Hi,
>>>
>>> We've been able to test this change and we haven't found problems.
>>>
>>> Regards,
>>> Kinga  
>>
>> Great!  What are the next steps to get it included in a future release?
> See patchwork:
> https://patchwork.kernel.org/project/linux-raid/patch/20230216044134.30581-1-kev.friedberg@gmail.com/
> 
> I moved the patch to "awaiting upstream". Now it is up to Jes.
> You will get mail, like here:
> https://lore.kernel.org/linux-raid/5f493463-6e69-419f-affc-b0de8424fa1a@trained-monkey.org/

Applied!

Thanks Mariusz for reviewing.

Cheers,
Jes



^ permalink raw reply

* Re: [PATCH 0/2] Fix unsafe string functions
From: Jes Sorensen @ 2023-05-08 20:31 UTC (permalink / raw)
  To: Kinga Tanska, linux-raid; +Cc: colyli
In-Reply-To: <20230420234658.367-1-kinga.tanska@intel.com>

On 4/20/23 19:46, Kinga Tanska wrote:
> This series of patches contains fixes for unsafe string
> functions usings. Unsafe functions were replaced with
> new ones that limites the input length.
> 
> Kinga Tanska (2):
>   Fix unsafe string functions
>   platform-intel: limit guid length
> 
>  mdmon.c          | 6 +++---
>  mdopen.c         | 4 ++--
>  platform-intel.c | 5 +----
>  platform-intel.h | 5 ++++-
>  super-intel.c    | 6 +++---
>  5 files changed, 13 insertions(+), 13 deletions(-)
> 

Hi Kinga,

This conflicts after applying Mariusz' changes.

Mind rebasing?

Thanks,
Jes


^ permalink raw reply

* Re: mdadm minimum kernel version requirements?
From: NeilBrown @ 2023-05-08 21:04 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Kernel.org-Linux-RAID, Mariusz Tkaczyk
In-Reply-To: <9bfd76c4-3775-4ba6-10c3-ac32b5389f63@trained-monkey.org>

On Tue, 09 May 2023, Jes Sorensen wrote:
> On 4/10/23 17:54, NeilBrown wrote:
> > On Tue, 11 Apr 2023, Jes Sorensen wrote:
> >> Hi,
> >>
> >> I bumped the minimum kernel version required for mdadm to 2.6.32.
> >>
> >> Should we drop support for anything prior to 3.10 at this point, since
> >> RHEL7 is 3.10 based and SLES12 seems to be 3.12 based.
> >>
> >> Thoughts?
> > 
> > When you talk about changing the required kernel version, I would find
> > it helpful if you at least mention what actual kernel features you now
> > want to depend on - at least the more significant ones.
> > 
> > Aside from features, I'd rather think about how old the kernel is.
> > 2.6.32 is over 13 years old.
> > 3.10 is very nearly 10 years old.
> > If there is something significant that landed in 3.10 that we want to
> > depend on, then requiring that seems perfectly reasonable.
> > 
> > I think the oldest SLE kernel that you might care about would be 4.12
> > (SLE12-SP5 - nearly 6 years old).  Anyone using an older SLE release
> > values stability over new functionality and is not going to be trialling
> > a new mdadm.
> 
> Hi Neil,
> 
> I guess my mindset is more that I don't expect RHEL/SLES grade distros
> to fully upgrade mdadm, but I do see them backporting changes occasionally.
> 
> I was mostly basing my question on what I see us testing for in the
> actual code. Dropping support for anything prior to SLES 12 (4.12) and
> RHEL 8 (kernel 4.18) seems fair.

So where you say "dropping support" you don't actually mean removing any
code, but only that you will document somewhere that no effort will be
made support, or test against, earlier kernels. Is that correct?
Sounds reasonable to me.

NeilBrown

^ permalink raw reply

* Re: mdadm grow raid 5 to 6 failure (crash)
From: Roger Heflin @ 2023-05-08 22:53 UTC (permalink / raw)
  To: David Gilmour; +Cc: Yu Kuai, linux-raid, Song Liu, yukuai (C)
In-Reply-To: <CAO2ABiqkg7HobNvXRWrid36+uYwZ3yHqLmbft_FQwzD9-B7mRg@mail.gmail.com>

On Mon, May 8, 2023 at 6:57 AM David Gilmour <dgilmour76@gmail.com> wrote:
>
> Ok, well I'm willing to try anything at this point. Do you need
> anything from me for a patch? Here is my current kernel details:

grep -i mdadm /etc/udev/rules.d/* /lib/udev/rules.d/*

If you can find a udev rule that starts up the monitor then move that
rule out of the directory, so that on the next assemble try it does
not get started.

If this is the recent bug that is being discussed then anything
accessing the array after the reshape will deadlock the array and the
reshape.

^ permalink raw reply

* Re: Raid5 to raid6 grow interrupted, mdadm hangs on assemble command
From: Yu Kuai @ 2023-05-09  2:10 UTC (permalink / raw)
  To: Jove, Yu Kuai
  Cc: Wol, linux-raid, yukuai (C), songliubraving, Logan Gunthorpe
In-Reply-To: <CAFig2csN8qSEafSehM5oN-kO3FsK=6+vvyEeiYcbvqRkmoiN7Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1278 bytes --]

Hi, Jove

在 2023/05/06 21:07, Jove 写道:
> Hi Kuai,
> 
> Just to confirm, the array seems fine after the reshape. Copying files now.
> 
> Would it be best if I scrap this array and create a new one or is this
> array safe to use in the long term? It had to use the --invalid-backup
> flag to get it to reshape, so there might be corruption before that
> resume point?
> 
> I have to do a reshape anyway, to 5 raid devices.
> 
>> In the meantime, I'll try to fix this deadlock, hope you don't mind a
>> reported-by tag.
> 
> I would not, thank you.
> 
> I still have the backup images of the drive in reshape. If you wish I
> can test any fix you create.

Here is the first verion of the fixed patch, I fail the io that is
waiting for reshape while reshape can't make progress. I tested in my
VM and it works as I expected. Can you give it a try to see if mdadm
can still assemble?

Thanks,
Kuai
> 
>> I have no idea why systemd-udevd is accessing the array.
> 
> My guess is it is accessing this array is because it checks it for the
> lvm layout so it can automatically create the /dev/mapper entries.
> With systemd-udevd disabled, these entries to not automatically
> appear.
> 
> And thank you again for getting me my data back.
> 
> Best regards,
> 
>     Johan
> .
> 

[-- Attachment #2: 0001-md-fix-raid456-deadlock.patch --]
[-- Type: text/plain, Size: 5758 bytes --]

From 159ea7c8d591882dfbbdf30938c1c1d5bc9d4931 Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3@huawei.com>
Date: Tue, 9 May 2023 09:28:36 +0800
Subject: [PATCH] md: fix raid456 deadlock

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 drivers/md/md.c    | 20 ++++----------------
 drivers/md/md.h    | 18 ++++++++++++++++++
 drivers/md/raid5.c | 32 +++++++++++++++++++++++++++++++-
 3 files changed, 53 insertions(+), 17 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8e344b4b3444..462529e47f19 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -93,18 +93,6 @@ static int remove_and_add_spares(struct mddev *mddev,
 				 struct md_rdev *this);
 static void mddev_detach(struct mddev *mddev);
 
-enum md_ro_state {
-	MD_RDWR,
-	MD_RDONLY,
-	MD_AUTO_READ,
-	MD_MAX_STATE
-};
-
-static bool md_is_rdwr(struct mddev *mddev)
-{
-	return (mddev->ro == MD_RDWR);
-}
-
 /*
  * Default number of read corrections we'll attempt on an rdev
  * before ejecting it from the array. We divide the read error
@@ -360,10 +348,6 @@ EXPORT_SYMBOL_GPL(md_new_event);
 static LIST_HEAD(all_mddevs);
 static DEFINE_SPINLOCK(all_mddevs_lock);
 
-static bool is_md_suspended(struct mddev *mddev)
-{
-	return percpu_ref_is_dying(&mddev->active_io);
-}
 /* Rather than calling directly into the personality make_request function,
  * IO requests come here first so that we can check if the device is
  * being suspended pending a reconfiguration.
@@ -464,6 +448,10 @@ void mddev_suspend(struct mddev *mddev)
 	wake_up(&mddev->sb_wait);
 	set_bit(MD_ALLOW_SB_UPDATE, &mddev->flags);
 	percpu_ref_kill(&mddev->active_io);
+
+	if (mddev->pers->prepare_suspend)
+		mddev->pers->prepare_suspend(mddev);
+
 	wait_event(mddev->sb_wait, percpu_ref_is_zero(&mddev->active_io));
 	mddev->pers->quiesce(mddev, 1);
 	clear_bit_unlock(MD_ALLOW_SB_UPDATE, &mddev->flags);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index fd8f260ed5f8..292b96a15890 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -536,6 +536,23 @@ struct mddev {
 	bool	serialize_policy:1;
 };
 
+enum md_ro_state {
+	MD_RDWR,
+	MD_RDONLY,
+	MD_AUTO_READ,
+	MD_MAX_STATE
+};
+
+static inline bool md_is_rdwr(struct mddev *mddev)
+{
+	return (mddev->ro == MD_RDWR);
+}
+
+static inline bool is_md_suspended(struct mddev *mddev)
+{
+	return percpu_ref_is_dying(&mddev->active_io);
+}
+
 enum recovery_flags {
 	/*
 	 * If neither SYNC or RESHAPE are set, then it is a recovery.
@@ -614,6 +631,7 @@ struct md_personality
 	int (*start_reshape) (struct mddev *mddev);
 	void (*finish_reshape) (struct mddev *mddev);
 	void (*update_reshape_pos) (struct mddev *mddev);
+	void (*prepare_suspend) (struct mddev *mddev);
 	/* quiesce suspends or resumes internal processing.
 	 * 1 - stop new actions and wait for action io to complete
 	 * 0 - return to normal behaviour
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 812a12e3e41a..5a24935c113d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -761,6 +761,7 @@ enum stripe_result {
 	STRIPE_RETRY,
 	STRIPE_SCHEDULE_AND_RETRY,
 	STRIPE_FAIL,
+	STRIPE_FAIL_AND_RETRY,
 };
 
 struct stripe_request_ctx {
@@ -5997,7 +5998,8 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 			if (ahead_of_reshape(mddev, logical_sector,
 					     conf->reshape_safe)) {
 				spin_unlock_irq(&conf->device_lock);
-				return STRIPE_SCHEDULE_AND_RETRY;
+				ret = STRIPE_SCHEDULE_AND_RETRY;
+				goto out;
 			}
 		}
 		spin_unlock_irq(&conf->device_lock);
@@ -6076,6 +6078,18 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
 
 out_release:
 	raid5_release_stripe(sh);
+out:
+	/*
+	 * There is no point to wait for reshape because reshape can't make
+	 * progress if the array is suspended or is not read write.
+	 */
+	if (ret == STRIPE_SCHEDULE_AND_RETRY &&
+	    (is_md_suspended(mddev) || !md_is_rdwr(mddev))) {
+		bi->bi_status = BLK_STS_IOERR;
+		ret = STRIPE_FAIL;
+		pr_err("md/raid456:%s: array is suspended or not read write, io accross reshape position failed, please try again after reshape.\n",
+		       mdname(mddev));
+	}
 	return ret;
 }
 
@@ -8654,6 +8668,19 @@ static void raid5_finish_reshape(struct mddev *mddev)
 	}
 }
 
+static void raid5_prepare_suspend(struct mddev *mddev)
+{
+	struct r5conf *conf = mddev->private;
+
+	/*
+	 * Before waiting for active_io to be done, fail all the io that is
+	 * waiting for reshape because they can never be done after suspend.
+	 *
+	 * Perhaps it's better to let those io wait for resume than failing.
+	 */
+	wake_up(&conf->wait_for_overlap);
+}
+
 static void raid5_quiesce(struct mddev *mddev, int quiesce)
 {
 	struct r5conf *conf = mddev->private;
@@ -9020,6 +9047,7 @@ static struct md_personality raid6_personality =
 	.check_reshape	= raid6_check_reshape,
 	.start_reshape  = raid5_start_reshape,
 	.finish_reshape = raid5_finish_reshape,
+	.prepare_suspend = raid5_prepare_suspend,
 	.quiesce	= raid5_quiesce,
 	.takeover	= raid6_takeover,
 	.change_consistency_policy = raid5_change_consistency_policy,
@@ -9044,6 +9072,7 @@ static struct md_personality raid5_personality =
 	.check_reshape	= raid5_check_reshape,
 	.start_reshape  = raid5_start_reshape,
 	.finish_reshape = raid5_finish_reshape,
+	.prepare_suspend = raid5_prepare_suspend,
 	.quiesce	= raid5_quiesce,
 	.takeover	= raid5_takeover,
 	.change_consistency_policy = raid5_change_consistency_policy,
@@ -9069,6 +9098,7 @@ static struct md_personality raid4_personality =
 	.check_reshape	= raid5_check_reshape,
 	.start_reshape  = raid5_start_reshape,
 	.finish_reshape = raid5_finish_reshape,
+	.prepare_suspend = raid5_prepare_suspend,
 	.quiesce	= raid5_quiesce,
 	.takeover	= raid4_takeover,
 	.change_consistency_policy = raid5_change_consistency_policy,
-- 
2.39.2


^ permalink raw reply related

* Re: mdadm grow raid 5 to 6 failure (crash)
From: Yu Kuai @ 2023-05-09  2:33 UTC (permalink / raw)
  To: Roger Heflin, David Gilmour; +Cc: Yu Kuai, linux-raid, Song Liu, yukuai (C)
In-Reply-To: <CAAMCDec_qt0wsfQ6d1CWc4e3hYtzXabw_sK9ChjMUSkA0cPxXg@mail.gmail.com>

Hi,

在 2023/05/09 6:53, Roger Heflin 写道:
> On Mon, May 8, 2023 at 6:57 AM David Gilmour <dgilmour76@gmail.com> wrote:
>>
>> Ok, well I'm willing to try anything at this point. Do you need
>> anything from me for a patch? Here is my current kernel details:
> 
> grep -i mdadm /etc/udev/rules.d/* /lib/udev/rules.d/*
> 
> If you can find a udev rule that starts up the monitor then move that
> rule out of the directory, so that on the next assemble try it does
> not get started.
> 
> If this is the recent bug that is being discussed then anything
> accessing the array after the reshape will deadlock the array and the
> reshape.

It's not anything accessing the array, in fact, it's only the io accross
reshape position can trigger the deadlock.

I just posted a fix patch in the other thread by failing such io while
reshape can't make progress. However, I'm not sure for now if this will
break mdadm, for example, will mdadm must read something from array to
make progress?

Thanks,
Kuai
> .
> 


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox