Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: [PATCH 4/5] fs/autofs4: Fix a dead URL to ftp.kernel.org
From: Ian Kent @ 2017-03-28  2:02 UTC (permalink / raw)
  To: SeongJae Park, rml, axboe, shli, yamada.masahito, mmarek
  Cc: linux-raid, autofs, linux-kbuild, linux-kernel
In-Reply-To: <20170327054731.31882-5-sj38.park@gmail.com>

On Mon, 2017-03-27 at 14:47 +0900, SeongJae Park wrote:
> As ftp.kernel.org is closed [0], this commit fixes a dead URL to
> ftp.kernel.org in fs/autofs4/ to use www.kernel.org instead.
> 
> [0] https://www.kernel.org/shutting-down-ftp-services.html
> 
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>

ACK, and thanks for fixing this.

> ---
>  fs/autofs4/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/fs/autofs4/Kconfig b/fs/autofs4/Kconfig
> index 1204d6384d39..44727bf18297 100644
> --- a/fs/autofs4/Kconfig
> +++ b/fs/autofs4/Kconfig
> @@ -7,7 +7,7 @@ config AUTOFS4_FS
>  	  automounter (amd), which is a pure user space daemon.
>  
>  	  To use the automounter you need the user-space tools from
> -	  <ftp://ftp.kernel.org/pub/linux/daemons/autofs/v4/>; you also
> +	  <https://www.kernel.org/pub/linux/daemons/autofs/v4/>; you also
>  	  want to answer Y to "NFS file system support", below.
>  
>  	  To compile this support as a module, choose M here: the module will
> be
--
To unsubscribe from this list: send the line "unsubscribe autofs" in

^ permalink raw reply

* 2541 linux-raid
From: een @ 2017-03-28  4:33 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: 652972856923.zip --]
[-- Type: application/zip, Size: 2667 bytes --]

^ permalink raw reply

* [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Arnd Bergmann @ 2017-03-28  9:49 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Arnd Bergmann, NeilBrown, Ming Lei, Jens Axboe, colyli@suse.de,
	Guoqing Jiang, Mike Christie, linux-raid, linux-kernel

Commit 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
is probably correct, but I get a new compile-time warning after
it, and have trouble understanding what it fixes:

drivers/md/raid1.c: In function 'sync_request_write':
drivers/md/raid1.c:2172:9: error: 'page_len$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     if (memcmp(page_address(ppages[j]),
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         page_address(spages[j]),
         ~~~~~~~~~~~~~~~~~~~~~~~~
         page_len[j]))
         ~~~~~~~~~~~~
drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
   int page_len[RESYNC_PAGES];
       ^~~~~~~~

This reverts it to resolve the warning.

Fixes: 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/md/raid1.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index b7d9651286d4..4d176c8abc33 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2113,7 +2113,6 @@ static void process_checks(struct r1bio *r1_bio)
 		int j;
 		int size;
 		int error;
-		struct bio_vec *bi;
 		struct bio *b = r1_bio->bios[i];
 		struct resync_pages *rp = get_resync_pages(b);
 		if (b->bi_end_io != end_sync_read)
@@ -2132,7 +2131,9 @@ static void process_checks(struct r1bio *r1_bio)
 		b->bi_private = rp;
 
 		size = b->bi_iter.bi_size;
-		bio_for_each_segment_all(bi, b, j) {
+		for (j = 0; j < vcnt ; j++) {
+			struct bio_vec *bi;
+			bi = &b->bi_io_vec[j];
 			bi->bv_offset = 0;
 			if (size > PAGE_SIZE)
 				bi->bv_len = PAGE_SIZE;
@@ -2156,22 +2157,17 @@ static void process_checks(struct r1bio *r1_bio)
 		int error = sbio->bi_error;
 		struct page **ppages = get_resync_pages(pbio)->pages;
 		struct page **spages = get_resync_pages(sbio)->pages;
-		struct bio_vec *bi;
-		int page_len[RESYNC_PAGES];
 
 		if (sbio->bi_end_io != end_sync_read)
 			continue;
 		/* Now we can 'fixup' the error value */
 		sbio->bi_error = 0;
 
-		bio_for_each_segment_all(bi, sbio, j)
-			page_len[j] = bi->bv_len;
-
 		if (!error) {
 			for (j = vcnt; j-- ; ) {
 				if (memcmp(page_address(ppages[j]),
 					   page_address(spages[j]),
-					   page_len[j]))
+					   sbio->bi_io_vec[j].bv_len))
 					break;
 			}
 		} else
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Ming Lei @ 2017-03-28 10:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <20170328095029.3500369-1-arnd@arndb.de>

On Tue, Mar 28, 2017 at 5:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> Commit 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
> is probably correct, but I get a new compile-time warning after
> it, and have trouble understanding what it fixes:
>
> drivers/md/raid1.c: In function 'sync_request_write':
> drivers/md/raid1.c:2172:9: error: 'page_len$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>      if (memcmp(page_address(ppages[j]),
>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>          page_address(spages[j]),
>          ~~~~~~~~~~~~~~~~~~~~~~~~
>          page_len[j]))
>          ~~~~~~~~~~~~
> drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
>    int page_len[RESYNC_PAGES];
>        ^~~~~~~~
>
> This reverts it to resolve the warning.

Please try the following patch:

 https://lkml.org/lkml/2017/3/28/126

BTW, the compile failure is just a false positive.

Thanks,
Ming

^ permalink raw reply

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Arnd Bergmann @ 2017-03-28 11:35 UTC (permalink / raw)
  To: Ming Lei
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CACVXFVMX=0XBvSt7b-3XObw3SknPGqH9JZvT6jViYbZfzKqQEg@mail.gmail.com>

On Tue, Mar 28, 2017 at 12:44 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Tue, Mar 28, 2017 at 5:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> Commit 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
>> is probably correct, but I get a new compile-time warning after
>> it, and have trouble understanding what it fixes:
>>
>> drivers/md/raid1.c: In function 'sync_request_write':
>> drivers/md/raid1.c:2172:9: error: 'page_len$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>>      if (memcmp(page_address(ppages[j]),
>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>          page_address(spages[j]),
>>          ~~~~~~~~~~~~~~~~~~~~~~~~
>>          page_len[j]))
>>          ~~~~~~~~~~~~
>> drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
>>    int page_len[RESYNC_PAGES];
>>        ^~~~~~~~
>>
>> This reverts it to resolve the warning.
>
> Please try the following patch:
>
>  https://lkml.org/lkml/2017/3/28/126


That patch will certainly shut up the warning, but will also prevent
the compiler from warning when the function gets changed in some
way that actually leads to an uninitialized use of the page_len array,
which is why I didn't suggest doing it that way.

       Arnd

^ permalink raw reply

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Ming Lei @ 2017-03-28 11:42 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CAK8P3a0k9S-t1CvpcabNVJOFMWoieuSn6EEr4oB72+T=zQT=aw@mail.gmail.com>

On Tue, Mar 28, 2017 at 7:35 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Mar 28, 2017 at 12:44 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Tue, Mar 28, 2017 at 5:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> Commit 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
>>> is probably correct, but I get a new compile-time warning after
>>> it, and have trouble understanding what it fixes:
>>>
>>> drivers/md/raid1.c: In function 'sync_request_write':
>>> drivers/md/raid1.c:2172:9: error: 'page_len$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>>>      if (memcmp(page_address(ppages[j]),
>>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>          page_address(spages[j]),
>>>          ~~~~~~~~~~~~~~~~~~~~~~~~
>>>          page_len[j]))
>>>          ~~~~~~~~~~~~
>>> drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
>>>    int page_len[RESYNC_PAGES];
>>>        ^~~~~~~~
>>>
>>> This reverts it to resolve the warning.
>>
>> Please try the following patch:
>>
>>  https://lkml.org/lkml/2017/3/28/126
>
>
> That patch will certainly shut up the warning, but will also prevent
> the compiler from warning when the function gets changed in some
> way that actually leads to an uninitialized use of the page_len array,

Why do you think that it leads to an uninitialized use of the page_len array?

The following code does initialize the array well enough for future use:

               bio_for_each_segment_all(bi, sbio, j)
                       page_len[j] = bi->bv_len;

That is why we don't need to initialize the array explicitly, but just
for killing
the warning.


Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH] md:array cannot be opened again after 'md_set_readonly'
From: Zhilong Liu @ 2017-03-28 13:19 UTC (permalink / raw)
  To: Shaohua Li; +Cc: neilb, shli, linux-raid, Guoqing Jiang
In-Reply-To: <20170327182214.zde4kao2gz2lazgm@kernel.org>


On 03/28/2017 02:22 AM, Shaohua Li wrote:
> On Mon, Mar 27, 2017 at 03:52:25PM +0800, Zhilong Liu wrote:
>> This is a bug about array cannot be opened again after 'md_set_readonly',
>> because the MD_CLOSING bit is still waiting for clear.
>> MD_CLOSING should only be set for a short period or time to avoid certain
>> races. After the operation that set it completes, it should be cleared.
> where is the bit set? Why don't clear it after the operation but clear it in
> set_readonly?
>   

it goes two paths after set MD_CLOSING bit, do_md_stop and md_set_readonly,
the do_md_stop has done the md_clean to clear the mddev->flags, thus I 
put it in
md_set_readonly.
Thanks for your suggestion, is the following changing good for you? here 
it has
finished what it needs to do, so clear the MD_CLOSING bit in time.
if looks good, I would send this in new revision patch.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index f6ae1d6..e8c1130 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6868,6 +6868,7 @@ static int md_ioctl(struct block_device *bdev, 
fmode_t mode,
                 set_bit(MD_CLOSING, &mddev->flags);
                 mutex_unlock(&mddev->open_mutex);
                 sync_blockdev(bdev);
+               clear_bit(MD_CLOSING, &mddev->flags);
         }
         err = mddev_lock(mddev);
         if (err) {

Thanks,
-Zhilong
>> Reviewed-by: NeilBrown <neilb@suse.com>
>> Cc: Guoqing Jiang <gqjiang@suse.com>
>> Signed-off-by: Zhilong Liu <zlliu@suse.com>
>> ---
>>   drivers/md/md.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index f6ae1d6..7f2db7c 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -5588,6 +5588,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
>>   	int err = 0;
>>   	int did_freeze = 0;
>>   
>> +	test_and_clear_bit(MD_CLOSING, &mddev->flags);
> I don't understand why this must be a test_and_clear.
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply related

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Arnd Bergmann @ 2017-03-28 13:20 UTC (permalink / raw)
  To: Ming Lei
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CACVXFVNzGV+P9CXcyqcV0_rJVM9F7dOR3_aX2xv0GQTsMaXX6A@mail.gmail.com>

On Tue, Mar 28, 2017 at 1:42 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Tue, Mar 28, 2017 at 7:35 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Tue, Mar 28, 2017 at 12:44 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>> On Tue, Mar 28, 2017 at 5:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> Commit 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
>>>> is probably correct, but I get a new compile-time warning after
>>>> it, and have trouble understanding what it fixes:
>>>>
>>>> drivers/md/raid1.c: In function 'sync_request_write':
>>>> drivers/md/raid1.c:2172:9: error: 'page_len$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>>>>      if (memcmp(page_address(ppages[j]),
>>>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>          page_address(spages[j]),
>>>>          ~~~~~~~~~~~~~~~~~~~~~~~~
>>>>          page_len[j]))
>>>>          ~~~~~~~~~~~~
>>>> drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
>>>>    int page_len[RESYNC_PAGES];
>>>>        ^~~~~~~~
>>>>
>>>> This reverts it to resolve the warning.
>>>
>>> Please try the following patch:
>>>
>>>  https://lkml.org/lkml/2017/3/28/126
>>
>>
>> That patch will certainly shut up the warning, but will also prevent
>> the compiler from warning when the function gets changed in some
>> way that actually leads to an uninitialized use of the page_len array,
>
> Why do you think that it leads to an uninitialized use of the page_len array?

What I meant is that a future change to the function might cause
another bug to go unnoticed later.

> The following code does initialize the array well enough for future use:
>
>                bio_for_each_segment_all(bi, sbio, j)
>                        page_len[j] = bi->bv_len;
>
> That is why we don't need to initialize the array explicitly, but just
> for killing the warning.

It's also a little less clear why that is safe than the original code:
We rely on sbio->bi_vcnt to be the same as vcnt, but it requires
careful reading of the function to see that this is always true.
gcc warns because it cannot prove this to be the case, so if
something changed here, it's likely that this would also not
get noticed.

     Arnd

^ permalink raw reply

* Re: constant array_state active after specific jobs
From: pdi @ 2017-03-28 13:44 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, Shaohua Li
In-Reply-To: <87lgrr8xt6.fsf@notabene.neil.brown.name>

On Mon, 27 Mar 2017 09:42:29 +1100
NeilBrown <neilb@suse.com> wrote:

> On Fri, Mar 24 2017, pdi wrote:
> 
> > On Fri, 24 Mar 2017 16:25:35 +1100
> > NeilBrown <neilb@suse.com> wrote:
> >  
> >> On Thu, Mar 23 2017, pdi wrote:
> >>   
> >> > Greetings all,
> >> >
> >> > The problem in a nutshell is that an array is clean after boot,
> >> > until some specific jobs switch it to active where it remains
> >> > until reboot.
> >> >
> >> > A similar problem was discussed, and solved, in 
> >> > https://www.spinics.net/lists/raid/msg46450.html. However,
> >> > AFAICT, it is not the same issue.
> >> >
> >> > I would be grateful for any insights as to why this happens
> >> > and/or how to prevent it.
> >> >
> >> > The relevant info follows, please let me know if anything further
> >> > might help.
> >> >
> >> > Many thanks in advance.
> >> >
> >> > - uname -a
> >> >   Linux hostname 4.4.38 #1 SMP Sun Dec 11 16:03:41 CST 2016
> >> > x86_64 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel
> >> > GNU/Linux
> >> > - mdadm -V
> >> >   mdadm - v3.3.4 - 3rd August 2015
> >> > - Desktop drives without sct/erc,
> >> >   with timeout mismatch correction as per
> >> >   https://raid.wiki.kernel.org/index.php/Timeout_Mismatch
> >> > - /dev/md9 is a raid10 array, 4 devices, far=2,
> >> >   with various dirs used as samba and nfs shares
> >> > - The array is in *constant* array_state active
> >> > - mdadm -D /dev/md9 | grep 'State :'
> >> >   State : active
> >> > - cat /sys/block/md9/md/array_state
> >> >   active
> >> > - watch -d 'grep md9 /proc/diskstats'
> >> >   remain unchanged
> >> > - uptime
> >> >   load average: 0.00, 0.00, 0.00
> >> > - cat /sys/block/md9/md/safe_mode_delay
> >> >   0.201
> >> > - echo 0.1 > /sys/block/md9/md/safe_mode_delay
> >> >   array_state remains active
> >> > - echo clean > /sys/block/md9/md/array_state
> >> >   echo: write error: Device or resource busy
> >> > - reboot (with or without prior check)
> >> >   array_state clean
> >> > - After reboot, array remains clean until some specific
> >> >   jobs put it in constant active state. Such jobs so far
> >> >   identified:
> >> >   - echo check > /sys/block/md9/md/sync_action
> >> >   - run an rsnapshot job
> >> >   - start a qemu/kvm vm
> >> > - Other jobs, like text/doc editing, multimedia playback,
> >> >   etc retain array_state clean    
> >> 
> >> This bug was introduced by
> >> Commit: 20d0189b1012 ("block: Introduce new bio_split()")
> >> in 3.14, and fixed by
> >> Commit: 9b622e2bbcf0 ("raid10: increment write counter after bio is
> >> split") in 4.8.
> >> 
> >> Maybe the latter patch should be sent to -stable ??
> >> 
> >> NeilBrown  
> >
> > NeilBrown, thank you for your swift and concise answer.
> >
> > I gather you are referring to kernel version numbers. The described
> > behaviour was first noticed many months ago with kernel 2.6.37.6,
> > and persisted after a system upgrade and kernel 4.4.38. However,
> > after the upgrade two things were corrected, the timeout mismatch,
> > and a Current_Pending_Sector in one of the drives; which may, or
> > may not, explain the occurrence with the older kernel.
> >
> > Is this constant active state in the data array something to worry
> > about and try kernel >= 4.8, or shall I let be?  
> 
> The only important consequence of the constant active state is that if
> your machine crashes at a moment when the array would otherwise have
> been idle, then a resync will be needed after reboot.  Without the
> constant active state, that resync would not have been needed.
> 
> If you have a write-intent bitmap, this is not particularly relevant.
> 
> I cannot say how important it is to you to avoid a resync after a
> crash, so I don't know if you should just let it be or not.
> 
> NeilBrown

NeilBrown,

Thank you for your clear explanation.

Best regards,
pdi



^ permalink raw reply

* Re: md/raid5: report one failure scenario about growing array from mirror to level 5
From: Zhilong Liu @ 2017-03-28 13:45 UTC (permalink / raw)
  To: Shaohua Li; +Cc: NeilBrown, linux-raid@vger.kernel.org
In-Reply-To: <20170324155145.boo2h7tgawmujdqx@kernel.org>


On 03/24/2017 11:51 PM, Shaohua Li wrote:
> On Thu, Mar 16, 2017 at 10:14:04PM +0800, Zhilong Liu wrote:
>> hi, list;
>>
>>    I just trace the code of the following scenarios, grow array from mirror
>>    to level 5,  it would be stuck at the beginning of reshape_progress when
>>    using the loop devices, the same testing steps work well via to hard disk
>>    devices.  refer to the detail steps as follow.
>>    Is this issue only happened on my site?
>>
>> *Steps for loop devices:*
>> 1). create one mirror array via to mdadm source code.
>> #  cd mdadm/
>> #  ./test setup
>> #  ./mdadm -CR /dev/md0 --level 1 -b internal -n2 /dev/loop[0-1]
>> #  dmesg -c,    clean the dmesg.
>> 2). triggers the reshape_request, then it would be stuck when start
>> reshape_progress.
>> ... ...
>> linux-x4lv:~/mdadm-test # ./mdadm --grow /dev/md0 -l5 -n3 -a /dev/loop2
>> mdadm: level of /dev/md0 changed to raid5
>> mdadm: added /dev/loop2
>> mdadm: Need to backup 128K of critical section..
>>
>> linux-x4lv:~/mdadm-test # cat /proc/mdstat
>> Personalities : [raid1] [raid6] [raid5] [raid4]
>> md0 : active raid5 loop2[2] loop1[1] loop0[0]
>>        19968 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>        [>....................]  reshape =  0.0% (1/19968) finish=41.5min
>> speed=0K/sec
>>        bitmap: 0/1 pages [0KB], 65536KB chunk
>>
>> unused devices: <none>
>>
>> linux-x4lv:~/mdadm-test # dmesg -c
>> [111544.283359] md/raid:md0: device loop1 operational as raid disk 1
>> [111544.283362] md/raid:md0: device loop0 operational as raid disk 0
>> [111544.296161] md/raid:md0: raid level 5 active with 2 out of 2 devices,
>> algorithm 2
>> [111544.296178] md/raid456: discard support disabled due to uncertainty.
>> [111544.296178] Set raid456.devices_handle_discard_safely=Y to override.
>> [111544.932238] md: reshape of RAID array md0
>>
>> *Steps for hard disks:*
>> # ./mdadm --grow /dev/md0 -l5 -n3  -a /dev/sdd
>> mdadm: level of /dev/md0 changed to raid5
>> mdadm: added /dev/sdd
>> mdadm: Need to backup 128K of critical section..
>>
>> # cat /proc/mdstat
>> Personalities : [raid1] [raid0] [raid6] [raid5] [raid4]
>> md0 : active raid5 sdd[2] sdc[1] sdb[0]
>>        102272 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
>>        [====>................]  reshape = 22.5% (23168/102272) finish=0.5min
>> speed=2316K/sec
>>        bitmap: 0/1 pages [0KB], 65536KB chunk
>>
>> unused devices: <none>
> can't reproduce locally. does this only exist in specific mdadm/kernel version?

I have noticed that it causes by the failure of 
"mdadm-grow-continue@%s.service",
I have sent a patch to mailing-list named:
mdadm/grow: reshape would be stuck from raid1 to raid5

I tested it under built the latest linux source code(4.11.0-rc4) and 
latest mdadm.
I would continue to look at this issue.

Thanks,
-Zhilong
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply

* [PATCH v1] mdadm/Build:check the level parameter when build new array
From: Zhilong Liu @ 2017-03-28 13:52 UTC (permalink / raw)
  To: Jes.Sorensen; +Cc: linux-raid, Zhilong Liu
In-Reply-To: <wrfjshlyidnx.fsf@gmail.com>

check if user forgets to specify the --level
when build a new array. such as:
./mdadm -B /dev/md0 -n2 /dev/loop[0-1]

Signed-off-by: Zhilong Liu <zlliu@suse.com>
---
 Build.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Build.c b/Build.c
index 74a440e..a5fcc06 100644
--- a/Build.c
+++ b/Build.c
@@ -56,6 +56,10 @@ int Build(char *mddev, struct mddev_dev *devlist,
 	int uuid[4] = {0,0,0,0};
 	struct map_ent *map = NULL;
 
+	if (s->level == UnSet) {
+		pr_err("a RAID level is needed to Build an array.\n");
+		return 1;
+	}
 	/* scan all devices, make sure they really are block devices */
 	for (dv = devlist; dv; dv=dv->next) {
 		subdevs++;
-- 
2.10.2


^ permalink raw reply related

* Re: [PATCH] mdadm/grow: reshape would be stuck from raid1 to raid5
From: Liu Zhilong @ 2017-03-28 14:03 UTC (permalink / raw)
  To: jes.sorensen; +Cc: linux-raid, Harald Hoyer, shli
In-Reply-To: <wrfjk27aid6n.fsf@gmail.com>

On 03/28/2017 06:10 AM, jes.sorensen@gmail.com wrote:
> Zhilong Liu <zlliu@suse.com> writes:
>> it would be stuck at the beginning of reshape progress
>> when grows array from raid1 to raid5, correct the name
>> of mdadm-grow-continue@.service in continue_via_systemd.
>>
>> reproduce steps:
>> ./mdadm -CR /dev/md0 -l1 -b internal -n2 /dev/loop[0-1]
>> ./mdadm --grow /dev/md0 -l5 -n3 -a /dev/loop2
>>
>> Signed-off-by: Zhilong Liu <zlliu@suse.com>
>> ---
>>   Grow.c | 6 ++----
>>   1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/Grow.c b/Grow.c
>> index 455c5f9..10c02a1 100755
>> --- a/Grow.c
>> +++ b/Grow.c
>> @@ -2808,13 +2808,11 @@ static int continue_via_systemd(char *devnm)
>>   		 */
>>   		close(2);
>>   		open("/dev/null", O_WRONLY);
>> -		snprintf(pathbuf, sizeof(pathbuf), "mdadm-grow-continue@%s.service",
>> -			 devnm);
>> +		snprintf(pathbuf, sizeof(pathbuf), "mdadm-grow-continue@.service");
> My memory is rusty here, isn't systemctl interpreting the device name in
> mdadm-grow-continue@<device>.service as an argument?

actually, the service started failed. paste the journalctl log here when 
reshape from mirror to raid5.

command: ./mdadm --grow /dev/md0 -l5 -n3 -a /dev/loop2

Mar 28 21:43:47 linux-g0sr kernel:  --- level:5 rd:3 wd:3
Mar 28 21:43:47 linux-g0sr kernel:  disk 0, o:1, dev:loop0
Mar 28 21:43:47 linux-g0sr kernel:  disk 1, o:1, dev:loop1
Mar 28 21:43:47 linux-g0sr kernel:  disk 2, o:1, dev:loop2
Mar 28 21:43:47 linux-g0sr kernel: md: reshape of RAID array md0
Mar 28 21:43:47 linux-g0sr kernel: md: minimum _guaranteed_  speed: 1000 
KB/sec/disk.
Mar 28 21:43:47 linux-g0sr kernel: md: using maximum available idle IO 
bandwidth (but not more than 2000 KB/sec) for reshape.
Mar 28 21:43:47 linux-g0sr kernel: md: using 128k window, over a total 
of 19968k.
Mar 28 21:43:47 linux-g0sr systemd[1]: Started Manage MD Reshape on 
/dev/md0.
Mar 28 21:43:47 linux-g0sr systemd[1]: mdadm-grow-continue@md0.service: 
Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 28 21:43:47 linux-g0sr systemd[1]: mdadm-grow-continue@md0.service: 
Unit entered failed state.
Mar 28 21:43:47 linux-g0sr systemd[1]: mdadm-grow-continue@md0.service: 
Failed with result 'exit-code'.
Mar 28 21:44:03 linux-g0sr kernel: md: md0: reshape interrupted.

>
>>   		status = execl("/usr/bin/systemctl", "systemctl",
>>   			       "start",
>>   			       pathbuf, NULL);
>> -		status = execl("/bin/systemctl", "systemctl", "start",
>> -			       pathbuf, NULL);
>> +		pr_err("/usr/bin/systemctl %s got failure\n", pathbuf);
>>   		exit(1);
> This assumes systemctl is location in /usr/bin only - you removed the
> fallback case for it being location in /bin.
>
> In addition, instead of saying 'got failure' lets do something with the
> errno value so the user gets a more descriptive error message.

I would continue to look at this issue, thanks for your time.

Thanks,
-Zhilong
> Cheers,
> Jes
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply

* Re: [PATCH v5 3/7] raid5-ppl: Partial Parity Log write logging implementation
From: Artur Paszkiewicz @ 2017-03-28 14:12 UTC (permalink / raw)
  To: Shaohua Li, NeilBrown; +Cc: shli, linux-raid
In-Reply-To: <20170324164651.hewssoahtcm4sdxh@kernel.org>

On 03/24/2017 05:46 PM, Shaohua Li wrote:
> On Wed, Mar 22, 2017 at 09:00:47AM +1100, Neil Brown wrote:
>> On Thu, Mar 09 2017, Artur Paszkiewicz wrote:
>>
>>> Implement the calculation of partial parity for a stripe and PPL write
>>> logging functionality. The description of PPL is added to the
>>> documentation. More details can be found in the comments in raid5-ppl.c.
>>>
>>> Attach a page for holding the partial parity data to stripe_head.
>>> Allocate it only if mddev has the MD_HAS_PPL flag set.
>>>
>>> Partial parity is the xor of not modified data chunks of a stripe and is
>>> calculated as follows:
>>>
>>> - reconstruct-write case:
>>>   xor data from all not updated disks in a stripe
>>>
>>> - read-modify-write case:
>>>   xor old data and parity from all updated disks in a stripe
>>>
>>> Implement it using the async_tx API and integrate into raid_run_ops().
>>> It must be called when we still have access to old data, so do it when
>>> STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is
>>> stored into sh->ppl_page.
>>>
>>> Partial parity is not meaningful for full stripe write and is not stored
>>> in the log or used for recovery, so don't attempt to calculate it when
>>> stripe has STRIPE_FULL_WRITE.
>>>
>>> Put the PPL metadata structures to md_p.h because userspace tools
>>> (mdadm) will also need to read/write PPL.
>>>
>>> Warn about using PPL with enabled disk volatile write-back cache for
>>> now. It can be removed once disk cache flushing before writing PPL is
>>> implemented.
>>>
>>> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
>>
>> Sorry for the delay in getting to this for review...
>>
>>> +static struct ppl_io_unit *ppl_new_iounit(struct ppl_log *log,
>>> +					  struct stripe_head *sh)
>>> +{
>>> +	struct ppl_conf *ppl_conf = log->ppl_conf;
>>> +	struct ppl_io_unit *io;
>>> +	struct ppl_header *pplhdr;
>>> +
>>> +	io = mempool_alloc(ppl_conf->io_pool, GFP_ATOMIC);
>>> +	if (!io)
>>> +		return NULL;
>>> +
>>> +	memset(io, 0, sizeof(*io));
>>> +	io->log = log;
>>> +	INIT_LIST_HEAD(&io->log_sibling);
>>> +	INIT_LIST_HEAD(&io->stripe_list);
>>> +	atomic_set(&io->pending_stripes, 0);
>>> +	bio_init(&io->bio, io->biovec, PPL_IO_INLINE_BVECS);
>>> +
>>> +	io->header_page = mempool_alloc(ppl_conf->meta_pool, GFP_NOIO);
>>
>> I'm trying to understand how these two mempool_alloc()s relate, and
>> particularly why the first one needs to be GFP_ATOMIC, while the second
>> one can safely be GFP_NOIO.
>> I see that the allocated memory is freed in different places:
>> header_page is called from the bi_endio function as soon as the write
>> completes, while 'io' is freed later.  But I'm not sure that is enough
>> to make it safe.
>>
>> When working with mempools, you need to assume that the pool only
>> contains one element, and that every time you call mempool_alloc(), it
>> waits for that one element to be available.  While that doesn't usually
>> happen, it is possible and if that case isn't handled correctly, the
>> system can deadlock.
>>
>> If no memory is available when this mempool_alloc() is called, it will
>> block.  As it is called from the raid5d thread, the whole array will
>> block.  So this can only complete safely is the write request has
>> already been submitted - or if there is some other workqueue which
>> submit requests after a timeout or similar.
>> I don't see that in the code.  These ppl_io_unit structures can queue up
>> and are only submitted later by raid5d (I think).  So if raid5d waits
>> for one to become free, it will wait forever.
>>
>> One easy way around this problem (assuming my understanding is correct)
>> is to just have a single mempool which allocates both a struct
>> ppl_io_unit and a page.  You would need to define you own alloc/free
>> routines for the pool but that is easy enough.
>>
>> Then you only need a single mempool_alloc(), which can sensibly be
>> GFP_ATOMIC.
>> If that fails, you queue for later handling as you do now.  If it
>> succeeds, then you continue to use the memory without any risk of
>> deadlocking.
> 
> Maybe Artur is following the raid5-cache code, which uses GFP_ATOMIC with
> commit: 5036c39(raid5: allow r5l_io_unit allocations to fail). A single pool
> does make sense.

That's right, I based this on raid5-cache code. I like the approach that
Neil proposed, I'll test it some more and probably send a patch soon. Do
you think GFP_ATOMIC is necessary here or would GFP_NOWAIT be enough?

Thanks,
Artur

^ permalink raw reply

* Re: [PATCH] Fix dead URLs to ftp.kernel.org
From: Jiri Kosina @ 2017-03-28 14:20 UTC (permalink / raw)
  To: SeongJae Park
  Cc: rml, axboe, shli, raven, yamada.masahito, mmarek, linux-raid,
	autofs, linux-kbuild, linux-kernel
In-Reply-To: <20170327124406.6910-1-sj38.park@gmail.com>

On Mon, 27 Mar 2017, SeongJae Park wrote:

> URLs to ftp.kernel.org are still exist though the service is closed [0].
> This commit fixes the URLs to use www.kernel.org instead.
> 
> [0] https://www.kernel.org/shutting-down-ftp-services.html
> 
> Signed-off-by: SeongJae Park <sj38.park@gmail.com>

I've applied this to trivial.git#for-next.

-- 
Jiri Kosina
SUSE Labs

--
To unsubscribe from this list: send the line "unsubscribe autofs" in

^ permalink raw reply

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Ming Lei @ 2017-03-28 15:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CAK8P3a1yV0M_3TwM2QkEKqL-teKNEcWRXuG4_GT27pM+rFoBPw@mail.gmail.com>

On Tue, Mar 28, 2017 at 9:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Mar 28, 2017 at 1:42 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Tue, Mar 28, 2017 at 7:35 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Tue, Mar 28, 2017 at 12:44 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>>>> On Tue, Mar 28, 2017 at 5:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Commit 60928a91b0b3 ("md: raid1: use bio helper in process_checks()")
>>>>> is probably correct, but I get a new compile-time warning after
>>>>> it, and have trouble understanding what it fixes:
>>>>>
>>>>> drivers/md/raid1.c: In function 'sync_request_write':
>>>>> drivers/md/raid1.c:2172:9: error: 'page_len$' may be used uninitialized in this function [-Werror=maybe-uninitialized]
>>>>>      if (memcmp(page_address(ppages[j]),
>>>>>          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>          page_address(spages[j]),
>>>>>          ~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>          page_len[j]))
>>>>>          ~~~~~~~~~~~~
>>>>> drivers/md/raid1.c:2160:7: note: 'page_len$' was declared here
>>>>>    int page_len[RESYNC_PAGES];
>>>>>        ^~~~~~~~
>>>>>
>>>>> This reverts it to resolve the warning.
>>>>
>>>> Please try the following patch:
>>>>
>>>>  https://lkml.org/lkml/2017/3/28/126
>>>
>>>
>>> That patch will certainly shut up the warning, but will also prevent
>>> the compiler from warning when the function gets changed in some
>>> way that actually leads to an uninitialized use of the page_len array,
>>
>> Why do you think that it leads to an uninitialized use of the page_len array?
>
> What I meant is that a future change to the function might cause
> another bug to go unnoticed later.

What is the future change? And what is another bug? Please don't suppose or
assume anything in future.

BTW, I don't think it is a problem, and anyone who want to change the code
much should understand it first, right?

>
>> The following code does initialize the array well enough for future use:
>>
>>                bio_for_each_segment_all(bi, sbio, j)
>>                        page_len[j] = bi->bv_len;
>>
>> That is why we don't need to initialize the array explicitly, but just
>> for killing the warning.
>
> It's also a little less clear why that is safe than the original code:
> We rely on sbio->bi_vcnt to be the same as vcnt, but it requires

That is absolutely true because all read bios in process_checks()
have same vector number, do you think it will be changed in future?

And what we really rely on is that RESYNC_PAGES is equal to or bigger
than the vector number, and that is what we can guarantee.

> careful reading of the function to see that this is always true.
> gcc warns because it cannot prove this to be the case, so if
> something changed here, it's likely that this would also not
> get noticed.

The compiler can't understand runtime behaviour, and
we try to let gcc check more, but that is just fine if gcc can't.

One big purpose of this patch is to remove direct access to
bvec table, so it can't be reverted, or do you have better idea?


Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Arnd Bergmann @ 2017-03-28 15:37 UTC (permalink / raw)
  To: Ming Lei
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CACVXFVMWORnxVtuokRp=N1ELAk4UbkOr0n-hM2MOTbO7HLhLtw@mail.gmail.com>

On Tue, Mar 28, 2017 at 5:02 PM, Ming Lei <tom.leiming@gmail.com> wrote:
> On Tue, Mar 28, 2017 at 9:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> On Tue, Mar 28, 2017 at 1:42 PM, Ming Lei <tom.leiming@gmail.com> wrote:

>> What I meant is that a future change to the function might cause
>> another bug to go unnoticed later.
>
> What is the future change? And what is another bug? Please don't suppose or
> assume anything in future.
>
> BTW, I don't think it is a problem, and anyone who want to change the code
> much should understand it first, right?

I did not have any specific change in mind, I was just following the
principle from https://rusty.ozlabs.org/?p=232

I tend to do a lot of fixes for warnings like this, and in general
adding a fake initialization will lead to worse object code while
changing the code to be easier to understand by the compiler
will also make it easier to understand by human readers, lead
to better object code and to being more robust against future
changes that would have otherwise triggered a correct warning.

>>> The following code does initialize the array well enough for future use:
>>>
>>>                bio_for_each_segment_all(bi, sbio, j)
>>>                        page_len[j] = bi->bv_len;
>>>
>>> That is why we don't need to initialize the array explicitly, but just
>>> for killing the warning.
>>
>> It's also a little less clear why that is safe than the original code:
>> We rely on sbio->bi_vcnt to be the same as vcnt, but it requires
>
> That is absolutely true because all read bios in process_checks()
> have same vector number, do you think it will be changed in future?

No, I have no reason to assume it would be changed.

> And what we really rely on is that RESYNC_PAGES is equal to or bigger
> than the vector number, and that is what we can guarantee.
>
>> careful reading of the function to see that this is always true.
>> gcc warns because it cannot prove this to be the case, so if
>> something changed here, it's likely that this would also not
>> get noticed.
>
> The compiler can't understand runtime behaviour, and
> we try to let gcc check more, but that is just fine if gcc can't.
>
> One big purpose of this patch is to remove direct access to
> bvec table, so it can't be reverted, or do you have better idea?

From what I can tell, the following walk of the pages does not
have to be done in reverse, so we can perhaps use a single
bio_for_each_segment_all() (only for illustration, haven't
thought this through completely) :

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 4d176c8abc33..e4bcc7cec8da 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2157,25 +2157,30 @@ static void process_checks(struct r1bio *r1_bio)
                int error = sbio->bi_error;
                struct page **ppages = get_resync_pages(pbio)->pages;
                struct page **spages = get_resync_pages(sbio)->pages;
+               struct bio_vec *bi;
+               int page_len[RESYNC_PAGES];
+

                if (sbio->bi_end_io != end_sync_read)
                        continue;
                /* Now we can 'fixup' the error value */
                sbio->bi_error = 0;

-               if (!error) {
-                       for (j = vcnt; j-- ; ) {
-                               if (memcmp(page_address(ppages[j]),
-                                          page_address(spages[j]),
-                                          sbio->bi_io_vec[j].bv_len))
-                                       break;
-                       }
-               } else
-                       j = 0;
-               if (j >= 0)
+               if (error) {
                        atomic64_add(r1_bio->sectors,
&mddev->resync_mismatches);
-               if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
-                             && !error)) {
+                       bio_copy_data(sbio, pbio);
+                       continue;
+               }
+
+               bio_for_each_segment_all(bi, sbio, j) {
+                       if (memcmp(page_address(ppages[j]),
+                                  page_address(spages[j]),
+                                          sbio->bi_io_vec[j].bv_len)) {
+                               atomic64_add(r1_bio->sectors,
&mddev->resync_mismatches);
+                               break;
+                       }
+               }
+               if (j == vcnt || (test_bit(MD_RECOVERY_CHECK,
&mddev->recovery))) {
                        /* No need to write to this device. */
                        sbio->bi_end_io = NULL;
                        rdev_dec_pending(conf->mirrors[i].rdev, mddev);

^ permalink raw reply related

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Wols Lists @ 2017-03-28 15:43 UTC (permalink / raw)
  To: Ming Lei, Arnd Bergmann
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CACVXFVMWORnxVtuokRp=N1ELAk4UbkOr0n-hM2MOTbO7HLhLtw@mail.gmail.com>

On 28/03/17 16:02, Ming Lei wrote:
>> What I meant is that a future change to the function might cause
>> > another bug to go unnoticed later.

> What is the future change? And what is another bug? Please don't suppose or
> assume anything in future.

What was that about some American General demanding a list of "unknown
unknowns"?
> 
> BTW, I don't think it is a problem, and anyone who want to change the code
> much should understand it first, right?
> 
I'm very sorry, but I think you are assuming facts not in evidence (or
rather, facts that the evidence says are wrong).

In real life, it is normal for people to change things without
understanding them. Are you saying that *you* - a couple of years down
the line - will remember this bit of code, and will block a mistaken patch?

What Arnd is doing is commonly called "defensive programming", and
unfortunately reality shows us that it is usually worth its weight in
gold. That's why you put ASSERTs in code - so that if somebody does
something stupid by accident, it blows up. This is just more of the same.

Cheers,
Wol

^ permalink raw reply

* Re: [PATCH] md:array cannot be opened again after 'md_set_readonly'
From: Shaohua Li @ 2017-03-28 16:12 UTC (permalink / raw)
  To: Zhilong Liu; +Cc: neilb, shli, linux-raid, Guoqing Jiang
In-Reply-To: <0c5289f3-0160-8d2d-a5d8-da126b2fb468@suse.com>

On Tue, Mar 28, 2017 at 09:19:12PM +0800, Zhilong Liu wrote:
> 
> On 03/28/2017 02:22 AM, Shaohua Li wrote:
> > On Mon, Mar 27, 2017 at 03:52:25PM +0800, Zhilong Liu wrote:
> > > This is a bug about array cannot be opened again after 'md_set_readonly',
> > > because the MD_CLOSING bit is still waiting for clear.
> > > MD_CLOSING should only be set for a short period or time to avoid certain
> > > races. After the operation that set it completes, it should be cleared.
> > where is the bit set? Why don't clear it after the operation but clear it in
> > set_readonly?
> 
> it goes two paths after set MD_CLOSING bit, do_md_stop and md_set_readonly,
> the do_md_stop has done the md_clean to clear the mddev->flags, thus I put
> it in
> md_set_readonly.
> Thanks for your suggestion, is the following changing good for you? here it
> has
> finished what it needs to do, so clear the MD_CLOSING bit in time.
> if looks good, I would send this in new revision patch.
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index f6ae1d6..e8c1130 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6868,6 +6868,7 @@ static int md_ioctl(struct block_device *bdev, fmode_t
> mode,
>                 set_bit(MD_CLOSING, &mddev->flags);
>                 mutex_unlock(&mddev->open_mutex);
>                 sync_blockdev(bdev);
> +               clear_bit(MD_CLOSING, &mddev->flags);
>         }
>         err = mddev_lock(mddev);
>         if (err) {

It would be better done at the end of set_readonly to guarantee set ro done.

Thanks,
Shaohua
> Thanks,
> -Zhilong
> > > Reviewed-by: NeilBrown <neilb@suse.com>
> > > Cc: Guoqing Jiang <gqjiang@suse.com>
> > > Signed-off-by: Zhilong Liu <zlliu@suse.com>
> > > ---
> > >   drivers/md/md.c | 1 +
> > >   1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index f6ae1d6..7f2db7c 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -5588,6 +5588,7 @@ static int md_set_readonly(struct mddev *mddev, struct block_device *bdev)
> > >   	int err = 0;
> > >   	int did_freeze = 0;
> > > +	test_and_clear_bit(MD_CLOSING, &mddev->flags);
> > I don't understand why this must be a test_and_clear.
> > 
> > Thanks,
> > Shaohua
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 

^ permalink raw reply

* Re: [PATCH 01/23] block: renumber REQ_OP_WRITE_ZEROES
From: Bart Van Assche @ 2017-03-28 16:12 UTC (permalink / raw)
  To: agk@redhat.com, lars.ellenberg@linbit.com, snitzer@redhat.com,
	hch@lst.de, martin.petersen@oracle.com,
	philipp.reisner@linbit.com, axboe@kernel.dk, shli@kernel.org
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
In-Reply-To: <20170323143341.31549-2-hch@lst.de>

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Make life easy for implementations that needs to send a data buffer
> to the device (e.g. SCSI) by numbering it as a data out command.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  include/linux/blk_types.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
> index d703acb55d0f..6393c13a6498 100644
> --- a/include/linux/blk_types.h
> +++ b/include/linux/blk_types.h
> @@ -160,7 +160,7 @@ enum req_opf {
>  	/* write the same sector many times */
>  	REQ_OP_WRITE_SAME	= 7,
>  	/* write the zero filled sector many times */
> -	REQ_OP_WRITE_ZEROES	= 8,
> +	REQ_OP_WRITE_ZEROES	= 9,
>  
>  	/* SCSI passthrough using struct scsi_request */
>  	REQ_OP_SCSI_IN		= 32,

Hello Christoph,

Since REQ_OP_WRITE_ZEROES was introduced in kernel v4.10, do we need
"Cc: stable" and "Fixes: a6f0788ec2881" tags for this patch?

Thanks,

Bart.

^ permalink raw reply

* Re: [PATCH] Revert "md: raid1: use bio helper in process_checks()"
From: Ming Lei @ 2017-03-28 16:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Shaohua Li, NeilBrown, Jens Axboe, colyli@suse.de, Guoqing Jiang,
	Mike Christie, open list:SOFTWARE RAID (Multiple Disks) SUPPORT,
	Linux Kernel Mailing List
In-Reply-To: <CAK8P3a0NfY56C_L+ds2VOXFtdT--WrZ=CoMMmd1JTM84KoYs0A@mail.gmail.com>

On Tue, Mar 28, 2017 at 11:37 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Mar 28, 2017 at 5:02 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>> On Tue, Mar 28, 2017 at 9:20 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> On Tue, Mar 28, 2017 at 1:42 PM, Ming Lei <tom.leiming@gmail.com> wrote:
>
>>> What I meant is that a future change to the function might cause
>>> another bug to go unnoticed later.
>>
>> What is the future change? And what is another bug? Please don't suppose or
>> assume anything in future.
>>
>> BTW, I don't think it is a problem, and anyone who want to change the code
>> much should understand it first, right?
>
> I did not have any specific change in mind, I was just following the
> principle from https://rusty.ozlabs.org/?p=232
>
> I tend to do a lot of fixes for warnings like this, and in general
> adding a fake initialization will lead to worse object code while
> changing the code to be easier to understand by the compiler
> will also make it easier to understand by human readers, lead
> to better object code and to being more robust against future
> changes that would have otherwise triggered a correct warning.

What gcc can't understand is the following situation:

1) int a[N];

2) for(i = 0; i < vcnt1; i++)
       a[i] = b[i];

3) for (i = 0; i < vcnt2; i++)
        c[i] = a[i];

The warning is in 3), since gcc may think vcnt2 isn't same with vcnt1, but as
I mentioned that two number is absolutely same, and shouldn't be
changed in future.

That is why I suggest to fix the warning via a[N] = {0}.

>
>>>> The following code does initialize the array well enough for future use:
>>>>
>>>>                bio_for_each_segment_all(bi, sbio, j)
>>>>                        page_len[j] = bi->bv_len;
>>>>
>>>> That is why we don't need to initialize the array explicitly, but just
>>>> for killing the warning.
>>>
>>> It's also a little less clear why that is safe than the original code:
>>> We rely on sbio->bi_vcnt to be the same as vcnt, but it requires
>>
>> That is absolutely true because all read bios in process_checks()
>> have same vector number, do you think it will be changed in future?
>
> No, I have no reason to assume it would be changed.
>
>> And what we really rely on is that RESYNC_PAGES is equal to or bigger
>> than the vector number, and that is what we can guarantee.
>>
>>> careful reading of the function to see that this is always true.
>>> gcc warns because it cannot prove this to be the case, so if
>>> something changed here, it's likely that this would also not
>>> get noticed.
>>
>> The compiler can't understand runtime behaviour, and
>> we try to let gcc check more, but that is just fine if gcc can't.
>>
>> One big purpose of this patch is to remove direct access to
>> bvec table, so it can't be reverted, or do you have better idea?
>
> From what I can tell, the following walk of the pages does not
> have to be done in reverse, so we can perhaps use a single
> bio_for_each_segment_all() (only for illustration, haven't
> thought this through completely) :
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 4d176c8abc33..e4bcc7cec8da 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2157,25 +2157,30 @@ static void process_checks(struct r1bio *r1_bio)
>                 int error = sbio->bi_error;
>                 struct page **ppages = get_resync_pages(pbio)->pages;
>                 struct page **spages = get_resync_pages(sbio)->pages;
> +               struct bio_vec *bi;
> +               int page_len[RESYNC_PAGES];
> +
>
>                 if (sbio->bi_end_io != end_sync_read)
>                         continue;
>                 /* Now we can 'fixup' the error value */
>                 sbio->bi_error = 0;
>
> -               if (!error) {
> -                       for (j = vcnt; j-- ; ) {
> -                               if (memcmp(page_address(ppages[j]),
> -                                          page_address(spages[j]),
> -                                          sbio->bi_io_vec[j].bv_len))
> -                                       break;
> -                       }
> -               } else
> -                       j = 0;
> -               if (j >= 0)
> +               if (error) {
>                         atomic64_add(r1_bio->sectors,
> &mddev->resync_mismatches);
> -               if (j < 0 || (test_bit(MD_RECOVERY_CHECK, &mddev->recovery)
> -                             && !error)) {
> +                       bio_copy_data(sbio, pbio);
> +                       continue;
> +               }
> +
> +               bio_for_each_segment_all(bi, sbio, j) {
> +                       if (memcmp(page_address(ppages[j]),
> +                                  page_address(spages[j]),
> +                                          sbio->bi_io_vec[j].bv_len)) {

The above line should be bi->bv_len, and this way should make the array
not necessary, but still a bit ugly.

I suggest to apply the simple fix first since it is correct, then we
can refactor the code in this way
later.

Thanks,
Ming Lei

^ permalink raw reply

* Re: [PATCH v5 3/7] raid5-ppl: Partial Parity Log write logging implementation
From: Shaohua Li @ 2017-03-28 16:16 UTC (permalink / raw)
  To: Artur Paszkiewicz; +Cc: NeilBrown, shli, linux-raid
In-Reply-To: <ab4ce9c9-46c6-4f5c-3821-964fc7c9744c@intel.com>

On Tue, Mar 28, 2017 at 04:12:37PM +0200, Artur Paszkiewicz wrote:
> On 03/24/2017 05:46 PM, Shaohua Li wrote:
> > On Wed, Mar 22, 2017 at 09:00:47AM +1100, Neil Brown wrote:
> >> On Thu, Mar 09 2017, Artur Paszkiewicz wrote:
> >>
> >>> Implement the calculation of partial parity for a stripe and PPL write
> >>> logging functionality. The description of PPL is added to the
> >>> documentation. More details can be found in the comments in raid5-ppl.c.
> >>>
> >>> Attach a page for holding the partial parity data to stripe_head.
> >>> Allocate it only if mddev has the MD_HAS_PPL flag set.
> >>>
> >>> Partial parity is the xor of not modified data chunks of a stripe and is
> >>> calculated as follows:
> >>>
> >>> - reconstruct-write case:
> >>>   xor data from all not updated disks in a stripe
> >>>
> >>> - read-modify-write case:
> >>>   xor old data and parity from all updated disks in a stripe
> >>>
> >>> Implement it using the async_tx API and integrate into raid_run_ops().
> >>> It must be called when we still have access to old data, so do it when
> >>> STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is
> >>> stored into sh->ppl_page.
> >>>
> >>> Partial parity is not meaningful for full stripe write and is not stored
> >>> in the log or used for recovery, so don't attempt to calculate it when
> >>> stripe has STRIPE_FULL_WRITE.
> >>>
> >>> Put the PPL metadata structures to md_p.h because userspace tools
> >>> (mdadm) will also need to read/write PPL.
> >>>
> >>> Warn about using PPL with enabled disk volatile write-back cache for
> >>> now. It can be removed once disk cache flushing before writing PPL is
> >>> implemented.
> >>>
> >>> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> >>
> >> Sorry for the delay in getting to this for review...
> >>
> >>> +static struct ppl_io_unit *ppl_new_iounit(struct ppl_log *log,
> >>> +					  struct stripe_head *sh)
> >>> +{
> >>> +	struct ppl_conf *ppl_conf = log->ppl_conf;
> >>> +	struct ppl_io_unit *io;
> >>> +	struct ppl_header *pplhdr;
> >>> +
> >>> +	io = mempool_alloc(ppl_conf->io_pool, GFP_ATOMIC);
> >>> +	if (!io)
> >>> +		return NULL;
> >>> +
> >>> +	memset(io, 0, sizeof(*io));
> >>> +	io->log = log;
> >>> +	INIT_LIST_HEAD(&io->log_sibling);
> >>> +	INIT_LIST_HEAD(&io->stripe_list);
> >>> +	atomic_set(&io->pending_stripes, 0);
> >>> +	bio_init(&io->bio, io->biovec, PPL_IO_INLINE_BVECS);
> >>> +
> >>> +	io->header_page = mempool_alloc(ppl_conf->meta_pool, GFP_NOIO);
> >>
> >> I'm trying to understand how these two mempool_alloc()s relate, and
> >> particularly why the first one needs to be GFP_ATOMIC, while the second
> >> one can safely be GFP_NOIO.
> >> I see that the allocated memory is freed in different places:
> >> header_page is called from the bi_endio function as soon as the write
> >> completes, while 'io' is freed later.  But I'm not sure that is enough
> >> to make it safe.
> >>
> >> When working with mempools, you need to assume that the pool only
> >> contains one element, and that every time you call mempool_alloc(), it
> >> waits for that one element to be available.  While that doesn't usually
> >> happen, it is possible and if that case isn't handled correctly, the
> >> system can deadlock.
> >>
> >> If no memory is available when this mempool_alloc() is called, it will
> >> block.  As it is called from the raid5d thread, the whole array will
> >> block.  So this can only complete safely is the write request has
> >> already been submitted - or if there is some other workqueue which
> >> submit requests after a timeout or similar.
> >> I don't see that in the code.  These ppl_io_unit structures can queue up
> >> and are only submitted later by raid5d (I think).  So if raid5d waits
> >> for one to become free, it will wait forever.
> >>
> >> One easy way around this problem (assuming my understanding is correct)
> >> is to just have a single mempool which allocates both a struct
> >> ppl_io_unit and a page.  You would need to define you own alloc/free
> >> routines for the pool but that is easy enough.
> >>
> >> Then you only need a single mempool_alloc(), which can sensibly be
> >> GFP_ATOMIC.
> >> If that fails, you queue for later handling as you do now.  If it
> >> succeeds, then you continue to use the memory without any risk of
> >> deadlocking.
> > 
> > Maybe Artur is following the raid5-cache code, which uses GFP_ATOMIC with
> > commit: 5036c39(raid5: allow r5l_io_unit allocations to fail). A single pool
> > does make sense.
> 
> That's right, I based this on raid5-cache code. I like the approach that
> Neil proposed, I'll test it some more and probably send a patch soon. Do
> you think GFP_ATOMIC is necessary here or would GFP_NOWAIT be enough?

Since we can handle error, I don't see the reason why we should use GFP_ATOMIC,
GFP_NOWAIT is good to me.

Thanks,
Shaohua

^ permalink raw reply

* Re: [PATCH 12/23] sd: handle REQ_UNMAP
From: Bart Van Assche @ 2017-03-28 16:48 UTC (permalink / raw)
  To: agk-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	lars.ellenberg-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org,
	snitzer-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	hch-jcswGhMUV9g@public.gmane.org,
	martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org,
	philipp.reisner-63ez5xqkn6DQT0dZR+AlfA@public.gmane.org,
	axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org,
	shli-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
  Cc: linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-raid-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	drbd-dev-cunTk1MwBs8qoQakbn7OcQ@public.gmane.org
In-Reply-To: <20170323143341.31549-13-hch-jcswGhMUV9g@public.gmane.org>

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Try to use a write same with unmap bit variant if the device supports it
> and the caller asks for it.
> 
> Signed-off-by: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> ---
>  drivers/scsi/sd.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index b6f70a09a301..ca96bb33471b 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -871,6 +871,16 @@ static int sd_setup_write_zeroes_cmnd(struct scsi_cmnd *cmd)
>  			return BLKPREP_INVALID;
>  		return sd_setup_ata_trim_cmnd(cmd);
>  	}
> +
> +	if (rq->cmd_flags & REQ_UNMAP) {
> +		switch (sdkp->provisioning_mode) {
> +		case SD_LBP_WS16:
> +			return sd_setup_write_same16_cmnd(cmd, true);
> +		case SD_LBP_WS10:
> +			return sd_setup_write_same10_cmnd(cmd, true);
> +		}
> +	}
> +
>  	if (sdp->no_write_same)
>  		return BLKPREP_INVALID;
>  	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff)

Users can change the provisioning mode from user space from SD_LBP_WS16 into
SD_LBP_WS10 so I'm not sure it's safe to skip the (sdkp->ws16 || sector >
0xffffffff || nr_sectors > 0xffff) check if REQ_UNMAP is set.

Bart.

^ permalink raw reply

* Re: [PATCH 11/23] block_dev: use blkdev_issue_zerout for hole punches
From: Bart Van Assche @ 2017-03-28 16:50 UTC (permalink / raw)
  To: agk@redhat.com, lars.ellenberg@linbit.com, snitzer@redhat.com,
	hch@lst.de, martin.petersen@oracle.com,
	philipp.reisner@linbit.com, axboe@kernel.dk, shli@kernel.org
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
In-Reply-To: <20170323143341.31549-12-hch@lst.de>

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> This gets us support for non-discard efficient write of zeroes (e.g. NVMe)
> and preparse for removing the discard_zeroes_data flag.

Hello Christoph,

"preparse" probably should have been "prepare"?

Thanks,

Bart.

^ permalink raw reply

* Re: [PATCH 23/23] block: remove the discard_zeroes_data flag
From: Bart Van Assche @ 2017-03-28 17:00 UTC (permalink / raw)
  To: agk@redhat.com, lars.ellenberg@linbit.com, snitzer@redhat.com,
	hch@lst.de, martin.petersen@oracle.com,
	philipp.reisner@linbit.com, axboe@kernel.dk, shli@kernel.org
  Cc: linux-scsi@vger.kernel.org, dm-devel@redhat.com,
	drbd-dev@lists.linbit.com, linux-block@vger.kernel.org,
	linux-raid@vger.kernel.org
In-Reply-To: <20170323143341.31549-24-hch@lst.de>

On Thu, 2017-03-23 at 10:33 -0400, Christoph Hellwig wrote:
> Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
> kill this hack.
> 
> [ ... ]
>
> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index 2da04ce6aeef..dea212db9df3 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -213,14 +213,8 @@ What:		/sys/block/<disk>/queue/discard_zeroes_data
>  Date:		May 2011
>  Contact:	Martin K. Petersen <martin.petersen@oracle.com>
>  Description:
> -		Devices that support discard functionality may return
> -		stale or random data when a previously discarded block
> -		is read back. This can cause problems if the filesystem
> -		expects discarded blocks to be explicitly cleared. If a
> -		device reports that it deterministically returns zeroes
> -		when a discarded area is read the discard_zeroes_data
> -		parameter will be set to one. Otherwise it will be 0 and
> -		the result of reading a discarded area is undefined.
> +		Will always return 0.  Don't rely on any specific behavior
> +		for discards, and don't read this file.
>  
>  What:		/sys/block/<disk>/queue/write_same_max_bytes
>  Date:		January 2012
>
> [ ... ]
>
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -208,7 +208,7 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
>  
>  static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *page)
>  {
> -	return queue_var_show(queue_discard_zeroes_data(q), page);
> +	return 0;
>  }

Hello Christoph,

It seems to me like the documentation in Documentation/ABI/testing/sysfs-block
and the above code are not in sync. I think the above code will cause reading
from the discard_zeroes_data attribute to return an empty string ("") instead
of "0\n".

BTW, my personal preference is to remove this attribute entirely because keeping
it will cause confusion, no matter how well we document the behavior of this
attribute.

Thanks,

Bart.

^ permalink raw reply

* Re: [PATCH v3] mdadm/r5cache: allow adding journal to array without journal
From: jes.sorensen @ 2017-03-28 17:52 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid, shli, neilb, kernel-team, dan.j.williams, hch
In-Reply-To: <20170317233111.2452831-2-songliubraving@fb.com>

Song Liu <songliubraving@fb.com> writes:
> Currently, --add-journal can be only used to recreate broken journal
> for arrays with journal since  creation. As the kernel code getting
> more mature, this constraint is no longer necessary.
>
> This patch allows --add-journal to add journal to array without
> journal.
>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  Manage.c   | 9 ---------
>  mdadm.8.in | 5 ++---
>  2 files changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/Manage.c b/Manage.c
> index 5c3d2b9..d038b68 100644
> --- a/Manage.c
> +++ b/Manage.c
> @@ -946,7 +946,6 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  
>  	/* only add journal to array that supports journaling */
>  	if (dv->disposition == 'j') {
> -		struct mdinfo mdi;
>  		struct mdinfo *mdp;
>  
>  		mdp = sysfs_read(fd, NULL, GET_ARRAY_STATE);
> @@ -960,14 +959,6 @@ int Manage_add(int fd, int tfd, struct mddev_dev *dv,
>  			pr_err("%s is not readonly, cannot add journal.\n", devname);
>  			return -1;
>  		}
> -
> -		sysfs_free(mdp);
> -

Song,

Sorry for nagging you again on this one, however removing
sysfs_free(mdp) means you are leaking mbp here.

Cheers,
Jes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox