Linux RAID subsystem development

Linux RAID subsystem development
 help / color / mirror / Atom feed

* Re: Using mdadm --grow to resize a RAID1
From: Ben Kamen @ 2017-01-02 17:59 UTC (permalink / raw)
  To: Linux-RAID
In-Reply-To: <e37c42da-71f5-e83e-4e9f-2ec3961a14a0@gmail.com>

On Mon, Jan 2, 2017 at 10:38 AM, Benjammin2068 <benjammin2068@gmail.com> wrote:
>
>
> Yep -- sizes are off and --rereadpt returns "device busy".
>
> Be right back....
>

ok. back.

the block sizes are the same...

There's still no bitmap active (from my turning it off previously)

So I issued the command again to expand the size... and my system has
been acting weird... my access to /proc/mdstat gets stalled if I'm
watching it..

The grow command "mdadm --grow /dev/md1 --size=max" seems to be
"blocking". I don't get back a prompt to enter the --wait command (and
the final bitmap command)

I can't even CTRL-C out of it. I had to reboot once already and the
filesystem comes back up as it was. But "reboot" even gets stuck. I
have to eventually reset.

Sounds like I have to do this from single user mode or an boot disk/CD?

 -Ben

^ permalink raw reply

* Urgent Please;;
From: Dr. David White @ 2017-01-02 17:34 UTC (permalink / raw)


Dear,
With due respect to your person and much sincerity of purpose. I have a business proposal which I will like to handle with you and $14.5 Million USD is involve. But be rest assured that everything is legal and risk free as I have concluded all the arrangements and the legal papers that will back the transaction up. Kindly indicate your interest as to enable me tell you more detail of the proposal. Waiting for your urgent response.

Yours Faithfully,
Dr. David White

^ permalink raw reply

* Re: Using mdadm --grow to resize a RAID1
From: Benjammin2068 @ 2017-01-02 16:38 UTC (permalink / raw)
  To: Linux-RAID
In-Reply-To: <20170102212412.19832496@natsu>



On 01/02/2017 10:24 AM, Roman Mamedov wrote:
> On Mon, 2 Jan 2017 09:55:34 -0600
> Benjammin2068 <benjammin2068@gmail.com> wrote:
>
>>
>> On 01/02/2017 06:19 AM, Wols Lists wrote:
>>> On 02/01/17 10:18, Benjammin2068 wrote:
>>>> I get stuck at the --size max.
>>>>
>>>> it seems the correct syntax is "--size=" but that "max" is not supported as an argument.
> First you say it's unsupported as an argument, but then actually 
>
>>> [root@quantum ~]# mdadm --grow /dev/md1 --size max
>>> mdadm: component size of /dev/md1 unchanged at 239490048K
>>> [root@quantum ~]# mdadm --grow /dev/md1 --size=max
>>> mdadm: component size of /dev/md1 unchanged at 239490048K
> Unsupported as an argument looks like that:
>
>> mdadm --grow /dev/md1 --size=blah
>> mdadm: invalid size: blah

Sorry bout that -- when I said  "unsupported" - it was from the combined results that there is no mention of it in the man page even though the Wiki mentions it and the results are ambiguous in actual use.


> In your case mdadm appears to think there is nowhere to grow the array.
>
> Check with "blockdev --getsz /dev/sda3" and sdb3 that they actually do have a
> proper size (the result will be in 512-byte sectors). 
>
> If not, maybe your kernel didn't re-read the partition table after resizing
> partitions (if that was what you did)?
>
> If the blockdevice size is wrong, do a "blockdev --rereadpt /dev/sda" and sdb,
> or just rebooting (as in some cases rereading will fail with a "device busy"
> error).
>
Yep -- sizes are off and --rereadpt returns "device busy".

Be right back....

 -Ben




^ permalink raw reply

* mdadm git 32bit build errors
From: Thomas Backlund @ 2017-01-02 16:35 UTC (permalink / raw)
  To: Linux-RAID


Trying to build current mdadm git fails on 32bit with:



super-intel.c: In function 'copy_metadata_imsm':
super-intel.c:1948:35: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
   if (read(from, buf, sector_size) != sector_size)
                                    ^
super-intel.c: In function 'read_imsm_migr_rec':
super-intel.c:2820:40: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
       MIGR_REC_BUF_SECTORS*sector_size) !=
                                         ^
super-intel.c: In function 'write_imsm_migr_rec':
super-intel.c:3022:41: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
        MIGR_REC_BUF_SECTORS*sector_size) !=
                                          ^
super-intel.c: In function 'load_imsm_mpb':
super-intel.c:4125:36: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
   if (read(fd, anchor, sector_size) != sector_size) {
                                     ^
super-intel.c: In function 'add_to_super_imsm':
super-intel.c:5641:48: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
        MIGR_REC_BUF_SECTORS*super->sector_size) !=
                                                 ^
super-intel.c: In function 'write_super_imsm':
super-intel.c:5851:43: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
          MIGR_REC_BUF_SECTORS*sector_size) !=
                                            ^
super-intel.c: In function 'store_imsm_mpb':
super-intel.c:8093:34: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
   if (write(fd, buf, sector_size) != sector_size)
                                   ^
super-intel.c: In function 'imsm_manage_reshape':
super-intel.c:11505:42: error: comparison between signed and unsigned 
integer expressions [-Werror=sign-compare]
         MIGR_REC_BUF_SECTORS*sector_size) !=
                                           ^
cc1: all warnings being treated as errors


Seems to be introduced by:
https://git.kernel.org/cgit/utils/mdadm/mdadm.git/commit/super-intel.c?id=de44e46fd4703ea286987d1d0cf775efa62700fd


64bit build is ok.

gcc is 5.4.0

--
Thomas


^ permalink raw reply

* Re: Using mdadm --grow to resize a RAID1
From: Roman Mamedov @ 2017-01-02 16:24 UTC (permalink / raw)
  To: Benjammin2068; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <ca87be95-bfb1-10c6-18d4-9aca4a6e3558@gmail.com>

On Mon, 2 Jan 2017 09:55:34 -0600
Benjammin2068 <benjammin2068@gmail.com> wrote:

> 
> 
> On 01/02/2017 06:19 AM, Wols Lists wrote:
> > On 02/01/17 10:18, Benjammin2068 wrote:
> >> I get stuck at the --size max.
> >>
> >> it seems the correct syntax is "--size=" but that "max" is not supported as an argument.

First you say it's unsupported as an argument, but then actually 

> > [root@quantum ~]# mdadm --grow /dev/md1 --size max
> > mdadm: component size of /dev/md1 unchanged at 239490048K
> 
> > [root@quantum ~]# mdadm --grow /dev/md1 --size=max
> > mdadm: component size of /dev/md1 unchanged at 239490048K

Unsupported as an argument looks like that:

> mdadm --grow /dev/md1 --size=blah
> mdadm: invalid size: blah

In your case mdadm appears to think there is nowhere to grow the array.

Check with "blockdev --getsz /dev/sda3" and sdb3 that they actually do have a
proper size (the result will be in 512-byte sectors). 

If not, maybe your kernel didn't re-read the partition table after resizing
partitions (if that was what you did)?

If the blockdevice size is wrong, do a "blockdev --rereadpt /dev/sda" and sdb,
or just rebooting (as in some cases rereading will fail with a "device busy"
error).

-- 
With respect,
Roman

^ permalink raw reply

* Re: Using mdadm --grow to resize a RAID1
From: Wols Lists @ 2017-01-02 16:19 UTC (permalink / raw)
  To: Jack Wang, Benjammin2068; +Cc: Linux-RAID
In-Reply-To: <CA+res+RgdS92HZ7ZQEs_D8TrQntuhGW-UYEjjARoxeJHKEZhZw@mail.gmail.com>

On 02/01/17 16:14, Jack Wang wrote:
> 2017-01-02 16:55 GMT+01:00 Benjammin2068 <benjammin2068@gmail.com>:
>>
>>
>> On 01/02/2017 06:19 AM, Wols Lists wrote:
>>> On 02/01/17 10:18, Benjammin2068 wrote:
>>>> I get stuck at the --size max.
>>>>
>>>> it seems the correct syntax is "--size=" but that "max" is not supported as an argument.
>>>>
>>>> little help?
>>> Off the top of my head - try with no argument. iirc it defaults to max,
>>> you don't need to tell it.
>>>
>>
>> Did that.
>>
>>
>>> [root@quantum ~]# mdadm --grow /dev/md1
>>> mdadm: no changes to --grow
>>
>>> [root@quantum ~]# mdadm --grow /dev/md1 --size
>>> mdadm: option '--size' requires an argument
>>> Usage: mdadm --help
>>>   for help
>>
>>> [root@quantum ~]# mdadm --grow /dev/md1 --size max
>>> mdadm: component size of /dev/md1 unchanged at 239490048K
>>
>>> [root@quantum ~]# mdadm --grow /dev/md1 --size=max
>>> mdadm: component size of /dev/md1 unchanged at 239490048K
> 
> So we need someone who has edit permission to update the the wiki page
> in https://raid.wiki.kernel.org/index.php/Growing
> 
I've just checked the wiki page - it's not one of the ones I've yet
updated. I will be only too happy to do so, as soon as we can work out
(or someone tells us) the correct syntax to do so.

I must admit, using "man mdadm", it looks like Benjammin is doing the
right thing, so something's not quite right here ...

Cheers,
Wol

^ permalink raw reply

* Re: Using mdadm --grow to resize a RAID1
From: Jack Wang @ 2017-01-02 16:14 UTC (permalink / raw)
  To: Benjammin2068; +Cc: Wols Lists, Linux-RAID
In-Reply-To: <ca87be95-bfb1-10c6-18d4-9aca4a6e3558@gmail.com>

2017-01-02 16:55 GMT+01:00 Benjammin2068 <benjammin2068@gmail.com>:
>
>
> On 01/02/2017 06:19 AM, Wols Lists wrote:
>> On 02/01/17 10:18, Benjammin2068 wrote:
>>> I get stuck at the --size max.
>>>
>>> it seems the correct syntax is "--size=" but that "max" is not supported as an argument.
>>>
>>> little help?
>> Off the top of my head - try with no argument. iirc it defaults to max,
>> you don't need to tell it.
>>
>
> Did that.
>
>
>> [root@quantum ~]# mdadm --grow /dev/md1
>> mdadm: no changes to --grow
>
>> [root@quantum ~]# mdadm --grow /dev/md1 --size
>> mdadm: option '--size' requires an argument
>> Usage: mdadm --help
>>   for help
>
>> [root@quantum ~]# mdadm --grow /dev/md1 --size max
>> mdadm: component size of /dev/md1 unchanged at 239490048K
>
>> [root@quantum ~]# mdadm --grow /dev/md1 --size=max
>> mdadm: component size of /dev/md1 unchanged at 239490048K

So we need someone who has edit permission to update the the wiki page
in https://raid.wiki.kernel.org/index.php/Growing

^ permalink raw reply

* Re: Using mdadm --grow to resize a RAID1
From: Benjammin2068 @ 2017-01-02 15:55 UTC (permalink / raw)
  To: Wols Lists, Linux-RAID
In-Reply-To: <586A454F.1010308@youngman.org.uk>



On 01/02/2017 06:19 AM, Wols Lists wrote:
> On 02/01/17 10:18, Benjammin2068 wrote:
>> I get stuck at the --size max.
>>
>> it seems the correct syntax is "--size=" but that "max" is not supported as an argument.
>>
>> little help?
> Off the top of my head - try with no argument. iirc it defaults to max,
> you don't need to tell it.
>

Did that.


> [root@quantum ~]# mdadm --grow /dev/md1
> mdadm: no changes to --grow

> [root@quantum ~]# mdadm --grow /dev/md1 --size
> mdadm: option '--size' requires an argument
> Usage: mdadm --help
>   for help

> [root@quantum ~]# mdadm --grow /dev/md1 --size max
> mdadm: component size of /dev/md1 unchanged at 239490048K

> [root@quantum ~]# mdadm --grow /dev/md1 --size=max
> mdadm: component size of /dev/md1 unchanged at 239490048K


^ permalink raw reply

* Re: [dm-devel] [PATCH v2 1/1] block: fix blk_queue_split() resource exhaustion
From: Jack Wang @ 2017-01-02 14:33 UTC (permalink / raw)
  To: Lars Ellenberg
  Cc: Michael Wang, Jens Axboe, linux-block, Martin K. Petersen,
	Mike Snitzer, Peter Zijlstra, Jiri Kosina, Ming Lei,
	Kirill A. Shutemov, NeilBrown, linux-kernel, linux-raid,
	Takashi Iwai, linux-bcache@vger.kernel.org, Zheng Liu,
	Kent Overstreet, Keith Busch, device-mapper development,
	Shaohua Li, Ingo Molnar
In-Reply-To: <20161223114553.GP4138@soda.linbit>

[-- Attachment #1: Type: text/plain, Size: 1285 bytes --]

2016-12-23 12:45 GMT+01:00 Lars Ellenberg <lars.ellenberg@linbit.com>:
> On Fri, Dec 23, 2016 at 09:49:53AM +0100, Michael Wang wrote:
>> Dear Maintainers
>>
>> I'd like to ask for the status of this patch since we hit the
>> issue too during our testing on md raid1.
>>
>> Split remainder bio_A was queued ahead, following by bio_B for
>> lower device, at this moment raid start freezing, the loop take
>> out bio_A firstly and deliver it, which will hung since raid is
>> freezing, while the freezing never end since it waiting for
>> bio_B to finish, and bio_B is still on the queue, waiting for
>> bio_A to finish...
>>
>> We're looking for a good solution and we found this patch
>> already progressed a lot, but we can't find it on linux-next,
>> so we'd like to ask are we still planning to have this fix
>> in upstream?
>
> I don't see why not, I'd even like to have it in older kernels,
> but did not have the time and energy to push it.
>
> Thanks for the bump.
>
>         Lars
>
Hi folks,

As Michael mentioned, we hit a bug this patch is trying to fix.
Neil suggested another way to fix it.  I attached below.
I personal prefer Neil's version as it's less code change, and straight forward.

Could you share your comments, we can get one fix into mainline.

Thanks,
Jinpu

[-- Attachment #2: 0001-block-fix-deadlock-between-freeze_array-and-wait_bar.patch --]
[-- Type: text/x-patch, Size: 2366 bytes --]

From 69a4829a55503e496ce9c730d2c8e3dd8a08874a Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.com>
Date: Wed, 14 Dec 2016 16:55:52 +0100
Subject: [PATCH] block: fix deadlock between freeze_array() and wait_barrier()

When we call wait_barrier, we might have some bios waiting
in current->bio_list, which prevents the array_freeze call to
complete. Those can only be internal READs, which have already
passed the wait_barrier call (thus incrementing nr_pending), but
still were not submitted to the lower level, due to generic_make_request
logic to avoid recursive calls. In such case, we have a deadlock:
- array_frozen is already set to 1, so wait_barrier unconditionally waits, so
- internal READ bios will not be submitted, thus freeze_array will
never completes.

To fix this, modify generic_make_request to always sort bio_list_on_stack
first with lowest level, then higher, until same level.

Sent to linux-raid mail list:
https://marc.info/?l=linux-raid&m=148232453107685&w=2

Suggested-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
---
 block/blk-core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 9e3ac56..47ef373 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -2138,10 +2138,30 @@ blk_qc_t generic_make_request(struct bio *bio)
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 
 		if (likely(blk_queue_enter(q, __GFP_DIRECT_RECLAIM) == 0)) {
+			struct bio_list lower, same, hold;
+
+			/* Create a fresh bio_list for all subordinate requests */
+			bio_list_init(&hold);
+			bio_list_merge(&hold, &bio_list_on_stack);
+			bio_list_init(&bio_list_on_stack);
 
 			ret = q->make_request_fn(q, bio);
 
 			blk_queue_exit(q);
+			/* sort new bios into those for a lower level
+			 * and those for the same level
+			 */
+			bio_list_init(&lower);
+			bio_list_init(&same);
+			while ((bio = bio_list_pop(&bio_list_on_stack)) != NULL)
+				if (q == bdev_get_queue(bio->bi_bdev))
+					bio_list_add(&same, bio);
+				else
+					bio_list_add(&lower, bio);
+			/* now assemble so we handle the lowest level first */
+			bio_list_merge(&bio_list_on_stack, &lower);
+			bio_list_merge(&bio_list_on_stack, &same);
+			bio_list_merge(&bio_list_on_stack, &hold);
 
 			bio = bio_list_pop(current->bio_list);
 		} else {
-- 
2.7.4


^ permalink raw reply related

* Re: Using mdadm --grow to resize a RAID1
From: Wols Lists @ 2017-01-02 12:19 UTC (permalink / raw)
  To: Benjammin2068, Linux-RAID
In-Reply-To: <4716208e-12a9-1ae8-46f4-a89ae15d9077@gmail.com>

On 02/01/17 10:18, Benjammin2068 wrote:
> I get stuck at the --size max.
> 
> it seems the correct syntax is "--size=" but that "max" is not supported as an argument.
> 
> little help?

Off the top of my head - try with no argument. iirc it defaults to max,
you don't need to tell it.

Cheers,
Wol

^ permalink raw reply

* Using mdadm --grow to resize a RAID1
From: Benjammin2068 @ 2017-01-02 10:18 UTC (permalink / raw)
  To: Linux-RAID

hey all,

Quick question...

I've replaced and resync'd my root drives /dev/sda and /dev/sdb with 1TB drives up from the 250GB drives I started with.

/boot and swap are fine as is, but I'd like the root mount "/" to use the max available. (as anyone would)

My partition setup looks like:


> Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x0007868a
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1            2048     8390655     4194304   fd  Linux raid autodetect
> /dev/sda2   *     8390656     9414655      512000   fd  Linux raid autodetect
> /dev/sda3         9414656  1953525167   972055256   fd  Linux raid autodetect
>
> Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
> 255 heads, 63 sectors/track, 121601 cylinders, total 1953525168 sectors
> Units = sectors of 1 * 512 = 512 bytes
> Sector size (logical/physical): 512 bytes / 4096 bytes
> I/O size (minimum/optimal): 4096 bytes / 4096 bytes
> Disk identifier: 0x000d707c
>
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sdb1            2048     8390655     4194304   fd  Linux raid autodetect
> /dev/sdb2   *     8390656     9414655      512000   fd  Linux raid autodetect
> /dev/sdb3         9414656  1953525167   972055256   fd  Linux raid autodetect
>
>

Currently, the array I want to resize has:

> [root@quantum ~]# mdadm --detail /dev/md1
> /dev/md1:
>         Version : 1.1
>   Creation Time : Sat Jul 30 15:21:45 2011
>      Raid Level : raid1
>      Array Size : 239490048 (228.40 GiB 245.24 GB)
>   Used Dev Size : 239490048 (228.40 GiB 245.24 GB)
>    Raid Devices : 2
>   Total Devices : 2
>     Persistence : Superblock is persistent
>
>     Update Time : Mon Jan  2 04:17:51 2017
>           State : clean
>  Active Devices : 2
> Working Devices : 2
>  Failed Devices : 0
>   Spare Devices : 0
>
>            Name : localhost.localdomain:1
>            UUID : 8d487d89:63690945:85c66cf4:e9885093
>          Events : 161820
>
>     Number   Major   Minor   RaidDevice State
>        2       8       19        0      active sync   /dev/sdb3
>        4       8        3        1      active sync   /dev/sda3

When I reference https://raid.wiki.kernel.org/index.php/Growing

>  mdadm --grow /dev/mdX --bitmap none
>  mdadm --grow /dev/mdX --size max
>  mdadm --wait /dev/mdX
>  mdadm --grow /dev/mdX --bitmap internal


I get stuck at the --size max.

it seems the correct syntax is "--size=" but that "max" is not supported as an argument.

little help?


Thanks,

 -Ben



^ permalink raw reply

* Re: [RFC PATCH v2] crypto: Add IV generation algorithms
From: Binoy Jayan @ 2017-01-02  7:05 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Milan Broz, Oded, Ofir, David S. Miller, linux-crypto, Mark Brown,
	Arnd Bergmann, Linux kernel mailing list, Alasdair Kergon,
	Mike Snitzer, dm-devel, Shaohua Li, linux-raid, Rajendra
In-Reply-To: <20170102065325.GA19553@gondor.apana.org.au>

On 2 January 2017 at 12:23, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Mon, Jan 02, 2017 at 12:16:45PM +0530, Binoy Jayan wrote:
>>
>> Even if ciphers are allocated this way, all the encryption requests
>> for cbc should still go through IV generators? So that should mean,
>> create one instance of IV generator using 'crypto_alloc_skcipher'
>> and create tfms_count instances of the generator depending on the
>> number of keys.
>
> Right.  The actual number of underlying tfms that do the work
> won't change compared to the status quo.  We're just structuring
> it such that if the overall scheme is supported by the hardware
> then we can feed more than one sector at a time to it.

Thank you Herbert.

^ permalink raw reply

* Re: [RFC PATCH v2] crypto: Add IV generation algorithms
From: Herbert Xu @ 2017-01-02  6:53 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Milan Broz, Oded, Ofir, David S. Miller, linux-crypto, Mark Brown,
	Arnd Bergmann, Linux kernel mailing list, Alasdair Kergon,
	Mike Snitzer, dm-devel, Shaohua Li, linux-raid, Rajendra
In-Reply-To: <CAHv-k_8FmeKk_3zUAVCqHp82nHmiWsyfZ_BW+z=SC5VVOrFsAA@mail.gmail.com>

On Mon, Jan 02, 2017 at 12:16:45PM +0530, Binoy Jayan wrote:
> 
> Even if ciphers are allocated this way, all the encryption requests
> for cbc should still go through IV generators? So that should mean,
> create one instance of IV generator using 'crypto_alloc_skcipher'
> and create tfms_count instances of the generator depending on the
> number of keys.

Right.  The actual number of underlying tfms that do the work
won't change compared to the status quo.  We're just structuring
it such that if the overall scheme is supported by the hardware
then we can feed more than one sector at a time to it.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC PATCH v2] crypto: Add IV generation algorithms
From: Binoy Jayan @ 2017-01-02  6:46 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Milan Broz, Oded, Ofir, David S. Miller, linux-crypto, Mark Brown,
	Arnd Bergmann, Linux kernel mailing list, Alasdair Kergon,
	Mike Snitzer, dm-devel, Shaohua Li, linux-raid, Rajendra
In-Reply-To: <20161230102723.GA15713@gondor.apana.org.au>

Hi Herbert,

On 30 December 2016 at 15:57, Herbert Xu <herbert@gondor.apana.org.au> wrote:

> This is just a matter of structuring the key for the IV generator.
> The IV generator's key in this case should be a combination of the
> key to the underlying CBC plus the set of all keys for the IV
> generator itself.  It should then allocate the required number of
> tfms as is currently done by crypt_alloc_tfms in dm-crypt.

Since I used template ciphers for the iv algorithms, I use
crypto_spawn_skcipher_alg and skcipher_register_instance
for creating the underlying cbc algorithm. I guess you suggest
to change that to make use of crypto_alloc_skcipher.

Even if ciphers are allocated this way, all the encryption requests
for cbc should still go through IV generators? So that should mean,
create one instance of IV generator using 'crypto_alloc_skcipher'
and create tfms_count instances of the generator depending on the
number of keys.

Thanks,
Binoy

^ permalink raw reply

* Write mostly flag
From: Victor Garcia @ 2017-01-01  7:37 UTC (permalink / raw)
  To: linux-raid

Hello all

I am trying to set up a raid 1 array using an SSD and a standard SAT
HDD drives. As I read, it's better to set the write-mostly flag to the
HDD partition of the array to keep SSD performance times.

This is the command I'm using:

# mdadm --create /dev/md1 -n 2 -l 1 --metadata=0.90 --bitmap=internal
/dev/nvme0n1p5 -W /dev/sda8 --write-behind

I expect seeing the (W) indicator in mdstat, but this is not showing:

# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda8[1] nvme0n1p5[0]
      52400064 blocks [2/2] [UU]
      [>....................]  resync =  0.3% (175936/52400064)
finish=9.8min speed=87968K/sec

unused devices: <none>

 But if I force later on the flag by means of the sys FS handle, then
it does appear:

# cat /sys/block/md1/md/dev-sda8/state
in_sync
# echo writemostly > /sys/block/md1/md/dev-sda8/state
# cat /sys/block/md1/md/dev-sda8/state
in_sync,write_mostly
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sda8[1](W) nvme0n1p5[0]
      52400064 blocks [2/2] [UU]
      [==>..................]  resync = 11.0% (5806336/52400064)
finish=9.8min speed=78966K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

How is it that the (W) is not showing in the first place?

BTW, this is the system being used:

# uname -a
Linux ubuntu 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC
2016 x86_64 x86_64 x86_64 GNU/Linu

Thanks and regards
Victor

^ permalink raw reply

* Re: [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs
From: Ming Lei @ 2016-12-31 10:29 UTC (permalink / raw)
  To: Coly Li
  Cc: Jens Axboe, Linux Kernel Mailing List, linux-block,
	Christoph Hellwig, Kent Overstreet, Shaohua Li, Mike Christie,
	Guoqing Jiang, open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT
In-Reply-To: <4d015585-c830-17ff-de4a-60f2d5f06a0d@coly.li>

Hi Coly,

On Fri, Dec 30, 2016 at 7:01 PM, Coly Li <i@coly.li> wrote:
> On 2016/12/27 下午11:56, Ming Lei wrote:
>> The incoming bio may be too big to be cloned into
>> one singlepage bvecs bio, so split the bio and
>> check the splitted bio one by one.
>>
>> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
>> ---
>>  drivers/md/bcache/debug.c | 24 ++++++++++++++++++++++--
>>  1 file changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
>> index 48d03e8b3385..18b2d2d138e3 100644
>> --- a/drivers/md/bcache/debug.c
>> +++ b/drivers/md/bcache/debug.c
>> @@ -103,7 +103,7 @@ void bch_btree_verify(struct btree *b)
>>       up(&b->io_mutex);
>>  }
>>
>> -void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>> +static void __bch_data_verify(struct cached_dev *dc, struct bio *bio)
>>  {
>>       char name[BDEVNAME_SIZE];
>>       struct bio *check;
>> @@ -116,7 +116,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>>        * in the new cloned bio because each single page need
>>        * to assign to each bvec of the new bio.
>>        */
>> -     check = bio_clone(bio, GFP_NOIO);
>> +     check = bio_clone_sp(bio, GFP_NOIO);
>>       if (!check)
>>               return;
>>       check->bi_opf = REQ_OP_READ;
>> @@ -151,6 +151,26 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>>       bio_put(check);
>>  }
>>
>> +void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>> +{
>> +     struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>> +     struct bio *clone = bio_clone_fast(bio, GFP_NOIO, q->bio_split);
>> +     unsigned sectors;
>> +
>> +     while (!bio_can_convert_to_sp(clone, &sectors)) {
>> +             struct bio *split = bio_split(clone, sectors,
>> +                                           GFP_NOIO, q->bio_split);
>> +
>> +             __bch_data_verify(dc, split);
>> +             bio_put(split);
>> +     }
>> +
>> +     if (bio_sectors(clone))
>> +             __bch_data_verify(dc, clone);
>> +
>> +     bio_put(clone);
>> +}
>> +
>
> Hi Lei,
>
> The above patch is good IMHO. Just wondering why not use the classical
> style ? something like,

I don't know there is the classical style, :-)

>
>
> do {
>         if (!bio_can_convert_to_sp(clone, &sectors))
>                 split = bio_split(clone, sectors,
>                                   GFP_NOIO, q->bio_split);
>         else
>                 split = clone;
>
>         __bch_data_verity(gc, split);
>         bio_put(split);
> } while (split != clone);
>
>
> I guess maybe the above style generates less binary code.

Maybe, will take this style in V2.

Thanks for the review!

-- 
Ming Lei

^ permalink raw reply

* Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
From: MasterPrenium @ 2016-12-31  0:00 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: linux-kernel, xen-users, linux-raid, shli,
	MasterPrenium@gmail.com, xen-devel
In-Reply-To: <wrfj8tqxf89j.fsf@redhat.com>

Hello,

Thanks for your reply. DRBD isn't part of the kernel ? I was thinking it 
has been included since 2.6.3x ?

I've just tested without DRBD, the issue seems to remain. Can't see the 
"BUG", but the kernel crashed also. (A little bit later)
I don't have full dump since I lost my network connection and my serial 
connection.
Here is a picture of what I got : 
http://img15.hostingpics.net/pics/113882KernelError6.png
Another one : http://img11.hostingpics.net/pics/164702KernelError7.png

It also seems to me that having the "glances" monitoring software 
running in dom0, makes the kernel crashes quicker, don't think this can 
help but... just in case...

Any idea / test I can make ? This is really a blocking issue with 
potential data loss...

Best regards,
MasterPrenium

Le 30/12/2016 21:54, Jes Sorensen a écrit :
> MasterPrenium<masterprenium.lkml@gmail.com>  writes:
>> Hello Guys,
>>
>> I've having some trouble on a new system I'm setting up. I'm getting a
>> kernel BUG message, seems to be related with the use of Xen (when I
>> boot the system _without_ Xen, I don't get any crash).
>> Here is configuration :
>> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
>> - On top of it, DRBD for replication over another node (Active/passive cluster)
>> - On top of it, a BTRFS FileSystem with a few subvolumes
>> - On top of it, XEN VMs running.
>>
>> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync
>> for example) on the RAID5 stack.
>> I've to reset system to make it work again.
>>
>> Reproducible : ALWAYS (making the i/o, it crash in 2-5mins). Also
>> reproducible on another system with the same hardware.
>>
>> Kernel versions impacted (at least): kernel-4.4.26, kernel-4.8.15, kernel-4.9.0
> Well you have one foreign object in there that is not part of the
> kernel and which shows up in the OOPS: DRDB
>
> What happens when you remove that from the equation?
>
> Jes

^ permalink raw reply

* Re: [PATCH v1] dm-crypt: replace custom implementation of hex2bin()
From: Andy Shevchenko @ 2016-12-30 21:51 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, linux-raid; +Cc: Andy Shevchenko
In-Reply-To: <1462922651-6465-1-git-send-email-andy.shevchenko@gmail.com>

On Wed, May 11, 2016 at 2:24 AM, Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> There is no need to have a duplication of the generic library, i.e. hex2bin().
> Replace the open coded variant.
>

Ping?

> Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
> ---
>  drivers/md/dm-crypt.c | 27 ++-------------------------
>  1 file changed, 2 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index 4f3cb35..eb30284 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -1407,30 +1407,6 @@ static void kcryptd_queue_crypt(struct dm_crypt_io *io)
>         queue_work(cc->crypt_queue, &io->work);
>  }
>
> -/*
> - * Decode key from its hex representation
> - */
> -static int crypt_decode_key(u8 *key, char *hex, unsigned int size)
> -{
> -       char buffer[3];
> -       unsigned int i;
> -
> -       buffer[2] = '\0';
> -
> -       for (i = 0; i < size; i++) {
> -               buffer[0] = *hex++;
> -               buffer[1] = *hex++;
> -
> -               if (kstrtou8(buffer, 16, &key[i]))
> -                       return -EINVAL;
> -       }
> -
> -       if (*hex != '\0')
> -               return -EINVAL;
> -
> -       return 0;
> -}
> -
>  static void crypt_free_tfms(struct crypt_config *cc)
>  {
>         unsigned i;
> @@ -1502,7 +1478,8 @@ static int crypt_set_key(struct crypt_config *cc, char *key)
>         if (!cc->key_size && strcmp(key, "-"))
>                 goto out;
>
> -       if (cc->key_size && crypt_decode_key(cc->key, key, cc->key_size) < 0)
> +       /* Decode key from its hex representation. */
> +       if (cc->key_size && hex2bin(cc->key, key, cc->key_size) < 0)
>                 goto out;
>
>         set_bit(DM_CRYPT_KEY_VALID, &cc->flags);
> --
> 2.8.2
>



-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
From: Jes Sorensen @ 2016-12-30 20:54 UTC (permalink / raw)
  To: MasterPrenium
  Cc: linux-kernel, xen-users, linux-raid, shli,
	MasterPrenium@gmail.com, xen-devel
In-Reply-To: <585D6C34.2020908@gmail.com>

MasterPrenium <masterprenium.lkml@gmail.com> writes:
> Hello Guys,
>
> I've having some trouble on a new system I'm setting up. I'm getting a
> kernel BUG message, seems to be related with the use of Xen (when I
> boot the system _without_ Xen, I don't get any crash).
> Here is configuration :
> - 3x Hard Drives running on RAID 5 Software raid created by mdadm
> - On top of it, DRBD for replication over another node (Active/passive cluster)
> - On top of it, a BTRFS FileSystem with a few subvolumes
> - On top of it, XEN VMs running.
>
> The BUG is happening when I'm making "huge" I/O (20MB/s with a rsync
> for example) on the RAID5 stack.
> I've to reset system to make it work again.
>
> Reproducible : ALWAYS (making the i/o, it crash in 2-5mins). Also
> reproducible on another system with the same hardware.
>
> Kernel versions impacted (at least): kernel-4.4.26, kernel-4.8.15, kernel-4.9.0

Well you have one foreign object in there that is not part of the
kernel and which shows up in the OOPS: DRDB

What happens when you remove that from the equation?

Jes

^ permalink raw reply

* Re: Intel SSD or other brands
From: Doug Dumitru @ 2016-12-30 18:23 UTC (permalink / raw)
  To: Pasi Kärkkäinen; +Cc: Adam Goryachev, linux-raid@vger.kernel.org
In-Reply-To: <20161230163210.GW28824@reaktio.net>

Pasi,

The software is proprietary, but can run on most 64-bit Linux systems
in a server setting.  It also works wonders with "embedded Flash" (SD,
eMMC, USB, etc.) for 32-bit x86 and ARM systems.

Please send me a direct email, off-list, and I can give you more details.

Doug Dumitru

On Fri, Dec 30, 2016 at 8:32 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> Hello,
>
> On Thu, Dec 29, 2016 at 05:24:16PM -0800, Doug Dumitru wrote:
>>
>> My test is of a "managed" array with a "host side Flash Translation
>> Layer".  This means that software is linearizing the writes before
>> RAID-5 sees them.  This is how the major "storage appliance" vendors
>> get really fast performance.  One vendor, running an earlier version
>> of the software I am running here, was able to support 5000 ESXI VDI
>> clients from a single 2U storage server (with a lot of FC cards).  The
>> boot storm took about 3 minutes to settle.
>>
>
> Does this software happen to be opensource / publicly available ?
>
>
> Thanks,
>
> -- Pasi
>
>> Single drives are around 500 MB/sec which is 125K IOPS through our
>> engine.  Eight drives are (8-1)x500=3500 MB/sec or 900K IOPS.  This is
>> actually faster than FIO can generate a test pattern from a single
>> job.  It is also faster than stock RAID-5 can linearly write without
>> patches.
>>
>> In terms of wear, lots of users are running very light write
>> environments.  This is good as many configurations are > 50:1 write
>> amp if you measure "end to end".  By end to end, I mean, how many
>> flash writes happen when you create a small file inside of a file
>> system.  This leads to "file system write amp" x "raid write amp" x
>> "SSD write amp".  Some people don't like this approach as the file
>> system is often "off limits" and a black box.  Then again, some file
>> systems are better than others (for 10K sync creates, EXT4 and XFS are
>> both about 4.4:1 whereas ZFS is a lot worse).  And with EXT4/XFS, you
>> can mitigate some of this with an SSD or mapping layer that compresses
>> blocks.
>>
>> Doug Dumitru
>>
>



-- 
Doug Dumitru
EasyCo LLC

^ permalink raw reply

* Re: [PATCH v1 07/54] bcache: comment on direct access to bvec table
From: Coly Li @ 2016-12-30 16:56 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Kent Overstreet, Shaohua Li, Guoqing Jiang, Zheng Liu,
	Mike Christie, Jiri Kosina, Eric Wheeler, Yijing Wang, Al Viro,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT
In-Reply-To: <1482854250-13481-8-git-send-email-tom.leiming@gmail.com>

On 2016/12/27 下午11:55, Ming Lei wrote:
> Looks all are safe after multipage bvec is supported.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/bcache/btree.c | 1 +
>  drivers/md/bcache/super.c | 6 ++++++
>  drivers/md/bcache/util.c  | 7 +++++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index a43eedd5804d..fc35cfb4d0f1 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -428,6 +428,7 @@ static void do_btree_node_write(struct btree *b)
>  
>  		continue_at(cl, btree_node_write_done, NULL);
>  	} else {
> +		/* No harm for multipage bvec since the new is just allocated */
>  		b->bio->bi_vcnt = 0;
>  		bch_bio_map(b->bio, i);
>  
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 3a19cbc8b230..607b022259dc 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -208,6 +208,7 @@ static void write_bdev_super_endio(struct bio *bio)
>  
>  static void __write_super(struct cache_sb *sb, struct bio *bio)
>  {
> +	/* single page bio, safe for multipage bvec */
>  	struct cache_sb *out = page_address(bio->bi_io_vec[0].bv_page);
>  	unsigned i;
>  
> @@ -1156,6 +1157,8 @@ static void register_bdev(struct cache_sb *sb, struct page *sb_page,
>  	dc->bdev->bd_holder = dc;
>  
>  	bio_init(&dc->sb_bio, dc->sb_bio.bi_inline_vecs, 1);
> +
> +	/* single page bio, safe for multipage bvec */
>  	dc->sb_bio.bi_io_vec[0].bv_page = sb_page;
>  	get_page(sb_page);
>  
> @@ -1799,6 +1802,7 @@ void bch_cache_release(struct kobject *kobj)
>  	for (i = 0; i < RESERVE_NR; i++)
>  		free_fifo(&ca->free[i]);
>  
> +	/* single page bio, safe for multipage bvec */
>  	if (ca->sb_bio.bi_inline_vecs[0].bv_page)
>  		put_page(ca->sb_bio.bi_io_vec[0].bv_page);
>  
> @@ -1854,6 +1858,8 @@ static int register_cache(struct cache_sb *sb, struct page *sb_page,
>  	ca->bdev->bd_holder = ca;
>  
>  	bio_init(&ca->sb_bio, ca->sb_bio.bi_inline_vecs, 1);
> +
> +	/* single page bio, safe for multipage bvec */
>  	ca->sb_bio.bi_io_vec[0].bv_page = sb_page;
>  	get_page(sb_page);
>  
> diff --git a/drivers/md/bcache/util.c b/drivers/md/bcache/util.c
> index dde6172f3f10..5cc0b49a65fb 100644
> --- a/drivers/md/bcache/util.c
> +++ b/drivers/md/bcache/util.c
> @@ -222,6 +222,13 @@ uint64_t bch_next_delay(struct bch_ratelimit *d, uint64_t done)
>  		: 0;
>  }
>  
> +/*
> + * Generally it isn't good to access .bi_io_vec and .bi_vcnt
> + * directly, the preferred way is bio_add_page, but in
> + * this case, bch_bio_map() supposes that the bvec table
> + * is empty, so it is safe to access .bi_vcnt & .bi_io_vec
> + * in this way even after multipage bvec is supported.
> + */
>  void bch_bio_map(struct bio *bio, void *base)
>  {
>  	size_t size = bio->bi_iter.bi_size;
> 

Acked-by: Coly Li <colyli@suse.de>



^ permalink raw reply

* Re: Intel SSD or other brands
From: Pasi Kärkkäinen @ 2016-12-30 16:32 UTC (permalink / raw)
  To: Doug Dumitru; +Cc: Adam Goryachev, linux-raid@vger.kernel.org
In-Reply-To: <CAFx4rwRUnfCXgM66oa+-GYbtthVZvtHQkb2-J3XqUU9YXWF6sA@mail.gmail.com>

Hello,

On Thu, Dec 29, 2016 at 05:24:16PM -0800, Doug Dumitru wrote:
> 
> My test is of a "managed" array with a "host side Flash Translation
> Layer".  This means that software is linearizing the writes before
> RAID-5 sees them.  This is how the major "storage appliance" vendors
> get really fast performance.  One vendor, running an earlier version
> of the software I am running here, was able to support 5000 ESXI VDI
> clients from a single 2U storage server (with a lot of FC cards).  The
> boot storm took about 3 minutes to settle.
>

Does this software happen to be opensource / publicly available ? 


Thanks,

-- Pasi
 
> Single drives are around 500 MB/sec which is 125K IOPS through our
> engine.  Eight drives are (8-1)x500=3500 MB/sec or 900K IOPS.  This is
> actually faster than FIO can generate a test pattern from a single
> job.  It is also faster than stock RAID-5 can linearly write without
> patches.
> 
> In terms of wear, lots of users are running very light write
> environments.  This is good as many configurations are > 50:1 write
> amp if you measure "end to end".  By end to end, I mean, how many
> flash writes happen when you create a small file inside of a file
> system.  This leads to "file system write amp" x "raid write amp" x
> "SSD write amp".  Some people don't like this approach as the file
> system is often "off limits" and a black box.  Then again, some file
> systems are better than others (for 10K sync creates, EXT4 and XFS are
> both about 4.4:1 whereas ZFS is a lot worse).  And with EXT4/XFS, you
> can mitigate some of this with an SSD or mapping layer that compresses
> blocks.
> 
> Doug Dumitru
> 


^ permalink raw reply

* Re: [PATCH v1 08/54] block: comment on bio_alloc_pages()
From: Coly Li @ 2016-12-30 11:06 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Jens Axboe, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang, Hannes Reinecke,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT
In-Reply-To: <1482854250-13481-9-git-send-email-tom.leiming@gmail.com>

On 2016/12/27 下午11:55, Ming Lei wrote:
> This patch adds comment on usage of bio_alloc_pages(),
> also comments on one special case of bch_data_verify().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  block/bio.c               | 4 +++-
>  drivers/md/bcache/debug.c | 6 ++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 2b375020fc49..d4a1e0b63ea0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -961,7 +961,9 @@ EXPORT_SYMBOL(bio_advance);
>   * @bio: bio to allocate pages for
>   * @gfp_mask: flags for allocation
>   *
> - * Allocates pages up to @bio->bi_vcnt.
> + * Allocates pages up to @bio->bi_vcnt, and this function should only
> + * be called on a new initialized bio, which means all pages aren't added
> + * to the bio via bio_add_page() yet.
>   *
>   * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
>   * freed.
> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
> index 06f55056aaae..48d03e8b3385 100644
> --- a/drivers/md/bcache/debug.c
> +++ b/drivers/md/bcache/debug.c
> @@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	struct bio_vec bv, cbv;
>  	struct bvec_iter iter, citer = { 0 };
>  
> +	/*
> +	 * Once multipage bvec is supported, the bio_clone()
> +	 * has to make sure page count in this bio can be held
> +	 * in the new cloned bio because each single page need
> +	 * to assign to each bvec of the new bio.
> +	 */
>  	check = bio_clone(bio, GFP_NOIO);
>  	if (!check)
>  		return;
> 
Acked-by: Coly Li <colyli@suse.de>

^ permalink raw reply

* Re: [PATCH v1 23/54] bcache: handle bio_clone() & bvec updating for multipage bvecs
From: Coly Li @ 2016-12-30 11:01 UTC (permalink / raw)
  To: Ming Lei
  Cc: Jens Axboe, linux-kernel, linux-block, Christoph Hellwig,
	Kent Overstreet, Shaohua Li, Mike Christie, Guoqing Jiang,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT
In-Reply-To: <1482854250-13481-24-git-send-email-tom.leiming@gmail.com>

On 2016/12/27 下午11:56, Ming Lei wrote:
> The incoming bio may be too big to be cloned into
> one singlepage bvecs bio, so split the bio and
> check the splitted bio one by one.
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  drivers/md/bcache/debug.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
> index 48d03e8b3385..18b2d2d138e3 100644
> --- a/drivers/md/bcache/debug.c
> +++ b/drivers/md/bcache/debug.c
> @@ -103,7 +103,7 @@ void bch_btree_verify(struct btree *b)
>  	up(&b->io_mutex);
>  }
>  
> -void bch_data_verify(struct cached_dev *dc, struct bio *bio)
> +static void __bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  {
>  	char name[BDEVNAME_SIZE];
>  	struct bio *check;
> @@ -116,7 +116,7 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	 * in the new cloned bio because each single page need
>  	 * to assign to each bvec of the new bio.
>  	 */
> -	check = bio_clone(bio, GFP_NOIO);
> +	check = bio_clone_sp(bio, GFP_NOIO);
>  	if (!check)
>  		return;
>  	check->bi_opf = REQ_OP_READ;
> @@ -151,6 +151,26 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	bio_put(check);
>  }
>  
> +void bch_data_verify(struct cached_dev *dc, struct bio *bio)
> +{
> +	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> +	struct bio *clone = bio_clone_fast(bio, GFP_NOIO, q->bio_split);
> +	unsigned sectors;
> +
> +	while (!bio_can_convert_to_sp(clone, &sectors)) {
> +		struct bio *split = bio_split(clone, sectors,
> +					      GFP_NOIO, q->bio_split);
> +
> +		__bch_data_verify(dc, split);
> +		bio_put(split);
> +	}
> +
> +	if (bio_sectors(clone))
> +		__bch_data_verify(dc, clone);
> +
> +	bio_put(clone);
> +}
> +

Hi Lei,

The above patch is good IMHO. Just wondering why not use the classical
style ? something like,


do {
	if (!bio_can_convert_to_sp(clone, &sectors))
		split = bio_split(clone, sectors,
				  GFP_NOIO, q->bio_split);
	else
		split = clone;

	__bch_data_verity(gc, split);
	bio_put(split);
} while (split != clone);


I guess maybe the above style generates less binary code.


-- 
Coly Li

^ permalink raw reply

* Re: [PATCH v1 08/54] block: comment on bio_alloc_pages()
From: Coly Li @ 2016-12-30 10:40 UTC (permalink / raw)
  To: Ming Lei, Jens Axboe, linux-kernel
  Cc: linux-block, Christoph Hellwig, Jens Axboe, Kent Overstreet,
	Shaohua Li, Mike Christie, Guoqing Jiang, Hannes Reinecke,
	open list:BCACHE (BLOCK LAYER CACHE),
	open list:SOFTWARE RAID (Multiple Disks) SUPPORT
In-Reply-To: <1482854250-13481-9-git-send-email-tom.leiming@gmail.com>

On 2016/12/27 下午11:55, Ming Lei wrote:
> This patch adds comment on usage of bio_alloc_pages(),
> also comments on one special case of bch_data_verify().
> 
> Signed-off-by: Ming Lei <tom.leiming@gmail.com>
> ---
>  block/bio.c               | 4 +++-
>  drivers/md/bcache/debug.c | 6 ++++++
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 2b375020fc49..d4a1e0b63ea0 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -961,7 +961,9 @@ EXPORT_SYMBOL(bio_advance);
>   * @bio: bio to allocate pages for
>   * @gfp_mask: flags for allocation
>   *
> - * Allocates pages up to @bio->bi_vcnt.
> + * Allocates pages up to @bio->bi_vcnt, and this function should only
> + * be called on a new initialized bio, which means all pages aren't added
> + * to the bio via bio_add_page() yet.
>   *
>   * Returns 0 on success, -ENOMEM on failure. On failure, any allocated pages are
>   * freed.
> diff --git a/drivers/md/bcache/debug.c b/drivers/md/bcache/debug.c
> index 06f55056aaae..48d03e8b3385 100644
> --- a/drivers/md/bcache/debug.c
> +++ b/drivers/md/bcache/debug.c
> @@ -110,6 +110,12 @@ void bch_data_verify(struct cached_dev *dc, struct bio *bio)
>  	struct bio_vec bv, cbv;
>  	struct bvec_iter iter, citer = { 0 };
>  
> +	/*
> +	 * Once multipage bvec is supported, the bio_clone()
> +	 * has to make sure page count in this bio can be held
> +	 * in the new cloned bio because each single page need
> +	 * to assign to each bvec of the new bio.
> +	 */
>  	check = bio_clone(bio, GFP_NOIO);
>  	if (!check)
>  		return;
> 
Acked-by: Coly Li <colyli@suse.de>

-- 
Coly Li

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox