* Re: Why not just return an error?
From: keld @ 2016-10-07 9:30 UTC (permalink / raw)
To: Rudy Zijlstra; +Cc: Dark Penguin, linux-raid
In-Reply-To: <0d0109f4-2c03-484c-9f70-008a7a1a0d67@grumpydevil.homelinux.org>
On Fri, Oct 07, 2016 at 10:21:26AM +0200, Rudy Zijlstra wrote:
>
>
> Op 07-10-16 om 07:26 schreef keld@keldix.com:
> >On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
> >>Greetings!
> >>
> >>The more I read about md-raid, the more I notice that the biggest
> >>problem of it: if you hit an error on a degraded RAID, it falls apart.
> >>Because of this, it is possible to lose a huge amount of data due to one
> >>tiny read error, which particularly makes raid5 the sword of Damocles.
> >>
> >>But one question keeps me increasingly frustrated. Yes, during its
> >>normal functioning, it totally makes sense to kick a faulty device out
> >>of an array. But if we're running a degraded array, and doing so will
> >>definitely result is massive data loss, why not just return a read error
> >>instead? Just add a little check: on error, if degraded -> then just
> >>return an error. I believe this is the dream of everyone who had ever
> >>dealt with RAIDs.
> >>
> >>With RAID, the first proprity is keeping data safe. Yes, it's not an
> >>alternative to backups and all that, but still - if we hit an error on a
> >>degraded array, the array should scream and panic and send all kinds of
> >>warnings, but definitely NOT collapse and warrant a visit to the RAID
> >>recovery laboratory (or this mailing list). Imagine how much headache
> >>and lost hair would that relieve!..
> >>
> >>Now, I'm probably not the first one to think of such a bright idea. So
> >>there must be a very good reason why this is not possible; I don't think
> >>the problem is just that "the existing behaviour is preferred, and
> >>anyone who does not agree is an idiot". If not for enterprise use, then
> >>at least it would be very useful for the "home archive" scenario when
> >>"uptime" and "absense of errors" hold much less meaning than "losing one
> >>file and not all the data". So, why is this not possible?..
> >Likewise, when the first disk fails, one could mark it as kind of in an
> >error state,
> >and keep it running, and if one gets a read error, then you could get
> >the data from the good disks.
> >
> >Often read errors can be remedied by writing data to the failing disk.
> >The good data could then be obtained from the good parts of the array.
> >
> >This behaviour could be optional and could even be set during operation.
> >
> >Best regards
> >keld
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> One big reason is human behaviour. And it is human behaviour that in the
> end causes all the collapsed raids. I have lost count how often i have
> seen requests for help once the raid had collapsed. But the earlier
> signal, where the RAID had become degraded was ignored. This means that
> if you only give an error message and continue going you will -- most
> likely in increasing rate -- have errors in the files. Very quickly it
> will become impossible to state which file is correct and which is not.
> Essentially you have lost at that point all information with NO ability
> to recover. Unless you have a backup....
>
> That is one of the big reasons the behaviour is as it is. RAID is
> intented to guarantee the consistency and correctness of the stored
> data. When this becomes impossible, the only way out is to clearly
> signal this. Even a collapsed RAID has more consistent data (although
> it takes effort to recover) then a corrupted RAID which would be the
> result of your proposal. The corruption resulting from your proposal
> above CANNOT be recovered.
I believe you are incorrect. As long as it is marked which parts of the array
that are in error, we know which data is good. Of cause some data may be unobtainable,
but this may well be just a few files, and the rest will be good. A much better result
than all data being lost!
Anyway this could be an optional feature, so that it can be chosen or not be chosen.
Best regards
keld
^ permalink raw reply
* Re: md/raid1: Improve another size determination in setup_conf()
From: Richard Weinberger @ 2016-10-07 9:06 UTC (permalink / raw)
To: SF Markus Elfring, Dan Carpenter
Cc: linux-raid@vger.kernel.org, Christoph Hellwig, Guoqing Jiang,
Jens Axboe, Mike Christie, Neil Brown, Shaohua Li,
Tomasz Majchrzak, LKML, kernel-janitors@vger.kernel.org,
Julia Lawall
In-Reply-To: <77d68bcd-1ae4-4808-fc0b-6183ae5fb6c4@users.sourceforge.net>
On 07.10.2016 10:53, SF Markus Elfring wrote:
>>>> Replace the specification of a data structure by a pointer dereference
>>>> as the parameter for the operator "sizeof" to make the corresponding size
>>>> determination a bit safer.
>>>
>>> Isn't this pure matter of taste?
>>> Some developers prefer sizeof(*ptr) because it is easier to type, other
>>> developers prefer sizeof(struct foo) because you can determine the type
>>> at first sight and makes review more easy.
>>
>> sizeof(*ptr) is more future proof and normally more obvious and easier
>> to review.
>
> Is it interesting to see how different the software development opinions
> can be for such an implementation detail?
>
>
>> That said, I've tried to tell Markus to only send bugfix patches
>
> Can any deviations from the Linux coding style become "bugs" also in
> your view of the software situation?
>
>
>> because these are a waste of time
>
> How do you value compliance with coding styles?
Just stop sending these kind of patches, *please*.
Linux has tons of issues, fixes for real problems are very welcome.
But coding style bike shedding is just a waste of time.
Thanks,
//richard
^ permalink raw reply
* Re: md/raid1: Improve another size determination in setup_conf()
From: SF Markus Elfring @ 2016-10-07 8:53 UTC (permalink / raw)
To: Dan Carpenter, Richard Weinberger
Cc: linux-raid@vger.kernel.org, Christoph Hellwig, Guoqing Jiang,
Jens Axboe, Mike Christie, Neil Brown, Shaohua Li,
Tomasz Majchrzak, LKML, kernel-janitors@vger.kernel.org,
Julia Lawall
In-Reply-To: <20161007075345.GB6039@mwanda>
>>> Replace the specification of a data structure by a pointer dereference
>>> as the parameter for the operator "sizeof" to make the corresponding size
>>> determination a bit safer.
>>
>> Isn't this pure matter of taste?
>> Some developers prefer sizeof(*ptr) because it is easier to type, other
>> developers prefer sizeof(struct foo) because you can determine the type
>> at first sight and makes review more easy.
>
> sizeof(*ptr) is more future proof and normally more obvious and easier
> to review.
Is it interesting to see how different the software development opinions
can be for such an implementation detail?
> That said, I've tried to tell Markus to only send bugfix patches
Can any deviations from the Linux coding style become "bugs" also in
your view of the software situation?
> because these are a waste of time
How do you value compliance with coding styles?
> and regularly introduce bugs.
Really?
Would you like to discuss concrete incidents any further?
Regards,
Markus
^ permalink raw reply
* Re: md-cluster: Delete unnecessary braces in unlock_all_bitmaps()
From: SF Markus Elfring @ 2016-10-07 8:37 UTC (permalink / raw)
To: Dan Carpenter
Cc: linux-raid, Guoqing Jiang, Shaohua Li, LKML, kernel-janitors,
Julia Lawall
In-Reply-To: <20161007074646.GA6039@mwanda>
>> Do not use curly brackets at one source code place
>> where a single statement should be sufficient.
>
> The original style was correct and this is wrong. I have explained this before.
Did I change a bit too much in the proposed step according to the following
update suggestion?
elfring@Sonne:~/Projekte/Linux/next-patched> git checkout d6385db94196b253ae5eb3678fa95cdf1f839fcc && scripts/checkpatch.pl --types BRACES -f drivers/md/md-cluster.c
…
WARNING: braces {} are not necessary for single statement blocks
#1228: FILE: drivers/md/md-cluster.c:1228:
+ if (cinfo->other_bitmap_lockres[i]) {
+ lockres_free(cinfo->other_bitmap_lockres[i]);
+ }
…
How do you think about to adjust this source code place a bit?
Regards,
Markus
^ permalink raw reply
* Re: Why not just return an error?
From: Rudy Zijlstra @ 2016-10-07 8:21 UTC (permalink / raw)
To: keld, Dark Penguin; +Cc: linux-raid
In-Reply-To: <20161007052651.GA11868@www5.open-std.org>
Op 07-10-16 om 07:26 schreef keld@keldix.com:
> On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
>> Greetings!
>>
>> The more I read about md-raid, the more I notice that the biggest
>> problem of it: if you hit an error on a degraded RAID, it falls apart.
>> Because of this, it is possible to lose a huge amount of data due to one
>> tiny read error, which particularly makes raid5 the sword of Damocles.
>>
>> But one question keeps me increasingly frustrated. Yes, during its
>> normal functioning, it totally makes sense to kick a faulty device out
>> of an array. But if we're running a degraded array, and doing so will
>> definitely result is massive data loss, why not just return a read error
>> instead? Just add a little check: on error, if degraded -> then just
>> return an error. I believe this is the dream of everyone who had ever
>> dealt with RAIDs.
>>
>> With RAID, the first proprity is keeping data safe. Yes, it's not an
>> alternative to backups and all that, but still - if we hit an error on a
>> degraded array, the array should scream and panic and send all kinds of
>> warnings, but definitely NOT collapse and warrant a visit to the RAID
>> recovery laboratory (or this mailing list). Imagine how much headache
>> and lost hair would that relieve!..
>>
>> Now, I'm probably not the first one to think of such a bright idea. So
>> there must be a very good reason why this is not possible; I don't think
>> the problem is just that "the existing behaviour is preferred, and
>> anyone who does not agree is an idiot". If not for enterprise use, then
>> at least it would be very useful for the "home archive" scenario when
>> "uptime" and "absense of errors" hold much less meaning than "losing one
>> file and not all the data". So, why is this not possible?..
> Likewise, when the first disk fails, one could mark it as kind of in an error state,
> and keep it running, and if one gets a read error, then you could get
> the data from the good disks.
>
> Often read errors can be remedied by writing data to the failing disk.
> The good data could then be obtained from the good parts of the array.
>
> This behaviour could be optional and could even be set during operation.
>
> Best regards
> keld
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
One big reason is human behaviour. And it is human behaviour that in the
end causes all the collapsed raids. I have lost count how often i have
seen requests for help once the raid had collapsed. But the earlier
signal, where the RAID had become degraded was ignored. This means that
if you only give an error message and continue going you will -- most
likely in increasing rate -- have errors in the files. Very quickly it
will become impossible to state which file is correct and which is not.
Essentially you have lost at that point all information with NO ability
to recover. Unless you have a backup....
That is one of the big reasons the behaviour is as it is. RAID is
intented to guarantee the consistency and correctness of the stored
data. When this becomes impossible, the only way out is to clearly
signal this. Even a collapsed RAID has more consistent data (although
it takes effort to recover) then a corrupted RAID which would be the
result of your proposal. The corruption resulting from your proposal
above CANNOT be recovered.
Cheers
Rudy
^ permalink raw reply
* Re: [PATCH 24/54] md/raid1: Improve another size determination in setup_conf()
From: Richard Weinberger @ 2016-10-07 8:15 UTC (permalink / raw)
To: Dan Carpenter
Cc: SF Markus Elfring, linux-raid@vger.kernel.org, Christoph Hellwig,
Guoqing Jiang, Jens Axboe, Mike Christie, Neil Brown, Shaohua Li,
Tomasz Majchrzak, LKML, kernel-janitors@vger.kernel.org,
Julia Lawall
In-Reply-To: <20161007075345.GB6039@mwanda>
On 07.10.2016 09:53, Dan Carpenter wrote:
> On Thu, Oct 06, 2016 at 11:29:20AM +0200, Richard Weinberger wrote:
>> On Thu, Oct 6, 2016 at 11:22 AM, SF Markus Elfring
>> <elfring@users.sourceforge.net> wrote:
>>> From: Markus Elfring <elfring@users.sourceforge.net>
>>> Date: Tue, 4 Oct 2016 21:46:18 +0200
>>>
>>> Replace the specification of a data structure by a pointer dereference
>>> as the parameter for the operator "sizeof" to make the corresponding size
>>> determination a bit safer.
>>
>> Isn't this pure matter of taste?
>> Some developers prefer sizeof(*ptr) because it is easier to type, other
>> developers prefer sizeof(struct foo) because you can determine the type
>> at first sight and makes review more easy.
>
> sizeof(*ptr) is more future proof and normally more obvious and easier
> to review.
Also a matter of taste.
See http://yarchive.net/comp/linux/struct_init.html
Thanks,
//richard
^ permalink raw reply
* Re: [PATCH 24/54] md/raid1: Improve another size determination in setup_conf()
From: Dan Carpenter @ 2016-10-07 7:53 UTC (permalink / raw)
To: Richard Weinberger
Cc: SF Markus Elfring, linux-raid@vger.kernel.org, Christoph Hellwig,
Guoqing Jiang, Jens Axboe, Mike Christie, Neil Brown, Shaohua Li,
Tomasz Majchrzak, LKML, kernel-janitors@vger.kernel.org,
Julia Lawall
In-Reply-To: <CAFLxGvx0Fr-WNvKYACdvNiZwYzHW4ocna9HPH8J_WPSXWc+ngg@mail.gmail.com>
On Thu, Oct 06, 2016 at 11:29:20AM +0200, Richard Weinberger wrote:
> On Thu, Oct 6, 2016 at 11:22 AM, SF Markus Elfring
> <elfring@users.sourceforge.net> wrote:
> > From: Markus Elfring <elfring@users.sourceforge.net>
> > Date: Tue, 4 Oct 2016 21:46:18 +0200
> >
> > Replace the specification of a data structure by a pointer dereference
> > as the parameter for the operator "sizeof" to make the corresponding size
> > determination a bit safer.
>
> Isn't this pure matter of taste?
> Some developers prefer sizeof(*ptr) because it is easier to type, other
> developers prefer sizeof(struct foo) because you can determine the type
> at first sight and makes review more easy.
sizeof(*ptr) is more future proof and normally more obvious and easier
to review. That said, I've tried to tell Markus to only send bugfix
patches because these are a waste of time and regularly introduce bugs.
regards,
dan carpenter
^ permalink raw reply
* Re: [PATCH 15/15] md-cluster: Delete unnecessary braces in unlock_all_bitmaps()
From: Dan Carpenter @ 2016-10-07 7:46 UTC (permalink / raw)
To: SF Markus Elfring
Cc: linux-raid, Guoqing Jiang, Shaohua Li, LKML, kernel-janitors,
Julia Lawall
In-Reply-To: <ba60b210-40c6-b021-780c-33ecea7bee9e@users.sourceforge.net>
On Sat, Oct 01, 2016 at 05:00:07PM +0200, SF Markus Elfring wrote:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Sat, 1 Oct 2016 16:15:55 +0200
>
> Do not use curly brackets at one source code place
> where a single statement should be sufficient.
>
The original style was correct and this is wrong. I have explained this
before.
regards,
dan carpenter
^ permalink raw reply
* Re: [PATCH 30/54] md/raid5: Delete two error messages for a failed memory allocation
From: Hannes Reinecke @ 2016-10-07 5:51 UTC (permalink / raw)
To: SF Markus Elfring, linux-raid, Christoph Hellwig, Guoqing Jiang,
Jens Axboe, Mike Christie, Neil Brown, Shaohua Li,
Tomasz Majchrzak
Cc: LKML, kernel-janitors, Julia Lawall, Wolfram Sang
In-Reply-To: <46ceb7a1-5b8e-989b-a896-9a6ff7fdf833@users.sourceforge.net>
On 10/06/2016 11:30 AM, SF Markus Elfring wrote:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Wed, 5 Oct 2016 09:43:40 +0200
>
> Omit extra messages for a memory allocation failure in this function.
>
> Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
>
> Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
> ---
> drivers/md/raid5.c | 13 +++----------
> 1 file changed, 3 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index d864871..ef180c0 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -6613,12 +6613,9 @@ static struct r5conf *setup_conf(struct mddev *mddev)
> memory = conf->min_nr_stripes * (sizeof(struct stripe_head) +
> max_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
> atomic_set(&conf->empty_inactive_list_nr, NR_STRIPE_HASH_LOCKS);
> - if (grow_stripes(conf, conf->min_nr_stripes)) {
> - printk(KERN_ERR
> - "md/raid:%s: couldn't allocate %dkB for buffers\n",
> - mdname(mddev), memory);
> + if (grow_stripes(conf, conf->min_nr_stripes))
> goto free_conf;
> - } else
> + else
> printk(KERN_INFO "md/raid:%s: allocated %dkB\n",
> mdname(mddev), memory);
> /*
> @@ -6640,12 +6637,8 @@ static struct r5conf *setup_conf(struct mddev *mddev)
>
> sprintf(pers_name, "raid%d", mddev->new_level);
> conf->thread = md_register_thread(raid5d, mddev, pers_name);
> - if (!conf->thread) {
> - printk(KERN_ERR
> - "md/raid:%s: couldn't allocate thread.\n",
> - mdname(mddev));
> + if (!conf->thread)
> goto free_conf;
> - }
>
> return conf;
> free_conf:
>
Actually I prefer having error messages, especially if you have several
possible failures all leading to the same return value.
Without it debugging becomes really hard.
Cheers,
Hannes
--
Dr. Hannes Reinecke Teamlead Storage & Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply
* Re: [GIT PULL] MD update for 4.9
From: Doug Dumitru @ 2016-10-07 5:39 UTC (permalink / raw)
To: Shaohua Li; +Cc: torvalds, linux-kernel, linux-raid, neilb
In-Reply-To: <20161007003850.GA7197@kernel.org>
Mr. Li,
There is another thread in [linux-raid] discussing pre-fetches in the
raid-6 AVX2 code. My testing implies that the prefetch distance is
too short. In your new AVX512 code, it looks like there are 24
instructions, each with latencies of 1, between the prefetch and the
actual memory load. I don't have a AVX512 CPU to try this on, but the
prefetch might do better at a bigger distance. If I am not mistaken,
it takes a lot longer than 24 clocks to fetch 4 cache lines.
Just a comment while the code is still fluid.
Doug Dumitru
EasyCo LLC
On Thu, Oct 6, 2016 at 5:38 PM, Shaohua Li <shli@kernel.org> wrote:
> Hi Linus,
> Please pull MD update for 4.9. This update includes:
> - new AVX512 instruction based raid6 gen/recovery algorithm
> - A couple of md-cluster related bug fixes
> - Fix a potential deadlock
> - Set nonrotational bit for raid array with SSD
> - Set correct max_hw_sectors for raid5/6, which hopefuly can improve
> performance a little bit
> - Other minor fixes
>
> Thanks,
> Shaohua
>
> The following changes since commit 7d1e042314619115153a0f6f06e4552c09a50e13:
>
> Merge tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux (2016-09-20 17:11:19 -0700)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.9-rc1
>
> for you to fetch changes up to bb086a89a406b5d877ee616f1490fcc81f8e1b2b:
>
> md: set rotational bit (2016-10-03 10:20:27 -0700)
>
> ----------------------------------------------------------------
> Chao Yu (1):
> raid5: fix to detect failure of register_shrinker
>
> Gayatri Kammela (5):
> lib/raid6: Add AVX512 optimized gen_syndrome functions
> lib/raid6: Add AVX512 optimized recovery functions
> lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
> lib/raid6: Add AVX512 optimized xor_syndrome functions
> raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
>
> Guoqing Jiang (9):
> md-cluster: call md_kick_rdev_from_array once ack failed
> md-cluster: use FORCEUNLOCK in lockres_free
> md-cluster: remove some unnecessary dlm_unlock_sync
> md: changes for MD_STILL_CLOSED flag
> md-cluster: clean related infos of cluster
> md-cluster: protect md_find_rdev_nr_rcu with rcu lock
> md-cluster: convert the completion to wait queue
> md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
> md-cluster: make resync lock also could be interruptted
>
> Shaohua Li (5):
> raid5: allow arbitrary max_hw_sectors
> md/bitmap: fix wrong cleanup
> md: fix a potential deadlock
> raid5: handle register_shrinker failure
> md: set rotational bit
>
> arch/x86/Makefile | 5 +-
> drivers/md/bitmap.c | 4 +-
> drivers/md/md-cluster.c | 99 ++++++---
> drivers/md/md.c | 44 +++-
> drivers/md/md.h | 5 +-
> drivers/md/raid5.c | 11 +-
> include/linux/raid/pq.h | 4 +
> lib/raid6/Makefile | 2 +-
> lib/raid6/algos.c | 12 +
> lib/raid6/avx512.c | 569 +++++++++++++++++++++++++++++++++++++++++++++++
> lib/raid6/recov_avx512.c | 388 ++++++++++++++++++++++++++++++++
> lib/raid6/test/Makefile | 5 +-
> lib/raid6/test/test.c | 7 +-
> lib/raid6/x86.h | 10 +
> 14 files changed, 1111 insertions(+), 54 deletions(-)
> create mode 100644 lib/raid6/avx512.c
> create mode 100644 lib/raid6/recov_avx512.c
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Doug Dumitru
EasyCo LLC
^ permalink raw reply
* Re: Why not just return an error?
From: keld @ 2016-10-07 5:26 UTC (permalink / raw)
To: Dark Penguin; +Cc: linux-raid
In-Reply-To: <57F6DF18.40703@yandex.ru>
On Fri, Oct 07, 2016 at 02:32:40AM +0300, Dark Penguin wrote:
> Greetings!
>
> The more I read about md-raid, the more I notice that the biggest
> problem of it: if you hit an error on a degraded RAID, it falls apart.
> Because of this, it is possible to lose a huge amount of data due to one
> tiny read error, which particularly makes raid5 the sword of Damocles.
>
> But one question keeps me increasingly frustrated. Yes, during its
> normal functioning, it totally makes sense to kick a faulty device out
> of an array. But if we're running a degraded array, and doing so will
> definitely result is massive data loss, why not just return a read error
> instead? Just add a little check: on error, if degraded -> then just
> return an error. I believe this is the dream of everyone who had ever
> dealt with RAIDs.
>
> With RAID, the first proprity is keeping data safe. Yes, it's not an
> alternative to backups and all that, but still - if we hit an error on a
> degraded array, the array should scream and panic and send all kinds of
> warnings, but definitely NOT collapse and warrant a visit to the RAID
> recovery laboratory (or this mailing list). Imagine how much headache
> and lost hair would that relieve!..
>
> Now, I'm probably not the first one to think of such a bright idea. So
> there must be a very good reason why this is not possible; I don't think
> the problem is just that "the existing behaviour is preferred, and
> anyone who does not agree is an idiot". If not for enterprise use, then
> at least it would be very useful for the "home archive" scenario when
> "uptime" and "absense of errors" hold much less meaning than "losing one
> file and not all the data". So, why is this not possible?..
Likewise, when the first disk fails, one could mark it as kind of in an error state,
and keep it running, and if one gets a read error, then you could get
the data from the good disks.
Often read errors can be remedied by writing data to the failing disk.
The good data could then be obtained from the good parts of the array.
This behaviour could be optional and could even be set during operation.
Best regards
keld
^ permalink raw reply
* [PATCH - mdadm] Fix some issues found by clang
From: NeilBrown @ 2016-10-07 3:55 UTC (permalink / raw)
To: Jes Sorensen; +Cc: Linux-RAID
[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]
The clang compiler complained about each of these.
The mdmon.h error will only affect 'far' RAID10 arrays using intel or DDF
metadata, and there is no such thing.
The mdopen.c will cause a problem if there are no free md device
numbers in the first 512. That is fairly unlikely.
The restripe.c error would only affect the 'test_stripe' command, and
probably doesn't change its behaviour.
The super-intel.c fix is purely cosmetic.
Signed-off-by: NeilBrown <neilb@suse.com>
---
mdmon.h | 2 +-
mdopen.c | 2 +-
restripe.c | 2 +-
super-intel.c | 2 +-
4 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/mdmon.h b/mdmon.h
index aa750c6811e1..0b08c3d7da2a 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -101,7 +101,7 @@ static inline int is_resync_complete(struct mdinfo *array)
break;
case 10:
l = array->array.layout;
- ncopies = (l & 0xff) * ((l >> 8) && 0xff);
+ ncopies = (l & 0xff) * ((l >> 8) & 0xff);
sync_size = array->component_size * array->array.raid_disks;
sync_size /= ncopies;
break;
diff --git a/mdopen.c b/mdopen.c
index f818fdf339eb..287b5211631a 100644
--- a/mdopen.c
+++ b/mdopen.c
@@ -318,7 +318,7 @@ int create_mddev(char *dev, char *name, int autof, int trustworthy,
else if (num < 0) {
/* need to choose a free number. */
char *_devnm = find_free_devnm(use_mdp);
- if (devnm == NULL) {
+ if (_devnm == NULL) {
pr_err("No avail md devices - aborting\n");
return -1;
}
diff --git a/restripe.c b/restripe.c
index 56dca73ea86b..94342c7bc6f9 100644
--- a/restripe.c
+++ b/restripe.c
@@ -462,7 +462,7 @@ int raid6_check_disks(int data_disks, int start, int chunk_size,
}
if((Px == 0) && (Qx == 0))
- curr_broken_disk = curr_broken_disk;
+ curr_broken_disk = prev_broken_disk;
if(curr_broken_disk >= data_disks + 2)
broken_status = 2;
diff --git a/super-intel.c b/super-intel.c
index 92817e9ec875..b24777ee3559 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -10141,7 +10141,7 @@ enum imsm_reshape_type imsm_analyze_change(struct supertype *st,
geo->size = max_size;
}
- if ((direction == ROLLBACK_METADATA_CHANGES)) {
+ if (direction == ROLLBACK_METADATA_CHANGES) {
/* accept size for rollback only
*/
} else {
--
2.10.0
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]
^ permalink raw reply related
* [GIT PULL] MD update for 4.9
From: Shaohua Li @ 2016-10-07 0:38 UTC (permalink / raw)
To: torvalds; +Cc: linux-kernel, linux-raid, neilb
Hi Linus,
Please pull MD update for 4.9. This update includes:
- new AVX512 instruction based raid6 gen/recovery algorithm
- A couple of md-cluster related bug fixes
- Fix a potential deadlock
- Set nonrotational bit for raid array with SSD
- Set correct max_hw_sectors for raid5/6, which hopefuly can improve
performance a little bit
- Other minor fixes
Thanks,
Shaohua
The following changes since commit 7d1e042314619115153a0f6f06e4552c09a50e13:
Merge tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux (2016-09-20 17:11:19 -0700)
are available in the git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md.git tags/md/4.9-rc1
for you to fetch changes up to bb086a89a406b5d877ee616f1490fcc81f8e1b2b:
md: set rotational bit (2016-10-03 10:20:27 -0700)
----------------------------------------------------------------
Chao Yu (1):
raid5: fix to detect failure of register_shrinker
Gayatri Kammela (5):
lib/raid6: Add AVX512 optimized gen_syndrome functions
lib/raid6: Add AVX512 optimized recovery functions
lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions
lib/raid6: Add AVX512 optimized xor_syndrome functions
raid6/test/test.c: bug fix: Specify aligned(alignment) attributes to the char arrays
Guoqing Jiang (9):
md-cluster: call md_kick_rdev_from_array once ack failed
md-cluster: use FORCEUNLOCK in lockres_free
md-cluster: remove some unnecessary dlm_unlock_sync
md: changes for MD_STILL_CLOSED flag
md-cluster: clean related infos of cluster
md-cluster: protect md_find_rdev_nr_rcu with rcu lock
md-cluster: convert the completion to wait queue
md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang
md-cluster: make resync lock also could be interruptted
Shaohua Li (5):
raid5: allow arbitrary max_hw_sectors
md/bitmap: fix wrong cleanup
md: fix a potential deadlock
raid5: handle register_shrinker failure
md: set rotational bit
arch/x86/Makefile | 5 +-
drivers/md/bitmap.c | 4 +-
drivers/md/md-cluster.c | 99 ++++++---
drivers/md/md.c | 44 +++-
drivers/md/md.h | 5 +-
drivers/md/raid5.c | 11 +-
include/linux/raid/pq.h | 4 +
lib/raid6/Makefile | 2 +-
lib/raid6/algos.c | 12 +
lib/raid6/avx512.c | 569 +++++++++++++++++++++++++++++++++++++++++++++++
lib/raid6/recov_avx512.c | 388 ++++++++++++++++++++++++++++++++
lib/raid6/test/Makefile | 5 +-
lib/raid6/test/test.c | 7 +-
lib/raid6/x86.h | 10 +
14 files changed, 1111 insertions(+), 54 deletions(-)
create mode 100644 lib/raid6/avx512.c
create mode 100644 lib/raid6/recov_avx512.c
^ permalink raw reply
* Re: RAID6 - CPU At 100% Usage After Reassembly
From: Shaohua Li @ 2016-10-06 23:55 UTC (permalink / raw)
To: Francisco Parada; +Cc: mdraid
In-Reply-To: <CAOW94uuALbxOpfb5Rgzp=Fgvy8zfKt94zCcgs7T=U-m8jWGq3g@mail.gmail.com>
On Wed, Oct 05, 2016 at 07:12:59AM -0400, Francisco Parada wrote:
> Hi all,
>
> Just wanted to send one last ping on this. I didn't hear back and
> really don't know where else to turn to before abandoning hope.
> Should I just start wiping the drives so I can start from scratch at
> this point?
>
> I rebuilt the kernel with debug patch that Shaohua provided and I sent
> the list the output of the trace. Does any one have any other
> suggestions?
Sorry for the long delay. I got a chance to look at the log today. Looks it shows
there are read errors in two disks and the raid6 thread is keeping loop to
handle the case, but doesn't success. The disks are sdd and sdg. I have no idea
how to fix this so far though, sorry.
Thanks,
Shaohua
^ permalink raw reply
* Why not just return an error?
From: Dark Penguin @ 2016-10-06 23:32 UTC (permalink / raw)
To: linux-raid
Greetings!
The more I read about md-raid, the more I notice that the biggest
problem of it: if you hit an error on a degraded RAID, it falls apart.
Because of this, it is possible to lose a huge amount of data due to one
tiny read error, which particularly makes raid5 the sword of Damocles.
But one question keeps me increasingly frustrated. Yes, during its
normal functioning, it totally makes sense to kick a faulty device out
of an array. But if we're running a degraded array, and doing so will
definitely result is massive data loss, why not just return a read error
instead? Just add a little check: on error, if degraded -> then just
return an error. I believe this is the dream of everyone who had ever
dealt with RAIDs.
With RAID, the first proprity is keeping data safe. Yes, it's not an
alternative to backups and all that, but still - if we hit an error on a
degraded array, the array should scream and panic and send all kinds of
warnings, but definitely NOT collapse and warrant a visit to the RAID
recovery laboratory (or this mailing list). Imagine how much headache
and lost hair would that relieve!..
Now, I'm probably not the first one to think of such a bright idea. So
there must be a very good reason why this is not possible; I don't think
the problem is just that "the existing behaviour is preferred, and
anyone who does not agree is an idiot". If not for enterprise use, then
at least it would be very useful for the "home archive" scenario when
"uptime" and "absense of errors" hold much less meaning than "losing one
file and not all the data". So, why is this not possible?..
--
darkpenguin
^ permalink raw reply
* [PATCH 2/2] RAID10: ignore discard error
From: Shaohua Li @ 2016-10-06 21:20 UTC (permalink / raw)
To: linux-raid; +Cc: Kernel-team, sitsofe, v3.6
In-Reply-To: <36b0390e404e6021f2aec98ee87c074eef6dc6bb.1475788662.git.shli@fb.com>
This is the counterpart of raid10 fix. If a write error occurs, raid10
will try to rewrite the bio in small chunk size. If the rewrite fails,
raid10 will record the error in bad block. narrow_write_error will
always use WRITE for the bio, but actually it could be a discard. Since
discard bio hasn't payload, write the bio will cause different issues.
But discard error isn't fatal, we can safely ignore it. This is what
this patch does.
This issue should exist since discard is added, but only exposed with
recent arbitrary bio size feature.
Cc: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: stable@vger.kernel.org (v3.6)
Signed-off-by: Shaohua Li <shli@fb.com>
---
drivers/md/raid10.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index be1a9fc..39fddda 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -447,6 +447,9 @@ static void raid10_end_write_request(struct bio *bio)
struct r10conf *conf = r10_bio->mddev->private;
int slot, repl;
struct md_rdev *rdev = NULL;
+ bool discard_error;
+
+ discard_error = bio->bi_error && bio_op(bio) == REQ_OP_DISCARD;
dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl);
@@ -460,7 +463,7 @@ static void raid10_end_write_request(struct bio *bio)
/*
* this branch is our 'one mirror IO has finished' event handler:
*/
- if (bio->bi_error) {
+ if (bio->bi_error && !discard_error) {
if (repl)
/* Never record new bad blocks to replacement,
* just fail it.
@@ -503,7 +506,7 @@ static void raid10_end_write_request(struct bio *bio)
if (is_badblock(rdev,
r10_bio->devs[slot].addr,
r10_bio->sectors,
- &first_bad, &bad_sectors)) {
+ &first_bad, &bad_sectors) && !discard_error) {
bio_put(bio);
if (repl)
r10_bio->devs[slot].repl_bio = IO_MADE_GOOD;
--
2.9.3
^ permalink raw reply related
* [PATCH 1/2] RAID1: ignore discard error
From: Shaohua Li @ 2016-10-06 21:20 UTC (permalink / raw)
To: linux-raid; +Cc: Kernel-team, sitsofe, v3.6
Sitsofe,
could you try if this patch fixes the issue too please?
Thanks,
Shaohua
If a write error occurs, raid1 will try to rewrite the bio in small
chunk size. If the rewrite fails, raid1 will record the error in bad
block. narrow_write_error will always use WRITE for the bio, but
actually it could be a discard. Since discard bio hasn't payload, write
the bio will cause different issues. But discard error isn't fatal, we
can safely ignore it. This is what this patch does.
This issue should exist since discard is added, but only exposed with
recent arbitrary bio size feature.
Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: stable@vger.kernel.org (v3.6)
Signed-off-by: Shaohua Li <shli@fb.com>
---
drivers/md/raid1.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 21dc00e..95bf4cd 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -407,11 +407,14 @@ static void raid1_end_write_request(struct bio *bio)
struct bio *to_put = NULL;
int mirror = find_bio_disk(r1_bio, bio);
struct md_rdev *rdev = conf->mirrors[mirror].rdev;
+ bool discard_error;
+
+ discard_error = bio->bi_error && bio_op(bio) == REQ_OP_DISCARD;
/*
* 'one mirror IO has finished' event handler:
*/
- if (bio->bi_error) {
+ if (bio->bi_error && !discard_error) {
set_bit(WriteErrorSeen, &rdev->flags);
if (!test_and_set_bit(WantReplacement, &rdev->flags))
set_bit(MD_RECOVERY_NEEDED, &
@@ -448,7 +451,7 @@ static void raid1_end_write_request(struct bio *bio)
/* Maybe we can clear some bad blocks. */
if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors,
- &first_bad, &bad_sectors)) {
+ &first_bad, &bad_sectors) && !discard_error) {
r1_bio->bios[mirror] = IO_MADE_GOOD;
set_bit(R1BIO_MadeGood, &r1_bio->state);
}
--
2.9.3
^ permalink raw reply related
* Re: [PATCH 37/54] md/raid5: Replace a seq_printf() call by seq_puts() in raid5_status()
From: Bernd Petrovitsch @ 2016-10-06 18:33 UTC (permalink / raw)
To: SF Markus Elfring, Joe Perches
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <c158cfe0-d8d8-e433-0cea-2ef1fc51e48b@users.sourceforge.net>
Hi all!
On Thu, 2016-10-06 at 19:49 +0200, SF Markus Elfring wrote:
> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> > > > []
> > > > >
> > > > > @@ -7044,7 +7044,7 @@ static void raid5_status(struct
> > > > > seq_file *seq, struct mddev *mddev)
> > > > > rdev && test_bit(In_sync, &rdev-
> > > > > >flags) ? "U" : "_");
> > > > > }
> > > > > rcu_read_unlock();
> > > > > - seq_printf (seq, "]");
> > > > > + seq_puts(seq, "]");
> > > > seq_putc
> > > How do you think about the possibility that the script
> > > "checkpatch.pl" can also point
> > > such a source code transformation out directly?
> >
> > Why don't _you_ try to implement that in checkpatch instead?
>
> How are the chances that any other software developer would be
> quicker (than me) for such
> an addition because of more practical knowledge for the programming
> language "Perl"?
The above is BTW a pretty simple thing and thus good for a learning
experience for regular expressions and copy-pasting a few lines in that
perl-script and editing them.
MfG,
Bernd
PS: Sry for the noise - it's somewhat OT here ....
--
Bernd Petrovitsch Email : bernd@petrovitsch.priv.at
LUGA : http://www.luga.at
^ permalink raw reply
* Re: md/raid10: Replace printk() calls by the usage of higher level interfaces
From: SF Markus Elfring @ 2016-10-06 18:32 UTC (permalink / raw)
To: Joe Perches
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <1475777887.1820.1.camel@perches.com>
>> Would it be nicer if a single call will be sufficient for such a log message?
>>
>> Can an idea picked up which was discussed for the update suggestion
>> "[PATCH 4/4] Input-gameport: Replace some printk() calls by pr_info() in joydump_connect()"
>> for a moment (on 2016-09-24)?
>> https://lkml.kernel.org/r/<1474735656.23838.6.camel@perches.com>;
>> https://lkml.org/lkml/2016/9/24/149
>
> That'd be great.
> Pease come back only after you've implemented that.
Does this use case show another improvement opportunity for involved
software developers (besides me)?
Are there any more contributors around who would like to tackle such a challenge?
Regards,
Markus
^ permalink raw reply
* Re: md/raid5: Replace a seq_printf() call by seq_puts() in raid5_status()
From: SF Markus Elfring @ 2016-10-06 18:22 UTC (permalink / raw)
To: Joe Perches
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <1475776304.1914.17.camel@perches.com>
>>> Why don't _you_ try to implement that in checkpatch instead?
>> How are the chances that any other software developer would be quicker (than me) for such
>> an addition because of more practical knowledge for the programming language "Perl"?
>
> Extremely high.
Thanks for this feedback.
> What are the chances you can add useful attributes to your skilz?
Unfortunately, "Perl" does not belong to a favourite in my current collection
of programming languages I learned through my software development activities.
I imagine that I would prefer to improve other Linux software modules
for a while instead?
Would it be interesting if the Coccinelle software could help a bit more with
corresponding semantic patches?
Regards,
Markus
^ permalink raw reply
* Re: md/raid10: Replace printk() calls by the usage of higher level interfaces
From: Joe Perches @ 2016-10-06 18:18 UTC (permalink / raw)
To: SF Markus Elfring
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <21fb08b6-d5b7-0dd6-0cbd-0f0a4bf56a9e@users.sourceforge.net>
On Thu, 2016-10-06 at 20:04 +0200, SF Markus Elfring wrote:
> Would it be nicer if a single call will be sufficient for such a log message?
>
> Can an idea picked up which was discussed for the update suggestion
> "[PATCH 4/4] Input-gameport: Replace some printk() calls by pr_info() in joydump_connect()"
> for a moment (on 2016-09-24)?
> https://lkml.kernel.org/r/<1474735656.23838.6.camel@perches.com>;
> https://lkml.org/lkml/2016/9/24/149
That'd be great.
Pease come back only after you've implemented that.
^ permalink raw reply
* Re: md/raid10: Replace printk() calls by the usage of higher level interfaces
From: SF Markus Elfring @ 2016-10-06 18:04 UTC (permalink / raw)
To: Joe Perches
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <1475775546.1914.15.camel@perches.com>
>> How should a multiline log message be achieved as it was constructed
>> in the function "raid10_error" (for example)?
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/drivers/md/raid10.c?id=c802e87fbe2d4dd58982d01b3c39bc5a781223aa#n1589
>
> As two individual calls to pr_alert
I know also such an approach.
Would it be nicer if a single call will be sufficient for such a log message?
Can an idea picked up which was discussed for the update suggestion
"[PATCH 4/4] Input-gameport: Replace some printk() calls by pr_info() in joydump_connect()"
for a moment (on 2016-09-24)?
https://lkml.kernel.org/r/<1474735656.23838.6.camel@perches.com>
https://lkml.org/lkml/2016/9/24/149
Regards,
Markus
^ permalink raw reply
* Re: [PATCH 37/54] md/raid5: Replace a seq_printf() call by seq_puts() in raid5_status()
From: Joe Perches @ 2016-10-06 17:51 UTC (permalink / raw)
To: SF Markus Elfring
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <c158cfe0-d8d8-e433-0cea-2ef1fc51e48b@users.sourceforge.net>
On Thu, 2016-10-06 at 19:49 +0200, SF Markus Elfring wrote:
> > Why don't _you_ try to implement that in checkpatch instead?
> How are the chances that any other software developer would be quicker (than me) for such
> an addition because of more practical knowledge for the programming language "Perl"?
Extremely high.
What are the chances you can add useful attributes to your skilz?
^ permalink raw reply
* Re: [PATCH 37/54] md/raid5: Replace a seq_printf() call by seq_puts() in raid5_status()
From: SF Markus Elfring @ 2016-10-06 17:49 UTC (permalink / raw)
To: Joe Perches
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <1475775403.1914.13.camel@perches.com>
>>>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>>> []
>>>> @@ -7044,7 +7044,7 @@ static void raid5_status(struct seq_file *seq, struct mddev *mddev)
>>>> rdev && test_bit(In_sync, &rdev->flags) ? "U" : "_");
>>>> }
>>>> rcu_read_unlock();
>>>> - seq_printf (seq, "]");
>>>> + seq_puts(seq, "]");
>>> seq_putc
>> How do you think about the possibility that the script "checkpatch.pl" can also point
>> such a source code transformation out directly?
>
> Why don't _you_ try to implement that in checkpatch instead?
How are the chances that any other software developer would be quicker (than me) for such
an addition because of more practical knowledge for the programming language "Perl"?
Regards,
Markus
^ permalink raw reply
* Re: [PATCH 49/54] md/raid10: Replace printk() calls by the usage of higher level interfaces
From: Joe Perches @ 2016-10-06 17:39 UTC (permalink / raw)
To: SF Markus Elfring
Cc: linux-raid, Christoph Hellwig, Guoqing Jiang, Jens Axboe,
Mike Christie, Neil Brown, Shaohua Li, Tomasz Majchrzak, LKML,
kernel-janitors, Julia Lawall
In-Reply-To: <321cc4aa-679a-a2fe-be0b-96285b19e489@users.sourceforge.net>
On Thu, 2016-10-06 at 19:20 +0200, SF Markus Elfring wrote:
> How should a multiline log message be achieved as it was constructed
> in the function "raid10_error" (for example)?
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/tree/drivers/md/raid10.c?id=c802e87fbe2d4dd58982d01b3c39bc5a781223aa#n1589
As two individual calls to pr_alert
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox