* RAID6 - RMW logic @ 2014-07-30 20:24 Markus Stockhausen 2014-07-30 21:30 ` NeilBrown 0 siblings, 1 reply; 7+ messages in thread From: Markus Stockhausen @ 2014-07-30 20:24 UTC (permalink / raw) To: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 1727 bytes --] Hi, the last days I tried to understand the RAID6 logic when recalculating P/Q parity (or syndrome) if only parts of a stripe are updated. As far as I get the current implementation right it will always read the whole stripe and build parities from scratch. E.g. updating a single chunk on a 10 disk RAID6 will read the other 7 data blocks to write the block plus 2 P/Q parities afterwards. Compared to the best case 3 read plus 3 write operations this can be quite a heavy impact. RAID5 at least has a shortcut for that. Being no specialist at all - but having some ideas - I would like to get some feedback about the following way to improve the scenario. 1) Write a modified gen_syndrome function (call it xor_syndrome) like in lib/raid6/xxx.c that allows to XOR an existing P/Q parity without overwriting it. The cost of that function would be the same as it only changes a few assembler commands. 2) Enhance the ops_run_prexor to call this xor_syndrome function for an RAID6. Fill the not relevant pages (R5_wantdrain) with an empty page or NULL. So the gen_syndrome will only find zeroes. With this call P/Q will have a pre-xored version. 3) Enhance ops_run_reconstruct6 to allow processing of changed blocks only. We will call xor_snydrome from there too, to XOR the prexored P/Q once again. 4) Enhance handle_stripe_dirtying so it will allow calculation of RCW versus RMW costs for the RAID6 case. Did I miss something? And if not: Will the doubled CPU consumption for P/Q calculation justify that way of implementation? At least we can avoid IOs on slow disks. Maybe it has been discussed before but I did not find it on the mailing list. Thanks in advance. Markus = [-- Attachment #2: InterScan_Disclaimer.txt --] [-- Type: text/plain, Size: 1650 bytes --] **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID6 - RMW logic 2014-07-30 20:24 RAID6 - RMW logic Markus Stockhausen @ 2014-07-30 21:30 ` NeilBrown 2014-07-30 21:55 ` Ethan Wilson 2014-07-31 6:43 ` AW: " Markus Stockhausen 0 siblings, 2 replies; 7+ messages in thread From: NeilBrown @ 2014-07-30 21:30 UTC (permalink / raw) To: Markus Stockhausen; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2114 bytes --] On Wed, 30 Jul 2014 20:24:30 +0000 Markus Stockhausen <stockhausen@collogia.de> wrote: > Hi, > > the last days I tried to understand the RAID6 logic when recalculating > P/Q parity (or syndrome) if only parts of a stripe are updated. As far > as I get the current implementation right it will always read the whole > stripe and build parities from scratch. E.g. updating a single chunk > on a 10 disk RAID6 will read the other 7 data blocks to write the block > plus 2 P/Q parities afterwards. Compared to the best case 3 read plus > 3 write operations this can be quite a heavy impact. RAID5 at least > has a shortcut for that. > > Being no specialist at all - but having some ideas - I would like to > get some feedback about the following way to improve the scenario. > > 1) Write a modified gen_syndrome function (call it xor_syndrome) > like in lib/raid6/xxx.c that allows to XOR an existing P/Q parity > without overwriting it. The cost of that function would be the same > as it only changes a few assembler commands. > > 2) Enhance the ops_run_prexor to call this xor_syndrome function > for an RAID6. Fill the not relevant pages (R5_wantdrain) with an > empty page or NULL. So the gen_syndrome will only find zeroes. > With this call P/Q will have a pre-xored version. > > 3) Enhance ops_run_reconstruct6 to allow processing of changed > blocks only. We will call xor_snydrome from there too, to XOR > the prexored P/Q once again. > > 4) Enhance handle_stripe_dirtying so it will allow calculation of > RCW versus RMW costs for the RAID6 case. > > Did I miss something? And if not: Will the doubled CPU consumption > for P/Q calculation justify that way of implementation? At least > we can avoid IOs on slow disks. > > Maybe it has been discussed before but I did not find it on the > mailing list. > > Thanks in advance. > > Markus Please see http://comments.gmane.org/gmane.linux.raid/42559 Yes, this is something we probably want. The previous effort stalled somehow. Maybe it just needs someone to start pushing again. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID6 - RMW logic 2014-07-30 21:30 ` NeilBrown @ 2014-07-30 21:55 ` Ethan Wilson 2014-07-31 6:43 ` AW: " Markus Stockhausen 1 sibling, 0 replies; 7+ messages in thread From: Ethan Wilson @ 2014-07-30 21:55 UTC (permalink / raw) Cc: linux-raid@vger.kernel.org On 30/07/2014 23:30, NeilBrown wrote: > On Wed, 30 Jul 2014 20:24:30 +0000 Markus Stockhausen > <stockhausen@collogia.de> wrote: > >> Hi, >> >> the last days I tried to understand the RAID6 logic when recalculating >> P/Q parity (or syndrome) if only parts of a stripe are updated.... >> ... >> Markus > Please see > http://comments.gmane.org/gmane.linux.raid/42559 > > Yes, this is something we probably want. > The previous effort stalled somehow. Maybe it just needs someone to start > pushing again. > > NeilBrown This feature would be wonderful !! upvote! ^ permalink raw reply [flat|nested] 7+ messages in thread
* AW: RAID6 - RMW logic 2014-07-30 21:30 ` NeilBrown 2014-07-30 21:55 ` Ethan Wilson @ 2014-07-31 6:43 ` Markus Stockhausen 2014-08-04 1:22 ` NeilBrown 1 sibling, 1 reply; 7+ messages in thread From: Markus Stockhausen @ 2014-07-31 6:43 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 2359 bytes --] > Von: linux-raid-owner@vger.kernel.org > Gesendet: Mittwoch, 30. Juli 2014 23:30 > An: Markus Stockhausen > Cc: linux-raid@vger.kernel.org > Betreff: Re: RAID6 - RMW logic > > On Wed, 30 Jul 2014 20:24:30 +0000 Markus Stockhausen > <stockhausen@collogia.de> wrote: > > > Hi, > > > > the last days I tried to understand the RAID6 logic when recalculating > > P/Q parity (or syndrome) if only parts of a stripe are updated. As far > > ... > > Please see > http://comments.gmane.org/gmane.linux.raid/42559 > > Yes, this is something we probably want. > The previous effort stalled somehow. Maybe it just needs someone to start > pushing again. > > NeilBrown Hi, thanks for the link. Crawling through the modifcation I isolated two steps that we must achieve in first place to get it on track. I'm far away from implementing a full patch so I focus on what I understand. 1) Implement a generic switch so we can configure rmw/rcw handling on the fly. Without any RAID6 rmw patches yet it will simply focus on the current RAID5 implementation. Later on RAID6 can use it too and we are able to compare rmw versus rcw performance in all cases. I would name the parameter enable_rmw and default it to 1. In RAID6 case it will be ignored. -> Ok with that? 2) The previous patch was quite tricky with handling the P/Q calculation. It combined a gen_syndrome run with 2 extra xor runs. Additionally it saved P/Q delta in spare pages. Maybe to avoid patching gen_syndrome functions. I understand the discussion that the flag "subtract" and the handling of the second spare page is not easy to understand. As explained in my last mail I would enhance the syndrome functions with the option to XOR the target P/Q pages instead of only storing the calculated values. This would allow to work the same way the RAID5 shortcut does. See ops_run_prexor that uses parity page to store the interim result. From a performance perspective I would write separate logic in /lib/raid6 by copying the existing functions. Another approach could be to change all gen_syndrome functions to xor the destination page - and empty the target page in advance for the rcw case. In all cases this patch will be quite huge but I should be easy to understand and to verify. -> Suggestions? Best regards. Markus = [-- Attachment #2: InterScan_Disclaimer.txt --] [-- Type: text/plain, Size: 1650 bytes --] **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID6 - RMW logic 2014-07-31 6:43 ` AW: " Markus Stockhausen @ 2014-08-04 1:22 ` NeilBrown 2014-08-10 17:49 ` AW: " Markus Stockhausen 0 siblings, 1 reply; 7+ messages in thread From: NeilBrown @ 2014-08-04 1:22 UTC (permalink / raw) To: Markus Stockhausen; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3943 bytes --] On Thu, 31 Jul 2014 06:43:52 +0000 Markus Stockhausen <stockhausen@collogia.de> wrote: > > Von: linux-raid-owner@vger.kernel.org > > Gesendet: Mittwoch, 30. Juli 2014 23:30 > > An: Markus Stockhausen > > Cc: linux-raid@vger.kernel.org > > Betreff: Re: RAID6 - RMW logic > > > > On Wed, 30 Jul 2014 20:24:30 +0000 Markus Stockhausen > > <stockhausen@collogia.de> wrote: > > > > > Hi, > > > > > > the last days I tried to understand the RAID6 logic when recalculating > > > P/Q parity (or syndrome) if only parts of a stripe are updated. As far > > > ... > > > > Please see > > http://comments.gmane.org/gmane.linux.raid/42559 > > > > Yes, this is something we probably want. > > The previous effort stalled somehow. Maybe it just needs someone to start > > pushing again. > > > > NeilBrown > > Hi, > > thanks for the link. Crawling through the modifcation I isolated two steps > that we must achieve in first place to get it on track. I'm far away from > implementing a full patch so I focus on what I understand. > > 1) Implement a generic switch so we can configure rmw/rcw handling > on the fly. Without any RAID6 rmw patches yet it will simply focus on the > current RAID5 implementation. Later on RAID6 can use it too and we > are able to compare rmw versus rcw performance in all cases. > I would name the parameter enable_rmw and default it to 1. In RAID6 case > it will be ignored. > > -> Ok with that? No, sorry. Or not very. In that email thread I pointed you to I wrote: - Can you explain *why* rcw is sometimes better than rmw even on large arrays? Even a fairly hand-wavy arguement would help. And it would go in the comment at the top of the patch that adds enable_rmw. I see you've posted a patch, but there is no "why". I don't like adding configuration options. If there is some clear and easy to understand benefit, like "this trades throughput against latency", then I might be able to live with one, because it would be easy to tell people how to tune it. Why would I ever disable rmw? Don't say "choose the option that performs best for your workload", because that is nearly meaningless: workloads change from moment to moment. If rwm is good in some cases and bad in others, then we should at least make sure we understand why, and then hopefully get the md driver to auto-detect the different cases. There might be a case for allowing an option like that to support a "developer only preview" of the code. i.e. Add the rmw-for-RAID6 code, find that is slows down some workloads, get confused about why, ask for help, people are only happy to test if it is in mainline, so use a developer-only config option. Then at least I could tell people when to turn it on: only if you are a developer. NeilBrown > > 2) The previous patch was quite tricky with handling the P/Q calculation. > It combined a gen_syndrome run with 2 extra xor runs. Additionally it > saved P/Q delta in spare pages. Maybe to avoid patching gen_syndrome > functions. I understand the discussion that the flag "subtract" and the handling > of the second spare page is not easy to understand. As explained in my > last mail I would enhance the syndrome functions with the option to XOR the > target P/Q pages instead of only storing the calculated values. This would allow > to work the same way the RAID5 shortcut does. See ops_run_prexor that uses > parity page to store the interim result. > > >From a performance perspective I would write separate logic in /lib/raid6 > by copying the existing functions. Another approach could be to change > all gen_syndrome functions to xor the destination page - and empty the > target page in advance for the rcw case. In all cases this patch will be quite > huge but I should be easy to understand and to verify. > > -> Suggestions? > > Best regards. > > Markus [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* AW: RAID6 - RMW logic 2014-08-04 1:22 ` NeilBrown @ 2014-08-10 17:49 ` Markus Stockhausen 2014-08-12 7:14 ` NeilBrown 0 siblings, 1 reply; 7+ messages in thread From: Markus Stockhausen @ 2014-08-10 17:49 UTC (permalink / raw) To: NeilBrown; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3168 bytes --] > Von: NeilBrown [neilb@suse.de] > Gesendet: Montag, 4. August 2014 03:22 > An: Markus Stockhausen > Cc: linux-raid@vger.kernel.org > Betreff: Re: RAID6 - RMW logic > > > On Thu, 31 Jul 2014 06:43:52 +0000 Markus Stockhausen > > <stockhausen@collogia.de> wrote: > > > > Hi, > > > > thanks for the link. Crawling through the modifcation I isolated two steps > > that we must achieve in first place to get it on track. I'm far away from > > implementing a full patch so I focus on what I understand. > > > > 1) Implement a generic switch so we can configure rmw/rcw handling > > on the fly. Without any RAID6 rmw patches yet it will simply focus on the > > current RAID5 implementation. Later on RAID6 can use it too and we > > are able to compare rmw versus rcw performance in all cases. > > I would name the parameter enable_rmw and default it to 1. In RAID6 case > > it will be ignored. > > > > -> Ok with that? > > No, sorry. Or not very. > > In that email thread I pointed you to I wrote: > > - Can you explain *why* rcw is sometimes better than rmw even on large > arrays? Even a fairly hand-wavy arguement would help. And it would go in > the comment at the top of the patch that adds enable_rmw. > > > I see you've posted a patch, but there is no "why". > I don't like adding configuration options. If there is some clear and easy > to understand benefit, like "this trades throughput against latency", then I > might be able to live with one, because it would be easy to tell people how > to tune it. > > Why would I ever disable rmw? Don't say "choose the option that performs best > for your workload", because that is nearly meaningless: workloads change from > moment to moment. If rwm is good in some cases and bad in others, then we > should at least make sure we understand why, and then hopefully get the md > driver to auto-detect the different cases. > > There might be a case for allowing an option like that to support a > "developer only preview" of the code. i.e. Add the rmw-for-RAID6 code, find > that is slows down some workloads, get confused about why, ask for help, > people are only happy to test if it is in mainline, so use a developer-only > config option. > Then at least I could tell people when to turn it on: only if you are a > developer. As you might have seen I posted a complete rmw patch to the mailing list. It is the first test version with the "developer switch". Sorry for being very defensive in that way. Working only one week with the md raid code I wanted to ensure that nothing gets broken. Especially I had hard times to figure out the logic of the async layer. Therefore I'm very unsure if a system with hardware assisted P/Q calculation will benefit straight forward from my patches. Additionaly I thought about some corner cases that might work better with one special switch option. To detect them automatically in md might be beyond the scope of this patch. Hopefully you can allay my concerns. If you like I can simply drop that switch in the next version. > > NeilBrown Thanks for your thoughts. Markus= [-- Attachment #2: InterScan_Disclaimer.txt --] [-- Type: text/plain, Size: 1650 bytes --] **************************************************************************** Diese E-Mail enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser Mail ist nicht gestattet. Ãber das Internet versandte E-Mails können unter fremden Namen erstellt oder manipuliert werden. Deshalb ist diese als E-Mail verschickte Nachricht keine rechtsverbindliche Willenserklärung. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln Vorstand: Kadir Akin Dr. Michael Höhnerbach Vorsitzender des Aufsichtsrates: Hans Kristian Langva Registergericht: Amtsgericht Köln Registernummer: HRB 52 497 This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden. e-mails sent over the internet may have been written under a wrong name or been manipulated. That is why this message sent as an e-mail is not a legally binding declaration of intention. Collogia Unternehmensberatung AG Ubierring 11 D-50678 Köln executive board: Kadir Akin Dr. Michael Höhnerbach President of the supervisory board: Hans Kristian Langva Registry office: district court Cologne Register number: HRB 52 497 **************************************************************************** ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: RAID6 - RMW logic 2014-08-10 17:49 ` AW: " Markus Stockhausen @ 2014-08-12 7:14 ` NeilBrown 0 siblings, 0 replies; 7+ messages in thread From: NeilBrown @ 2014-08-12 7:14 UTC (permalink / raw) To: Markus Stockhausen; +Cc: linux-raid@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 3461 bytes --] On Sun, 10 Aug 2014 17:49:58 +0000 Markus Stockhausen <stockhausen@collogia.de> wrote: > > Von: NeilBrown [neilb@suse.de] > > Gesendet: Montag, 4. August 2014 03:22 > > An: Markus Stockhausen > > Cc: linux-raid@vger.kernel.org > > Betreff: Re: RAID6 - RMW logic > > > > > On Thu, 31 Jul 2014 06:43:52 +0000 Markus Stockhausen > > > <stockhausen@collogia.de> wrote: > > > > > > Hi, > > > > > > thanks for the link. Crawling through the modifcation I isolated two steps > > > that we must achieve in first place to get it on track. I'm far away from > > > implementing a full patch so I focus on what I understand. > > > > > > 1) Implement a generic switch so we can configure rmw/rcw handling > > > on the fly. Without any RAID6 rmw patches yet it will simply focus on the > > > current RAID5 implementation. Later on RAID6 can use it too and we > > > are able to compare rmw versus rcw performance in all cases. > > > I would name the parameter enable_rmw and default it to 1. In RAID6 case > > > it will be ignored. > > > > > > -> Ok with that? > > > > No, sorry. Or not very. > > > > In that email thread I pointed you to I wrote: > > > > - Can you explain *why* rcw is sometimes better than rmw even on large > > arrays? Even a fairly hand-wavy arguement would help. And it would go in > > the comment at the top of the patch that adds enable_rmw. > > > > > > I see you've posted a patch, but there is no "why". > > I don't like adding configuration options. If there is some clear and easy > > to understand benefit, like "this trades throughput against latency", then I > > might be able to live with one, because it would be easy to tell people how > > to tune it. > > > > Why would I ever disable rmw? Don't say "choose the option that performs best > > for your workload", because that is nearly meaningless: workloads change from > > moment to moment. If rwm is good in some cases and bad in others, then we > > should at least make sure we understand why, and then hopefully get the md > > driver to auto-detect the different cases. > > > > There might be a case for allowing an option like that to support a > > "developer only preview" of the code. i.e. Add the rmw-for-RAID6 code, find > > that is slows down some workloads, get confused about why, ask for help, > > people are only happy to test if it is in mainline, so use a developer-only > > config option. > > Then at least I could tell people when to turn it on: only if you are a > > developer. > > As you might have seen I posted a complete rmw patch to the mailing list. > It is the first test version with the "developer switch". Sorry for being very > defensive in that way. Working only one week with the md raid code I > wanted to ensure that nothing gets broken. Especially I had hard times to > figure out the logic of the async layer. Therefore I'm very unsure if a system > with hardware assisted P/Q calculation will benefit straight forward from > my patches. > > Additionaly I thought about some corner cases that might work better with > one special switch option. To detect them automatically in md might be beyond > the scope of this patch. > > Hopefully you can allay my concerns. If you like I can simply drop that switch > in the next version. Thanks for the patches. I'll try to have a proper look sometime soon, but it might not be until next week. NeilBrown [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 828 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-08-12 7:14 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-30 20:24 RAID6 - RMW logic Markus Stockhausen 2014-07-30 21:30 ` NeilBrown 2014-07-30 21:55 ` Ethan Wilson 2014-07-31 6:43 ` AW: " Markus Stockhausen 2014-08-04 1:22 ` NeilBrown 2014-08-10 17:49 ` AW: " Markus Stockhausen 2014-08-12 7:14 ` NeilBrown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).