* [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions
@ 2022-01-05 16:38 Dirk Müller
2022-01-05 16:38 ` [PATCH v2 2/2] lib/raid6: Use strict priority ranking for pq gen() benchmarking Dirk Müller
2022-01-05 18:55 ` [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions Song Liu
0 siblings, 2 replies; 3+ messages in thread
From: Dirk Müller @ 2022-01-05 16:38 UTC (permalink / raw)
To: linux-raid; +Cc: Dirk Müller
In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
a xor_syndrome() benchmarking was added also to the raid6_choose_gen()
function. However, the results of that benchmarking were intentionally
discarded and did not influence the choice. It picked the
xor_syndrome() variant related to the best performing gen_syndrome().
Reduce runtime of raid6_choose_gen() without modifying its outcome by
only benchmarking the xor_syndrome() of the best gen_syndrome() variant.
For a HZ=250 x86_64 system with avx2 and without avx512 this removes
5 out of 6 xor() benchmarks, saving 340ms of raid6 initialization time.
Signed-off-by: Dirk Müller <dmueller@suse.de>
---
lib/raid6/algos.c | 76 +++++++++++++++++++++++------------------------
1 file changed, 37 insertions(+), 39 deletions(-)
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 6d5e5000fdd7..9b7e8a837b27 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -145,12 +145,12 @@ static inline const struct raid6_recov_calls *raid6_choose_recov(void)
static inline const struct raid6_calls *raid6_choose_gen(
void *(*const dptrs)[RAID6_TEST_DISKS], const int disks)
{
- unsigned long perf, bestgenperf, bestxorperf, j0, j1;
+ unsigned long perf, bestgenperf, j0, j1;
int start = (disks>>1)-1, stop = disks-3; /* work on the second half of the disks */
const struct raid6_calls *const *algo;
const struct raid6_calls *best;
- for (bestgenperf = 0, bestxorperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
+ for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
if (!best || (*algo)->prefer >= best->prefer) {
if ((*algo)->valid && !(*algo)->valid())
continue;
@@ -180,50 +180,48 @@ static inline const struct raid6_calls *raid6_choose_gen(
pr_info("raid6: %-8s gen() %5ld MB/s\n", (*algo)->name,
(perf * HZ * (disks-2)) >>
(20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2));
+ }
+ }
- if (!(*algo)->xor_syndrome)
- continue;
+ if (!best) {
+ pr_err("raid6: Yikes! No algorithm found!\n");
+ goto out;
+ }
- perf = 0;
+ raid6_call = *best;
- preempt_disable();
- j0 = jiffies;
- while ((j1 = jiffies) == j0)
- cpu_relax();
- while (time_before(jiffies,
- j1 + (1<<RAID6_TIME_JIFFIES_LG2))) {
- (*algo)->xor_syndrome(disks, start, stop,
- PAGE_SIZE, *dptrs);
- perf++;
- }
- preempt_enable();
-
- if (best == *algo)
- bestxorperf = perf;
+ if (!IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
+ pr_info("raid6: skipped pq benchmark and selected %s\n",
+ best->name);
+ goto out;
+ }
- pr_info("raid6: %-8s xor() %5ld MB/s\n", (*algo)->name,
- (perf * HZ * (disks-2)) >>
- (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
+ pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
+ best->name,
+ (bestgenperf * HZ * (disks - 2)) >>
+ (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2));
+
+ if (best->xor_syndrome) {
+ perf = 0;
+
+ preempt_disable();
+ j0 = jiffies;
+ while ((j1 = jiffies) == j0)
+ cpu_relax();
+ while (time_before(jiffies,
+ j1 + (1 << RAID6_TIME_JIFFIES_LG2))) {
+ best->xor_syndrome(disks, start, stop,
+ PAGE_SIZE, *dptrs);
+ perf++;
}
- }
+ preempt_enable();
- if (best) {
- if (IS_ENABLED(CONFIG_RAID6_PQ_BENCHMARK)) {
- pr_info("raid6: using algorithm %s gen() %ld MB/s\n",
- best->name,
- (bestgenperf * HZ * (disks-2)) >>
- (20 - PAGE_SHIFT+RAID6_TIME_JIFFIES_LG2));
- if (best->xor_syndrome)
- pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
- (bestxorperf * HZ * (disks-2)) >>
- (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
- } else
- pr_info("raid6: skip pq benchmark and using algorithm %s\n",
- best->name);
- raid6_call = *best;
- } else
- pr_err("raid6: Yikes! No algorithm found!\n");
+ pr_info("raid6: .... xor() %ld MB/s, rmw enabled\n",
+ (perf * HZ * (disks - 2)) >>
+ (20 - PAGE_SHIFT + RAID6_TIME_JIFFIES_LG2 + 1));
+ }
+out:
return best;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2 2/2] lib/raid6: Use strict priority ranking for pq gen() benchmarking
2022-01-05 16:38 [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions Dirk Müller
@ 2022-01-05 16:38 ` Dirk Müller
2022-01-05 18:55 ` [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions Song Liu
1 sibling, 0 replies; 3+ messages in thread
From: Dirk Müller @ 2022-01-05 16:38 UTC (permalink / raw)
To: linux-raid; +Cc: Dirk Müller, Paul Menzel
On x86_64, currently 3 variants of AVX512, 3 variants of AVX2
and 3 variants of SSE2 are benchmarked on initialization, taking
between 144-153 jiffies. Testing across a hardware pool of
various generations of intel cpus I could not find a single
case where SSE2 won over AVX2 or AVX512. There are cases where
AVX2 wins over AVX512 however.
Change "prefer" into an integer priority field (similar to
how recov selection works) to have more than one ranking level
available, which is backwards compatible with existing behavior.
Give AVX2/512 variants higher priority over SSE2 in order to skip
SSE testing when AVX is available. in a AVX2/x86_64/HZ=250 case this
saves in the order of 200ms of initialization time.
Signed-off-by: Dirk Müller <dmueller@suse.de>
Acked-by: Paul Menzel <pmenzel@molgen.mpg.de>
---
include/linux/raid/pq.h | 2 +-
lib/raid6/algos.c | 2 +-
lib/raid6/avx2.c | 8 ++++----
lib/raid6/avx512.c | 6 +++---
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 154e954b711d..d6e5a1feb947 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -81,7 +81,7 @@ struct raid6_calls {
void (*xor_syndrome)(int, int, int, size_t, void **);
int (*valid)(void); /* Returns 1 if this routine set is usable */
const char *name; /* Name of this routine set */
- int prefer; /* Has special performance attribute */
+ int priority; /* Relative priority ranking if non-zero */
};
/* Selected algorithm */
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 9b7e8a837b27..39b74221f4a7 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -151,7 +151,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
const struct raid6_calls *best;
for (bestgenperf = 0, best = NULL, algo = raid6_algos; *algo; algo++) {
- if (!best || (*algo)->prefer >= best->prefer) {
+ if (!best || (*algo)->priority >= best->priority) {
if ((*algo)->valid && !(*algo)->valid())
continue;
diff --git a/lib/raid6/avx2.c b/lib/raid6/avx2.c
index f299476e1d76..059024234dce 100644
--- a/lib/raid6/avx2.c
+++ b/lib/raid6/avx2.c
@@ -132,7 +132,7 @@ const struct raid6_calls raid6_avx2x1 = {
raid6_avx21_xor_syndrome,
raid6_have_avx2,
"avx2x1",
- 1 /* Has cache hints */
+ .priority = 2 /* Prefer AVX2 over priority 1 (SSE2 and others) */
};
/*
@@ -262,7 +262,7 @@ const struct raid6_calls raid6_avx2x2 = {
raid6_avx22_xor_syndrome,
raid6_have_avx2,
"avx2x2",
- 1 /* Has cache hints */
+ .priority = 2 /* Prefer AVX2 over priority 1 (SSE2 and others) */
};
#ifdef CONFIG_X86_64
@@ -465,6 +465,6 @@ const struct raid6_calls raid6_avx2x4 = {
raid6_avx24_xor_syndrome,
raid6_have_avx2,
"avx2x4",
- 1 /* Has cache hints */
+ .priority = 2 /* Prefer AVX2 over priority 1 (SSE2 and others) */
};
-#endif
+#endif /* CONFIG_X86_64 */
diff --git a/lib/raid6/avx512.c b/lib/raid6/avx512.c
index bb684d144ee2..9c3e822e1adf 100644
--- a/lib/raid6/avx512.c
+++ b/lib/raid6/avx512.c
@@ -162,7 +162,7 @@ const struct raid6_calls raid6_avx512x1 = {
raid6_avx5121_xor_syndrome,
raid6_have_avx512,
"avx512x1",
- 1 /* Has cache hints */
+ .priority = 2 /* Prefer AVX512 over priority 1 (SSE2 and others) */
};
/*
@@ -319,7 +319,7 @@ const struct raid6_calls raid6_avx512x2 = {
raid6_avx5122_xor_syndrome,
raid6_have_avx512,
"avx512x2",
- 1 /* Has cache hints */
+ .priority = 2 /* Prefer AVX512 over priority 1 (SSE2 and others) */
};
#ifdef CONFIG_X86_64
@@ -557,7 +557,7 @@ const struct raid6_calls raid6_avx512x4 = {
raid6_avx5124_xor_syndrome,
raid6_have_avx512,
"avx512x4",
- 1 /* Has cache hints */
+ .priority = 2 /* Prefer AVX512 over priority 1 (SSE2 and others) */
};
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions
2022-01-05 16:38 [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions Dirk Müller
2022-01-05 16:38 ` [PATCH v2 2/2] lib/raid6: Use strict priority ranking for pq gen() benchmarking Dirk Müller
@ 2022-01-05 18:55 ` Song Liu
1 sibling, 0 replies; 3+ messages in thread
From: Song Liu @ 2022-01-05 18:55 UTC (permalink / raw)
To: Dirk Müller; +Cc: linux-raid
On Wed, Jan 5, 2022 at 8:39 AM Dirk Müller <dmueller@suse.de> wrote:
>
> In commit fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> a xor_syndrome() benchmarking was added also to the raid6_choose_gen()
> function. However, the results of that benchmarking were intentionally
> discarded and did not influence the choice. It picked the
> xor_syndrome() variant related to the best performing gen_syndrome().
>
> Reduce runtime of raid6_choose_gen() without modifying its outcome by
> only benchmarking the xor_syndrome() of the best gen_syndrome() variant.
>
> For a HZ=250 x86_64 system with avx2 and without avx512 this removes
> 5 out of 6 xor() benchmarks, saving 340ms of raid6 initialization time.
>
> Signed-off-by: Dirk Müller <dmueller@suse.de>
Applied both patches to md-next.
Thanks,
Song
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-01-05 18:55 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-05 16:38 [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions Dirk Müller
2022-01-05 16:38 ` [PATCH v2 2/2] lib/raid6: Use strict priority ranking for pq gen() benchmarking Dirk Müller
2022-01-05 18:55 ` [PATCH v2 1/2] lib/raid6: skip benchmark of non-chosen xor_syndrome functions Song Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).