* [PATCH 1/2] crypto: wait for a full jiffy in do_xor_speed
2012-04-03 23:56 ` RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming) Jim Kukunas
@ 2012-04-03 23:56 ` Jim Kukunas
2012-04-03 23:56 ` [PATCH 2/2] crypto: disable preemption while benchmarking RAID5 xor checksumming Jim Kukunas
2012-04-06 20:43 ` RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming) Dan Williams
2 siblings, 0 replies; 5+ messages in thread
From: Jim Kukunas @ 2012-04-03 23:56 UTC (permalink / raw)
To: linux-raid; +Cc: linux-crypto
In the existing do_xor_speed(), there is no guarantee that we actually
run do_2() for a full jiffy. We get the current jiffy, then run do_2()
until the next jiffy.
Instead, let's get the current jiffy, then wait until the next jiffy
to start our test.
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
---
crypto/xor.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/crypto/xor.c b/crypto/xor.c
index b75182d..8788443 100644
--- a/crypto/xor.c
+++ b/crypto/xor.c
@@ -63,7 +63,7 @@ static void
do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
{
int speed;
- unsigned long now;
+ unsigned long now, j;
int i, count, max;
tmpl->next = template_list;
@@ -76,9 +76,11 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
*/
max = 0;
for (i = 0; i < 5; i++) {
- now = jiffies;
+ j = jiffies;
count = 0;
- while (jiffies == now) {
+ while ((now = jiffies) == j)
+ cpu_relax();
+ while (time_before(jiffies, now + 1)) {
mb(); /* prevent loop optimzation */
tmpl->do_2(BENCH_SIZE, b1, b2);
mb();
--
1.7.8.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] crypto: disable preemption while benchmarking RAID5 xor checksumming
2012-04-03 23:56 ` RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming) Jim Kukunas
2012-04-03 23:56 ` [PATCH 1/2] crypto: wait for a full jiffy in do_xor_speed Jim Kukunas
@ 2012-04-03 23:56 ` Jim Kukunas
2012-04-06 20:43 ` RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming) Dan Williams
2 siblings, 0 replies; 5+ messages in thread
From: Jim Kukunas @ 2012-04-03 23:56 UTC (permalink / raw)
To: linux-raid; +Cc: linux-crypto
With CONFIG_PREEMPT=y, we need to disable preemption while benchmarking
RAID5 xor checksumming to ensure we're actually measuring what we think
we're measuring.
Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com>
---
crypto/xor.c | 5 +++++
1 files changed, 5 insertions(+), 0 deletions(-)
diff --git a/crypto/xor.c b/crypto/xor.c
index 8788443..84daa11 100644
--- a/crypto/xor.c
+++ b/crypto/xor.c
@@ -21,6 +21,7 @@
#include <linux/gfp.h>
#include <linux/raid/xor.h>
#include <linux/jiffies.h>
+#include <linux/preempt.h>
#include <asm/xor.h>
/* The xor routines to use. */
@@ -69,6 +70,8 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
tmpl->next = template_list;
template_list = tmpl;
+ preempt_disable();
+
/*
* Count the number of XORs done during a whole jiffy, and use
* this to calculate the speed of checksumming. We use a 2-page
@@ -91,6 +94,8 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
max = count;
}
+ preempt_enable();
+
speed = max * (HZ * BENCH_SIZE / 1024);
tmpl->speed = speed;
--
1.7.8.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming)
2012-04-03 23:56 ` RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming) Jim Kukunas
2012-04-03 23:56 ` [PATCH 1/2] crypto: wait for a full jiffy in do_xor_speed Jim Kukunas
2012-04-03 23:56 ` [PATCH 2/2] crypto: disable preemption while benchmarking RAID5 xor checksumming Jim Kukunas
@ 2012-04-06 20:43 ` Dan Williams
2012-04-17 15:32 ` Boaz Harrosh
2 siblings, 1 reply; 5+ messages in thread
From: Dan Williams @ 2012-04-06 20:43 UTC (permalink / raw)
To: Jim Kukunas; +Cc: linux-raid, linux-crypto, bharrosh
[adding Boaz since he also made an attempt at fixing this]
http://marc.info/?l=linux-crypto-vger&m=131829241111450&w=2
...I had meant to follow up on this, but was buried in 'isci' issues.
On Tue, Apr 3, 2012 at 4:56 PM, Jim Kukunas
<james.t.kukunas@linux.intel.com> wrote:
> On Tue, Apr 03, 2012 at 11:23:16AM +0100, John Robinson wrote:
>> On 02/04/2012 23:48, Jim Kukunas wrote:
>> > On Sat, Mar 31, 2012 at 12:38:56PM +0100, John Robinson wrote:
>> [...]
>> >> I just noticed in my logs the other day (recent el5 kernel on a Core 2):
>> >>
>> >> raid5: automatically using best checksumming function: generic_sse
>> >> generic_sse: 7805.000 MB/sec
>> >> raid5: using function: generic_sse (7805.000 MB/sec)
>> [...]
>> >> raid6: using algorithm sse2x4 (8237 MB/s)
>> >>
>> >> I was just wondering how it's possible to do the RAID6 Q calculation
>> >> faster than the RAID5 XOR calculation - or am I reading this log excerpt
>> >> wrongly?
>> >
>> > Out of curiosity, are you running with CONFIG_PREEMPT=y?
>>
>> No. Here's an excerpt from my .config:
>>
>> # CONFIG_PREEMPT_NONE is not set
>> CONFIG_PREEMPT_VOLUNTARY=y
>> # CONFIG_PREEMPT is not set
>> CONFIG_PREEMPT_BKL=y
>> CONFIG_PREEMPT_NOTIFIERS=y
>>
>> But this is a Xen dom0 kernel, 2.6.18-308.1.1.el5.centos.plusxen. Now, a
>> non-Xen kernel (2.6.18-308.1.1.el5) says:
>> raid5: automatically using best checksumming function: generic_sse
>> generic_sse: 11892.000 MB/sec
>> raid5: using function: generic_sse (11892.000 MB/sec)
>> raid6: int64x1 2644 MB/s
>> raid6: int64x2 3238 MB/s
>> raid6: int64x4 3011 MB/s
>> raid6: int64x8 2503 MB/s
>> raid6: sse2x1 5375 MB/s
>> raid6: sse2x2 5851 MB/s
>> raid6: sse2x4 9136 MB/s
>> raid6: using algorithm sse2x4 (9136 MB/s)
>>
>> Looks like it loses a chunk of performance running as a Xen dom0.
>>
>> Even still, 11892 MB/s for XOR vs 9136 MB/s for XOR+Q - it still seems
>> remarkable that the XOR can't be done several times faster than the Q.
>
> Taking a look at do_xor_speed, I see two issues which might be the cause
> of the disparity you reported.
>
> 0) In the RAID5 xor benchmark, we get the current jiffy, then run do_2() until
> the jiffy increments. This means we could potentially be testing for less
> than a full jiffy. The RAID6 benchmark handles this by obtaining the current
> jiffy, then calling cpu_relax() until the jiffy increments, and then running
> the test. This is addressed by my first patch.
>
> 1) The only way I could reproduce your findings of a higher throughput for
> RAID6 than for RAID5 xor checksumming was with CONFIG_PREEMPT=y. It seems
> that you encountered this while running as XEN dom0. Currently, we disable
> preemption during the RAID6 benchmark, but don't in the RAID5 benchmark.
> This is addressed by my second patch.
>
> I've added linux-crypto to the discussion as both of these patches affect
> code in crypto/
>
> Thanks.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming)
2012-04-06 20:43 ` RAID5 XOR speed vs RAID6 Q speed (was Re: AVX RAID5 xor checksumming) Dan Williams
@ 2012-04-17 15:32 ` Boaz Harrosh
0 siblings, 0 replies; 5+ messages in thread
From: Boaz Harrosh @ 2012-04-17 15:32 UTC (permalink / raw)
To: Dan Williams; +Cc: Jim Kukunas, linux-raid, linux-crypto
On 04/06/2012 11:43 PM, Dan Williams wrote:
> [adding Boaz since he also made an attempt at fixing this]
>
> http://marc.info/?l=linux-crypto-vger&m=131829241111450&w=2
>
> ...I had meant to follow up on this, but was buried in 'isci' issues.
>
>
Sorry was traveling.
Yes I have an old fix for this. Which I need to cleanup and retest.
My original problem was an hang in UML, but I noticed the timing problems
as well.
Please give me til the end of the week to settle in and come up to speed.
[Current patch: http://marc.info/?l=linux-crypto-vger&m=131829242311458&w=2]
Thanks
Boaz
> On Tue, Apr 3, 2012 at 4:56 PM, Jim Kukunas
> <james.t.kukunas@linux.intel.com> wrote:
>> On Tue, Apr 03, 2012 at 11:23:16AM +0100, John Robinson wrote:
>>> On 02/04/2012 23:48, Jim Kukunas wrote:
>>>> On Sat, Mar 31, 2012 at 12:38:56PM +0100, John Robinson wrote:
>>> [...]
>>>>> I just noticed in my logs the other day (recent el5 kernel on a Core 2):
>>>>>
>>>>> raid5: automatically using best checksumming function: generic_sse
>>>>> generic_sse: 7805.000 MB/sec
>>>>> raid5: using function: generic_sse (7805.000 MB/sec)
>>> [...]
>>>>> raid6: using algorithm sse2x4 (8237 MB/s)
>>>>>
>>>>> I was just wondering how it's possible to do the RAID6 Q calculation
>>>>> faster than the RAID5 XOR calculation - or am I reading this log excerpt
>>>>> wrongly?
>>>>
>>>> Out of curiosity, are you running with CONFIG_PREEMPT=y?
>>>
>>> No. Here's an excerpt from my .config:
>>>
>>> # CONFIG_PREEMPT_NONE is not set
>>> CONFIG_PREEMPT_VOLUNTARY=y
>>> # CONFIG_PREEMPT is not set
>>> CONFIG_PREEMPT_BKL=y
>>> CONFIG_PREEMPT_NOTIFIERS=y
>>>
>>> But this is a Xen dom0 kernel, 2.6.18-308.1.1.el5.centos.plusxen. Now, a
>>> non-Xen kernel (2.6.18-308.1.1.el5) says:
>>> raid5: automatically using best checksumming function: generic_sse
>>> generic_sse: 11892.000 MB/sec
>>> raid5: using function: generic_sse (11892.000 MB/sec)
>>> raid6: int64x1 2644 MB/s
>>> raid6: int64x2 3238 MB/s
>>> raid6: int64x4 3011 MB/s
>>> raid6: int64x8 2503 MB/s
>>> raid6: sse2x1 5375 MB/s
>>> raid6: sse2x2 5851 MB/s
>>> raid6: sse2x4 9136 MB/s
>>> raid6: using algorithm sse2x4 (9136 MB/s)
>>>
>>> Looks like it loses a chunk of performance running as a Xen dom0.
>>>
>>> Even still, 11892 MB/s for XOR vs 9136 MB/s for XOR+Q - it still seems
>>> remarkable that the XOR can't be done several times faster than the Q.
>>
>> Taking a look at do_xor_speed, I see two issues which might be the cause
>> of the disparity you reported.
>>
>> 0) In the RAID5 xor benchmark, we get the current jiffy, then run do_2() until
>> the jiffy increments. This means we could potentially be testing for less
>> than a full jiffy. The RAID6 benchmark handles this by obtaining the current
>> jiffy, then calling cpu_relax() until the jiffy increments, and then running
>> the test. This is addressed by my first patch.
>>
>> 1) The only way I could reproduce your findings of a higher throughput for
>> RAID6 than for RAID5 xor checksumming was with CONFIG_PREEMPT=y. It seems
>> that you encountered this while running as XEN dom0. Currently, we disable
>> preemption during the RAID6 benchmark, but don't in the RAID5 benchmark.
>> This is addressed by my second patch.
>>
>> I've added linux-crypto to the discussion as both of these patches affect
>> code in crypto/
>>
>> Thanks.
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread