All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Schwab <schwab@linux-m68k.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: akira.tsukamoto@gmail.com,
	 Paul Walmsley <paul.walmsley@sifive.com>,
	linux@roeck-us.net,  geert@linux-m68k.org,
	 qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu,
	 linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG
Date: Mon, 16 Aug 2021 21:00:16 +0200	[thread overview]
Message-ID: <87zgthjjun.fsf@igel.home> (raw)
In-Reply-To: <mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop> (Palmer Dabbelt's message of "Mon, 16 Aug 2021 11:09:45 -0700 (PDT)")

On Aug 16 2021, Palmer Dabbelt wrote:

> On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote:
>> Reduce the number of slow byte_copy when the size is in between
>> 2*SZREG to 9*SZREG by using none unrolled word_copy.
>>
>> Without it any size smaller than 9*SZREG will be using slow byte_copy
>> instead of none unrolled word_copy.
>>
>> Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
>> ---
>>  arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
>> index 63bc691cff91..6a80d5517afc 100644
>> --- a/arch/riscv/lib/uaccess.S
>> +++ b/arch/riscv/lib/uaccess.S
>> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user)
>>  	/*
>>  	 * Use byte copy only if too small.
>>  	 * SZREG holds 4 for RV32 and 8 for RV64
>> +	 * a3 - 2*SZREG is minimum size for word_copy
>> +	 *      1*SZREG for aligning dst + 1*SZREG for word_copy
>>  	 */
>> -	li	a3, 9*SZREG /* size must be larger than size in word_copy */
>> +	li	a3, 2*SZREG
>>  	bltu	a2, a3, .Lbyte_copy_tail
>>
>>  	/*
>> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user)
>>  	andi	a3, a1, SZREG-1
>>  	bnez	a3, .Lshift_copy
>>
>> +.Lcheck_size_bulk:
>> +	/*
>> +	 * Evaluate the size if possible to use unrolled.
>> +	 * The word_copy_unlrolled requires larger than 8*SZREG
>> +	 */
>> +	li	a3, 8*SZREG
>> +	add	a4, a0, a3
>> +	bltu	a4, t0, .Lword_copy_unlrolled
>> +
>>  .Lword_copy:
>> -        /*
>> -	 * Both src and dst are aligned, unrolled word copy
>> +	/*
>> +	 * Both src and dst are aligned
>> +	 * None unrolled word copy with every 1*SZREG iteration
>> +	 *
>> +	 * a0 - start of aligned dst
>> +	 * a1 - start of aligned src
>> +	 * t0 - end of aligned dst
>> +	 */
>> +	bgeu	a0, t0, .Lbyte_copy_tail /* check if end of copy */
>> +	addi	t0, t0, -(SZREG) /* not to over run */
>> +1:
>> +	REG_L	a5, 0(a1)
>> +	addi	a1, a1, SZREG
>> +	REG_S	a5, 0(a0)
>> +	addi	a0, a0, SZREG
>> +	bltu	a0, t0, 1b
>> +
>> +	addi	t0, t0, SZREG /* revert to original value */
>> +	j	.Lbyte_copy_tail
>> +
>> +.Lword_copy_unlrolled:
>> +	/*
>> +	 * Both src and dst are aligned
>> +	 * Unrolled word copy with every 8*SZREG iteration
>>  	 *
>>  	 * a0 - start of aligned dst
>>  	 * a1 - start of aligned src
>> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user)
>>  	bltu	a0, t0, 2b
>>
>>  	addi	t0, t0, 8*SZREG /* revert to original value */
>> -	j	.Lbyte_copy_tail
>> +
>> +	/*
>> +	 * Remaining might large enough for word_copy to reduce slow byte
>> +	 * copy
>> +	 */
>> +	j	.Lcheck_size_bulk
>>
>>  .Lshift_copy:
>
> I'm still not convinced that going all the way to such a large unrolling
> factor is a net win, but this at least provides a much smoother cost 
> curve.
>
> That said, this is causing my 32-bit configs to hang.

It's missing fixups for the loads in the loop.

diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index a835df6bd68f..12ed1f76bd1f 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -89,9 +89,9 @@ ENTRY(__asm_copy_from_user)
 	bgeu	a0, t0, .Lbyte_copy_tail /* check if end of copy */
 	addi	t0, t0, -(SZREG) /* not to over run */
 1:
-	REG_L	a5, 0(a1)
+	fixup REG_L	a5, 0(a1), 10f
 	addi	a1, a1, SZREG
-	REG_S	a5, 0(a0)
+	fixup REG_S	a5, 0(a0), 10f
 	addi	a0, a0, SZREG
 	bltu	a0, t0, 1b
 

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

WARNING: multiple messages have this Message-ID (diff)
From: Andreas Schwab <schwab@linux-m68k.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: akira.tsukamoto@gmail.com,
	Paul Walmsley <paul.walmsley@sifive.com>,
	linux@roeck-us.net, geert@linux-m68k.org,
	qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu,
	linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG
Date: Mon, 16 Aug 2021 21:00:16 +0200	[thread overview]
Message-ID: <87zgthjjun.fsf@igel.home> (raw)
In-Reply-To: <mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop> (Palmer Dabbelt's message of "Mon, 16 Aug 2021 11:09:45 -0700 (PDT)")

On Aug 16 2021, Palmer Dabbelt wrote:

> On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote:
>> Reduce the number of slow byte_copy when the size is in between
>> 2*SZREG to 9*SZREG by using none unrolled word_copy.
>>
>> Without it any size smaller than 9*SZREG will be using slow byte_copy
>> instead of none unrolled word_copy.
>>
>> Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
>> ---
>>  arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++----
>>  1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
>> index 63bc691cff91..6a80d5517afc 100644
>> --- a/arch/riscv/lib/uaccess.S
>> +++ b/arch/riscv/lib/uaccess.S
>> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user)
>>  	/*
>>  	 * Use byte copy only if too small.
>>  	 * SZREG holds 4 for RV32 and 8 for RV64
>> +	 * a3 - 2*SZREG is minimum size for word_copy
>> +	 *      1*SZREG for aligning dst + 1*SZREG for word_copy
>>  	 */
>> -	li	a3, 9*SZREG /* size must be larger than size in word_copy */
>> +	li	a3, 2*SZREG
>>  	bltu	a2, a3, .Lbyte_copy_tail
>>
>>  	/*
>> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user)
>>  	andi	a3, a1, SZREG-1
>>  	bnez	a3, .Lshift_copy
>>
>> +.Lcheck_size_bulk:
>> +	/*
>> +	 * Evaluate the size if possible to use unrolled.
>> +	 * The word_copy_unlrolled requires larger than 8*SZREG
>> +	 */
>> +	li	a3, 8*SZREG
>> +	add	a4, a0, a3
>> +	bltu	a4, t0, .Lword_copy_unlrolled
>> +
>>  .Lword_copy:
>> -        /*
>> -	 * Both src and dst are aligned, unrolled word copy
>> +	/*
>> +	 * Both src and dst are aligned
>> +	 * None unrolled word copy with every 1*SZREG iteration
>> +	 *
>> +	 * a0 - start of aligned dst
>> +	 * a1 - start of aligned src
>> +	 * t0 - end of aligned dst
>> +	 */
>> +	bgeu	a0, t0, .Lbyte_copy_tail /* check if end of copy */
>> +	addi	t0, t0, -(SZREG) /* not to over run */
>> +1:
>> +	REG_L	a5, 0(a1)
>> +	addi	a1, a1, SZREG
>> +	REG_S	a5, 0(a0)
>> +	addi	a0, a0, SZREG
>> +	bltu	a0, t0, 1b
>> +
>> +	addi	t0, t0, SZREG /* revert to original value */
>> +	j	.Lbyte_copy_tail
>> +
>> +.Lword_copy_unlrolled:
>> +	/*
>> +	 * Both src and dst are aligned
>> +	 * Unrolled word copy with every 8*SZREG iteration
>>  	 *
>>  	 * a0 - start of aligned dst
>>  	 * a1 - start of aligned src
>> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user)
>>  	bltu	a0, t0, 2b
>>
>>  	addi	t0, t0, 8*SZREG /* revert to original value */
>> -	j	.Lbyte_copy_tail
>> +
>> +	/*
>> +	 * Remaining might large enough for word_copy to reduce slow byte
>> +	 * copy
>> +	 */
>> +	j	.Lcheck_size_bulk
>>
>>  .Lshift_copy:
>
> I'm still not convinced that going all the way to such a large unrolling
> factor is a net win, but this at least provides a much smoother cost 
> curve.
>
> That said, this is causing my 32-bit configs to hang.

It's missing fixups for the loads in the loop.

diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index a835df6bd68f..12ed1f76bd1f 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -89,9 +89,9 @@ ENTRY(__asm_copy_from_user)
 	bgeu	a0, t0, .Lbyte_copy_tail /* check if end of copy */
 	addi	t0, t0, -(SZREG) /* not to over run */
 1:
-	REG_L	a5, 0(a1)
+	fixup REG_L	a5, 0(a1), 10f
 	addi	a1, a1, SZREG
-	REG_S	a5, 0(a0)
+	fixup REG_S	a5, 0(a0), 10f
 	addi	a0, a0, SZREG
 	bltu	a0, t0, 1b
 

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

  reply	other threads:[~2021-08-16 19:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-30 13:50 [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
2021-07-30 13:50 ` Akira Tsukamoto
2021-07-30 13:52 ` [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG Akira Tsukamoto
2021-07-30 13:52   ` Akira Tsukamoto
2021-08-12 13:41   ` Guenter Roeck
2021-08-12 13:41     ` Guenter Roeck
2021-08-15  6:51   ` Andreas Schwab
2021-08-15  6:51     ` Andreas Schwab
2021-08-16 18:09   ` Palmer Dabbelt
2021-08-16 18:09     ` Palmer Dabbelt
2021-08-16 19:00     ` Andreas Schwab [this message]
2021-08-16 19:00       ` Andreas Schwab
2021-08-20  6:42       ` Akira Tsukamoto
2021-08-20  6:42         ` Akira Tsukamoto
2021-08-17  9:03     ` Akira Tsukamoto
2021-08-17  9:03       ` Akira Tsukamoto
2021-08-12 11:01 ` [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
2021-08-12 11:01   ` Akira Tsukamoto
2021-08-15  2:30   ` Qiu Wenbo
     [not found]   ` <61187c37.1c69fb81.ed9bd.cc45SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-16  6:24     ` Akira Tsukamoto
2021-08-16  6:24       ` Akira Tsukamoto
2021-08-16  9:45       ` Qiu Wenbo
     [not found]       ` <611a33ac.1c69fb81.12aae.89a5SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-17  7:32         ` Akira Tsukamoto
2021-08-17  7:32           ` Akira Tsukamoto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgthjjun.fsf@igel.home \
    --to=schwab@linux-m68k.org \
    --cc=akira.tsukamoto@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=geert@linux-m68k.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=linux@roeck-us.net \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=qiuwenbo@kylinos.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.