From: Andreas Schwab <schwab@linux-m68k.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: akira.tsukamoto@gmail.com,
Paul Walmsley <paul.walmsley@sifive.com>,
linux@roeck-us.net, geert@linux-m68k.org,
qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG
Date: Mon, 16 Aug 2021 21:00:16 +0200 [thread overview]
Message-ID: <87zgthjjun.fsf@igel.home> (raw)
In-Reply-To: <mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop> (Palmer Dabbelt's message of "Mon, 16 Aug 2021 11:09:45 -0700 (PDT)")
On Aug 16 2021, Palmer Dabbelt wrote:
> On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote:
>> Reduce the number of slow byte_copy when the size is in between
>> 2*SZREG to 9*SZREG by using none unrolled word_copy.
>>
>> Without it any size smaller than 9*SZREG will be using slow byte_copy
>> instead of none unrolled word_copy.
>>
>> Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
>> ---
>> arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
>> index 63bc691cff91..6a80d5517afc 100644
>> --- a/arch/riscv/lib/uaccess.S
>> +++ b/arch/riscv/lib/uaccess.S
>> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user)
>> /*
>> * Use byte copy only if too small.
>> * SZREG holds 4 for RV32 and 8 for RV64
>> + * a3 - 2*SZREG is minimum size for word_copy
>> + * 1*SZREG for aligning dst + 1*SZREG for word_copy
>> */
>> - li a3, 9*SZREG /* size must be larger than size in word_copy */
>> + li a3, 2*SZREG
>> bltu a2, a3, .Lbyte_copy_tail
>>
>> /*
>> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user)
>> andi a3, a1, SZREG-1
>> bnez a3, .Lshift_copy
>>
>> +.Lcheck_size_bulk:
>> + /*
>> + * Evaluate the size if possible to use unrolled.
>> + * The word_copy_unlrolled requires larger than 8*SZREG
>> + */
>> + li a3, 8*SZREG
>> + add a4, a0, a3
>> + bltu a4, t0, .Lword_copy_unlrolled
>> +
>> .Lword_copy:
>> - /*
>> - * Both src and dst are aligned, unrolled word copy
>> + /*
>> + * Both src and dst are aligned
>> + * None unrolled word copy with every 1*SZREG iteration
>> + *
>> + * a0 - start of aligned dst
>> + * a1 - start of aligned src
>> + * t0 - end of aligned dst
>> + */
>> + bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
>> + addi t0, t0, -(SZREG) /* not to over run */
>> +1:
>> + REG_L a5, 0(a1)
>> + addi a1, a1, SZREG
>> + REG_S a5, 0(a0)
>> + addi a0, a0, SZREG
>> + bltu a0, t0, 1b
>> +
>> + addi t0, t0, SZREG /* revert to original value */
>> + j .Lbyte_copy_tail
>> +
>> +.Lword_copy_unlrolled:
>> + /*
>> + * Both src and dst are aligned
>> + * Unrolled word copy with every 8*SZREG iteration
>> *
>> * a0 - start of aligned dst
>> * a1 - start of aligned src
>> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user)
>> bltu a0, t0, 2b
>>
>> addi t0, t0, 8*SZREG /* revert to original value */
>> - j .Lbyte_copy_tail
>> +
>> + /*
>> + * Remaining might large enough for word_copy to reduce slow byte
>> + * copy
>> + */
>> + j .Lcheck_size_bulk
>>
>> .Lshift_copy:
>
> I'm still not convinced that going all the way to such a large unrolling
> factor is a net win, but this at least provides a much smoother cost
> curve.
>
> That said, this is causing my 32-bit configs to hang.
It's missing fixups for the loads in the loop.
diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index a835df6bd68f..12ed1f76bd1f 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -89,9 +89,9 @@ ENTRY(__asm_copy_from_user)
bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
addi t0, t0, -(SZREG) /* not to over run */
1:
- REG_L a5, 0(a1)
+ fixup REG_L a5, 0(a1), 10f
addi a1, a1, SZREG
- REG_S a5, 0(a0)
+ fixup REG_S a5, 0(a0), 10f
addi a0, a0, SZREG
bltu a0, t0, 1b
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
WARNING: multiple messages have this Message-ID (diff)
From: Andreas Schwab <schwab@linux-m68k.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: akira.tsukamoto@gmail.com,
Paul Walmsley <paul.walmsley@sifive.com>,
linux@roeck-us.net, geert@linux-m68k.org,
qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG
Date: Mon, 16 Aug 2021 21:00:16 +0200 [thread overview]
Message-ID: <87zgthjjun.fsf@igel.home> (raw)
In-Reply-To: <mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop> (Palmer Dabbelt's message of "Mon, 16 Aug 2021 11:09:45 -0700 (PDT)")
On Aug 16 2021, Palmer Dabbelt wrote:
> On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote:
>> Reduce the number of slow byte_copy when the size is in between
>> 2*SZREG to 9*SZREG by using none unrolled word_copy.
>>
>> Without it any size smaller than 9*SZREG will be using slow byte_copy
>> instead of none unrolled word_copy.
>>
>> Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
>> ---
>> arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
>> index 63bc691cff91..6a80d5517afc 100644
>> --- a/arch/riscv/lib/uaccess.S
>> +++ b/arch/riscv/lib/uaccess.S
>> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user)
>> /*
>> * Use byte copy only if too small.
>> * SZREG holds 4 for RV32 and 8 for RV64
>> + * a3 - 2*SZREG is minimum size for word_copy
>> + * 1*SZREG for aligning dst + 1*SZREG for word_copy
>> */
>> - li a3, 9*SZREG /* size must be larger than size in word_copy */
>> + li a3, 2*SZREG
>> bltu a2, a3, .Lbyte_copy_tail
>>
>> /*
>> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user)
>> andi a3, a1, SZREG-1
>> bnez a3, .Lshift_copy
>>
>> +.Lcheck_size_bulk:
>> + /*
>> + * Evaluate the size if possible to use unrolled.
>> + * The word_copy_unlrolled requires larger than 8*SZREG
>> + */
>> + li a3, 8*SZREG
>> + add a4, a0, a3
>> + bltu a4, t0, .Lword_copy_unlrolled
>> +
>> .Lword_copy:
>> - /*
>> - * Both src and dst are aligned, unrolled word copy
>> + /*
>> + * Both src and dst are aligned
>> + * None unrolled word copy with every 1*SZREG iteration
>> + *
>> + * a0 - start of aligned dst
>> + * a1 - start of aligned src
>> + * t0 - end of aligned dst
>> + */
>> + bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
>> + addi t0, t0, -(SZREG) /* not to over run */
>> +1:
>> + REG_L a5, 0(a1)
>> + addi a1, a1, SZREG
>> + REG_S a5, 0(a0)
>> + addi a0, a0, SZREG
>> + bltu a0, t0, 1b
>> +
>> + addi t0, t0, SZREG /* revert to original value */
>> + j .Lbyte_copy_tail
>> +
>> +.Lword_copy_unlrolled:
>> + /*
>> + * Both src and dst are aligned
>> + * Unrolled word copy with every 8*SZREG iteration
>> *
>> * a0 - start of aligned dst
>> * a1 - start of aligned src
>> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user)
>> bltu a0, t0, 2b
>>
>> addi t0, t0, 8*SZREG /* revert to original value */
>> - j .Lbyte_copy_tail
>> +
>> + /*
>> + * Remaining might large enough for word_copy to reduce slow byte
>> + * copy
>> + */
>> + j .Lcheck_size_bulk
>>
>> .Lshift_copy:
>
> I'm still not convinced that going all the way to such a large unrolling
> factor is a net win, but this at least provides a much smoother cost
> curve.
>
> That said, this is causing my 32-bit configs to hang.
It's missing fixups for the loads in the loop.
diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index a835df6bd68f..12ed1f76bd1f 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -89,9 +89,9 @@ ENTRY(__asm_copy_from_user)
bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
addi t0, t0, -(SZREG) /* not to over run */
1:
- REG_L a5, 0(a1)
+ fixup REG_L a5, 0(a1), 10f
addi a1, a1, SZREG
- REG_S a5, 0(a0)
+ fixup REG_S a5, 0(a0), 10f
addi a0, a0, SZREG
bltu a0, t0, 1b
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
next prev parent reply other threads:[~2021-08-16 19:00 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 13:50 [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
2021-07-30 13:50 ` Akira Tsukamoto
2021-07-30 13:52 ` [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG Akira Tsukamoto
2021-07-30 13:52 ` Akira Tsukamoto
2021-08-12 13:41 ` Guenter Roeck
2021-08-12 13:41 ` Guenter Roeck
2021-08-15 6:51 ` Andreas Schwab
2021-08-15 6:51 ` Andreas Schwab
2021-08-16 18:09 ` Palmer Dabbelt
2021-08-16 18:09 ` Palmer Dabbelt
2021-08-16 19:00 ` Andreas Schwab [this message]
2021-08-16 19:00 ` Andreas Schwab
2021-08-20 6:42 ` Akira Tsukamoto
2021-08-20 6:42 ` Akira Tsukamoto
2021-08-17 9:03 ` Akira Tsukamoto
2021-08-17 9:03 ` Akira Tsukamoto
2021-08-12 11:01 ` [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
2021-08-12 11:01 ` Akira Tsukamoto
2021-08-15 2:30 ` Qiu Wenbo
[not found] ` <61187c37.1c69fb81.ed9bd.cc45SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-16 6:24 ` Akira Tsukamoto
2021-08-16 6:24 ` Akira Tsukamoto
2021-08-16 9:45 ` Qiu Wenbo
[not found] ` <611a33ac.1c69fb81.12aae.89a5SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-17 7:32 ` Akira Tsukamoto
2021-08-17 7:32 ` Akira Tsukamoto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgthjjun.fsf@igel.home \
--to=schwab@linux-m68k.org \
--cc=akira.tsukamoto@gmail.com \
--cc=aou@eecs.berkeley.edu \
--cc=geert@linux-m68k.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux@roeck-us.net \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=qiuwenbo@kylinos.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.