From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>,
Yang Yingliang <yangyingliang@huawei.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, guohanjun@huawei.com
Subject: Re: [PATCH 2/3] arm64: lib: improve copy performance when size is ge 128 bytes
Date: Tue, 23 Mar 2021 15:03:59 +0000 [thread overview]
Message-ID: <20210323150358.GA10576@arm.com> (raw)
In-Reply-To: <20210323133217.GA11802@willie-the-truck>
On Tue, Mar 23, 2021 at 01:32:18PM +0000, Will Deacon wrote:
> On Tue, Mar 23, 2021 at 12:08:56PM +0000, Robin Murphy wrote:
> > On 2021-03-23 07:34, Yang Yingliang wrote:
> > > When copy over 128 bytes, src/dst is added after
> > > each ldp/stp instruction, it will cost more time.
> > > To improve this, we only add src/dst after load
> > > or store 64 bytes.
> >
> > This breaks the required behaviour for copy_*_user(), since the fault
> > handler expects the base address to be up-to-date at all times. Say you're
> > copying 128 bytes and fault on the 4th store, it should return 80 bytes not
> > copied; the code below would return 128 bytes not copied, even though 48
> > bytes have actually been written to the destination.
> >
> > We've had a couple of tries at updating this code (because the whole
> > template is frankly a bit terrible, and a long way from the well-optimised
> > code it was derived from), but getting the fault-handling behaviour right
> > without making the handler itself ludicrously complex has proven tricky. And
> > then it got bumped down the priority list while the uaccess behaviour in
> > general was in flux - now that the dust has largely settled on that I should
> > probably try to find time to pick this up again...
>
> I think the v5 from Oli was pretty close, but it didn't get any review:
>
> https://lore.kernel.org/r/20200914151800.2270-1-oli.swede@arm.com
These are still unread in my inbox as I was planning to look at them
again. However, I think we discussed a few options on how to proceed but
no concrete plans:
1. Merge Oli's patches as they are, with some potential complexity
issues as fixing the user copy accuracy was non-trivial. I think the
latest version uses a two-stage approach: when taking a fault, it
falls back to to byte-by-byte with the expectation that it faults
again and we can then report the correct fault address.
2. Only use Cortex Strings for in-kernel memcpy() while the uaccess
routines are some simple loops that align the uaccess part only
(unlike Cortex Strings which usually to align the source).
3. Similar to 2 but with Cortex Strings imported automatically with some
script to make it easier to keep the routines up to date.
If having non-optimal (but good enough) uaccess routines is acceptable,
I'd go for (2) with a plan to move to (3) at the next Cortex Strings
update.
I also need to look again at option (1) to see how complex it is but
given the time one spends on importing a new Cortex Strings library, I
don't think (1) scales well on the long term. We could, however, go for
(1) now and look at (3) with the next update.
--
Catalin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: Catalin Marinas <catalin.marinas@arm.com>
To: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>,
Yang Yingliang <yangyingliang@huawei.com>,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org, guohanjun@huawei.com
Subject: Re: [PATCH 2/3] arm64: lib: improve copy performance when size is ge 128 bytes
Date: Tue, 23 Mar 2021 15:03:59 +0000 [thread overview]
Message-ID: <20210323150358.GA10576@arm.com> (raw)
In-Reply-To: <20210323133217.GA11802@willie-the-truck>
On Tue, Mar 23, 2021 at 01:32:18PM +0000, Will Deacon wrote:
> On Tue, Mar 23, 2021 at 12:08:56PM +0000, Robin Murphy wrote:
> > On 2021-03-23 07:34, Yang Yingliang wrote:
> > > When copy over 128 bytes, src/dst is added after
> > > each ldp/stp instruction, it will cost more time.
> > > To improve this, we only add src/dst after load
> > > or store 64 bytes.
> >
> > This breaks the required behaviour for copy_*_user(), since the fault
> > handler expects the base address to be up-to-date at all times. Say you're
> > copying 128 bytes and fault on the 4th store, it should return 80 bytes not
> > copied; the code below would return 128 bytes not copied, even though 48
> > bytes have actually been written to the destination.
> >
> > We've had a couple of tries at updating this code (because the whole
> > template is frankly a bit terrible, and a long way from the well-optimised
> > code it was derived from), but getting the fault-handling behaviour right
> > without making the handler itself ludicrously complex has proven tricky. And
> > then it got bumped down the priority list while the uaccess behaviour in
> > general was in flux - now that the dust has largely settled on that I should
> > probably try to find time to pick this up again...
>
> I think the v5 from Oli was pretty close, but it didn't get any review:
>
> https://lore.kernel.org/r/20200914151800.2270-1-oli.swede@arm.com
These are still unread in my inbox as I was planning to look at them
again. However, I think we discussed a few options on how to proceed but
no concrete plans:
1. Merge Oli's patches as they are, with some potential complexity
issues as fixing the user copy accuracy was non-trivial. I think the
latest version uses a two-stage approach: when taking a fault, it
falls back to to byte-by-byte with the expectation that it faults
again and we can then report the correct fault address.
2. Only use Cortex Strings for in-kernel memcpy() while the uaccess
routines are some simple loops that align the uaccess part only
(unlike Cortex Strings which usually to align the source).
3. Similar to 2 but with Cortex Strings imported automatically with some
script to make it easier to keep the routines up to date.
If having non-optimal (but good enough) uaccess routines is acceptable,
I'd go for (2) with a plan to move to (3) at the next Cortex Strings
update.
I also need to look again at option (1) to see how complex it is but
given the time one spends on importing a new Cortex Strings library, I
don't think (1) scales well on the long term. We could, however, go for
(1) now and look at (3) with the next update.
--
Catalin
next prev parent reply other threads:[~2021-03-23 15:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-23 7:34 [PATCH 0/3] arm64: lib: improve copy performance Yang Yingliang
2021-03-23 7:34 ` Yang Yingliang
2021-03-23 7:34 ` [PATCH 1/3] arm64: lib: introduce ldp2/stp2 macro Yang Yingliang
2021-03-23 7:34 ` Yang Yingliang
2021-03-23 7:34 ` [PATCH 2/3] arm64: lib: improve copy performance when size is ge 128 bytes Yang Yingliang
2021-03-23 7:34 ` Yang Yingliang
2021-03-23 12:08 ` Robin Murphy
2021-03-23 12:08 ` Robin Murphy
2021-03-23 13:32 ` Will Deacon
2021-03-23 13:32 ` Will Deacon
2021-03-23 14:28 ` Robin Murphy
2021-03-23 14:28 ` Robin Murphy
2021-03-23 15:03 ` Catalin Marinas [this message]
2021-03-23 15:03 ` Catalin Marinas
2021-03-24 16:38 ` David Laight
2021-03-24 16:38 ` David Laight
2021-03-24 19:36 ` Robin Murphy
2021-03-24 19:36 ` Robin Murphy
2021-03-23 7:34 ` [PATCH 3/3] arm64: lib: improve copy performance when size is less than 128 and ge 64 bytes Yang Yingliang
2021-03-23 7:34 ` Yang Yingliang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210323150358.GA10576@arm.com \
--to=catalin.marinas@arm.com \
--cc=guohanjun@huawei.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=will@kernel.org \
--cc=yangyingliang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.