* Discontiguous memory and cacheflush
[not found] <447177957.1189939.1344577602635.JavaMail.root@mozilla.com>
@ 2012-08-10 5:52 ` Martin Rosenberg
2012-08-13 14:42 ` Jonathan Austin
0 siblings, 1 reply; 9+ messages in thread
From: Martin Rosenberg @ 2012-08-10 5:52 UTC (permalink / raw)
To: linux-arm-kernel
A couple of days ago, I started a project to coalesce calls to cachefush in mozilla's JIT. After working out most of the bugs that I could find, there were still some lingering failures, which I believe finally tracked down to do_cache_op, where it looks like only the first contiguous region of virtual memory is consulted. I don't actually know about any of the functions that are being called, nor exactly what the datastructures represent, but if my understanding of the code is correct, then it is at odds with the documentation that is available (namely, man cacheflush(2)). Is there something I've overlooked, any suggested workarounds? Thanks --Marty
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-10 5:52 ` Discontiguous memory and cacheflush Martin Rosenberg
@ 2012-08-13 14:42 ` Jonathan Austin
2012-08-13 16:00 ` Russell King - ARM Linux
0 siblings, 1 reply; 9+ messages in thread
From: Jonathan Austin @ 2012-08-13 14:42 UTC (permalink / raw)
To: linux-arm-kernel
Hi Martin,
On 10/08/12 06:52, Martin Rosenberg wrote:
> A couple of days ago, I started a project to coalesce calls to
> cachefush in mozilla's JIT. After working out most of the bugs that
> I could find, there were still some lingering failures, which I
> believe finally tracked down to do_cache_op, where it looks like only
> the first contiguous region of virtual memory is consulted. I don't
> actually know about any of the functions that are being called, nor
> exactly what the datastructures represent, but if my understanding of
> the code is correct, then it is at odds with the documentation that
> is available (namely, man cacheflush(2)). Is there something I've
> overlooked, any suggested workarounds? Thanks --Marty
>
On my system the cacheflush(2) documentation suggests it is MIPS only:
"This Linux-specific system call is only available on MIPS based
systems. It should not be used in programs intended to be portable."
As you've established by looking at the code, on ARM we intentionally
attempt to flush only the part of the given range that occurs inside the
vma containing 'start'.
When flushing some huge range that spans multiple vmas, there might be
junk in the middle that it doesn't make sense to flush (eg devices), so
a fix that guarantees to flush an entire range isn't trivial and there
need to be some decisions about how to handle nonsensical requests. If
it is *really* necessary to flush large ranges like this it would be
good to understand why.
That said, now that it is possible to return errors from the cache
flushing syscalls[1] we should probably at least consider returning
something to report that the range had been truncated. There'd need to
be some thought given to how to represent errors where the range is
truncated *and* the flush_cache_user_range() function returns an error,
as well as a justification of why reporting truncation is
useful/necessary: For example the gcc builtin function:
void __builtin___clear_cache (char *begin, char *end) )
has no return value to propagate error data[2], so it is unlikely that
much of userspace could take advantage of the return codes. There's some
historical discussion of this issue at [3]
Hope that helps,
Jonny
[1]
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-April/096299.html
which was committed as c5102f593550 - ARM: 7408/1: cacheflush: return
error to userspace when flushing syscall fails)
[2] http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
[3]
http://lists.infradead.org/pipermail/linux-arm-kernel/2012-April/094869.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-13 14:42 ` Jonathan Austin
@ 2012-08-13 16:00 ` Russell King - ARM Linux
2012-08-15 22:08 ` Martin Rosenberg
0 siblings, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2012-08-13 16:00 UTC (permalink / raw)
To: linux-arm-kernel
On Mon, Aug 13, 2012 at 03:42:52PM +0100, Jonathan Austin wrote:
> On my system the cacheflush(2) documentation suggests it is MIPS only:
> "This Linux-specific system call is only available on MIPS based
> systems. It should not be used in programs intended to be portable."
It's been part of the ARM kernel API for a very long time.
> As you've established by looking at the code, on ARM we intentionally
> attempt to flush only the part of the given range that occurs inside the
> vma containing 'start'.
It's intention is to support self-modifying userspace code only and also
intended to be used over a _short_ range of addresses only - it works by
flushing each _individual_ cache line over the range of addreses
requested.
If it's going to be used for significantly larger areas, then we need to
think about imposing a limit, upon which we just flush the entire cache
and be done with it.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-13 16:00 ` Russell King - ARM Linux
@ 2012-08-15 22:08 ` Martin Rosenberg
2012-08-16 8:27 ` Dave Martin
2012-08-17 15:53 ` Jonathan Austin
0 siblings, 2 replies; 9+ messages in thread
From: Martin Rosenberg @ 2012-08-15 22:08 UTC (permalink / raw)
To: linux-arm-kernel
----- Original Message -----
From: "Russell King - ARM Linux" <linux@arm.linux.org.uk>
To: "Jonathan Austin" <jonathan.austin@arm.com>
Cc: "Martin Rosenberg" <mrosenberg@mozilla.com>, "Will Deacon" <will.deacon@arm.com>, linux-arm-kernel at lists.infradead.org
Sent: Monday, August 13, 2012 9:00:08 AM
Subject: Re: Discontiguous memory and cacheflush
> It's intention is to support self-modifying userspace code only and also
> intended to be used over a _short_ range of addresses only - it works by
> flushing each _individual_ cache line over the range of addreses
> requested.
Documenting the current behavior would mostly be acceptable, but it is rather confusing (and took quite some time to track down)
> If it's going to be used for significantly larger areas, then we need to
> think about imposing a limit, upon which we just flush the entire cache
> and be done with it.
I found that there was still a net win making fewer syscalls, even with the overhead of flushing extra cache lines.
In the cases where the range is discontiguous, I usually need to do something silly, like flush 20 individual instructions that are scattered throughout several hundred MB of memory. I think the fastest method for flushing in this case would be to shave a different call, with a different api, where userspace can provide an array of addresses that need to be flushed, but that sounds like material for a new thread. Thanks --Marty
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-15 22:08 ` Martin Rosenberg
@ 2012-08-16 8:27 ` Dave Martin
2012-08-16 9:19 ` Russell King - ARM Linux
2012-08-17 15:53 ` Jonathan Austin
1 sibling, 1 reply; 9+ messages in thread
From: Dave Martin @ 2012-08-16 8:27 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Aug 15, 2012 at 03:08:29PM -0700, Martin Rosenberg wrote:
>
>
> ----- Original Message -----
> From: "Russell King - ARM Linux" <linux@arm.linux.org.uk>
> To: "Jonathan Austin" <jonathan.austin@arm.com>
> Cc: "Martin Rosenberg" <mrosenberg@mozilla.com>, "Will Deacon" <will.deacon@arm.com>, linux-arm-kernel at lists.infradead.org
> Sent: Monday, August 13, 2012 9:00:08 AM
> Subject: Re: Discontiguous memory and cacheflush
>
>
> > It's intention is to support self-modifying userspace code only and also
> > intended to be used over a _short_ range of addresses only - it works by
> > flushing each _individual_ cache line over the range of addreses
> > requested.
> Documenting the current behavior would mostly be acceptable, but it is rather confusing (and took quite some time to track down)
>
> > If it's going to be used for significantly larger areas, then we need to
> > think about imposing a limit, upon which we just flush the entire cache
> > and be done with it.
>
> I found that there was still a net win making fewer syscalls, even with the overhead of flushing extra cache lines.
> In the cases where the range is discontiguous, I usually need to do something silly, like flush 20 individual instructions that are scattered throughout several hundred MB of memory. I think the fastest method for flushing in this case would be to shave a different call, with a different api, where userspace can provide an array of addresses that need to be flushed, but that sounds like material for a new thread. Thanks --Marty
Could we use the currently must-be-zero flags parameter to add new
functionality to this syscall, or are there legacy uses of that
parameter (or history of userspace not bothering to zero it?)
For example, we could have something like
do_cache_op(struct iovec *iov, int iovcnt, CF_IOVEC)
to flush a discontiguous set of ranges described by an iovec (though we
can of course also describe the ranges in other ways)
Cheers
---Dave
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-16 8:27 ` Dave Martin
@ 2012-08-16 9:19 ` Russell King - ARM Linux
2012-08-16 9:43 ` Dave Martin
0 siblings, 1 reply; 9+ messages in thread
From: Russell King - ARM Linux @ 2012-08-16 9:19 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Aug 16, 2012 at 09:27:26AM +0100, Dave Martin wrote:
> Could we use the currently must-be-zero flags parameter to add new
> functionality to this syscall, or are there legacy uses of that
> parameter (or history of userspace not bothering to zero it?)
>
> For example, we could have something like
>
> do_cache_op(struct iovec *iov, int iovcnt, CF_IOVEC)
>
> to flush a discontiguous set of ranges described by an iovec (though we
> can of course also describe the ranges in other ways)
Why overload an existing syscall with multiple different argument types
rather than having a new syscall with a sane API?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-16 9:19 ` Russell King - ARM Linux
@ 2012-08-16 9:43 ` Dave Martin
0 siblings, 0 replies; 9+ messages in thread
From: Dave Martin @ 2012-08-16 9:43 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Aug 16, 2012 at 10:19:53AM +0100, Russell King - ARM Linux wrote:
> On Thu, Aug 16, 2012 at 09:27:26AM +0100, Dave Martin wrote:
> > Could we use the currently must-be-zero flags parameter to add new
> > functionality to this syscall, or are there legacy uses of that
> > parameter (or history of userspace not bothering to zero it?)
> >
> > For example, we could have something like
> >
> > do_cache_op(struct iovec *iov, int iovcnt, CF_IOVEC)
> >
> > to flush a discontiguous set of ranges described by an iovec (though we
> > can of course also describe the ranges in other ways)
>
> Why overload an existing syscall with multiple different argument types
> rather than having a new syscall with a sane API?
Sure, if it's OK to add a new system call, that would be a cleaner
approach.
Cheers
---Dave
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-15 22:08 ` Martin Rosenberg
2012-08-16 8:27 ` Dave Martin
@ 2012-08-17 15:53 ` Jonathan Austin
2012-08-17 21:03 ` Russell King - ARM Linux
1 sibling, 1 reply; 9+ messages in thread
From: Jonathan Austin @ 2012-08-17 15:53 UTC (permalink / raw)
To: linux-arm-kernel
On 15/08/12 23:08, Martin Rosenberg wrote:
>> It's intention is to support self-modifying userspace code only and
>> also intended to be used over a _short_ range of addresses only -
>> it works by flushing each _individual_ cache line over the range of
>> addreses requested.
> Documenting the current behavior would mostly be acceptable, but it
> is rather confusing (and took quite some time to track down)
>
>> If it's going to be used for significantly larger areas, then we
>> need to think about imposing a limit, upon which we just flush the
>> entire cache and be done with it.
>
> I found that there was still a net win making fewer syscalls, even
> with the overhead of flushing extra cache lines. In the cases where
> the range is discontiguous,
Was this net win on platforms other than ARM? Was it still evident after
working around the truncation/multiple vma issue that occurs?
I'm curious as to whether some (even most?) of your performance
improvement could be coming from the range truncation. Perhaps the
performance is better because you're actually doing less flushing, and
not governed by the syscall overhead?
Consider an extreme version of the case you describe below: lets say
instruction 1/20 was in the first vma and 19/20 are in subsequent vmas.
In the un-coalesced case you are going to flush several pages covering
the addresses of all 20 instructions, but in your coalesced case I think
only one page that contains the first instruction[1] will get
flushed...(which, as you know, means you've got a bug!)
> I usually need to do something silly,
> like flush 20 individual instructions that are scattered throughout
> several hundred MB of memory. I think the fastest method for
> flushing in this case would be to shave a different call, with a
> different api, where userspace can provide an array of addresses that
> need to be flushed, but that sounds like material for a new thread.
> Thanks --Marty
A better understanding of the details of your case would be a good
starting point for a discussion. For example it might be nice to know
how significant the syscall overhead is compared to the time to flush
ranges you're interested in...
Also, as Russell says, there'll be a point at which flushing the whole
cache is better than flushing a load of odd chunks here and there...
Jonny
[1] Had you noticed that our implementation flushes a minimum of 1-page?
(cacheflush.h:257)
#define flush_cache_user_range(start,end) \
__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
Does that have a bearing on what you're doing?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Discontiguous memory and cacheflush
2012-08-17 15:53 ` Jonathan Austin
@ 2012-08-17 21:03 ` Russell King - ARM Linux
0 siblings, 0 replies; 9+ messages in thread
From: Russell King - ARM Linux @ 2012-08-17 21:03 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Aug 17, 2012 at 04:53:44PM +0100, Jonathan Austin wrote:
> (cacheflush.h:257)
> #define flush_cache_user_range(start,end) \
> __cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
>
> Does that have a bearing on what you're doing?
I'm going to do some digging over the weekend, and work out why that
was added. I don't think it used to be the case that it flushed an
entire page.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-08-17 21:03 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <447177957.1189939.1344577602635.JavaMail.root@mozilla.com>
2012-08-10 5:52 ` Discontiguous memory and cacheflush Martin Rosenberg
2012-08-13 14:42 ` Jonathan Austin
2012-08-13 16:00 ` Russell King - ARM Linux
2012-08-15 22:08 ` Martin Rosenberg
2012-08-16 8:27 ` Dave Martin
2012-08-16 9:19 ` Russell King - ARM Linux
2012-08-16 9:43 ` Dave Martin
2012-08-17 15:53 ` Jonathan Austin
2012-08-17 21:03 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).