* [PATCH] ARM: cacheflush: disallow pending signals during cacheflush
@ 2014-11-13 7:29 Chanho Min
2014-11-13 11:26 ` Will Deacon
0 siblings, 1 reply; 4+ messages in thread
From: Chanho Min @ 2014-11-13 7:29 UTC (permalink / raw)
To: linux-arm-kernel
Since commit 28256d612726 ("ARM: cacheflush: split user cache-flushing
into interruptible chunks"), cacheflush can be interrupted by signal.
But, cacheflush doesn't resume from where we left off if process has
user-defined signal handlers. It returns -EINTR then cacheflush
should be re-invoked from the start of address until cache-flushing
of whole address ranges is completed (restart_syscall isn't available
in userspace). It may cause regression. So I suggest to disallow
pending signals during cacheflush.
This partially reverts commit 28256d612726a28a8b9d3c49f2b74198c4423d6a.
Signed-off-by: Chanho Min <chanho.min@lge.com>
---
arch/arm/kernel/traps.c | 19 -------------------
1 file changed, 19 deletions(-)
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index abd2fc0..275e086 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -521,25 +521,6 @@ __do_cache_op(unsigned long start, unsigned long end)
do {
unsigned long chunk = min(PAGE_SIZE, end - start);
- if (signal_pending(current)) {
- struct thread_info *ti = current_thread_info();
-
- ti->restart_block = (struct restart_block) {
- .fn = do_cache_op_restart,
- };
-
- ti->arm_restart_block = (struct arm_restart_block) {
- {
- .cache = {
- .start = start,
- .end = end,
- },
- },
- };
-
- return -ERESTART_RESTARTBLOCK;
- }
-
ret = flush_cache_user_range(start, start + chunk);
if (ret)
return ret;
--
1.7.9.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH] ARM: cacheflush: disallow pending signals during cacheflush
2014-11-13 7:29 [PATCH] ARM: cacheflush: disallow pending signals during cacheflush Chanho Min
@ 2014-11-13 11:26 ` Will Deacon
2014-11-13 17:39 ` Peter Maydell
2014-11-14 8:40 ` Chanho Min
0 siblings, 2 replies; 4+ messages in thread
From: Will Deacon @ 2014-11-13 11:26 UTC (permalink / raw)
To: linux-arm-kernel
Hello,
[adding linux-api, linux-man]
On Thu, Nov 13, 2014 at 07:29:53AM +0000, Chanho Min wrote:
> Since commit 28256d612726 ("ARM: cacheflush: split user cache-flushing
> into interruptible chunks"), cacheflush can be interrupted by signal.
>
> But, cacheflush doesn't resume from where we left off if process has
> user-defined signal handlers. It returns -EINTR then cacheflush
> should be re-invoked from the start of address until cache-flushing
> of whole address ranges is completed (restart_syscall isn't available
> in userspace). It may cause regression. So I suggest to disallow
> pending signals during cacheflush.
>
> This partially reverts commit 28256d612726a28a8b9d3c49f2b74198c4423d6a.
Whilst I don't think this is the correct solution, I agree that there's
a potential issue here. We could change the restart return value to
-ERESTARTNOINTR instead, but I can imagine something like a periodic
SIGALRM which could prevent a large cacheflush from ever completing.
Do we actually care about making forward progress in such a scenario?
It is interesting to note that this change has been in mainline since
May last year without any reported issues. That could be down to a number
of reasons:
(1) People are using old kernels on ARM
(2) Code doesn't check the return value from the cacheflush system call,
because it historically always returned 0
(3) People are getting lucky with timing, as this is likely difficult
to hit
Related to (2) is that a `man cacheflush' invocation returns something
about the MIPs system call, that doesn't match what we do for ARM. The
(relatively recent) history of the system call on ARM is:
< v3.5 [*]
- Always returns 0
- Restricts virtual address range to a single VMA
- Page-aligns the region limits (over flushing for smaller ranges)
- Terminates on the first fault
- Flags are ignored but must "ALWAYS be passed as ZERO"
v3.5 - v3.12
- Returns -EINVAL if flags is set or if end < start
- Returns -EINVAL if we couldn't find a vma
- Terminates on the first fault and returns -EFAULT
v3.12 - HEAD
- No longer page-aligns region
- Removes VMA checking as this had a deadlock bug with mmap_sem
and we could handle faults by this point anyway
- Returns -EINVAL if !access_ok for the range
- Splits the range into PAGE_SIZE chunks, checking for reschedule
and pending signals to avoid DoSing the system (the hardware can
only clean by cacheline). This is where the -ERESTART_RESTARTBLOCK
behaviour came in, potentially returning -EINTR to userspace.
This leaves me with the following questions:
- Has this change been shown to break anything in practice?
- Can we change the internal return value to -ERESTARTNOINTR?
- What do we do about kernels that *do* return -EINTR? (>=3.12?)
- Can we get a manpage put together to describe this mess?
Cheers,
Will
[*] rmk may have some more ancient history kicking around, if you like!
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index abd2fc0..275e086 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -521,25 +521,6 @@ __do_cache_op(unsigned long start, unsigned long end)
> do {
> unsigned long chunk = min(PAGE_SIZE, end - start);
>
> - if (signal_pending(current)) {
> - struct thread_info *ti = current_thread_info();
> -
> - ti->restart_block = (struct restart_block) {
> - .fn = do_cache_op_restart,
> - };
> -
> - ti->arm_restart_block = (struct arm_restart_block) {
> - {
> - .cache = {
> - .start = start,
> - .end = end,
> - },
> - },
> - };
> -
> - return -ERESTART_RESTARTBLOCK;
> - }
> -
> ret = flush_cache_user_range(start, start + chunk);
> if (ret)
> return ret;
> --
> 1.7.9.5
>
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] ARM: cacheflush: disallow pending signals during cacheflush
2014-11-13 11:26 ` Will Deacon
@ 2014-11-13 17:39 ` Peter Maydell
2014-11-14 8:40 ` Chanho Min
1 sibling, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2014-11-13 17:39 UTC (permalink / raw)
To: linux-arm-kernel
On 13 November 2014 11:26, Will Deacon <will.deacon@arm.com> wrote:
> Whilst I don't think this is the correct solution, I agree that there's
> a potential issue here. We could change the restart return value to
> -ERESTARTNOINTR instead, but I can imagine something like a periodic
> SIGALRM which could prevent a large cacheflush from ever completing.
> Do we actually care about making forward progress in such a scenario?
>
> It is interesting to note that this change has been in mainline since
> May last year without any reported issues. That could be down to a number
> of reasons:
>
> (1) People are using old kernels on ARM
>
> (2) Code doesn't check the return value from the cacheflush system call,
> because it historically always returned 0
...and the documentation comment in the source code didn't say
anything about the syscall having a return value; it only
described the input parameters. I would actually be surprised
if any userspace caller of this syscall checked its return value
(the libgcc cacheflush function used by gcc's clear_cache builtin
doesn't, to pick one popularly used example).
> (3) People are getting lucky with timing, as this is likely difficult
> to hit
(4) The resulting misbehaviour ("my JIT crashes occasionally and
non-reproducibly at some point possibly some while after the
cacheflush call") will be extremely hard to track back
to this kernel change
> This leaves me with the following questions:
>
> - Has this change been shown to break anything in practice?
> - Can we change the internal return value to -ERESTARTNOINTR?
> - What do we do about kernels that *do* return -EINTR? (>=3.12?)
My suggestion would be "treat this as a bugfix, put it into
stable kernels in the usual way (and assume distros will pick
it up if appropriate)".
> - Can we get a manpage put together to describe this mess?
That would be nice :-)
-- PMM
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH] ARM: cacheflush: disallow pending signals during cacheflush
2014-11-13 11:26 ` Will Deacon
2014-11-13 17:39 ` Peter Maydell
@ 2014-11-14 8:40 ` Chanho Min
1 sibling, 0 replies; 4+ messages in thread
From: Chanho Min @ 2014-11-14 8:40 UTC (permalink / raw)
To: linux-arm-kernel
> -----Original Message-----
> From: Will Deacon [mailto:will.deacon at arm.com]
> Whilst I don't think this is the correct solution, I agree that there's
> a potential issue here. We could change the restart return value to
> -ERESTARTNOINTR instead, but I can imagine something like a periodic
> SIGALRM which could prevent a large cacheflush from ever completing.
> Do we actually care about making forward progress in such a scenario?
It's not complete solution. But, I don't think this is incorrect solution
as well. Potential issue could be more serious than improvement of signal
responsiveness.
>
> It is interesting to note that this change has been in mainline since
> May last year without any reported issues. That could be down to a number
> of reasons:
>
> (1) People are using old kernels on ARM
>
> (2) Code doesn't check the return value from the cacheflush system call,
> because it historically always returned 0
>
> (3) People are getting lucky with timing, as this is likely difficult
> to hit
>
> Related to (2) is that a `man cacheflush' invocation returns something
> about the MIPs system call, that doesn't match what we do for ARM. The
> (relatively recent) history of the system call on ARM is:
>
> < v3.5 [*]
>
> - Always returns 0
> - Restricts virtual address range to a single VMA
> - Page-aligns the region limits (over flushing for smaller ranges)
> - Terminates on the first fault
> - Flags are ignored but must "ALWAYS be passed as ZERO"
>
> v3.5 - v3.12
> - Returns -EINVAL if flags is set or if end < start
> - Returns -EINVAL if we couldn't find a vma
> - Terminates on the first fault and returns -EFAULT
>
> v3.12 - HEAD
>
> - No longer page-aligns region
> - Removes VMA checking as this had a deadlock bug with mmap_sem
> and we could handle faults by this point anyway
> - Returns -EINVAL if !access_ok for the range
> - Splits the range into PAGE_SIZE chunks, checking for reschedule
> and pending signals to avoid DoSing the system (the hardware can
> only clean by cacheline). This is where the -ERESTART_RESTARTBLOCK
> behaviour came in, potentially returning -EINTR to userspace.
>
> This leaves me with the following questions:
>
> - Has this change been shown to break anything in practice?
In practice, node.js (Currently, It doesn't check -EINTR of cacheflush)
crashes occasionally and non-reproducibly at some point some while after
the cacheflush call. At that time, strace tells cacheflush returns -EINTR.
> - Can we change the internal return value to -ERESTARTNOINTR?
In worst case, I can imagine that periodic signal interrupts cacheflush
and it repeats restart of syscall from start of address with unlucky timing.
> - What do we do about kernels that *do* return -EINTR? (>=3.12?)
> - Can we get a manpage put together to describe this mess?
>
> Cheers,
>
> Will
>
> [*] rmk may have some more ancient history kicking around, if you like!
>
> > diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> > index abd2fc0..275e086 100644
> > --- a/arch/arm/kernel/traps.c
> > +++ b/arch/arm/kernel/traps.c
> > @@ -521,25 +521,6 @@ __do_cache_op(unsigned long start, unsigned long end)
> > do {
> > unsigned long chunk = min(PAGE_SIZE, end - start);
> >
> > - if (signal_pending(current)) {
> > - struct thread_info *ti = current_thread_info();
> > -
> > - ti->restart_block = (struct restart_block) {
> > - .fn = do_cache_op_restart,
> > - };
> > -
> > - ti->arm_restart_block = (struct arm_restart_block) {
> > - {
> > - .cache = {
> > - .start = start,
> > - .end = end,
> > - },
> > - },
> > - };
> > -
> > - return -ERESTART_RESTARTBLOCK;
> > - }
> > -
> > ret = flush_cache_user_range(start, start + chunk);
> > if (ret)
> > return ret;
> > --
> > 1.7.9.5
> >
> >
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-11-14 8:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-13 7:29 [PATCH] ARM: cacheflush: disallow pending signals during cacheflush Chanho Min
2014-11-13 11:26 ` Will Deacon
2014-11-13 17:39 ` Peter Maydell
2014-11-14 8:40 ` Chanho Min
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).