* longjmp question
@ 2011-10-07 23:22 Jurij Smakov
2011-10-07 23:42 ` David Miller
` (23 more replies)
0 siblings, 24 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-07 23:22 UTC (permalink / raw)
To: sparclinux
Hello,
Over the last few days I had some quality time trying to track down
the segfault in one of Ruby tests, involving continuations (which are
implemented on top of setjmp/longjmp):
http://redmine.ruby-lang.org/issues/5244
After lots of experiments I came to a conclusion that something is
wrong with libc (eglibc in this case) implementation of longjmp. In
particular, this code from sysdeps/sparc/sparc32/__longjmp.S is used
to perform the longjmp in the Ruby test:
[...]
#define RW_FP [%fp + 0x48]
ENTRY(__longjmp)
/* Store our arguments in global registers so we can still
use them while unwinding frames and their register windows. */
ld ENV(o0,JB_FP), %g3 /* Cache target FP in register %g3. */
#ifdef PTR_DEMANGLE
PTR_DEMANGLE (%g3, %g3, %g4)
#endif
[...]
LOC(thread):
/*
* Do a "flush register windows trap". The trap handler in the
* kernel writes all the register windows to their stack slots, and
* marks them all as invalid (needing to be sucked up from the
* stack when used). This ensures that all information needed to
* unwind to these callers is in memory, not in the register
* windows.
*/
ta ST_FLUSH_WINDOWS
#ifdef PTR_DEMANGLE
ld ENV(g1,JB_PC), %g5 /* Set return PC. */
ld ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
PTR_DEMANGLE2 (%o7, %g5, %g4)
PTR_DEMANGLE2 (%fp, %g1, %g4)
#else
ld ENV(g1,JB_PC), %o7 /* Set return PC. */
ld ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
#endif
sub %fp, 64, %sp /* Allocate a register frame. */
st %g3, RW_FP /* Set saved FP on restore below. */
retl
restore %g2, 0, %o0 /* Restore values from above register frame. */
At the end of the procedure %g3 contains the target (post-jump) frame
pointer address, which we would like to end up in %fp as a result of
restore instruction in the retl delay slot. To that end we write it to
a location determined by RW_FP, which is [%fp + 0x48]. However,
according to http://www.sics.se/~psm/sparcstack.html, which nicely
describes the layout of registers in memory and the effect of
save/restore operations, the post-restore frame pointer value should
be located at [%fp + 0x38], as we are only saving 16 word-sized
registers, and %fp is 15-th out of them (see Figure 4), so offset
should be 14 * 4 = 56 = 0x38. I'm not sure whether if that's just an
off-by-0x10, or I don't understand how this things work - anyone has
any ideas?
There are a lot of things here I don't understand. For example, Ruby
test only fails if the code is compiled with -O2, but works with -O0,
and with -O0 we get the correct value of target frame pointer at
[%fp + 0x38], so it does not matter whether we write another one at
the "wrong" location. With -O2 though the memory layout is different,
and we end up with broken value at [%fp + 0x38], and this is what gets
restored into %fp, eventually causing a segfault. Also, it's probably
not the whole story, because after changing the offset to 0x38 I still
get a test failure, but it now segfaults in a different place. I could
not, however, explain the 0x48 vs 0x38 discrepancy, so taking things
one step at a time here :-).
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
@ 2011-10-07 23:42 ` David Miller
2011-10-08 6:43 ` David Miller
` (22 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-07 23:42 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 8 Oct 2011 00:22:10 +0100
> At the end of the procedure %g3 contains the target (post-jump) frame
> pointer address, which we would like to end up in %fp as a result of
> restore instruction in the retl delay slot. To that end we write it to
> a location determined by RW_FP, which is [%fp + 0x48]. However,
> according to http://www.sics.se/~psm/sparcstack.html, which nicely
> describes the layout of registers in memory and the effect of
> save/restore operations, the post-restore frame pointer value should
> be located at [%fp + 0x38], as we are only saving 16 word-sized
> registers, and %fp is 15-th out of them (see Figure 4), so offset
> should be 14 * 4 = 56 = 0x38. I'm not sure whether if that's just an
> off-by-0x10, or I don't understand how this things work - anyone has
> any ideas?
Strange... your analysis seems correct, but we might be missing something.
You'll probably find the following glibc commit amusing, and the commit
message is scant on details too :-/
commit ab52d206a3bd4dbd671b46a1797dee063926604e
Author: Ulrich Drepper <drepper@redhat.com>
Date: Mon Sep 11 07:01:16 2000 +0000
Update.
2000-09-10 David S. Miller <davem@redhat.com>
* sysdeps/sparc/sparc32/__longjmp.S (__longjmp): Correct %fp
frame pointer offset for non-fast path.
2000-09-10 Ulrich Drepper <drepper@redhat.com>
* locale/programs/3level.h (*_init): Initialize level1, level2,
and level3 as well.
(*_add): Remove a few unnecessary conditionals.
diff --git a/ChangeLog b/ChangeLog
index 007e433..8d9ddf7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2000-09-10 David S. Miller <davem@redhat.com>
+
+ * sysdeps/sparc/sparc32/__longjmp.S (__longjmp): Correct %fp
+ frame pointer offset for non-fast path.
+
+2000-09-10 Ulrich Drepper <drepper@redhat.com>
+
+ * locale/programs/3level.h (*_init): Initialize level1, level2,
+ and level3 as well.
+ (*_add): Remove a few unnecessary conditionals.
+
2000-09-05 Wolfram Gloger <wg@malloc.de>
* malloc/thread-m.h [_LIBC]: Even if not linking with libpthread,
diff --git a/locale/programs/3level.h b/locale/programs/3level.h
index d829332..5bb8929 100644
--- a/locale/programs/3level.h
+++ b/locale/programs/3level.h
@@ -66,8 +66,11 @@ struct TABLE
static inline void
CONCAT(TABLE,_init) (struct TABLE *t)
{
+ t->level1 = NULL;
t->level1_alloc = t->level1_size = 0;
+ t->level2 = NULL;
t->level2_alloc = t->level2_size = 0;
+ t->level3 = NULL;
t->level3_alloc = t->level3_size = 0;
}
@@ -116,10 +119,8 @@ CONCAT(TABLE,_add) (struct TABLE *t, uint32_t wc, ELEMENT value)
size_t alloc = 2 * t->level1_alloc;
if (alloc <= index1)
alloc = index1 + 1;
- t->level1 = (t->level1_alloc > 0
- ? (uint32_t *) xrealloc ((char *) t->level1,
- alloc * sizeof (uint32_t))
- : (uint32_t *) xmalloc (alloc * sizeof (uint32_t)));
+ t->level1 = (uint32_t *) xrealloc ((char *) t->level1,
+ alloc * sizeof (uint32_t));
t->level1_alloc = alloc;
}
while (index1 >= t->level1_size)
@@ -131,10 +132,8 @@ CONCAT(TABLE,_add) (struct TABLE *t, uint32_t wc, ELEMENT value)
if (t->level2_size = t->level2_alloc)
{
size_t alloc = 2 * t->level2_alloc + 1;
- t->level2 = (t->level2_alloc > 0
- ? (uint32_t *) xrealloc ((char *) t->level2,
- (alloc << t->q) * sizeof (uint32_t))
- : (uint32_t *) xmalloc ((alloc << t->q) * sizeof (uint32_t)));
+ t->level2 = (uint32_t *) xrealloc ((char *) t->level2,
+ (alloc << t->q) * sizeof (uint32_t));
t->level2_alloc = alloc;
}
i1 = t->level2_size << t->q;
@@ -151,10 +150,8 @@ CONCAT(TABLE,_add) (struct TABLE *t, uint32_t wc, ELEMENT value)
if (t->level3_size = t->level3_alloc)
{
size_t alloc = 2 * t->level3_alloc + 1;
- t->level3 = (t->level3_alloc > 0
- ? (ELEMENT *) xrealloc ((char *) t->level3,
- (alloc << t->p) * sizeof (ELEMENT))
- : (ELEMENT *) xmalloc ((alloc << t->p) * sizeof (ELEMENT)));
+ t->level3 = (ELEMENT *) xrealloc ((char *) t->level3,
+ (alloc << t->p) * sizeof (ELEMENT));
t->level3_alloc = alloc;
}
i1 = t->level3_size << t->p;
diff --git a/sysdeps/sparc/sparc32/__longjmp.S b/sysdeps/sparc/sparc32/__longjmp.S
index 1d49b26..ef83dee 100644
--- a/sysdeps/sparc/sparc32/__longjmp.S
+++ b/sysdeps/sparc/sparc32/__longjmp.S
@@ -1,4 +1,4 @@
-/* Copyright (C) 1991, 93, 96, 97, 98, 99 Free Software Foundation, Inc.
+/* Copyright (C) 1991, 93, 96, 97, 98, 99, 2000 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
@@ -23,7 +23,7 @@
#include <bits/setjmp.h>
#define ENV(base,reg) [%base + (reg * 4)]
#define ST_FLUSH_WINDOWS 3
-#define RW_FP [%fp + 0x38]
+#define RW_FP [%fp + 0x48]
ENTRY(__longjmp)
/* Store our arguments in global registers so we can still
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
2011-10-07 23:42 ` David Miller
@ 2011-10-08 6:43 ` David Miller
2011-10-08 12:55 ` Jurij Smakov
` (21 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-08 6:43 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 8 Oct 2011 00:22:10 +0100
> #define RW_FP [%fp + 0x48]
Let's just do away with this value entirely and do this the
safest way possible, since this is the slow path:
diff --git a/sysdeps/sparc/sparc32/__longjmp.S b/sysdeps/sparc/sparc32/__longjmp.S
index a5453b4..70e8e29 100644
--- a/sysdeps/sparc/sparc32/__longjmp.S
+++ b/sysdeps/sparc/sparc32/__longjmp.S
@@ -22,7 +22,6 @@
#include <jmpbuf-offsets.h>
#define ENV(base,reg) [%base + (reg * 4)]
#define ST_FLUSH_WINDOWS 3
-#define RW_FP [%fp + 0x48]
ENTRY(__longjmp)
/* Store our arguments in global registers so we can still
@@ -55,6 +54,7 @@ LOC(loop):
ld ENV(g1,JB_SP), %o0 /* Delay slot: extract target SP. */
LOC(thread):
+ save %sp, -96, %sp
/*
* Do a "flush register windows trap". The trap handler in the
* kernel writes all the register windows to their stack slots, and
@@ -67,15 +67,13 @@ LOC(thread):
#ifdef PTR_DEMANGLE
ld ENV(g1,JB_PC), %g5 /* Set return PC. */
ld ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
- PTR_DEMANGLE2 (%o7, %g5, %g4)
+ PTR_DEMANGLE2 (%i7, %g5, %g4)
PTR_DEMANGLE2 (%fp, %g1, %g4)
#else
- ld ENV(g1,JB_PC), %o7 /* Set return PC. */
+ ld ENV(g1,JB_PC), %i7 /* Set return PC. */
ld ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
#endif
- sub %fp, 64, %sp /* Allocate a register frame. */
- st %g3, RW_FP /* Set saved FP on restore below. */
- retl
+ jmp %i7 + 8
restore %g2, 0, %o0 /* Restore values from above register frame. */
LOC(found):
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
2011-10-07 23:42 ` David Miller
2011-10-08 6:43 ` David Miller
@ 2011-10-08 12:55 ` Jurij Smakov
2011-10-08 19:22 ` David Miller
` (20 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-08 12:55 UTC (permalink / raw)
To: sparclinux
On Sat, Oct 08, 2011 at 02:43:55AM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 8 Oct 2011 00:22:10 +0100
>
> > #define RW_FP [%fp + 0x48]
>
> Let's just do away with this value entirely and do this the
> safest way possible, since this is the slow path:
>
> diff --git a/sysdeps/sparc/sparc32/__longjmp.S b/sysdeps/sparc/sparc32/__longjmp.S
> index a5453b4..70e8e29 100644
> --- a/sysdeps/sparc/sparc32/__longjmp.S
> +++ b/sysdeps/sparc/sparc32/__longjmp.S
> @@ -22,7 +22,6 @@
> #include <jmpbuf-offsets.h>
> #define ENV(base,reg) [%base + (reg * 4)]
> #define ST_FLUSH_WINDOWS 3
> -#define RW_FP [%fp + 0x48]
>
> ENTRY(__longjmp)
> /* Store our arguments in global registers so we can still
> @@ -55,6 +54,7 @@ LOC(loop):
> ld ENV(g1,JB_SP), %o0 /* Delay slot: extract target SP. */
>
> LOC(thread):
> + save %sp, -96, %sp
> /*
> * Do a "flush register windows trap". The trap handler in the
> * kernel writes all the register windows to their stack slots, and
> @@ -67,15 +67,13 @@ LOC(thread):
> #ifdef PTR_DEMANGLE
> ld ENV(g1,JB_PC), %g5 /* Set return PC. */
> ld ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
> - PTR_DEMANGLE2 (%o7, %g5, %g4)
> + PTR_DEMANGLE2 (%i7, %g5, %g4)
> PTR_DEMANGLE2 (%fp, %g1, %g4)
> #else
> - ld ENV(g1,JB_PC), %o7 /* Set return PC. */
> + ld ENV(g1,JB_PC), %i7 /* Set return PC. */
> ld ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
> #endif
> - sub %fp, 64, %sp /* Allocate a register frame. */
> - st %g3, RW_FP /* Set saved FP on restore below. */
> - retl
> + jmp %i7 + 8
> restore %g2, 0, %o0 /* Restore values from above register frame. */
I don't really see anything in the code which would ensure that after
we execute this restore, fp is going to be set to target fp value
(available as ENV(g1,JB_FP) or g3. It might be the responsibility of
the unwinding part, but then it would not explain why such code was
explicitly present in the old version.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (2 preceding siblings ...)
2011-10-08 12:55 ` Jurij Smakov
@ 2011-10-08 19:22 ` David Miller
2011-10-12 21:22 ` Jurij Smakov
` (19 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-08 19:22 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 8 Oct 2011 13:55:35 +0100
> On Sat, Oct 08, 2011 at 02:43:55AM -0400, David Miller wrote:
>> @@ -67,15 +67,13 @@ LOC(thread):
>> #ifdef PTR_DEMANGLE
>> ld ENV(g1,JB_PC), %g5 /* Set return PC. */
>> ld ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
>> - PTR_DEMANGLE2 (%o7, %g5, %g4)
>> + PTR_DEMANGLE2 (%i7, %g5, %g4)
>> PTR_DEMANGLE2 (%fp, %g1, %g4)
>> #else
>> - ld ENV(g1,JB_PC), %o7 /* Set return PC. */
>> + ld ENV(g1,JB_PC), %i7 /* Set return PC. */
>> ld ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
>> #endif
>> - sub %fp, 64, %sp /* Allocate a register frame. */
>> - st %g3, RW_FP /* Set saved FP on restore below. */
>> - retl
>> + jmp %i7 + 8
>> restore %g2, 0, %o0 /* Restore values from above register frame. */
>
> I don't really see anything in the code which would ensure that after
> we execute this restore, fp is going to be set to target fp value
> (available as ENV(g1,JB_FP) or g3. It might be the responsibility of
> the unwinding part, but then it would not explain why such code was
> explicitly present in the old version.
Because the restore traps, causing a window fill, and that trap
handler will load up the register window at JB_SP, inside of which
JB_FP is contained.
The only reason we store JB_FP is for the implementation the optimized
longjmp loop.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (3 preceding siblings ...)
2011-10-08 19:22 ` David Miller
@ 2011-10-12 21:22 ` Jurij Smakov
2011-10-12 22:09 ` David Miller
` (18 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-12 21:22 UTC (permalink / raw)
To: sparclinux
On Sat, Oct 08, 2011 at 03:22:51PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 8 Oct 2011 13:55:35 +0100
>
> > On Sat, Oct 08, 2011 at 02:43:55AM -0400, David Miller wrote:
> >> @@ -67,15 +67,13 @@ LOC(thread):
> >> #ifdef PTR_DEMANGLE
> >> ld ENV(g1,JB_PC), %g5 /* Set return PC. */
> >> ld ENV(g1,JB_SP), %g1 /* Set saved SP on restore below. */
> >> - PTR_DEMANGLE2 (%o7, %g5, %g4)
> >> + PTR_DEMANGLE2 (%i7, %g5, %g4)
> >> PTR_DEMANGLE2 (%fp, %g1, %g4)
> >> #else
> >> - ld ENV(g1,JB_PC), %o7 /* Set return PC. */
> >> + ld ENV(g1,JB_PC), %i7 /* Set return PC. */
> >> ld ENV(g1,JB_SP), %fp /* Set saved SP on restore below. */
> >> #endif
> >> - sub %fp, 64, %sp /* Allocate a register frame. */
> >> - st %g3, RW_FP /* Set saved FP on restore below. */
> >> - retl
> >> + jmp %i7 + 8
> >> restore %g2, 0, %o0 /* Restore values from above register frame. */
> >
> > I don't really see anything in the code which would ensure that after
> > we execute this restore, fp is going to be set to target fp value
> > (available as ENV(g1,JB_FP) or g3. It might be the responsibility of
> > the unwinding part, but then it would not explain why such code was
> > explicitly present in the old version.
>
> Because the restore traps, causing a window fill, and that trap
> handler will load up the register window at JB_SP, inside of which
> JB_FP is contained.
>
> The only reason we store JB_FP is for the implementation the optimized
> longjmp loop.
Thanks for the explanation.
I've built eglibc with this patch, and nothing exploded in an obvious
way. Ruby test is still failing though, so I'm pretty sure now that
this was not a problem even before the patch.
On an unrelated note, I was not able to decipher what's the purpose of
the following code in __longjmp:
0:
xor %fp, %g3, %o0
add %fp, 512, %o1
andncc %o0, 4095, %o0
bne LOC(thread)
cmp %o1, %g3
bl LOC(thread)
It looks like this code is *very* old (was added back in 1998, it
seems [0]), so I was not able to find any description for it. If you
don't mind explaining what's going on here, I would appreciate it.
[0] http://lwn.net/1998/1022/a/sparc-glibc-patch.html
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (4 preceding siblings ...)
2011-10-12 21:22 ` Jurij Smakov
@ 2011-10-12 22:09 ` David Miller
2011-10-12 22:17 ` David Miller
` (17 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-12 22:09 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Wed, 12 Oct 2011 22:22:51 +0100
> 0:
> xor %fp, %g3, %o0
> add %fp, 512, %o1
> andncc %o0, 4095, %o0
> bne LOC(thread)
> cmp %o1, %g3
> bl LOC(thread)
>
> It looks like this code is *very* old (was added back in 1998, it
> seems [0]), so I was not able to find any description for it. If you
> don't mind explaining what's going on here, I would appreciate it.
>
> [0] http://lwn.net/1998/1022/a/sparc-glibc-patch.html
>
> Best regards,
This code is seeing if the current stack/frame pointers are on a
different "page" or very far from the stack/frame we are unwinding
into.
The thinking is that if the distance is very large, then the loop is
unlikely to be more efficient, or we are unwinding out-of or into
a different kind of stack (thread stack, process stack, signal stack,
etc.).
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (5 preceding siblings ...)
2011-10-12 22:09 ` David Miller
@ 2011-10-12 22:17 ` David Miller
2011-10-12 23:06 ` David Miller
` (16 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-12 22:17 UTC (permalink / raw)
To: sparclinux
FYI, I'm taking a quick look into this specific Ruby situation.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (6 preceding siblings ...)
2011-10-12 22:17 ` David Miller
@ 2011-10-12 23:06 ` David Miller
2011-10-12 23:21 ` Jurij Smakov
` (15 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-12 23:06 UTC (permalink / raw)
To: sparclinux
Jurij, how do I setup this testcase?
I checked out Ruby from SVN and built it, but I can't find this
miniruby thing so that I can run the command line in that Ruby bug
report.
Thanks.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (7 preceding siblings ...)
2011-10-12 23:06 ` David Miller
@ 2011-10-12 23:21 ` Jurij Smakov
2011-10-12 23:42 ` David Miller
` (14 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-12 23:21 UTC (permalink / raw)
To: sparclinux
[-- Attachment #1: Type: text/plain, Size: 845 bytes --]
On Wed, Oct 12, 2011 at 07:06:17PM -0400, David Miller wrote:
>
> Jurij, how do I setup this testcase?
>
> I checked out Ruby from SVN and built it, but I can't find this
> miniruby thing so that I can run the command line in that Ruby bug
> report.
>
> Thanks.
Thanks for looking at it! Attached is a script which I use to set up
the environment and start gdb for the binary (you probably will need
directory names adjusted, if you are building from directly from svn
and not from Debian package). I'm still in the process of trying to
understand what's Ruby is trying to do with all the machine state
saves and restores, so I was not able to make a lot of progress so
far.
Best regads,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
[-- Attachment #2: run --]
[-- Type: text/plain, Size: 772 bytes --]
#!/bin/sh
export RUBY=/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236/ruby1.9.1
export PATH=/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
export RUBYLIB=/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236:/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236/.ext/common:/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236/.ext/sparc-linux:/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236/lib
export LD_LIBRARY_PATH=/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236
export LD_PRELOAD=/home/jurij/ruby/ruby1.9.1-1.9.3~preview1+svn33236/libruby-1.9.1.so.1.9.1
cat > script <<EOF
file ./ruby1.9.1
set args -rcontinuation -e 'callcc { |c| c.call }'
set break pending on
break cont_capture
run
EOF
gdb -x script
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (8 preceding siblings ...)
2011-10-12 23:21 ` Jurij Smakov
@ 2011-10-12 23:42 ` David Miller
2011-10-13 22:06 ` Jurij Smakov
` (13 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-12 23:42 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Thu, 13 Oct 2011 00:21:28 +0100
> On Wed, Oct 12, 2011 at 07:06:17PM -0400, David Miller wrote:
>>
>> Jurij, how do I setup this testcase?
>>
>> I checked out Ruby from SVN and built it, but I can't find this
>> miniruby thing so that I can run the command line in that Ruby bug
>> report.
>>
>> Thanks.
>
> Thanks for looking at it! Attached is a script which I use to set up
> the environment and start gdb for the binary (you probably will need
> directory names adjusted, if you are building from directly from svn
> and not from Debian package). I'm still in the process of trying to
> understand what's Ruby is trying to do with all the machine state
> saves and restores, so I was not able to make a lot of progress so
> far.
Thanks for the script.
Ruby is too clever for it's own good. Right before it calls setjmp()
it copies the stack frame down to the current bottom of the stack
into a save area.
Later, right before it longjmp()'s, it copies back to the stack from
the save area.
Look at the "workaround" for x86-86 in cont_restore_1(), and all of
the special case code they need in order to get IA64 right.
What a mess.
This is only going to get worse when GCC's support for shrink-wrapping
and other interesting features propagates. There is no guarentee that
the stack won't expand further downward between when Ruby saves the
stack frames and when it does it's setjmp() call, and it very much
relies upon that such a stack expansion not happening.
Anyways I'll see if there is some way to salvage this and make it
work. I suspect that Solaris doesn't have the restore loop
optimization we do in longjmp, and that's why Ruby works there with
the same compilers on sparc.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (9 preceding siblings ...)
2011-10-12 23:42 ` David Miller
@ 2011-10-13 22:06 ` Jurij Smakov
2011-10-13 22:35 ` David Miller
` (12 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-13 22:06 UTC (permalink / raw)
To: sparclinux
On Wed, Oct 12, 2011 at 07:42:28PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Thu, 13 Oct 2011 00:21:28 +0100
>
> > On Wed, Oct 12, 2011 at 07:06:17PM -0400, David Miller wrote:
> >>
> >> Jurij, how do I setup this testcase?
> >>
> >> I checked out Ruby from SVN and built it, but I can't find this
> >> miniruby thing so that I can run the command line in that Ruby bug
> >> report.
> >>
> >> Thanks.
> >
> > Thanks for looking at it! Attached is a script which I use to set up
> > the environment and start gdb for the binary (you probably will need
> > directory names adjusted, if you are building from directly from svn
> > and not from Debian package). I'm still in the process of trying to
> > understand what's Ruby is trying to do with all the machine state
> > saves and restores, so I was not able to make a lot of progress so
> > far.
>
> Thanks for the script.
>
> Ruby is too clever for it's own good. Right before it calls setjmp()
> it copies the stack frame down to the current bottom of the stack
> into a save area.
>
> Later, right before it longjmp()'s, it copies back to the stack from
> the save area.
>
> Look at the "workaround" for x86-86 in cont_restore_1(), and all of
> the special case code they need in order to get IA64 right.
>
> What a mess.
>
> This is only going to get worse when GCC's support for shrink-wrapping
> and other interesting features propagates. There is no guarentee that
> the stack won't expand further downward between when Ruby saves the
> stack frames and when it does it's setjmp() call, and it very much
> relies upon that such a stack expansion not happening.
>
> Anyways I'll see if there is some way to salvage this and make it
> work. I suspect that Solaris doesn't have the restore loop
> optimization we do in longjmp, and that's why Ruby works there with
> the same compilers on sparc.
I think I've figured it out (famous last words :-). The problem
appears to be in cont_save_machine_stack in cont.c. The part where new
memory is allocated and the machine state is saved using memcpy from
cont->machine_stack_src to cont->machine_stack generates the following
assembler code:
0xf7f4d728 <+584>: mov %i4, %o0 // %o0 = 437, size of memory to allocate in words
0xf7f4d72c <+588>: call 0xf7fb4004 <ruby_xmalloc2@plt>
0xf7f4d730 <+592>: mov 4, %o1 // %o1 = 4, word size
0xf7f4d734 <+596>: ld [ %fp + -12 ], %g3 // load 'cont' address into g3.
=> 0xf7f4d738 <+600>: st %o0, [ %g3 + 0x1c ] // %o0 contains the address returned by ruby_xmalloc2, store it in cont->machine_stack
0xf7f4d73c <+604>: flushw // flush register windows
0xf7f4d740 <+608>: ld [ %fp + -12 ], %g1 // load 'cont' address into g1. But flushw might have caused a spill trap and changed fp!
0xf7f4d744 <+612>: sll %i4, 2, %o2 // 437*4, total amount of memory to copy, goes into o2, third arg for memcpy
0xf7f4d748 <+616>: call 0xf7fb2b1c <memcpy@plt>
0xf7f4d74c <+620>: ld [ %g1 + 0x20 ], %o1 // we load what we think to be cont->machine_stack_src into second arg
0xf7f4d750 <+624>: ld [ %fp + -12 ], %g2
0xf7f4d754 <+628>: call 0xf7fb29b4 <_setjmp@plt>
0xf7f4d758 <+632>: add %g2, 0x278, %o0
0xf7f4d75c <+636>: cmp %o0, 0
I believe that whenever flushw causes a spill trap, we are going to
load an incorrect source address (cont->machine_stack_src) as a second
memcpy argument. A couple of observations support it: if you
insert a breakpoint right after memcpy, you find that memory regions
pointed to by cont->machine_stack and cont->machine_stack_src are not
synchronized, as one would expect. Furthermore, breaking anywhere
*before* will make the problem magically go away (perhaps because gdb
flushes register windows itself on breakpoints, and then flushw in
cont_capture is effectively a noop?)
I hope it makes at least some sense :-).
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (10 preceding siblings ...)
2011-10-13 22:06 ` Jurij Smakov
@ 2011-10-13 22:35 ` David Miller
2011-10-14 23:06 ` Jurij Smakov
` (11 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-13 22:35 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Thu, 13 Oct 2011 23:06:17 +0100
> I believe that whenever flushw causes a spill trap, we are going to
> load an incorrect source address (cont->machine_stack_src) as a second
> memcpy argument. A couple of observations support it: if you
> insert a breakpoint right after memcpy, you find that memory regions
> pointed to by cont->machine_stack and cont->machine_stack_src are not
> synchronized, as one would expect. Furthermore, breaking anywhere
> *before* will make the problem magically go away (perhaps because gdb
> flushes register windows itself on breakpoints, and then flushw in
> cont_capture is effectively a noop?)
>
> I hope it makes at least some sense :-).
Good detective work, did I mention that this Ruby continuation stuff
is extremely fragile?
Can you show me what values %sp and %fp have right before the flushw
is executed?
The effect of taking a breakpoint right before the flushw ought to
be the same as executing a flushw. When a process being debugged
by GDB takes a breakpoint, we flush all the user register windows
out of the cpu and onto the process stack, the wake up the parent
(GDB) and context switch.
Obviously, something different is happening when you just let the
flushw execute without an immediately preceeding breakpoint, so
we have to figure out exactly what that is :-)
Something you might want to try, compile cont.c into an assembler
file cont.s, then insert the following around the flushw
mov %fp, %g1
flushw
mov %fp, %g2
Then compile that into an object and link up ruby.
In the debugger, breakpoint right after that "mov %fp, %g2" and
print out from GDB the values of %g1 and %g2. This might give
some hints as to what's going on exactly.
Another test, go into Ruby's defines.h and get rid of the:
# if defined(__sparc_v9__) || defined(__sparcv9) || defined(__arch64__)
("flushw")
# else
and make it always use "ta 0x03" instead of "flushw". This might
explain why the Ruby developers can't reproduce this on Solaris. That
could happen if for some reason their Solaris build isn't setting the
defines that guard the flushw instruction usage.
If using "ta 0x03" instead of "flushw" makes a difference that would
be a huge clue.
Thanks!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (11 preceding siblings ...)
2011-10-13 22:35 ` David Miller
@ 2011-10-14 23:06 ` Jurij Smakov
2011-10-14 23:26 ` David Miller
` (10 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-14 23:06 UTC (permalink / raw)
To: sparclinux
On Thu, Oct 13, 2011 at 06:35:18PM -0400, David Miller wrote:
> Something you might want to try, compile cont.c into an assembler
> file cont.s, then insert the following around the flushw
>
> mov %fp, %g1
> flushw
> mov %fp, %g2
>
> Then compile that into an object and link up ruby.
>
> In the debugger, breakpoint right after that "mov %fp, %g2" and
> print out from GDB the values of %g1 and %g2. This might give
> some hints as to what's going on exactly.
>
> Another test, go into Ruby's defines.h and get rid of the:
>
> # if defined(__sparc_v9__) || defined(__sparcv9) || defined(__arch64__)
> ("flushw")
> # else
>
> and make it always use "ta 0x03" instead of "flushw". This might
> explain why the Ruby developers can't reproduce this on Solaris. That
> could happen if for some reason their Solaris build isn't setting the
> defines that guard the flushw instruction usage.
>
> If using "ta 0x03" instead of "flushw" makes a difference that would
> be a huge clue.
Replacing "flushw" with "ta 0x03" makes the problem go away. What is
the difference between the two? I would naively think that the effect
of both should be saving register windows on the stack, allocating a
new stack frame for each of them, so fp would get adjusted in either
case. Then I would expect that the correct fix would be to indicate to
the compiler that flushw is clobbering fp/sp registers, so it cannot
rely on their contents afterwards. The fact that using "ta 0x03" fixes
it makes me feel lost again :-).
Unfortunately, I was unable to do a conclusive second test (copying fp
to some registers before and after flushw) so far, as inserting a
break even after flushw but before memcpy() call makes the problem go
away as well. I can break after memcpy, but it might clobber the
registers, so the saving part would need to be a bit more elaborate.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (12 preceding siblings ...)
2011-10-14 23:06 ` Jurij Smakov
@ 2011-10-14 23:26 ` David Miller
2011-10-16 17:07 ` Jurij Smakov
` (9 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-14 23:26 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 15 Oct 2011 00:06:53 +0100
> Replacing "flushw" with "ta 0x03" makes the problem go away. What is
> the difference between the two? I would naively think that the effect
> of both should be saving register windows on the stack, allocating a
> new stack frame for each of them, so fp would get adjusted in either
> case. Then I would expect that the correct fix would be to indicate to
> the compiler that flushw is clobbering fp/sp registers, so it cannot
> rely on their contents afterwards. The fact that using "ta 0x03" fixes
> it makes me feel lost again :-).
Taking a trap has a side effect, in that the %g* and %o* registers
will be saved and restored by the trap entry and exit respectively.
Trap entry also grabs a register window (for the kernel), which
is restored from on trap exit.
The register window flush is performed between this trap statesave and
restore.
Furthermore, it also means that the current register window will be
saved by the "ta 0x03" case.
This is probably why certain gdb breakpoints also make the problem go
away.
Essentially, "ta 0x03" is kind of like:
save %sp, -WHATEVER, %sp
flushw
restore
so it will restore one more register window than an actual 'flushw'
in userspace would.
I'm starting to become convinced that if you look at the stack
backtrace at the time of the flushw done by ruby, you'll see that
there are multiple stack frames using the same memory regions.
I'll try to look at this more closely myself, especially since you've
given me excellent tips on how to reproduce this and run it under gdb,
but I'm currently fighting a gcc bug which I want to clear away first.
Thanks!
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (13 preceding siblings ...)
2011-10-14 23:26 ` David Miller
@ 2011-10-16 17:07 ` Jurij Smakov
2011-10-17 0:56 ` David Miller
` (8 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-16 17:07 UTC (permalink / raw)
To: sparclinux
On Fri, Oct 14, 2011 at 07:26:50PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 15 Oct 2011 00:06:53 +0100
>
> > Replacing "flushw" with "ta 0x03" makes the problem go away. What is
> > the difference between the two? I would naively think that the effect
> > of both should be saving register windows on the stack, allocating a
> > new stack frame for each of them, so fp would get adjusted in either
> > case. Then I would expect that the correct fix would be to indicate to
> > the compiler that flushw is clobbering fp/sp registers, so it cannot
> > rely on their contents afterwards. The fact that using "ta 0x03" fixes
> > it makes me feel lost again :-).
>
> Taking a trap has a side effect, in that the %g* and %o* registers
> will be saved and restored by the trap entry and exit respectively.
>
> Trap entry also grabs a register window (for the kernel), which
> is restored from on trap exit.
>
> The register window flush is performed between this trap statesave and
> restore.
>
> Furthermore, it also means that the current register window will be
> saved by the "ta 0x03" case.
>
> This is probably why certain gdb breakpoints also make the problem go
> away.
>
> Essentially, "ta 0x03" is kind of like:
>
> save %sp, -WHATEVER, %sp
> flushw
> restore
>
> so it will restore one more register window than an actual 'flushw'
> in userspace would.
>
> I'm starting to become convinced that if you look at the stack
> backtrace at the time of the flushw done by ruby, you'll see that
> there are multiple stack frames using the same memory regions.
>
> I'll try to look at this more closely myself, especially since you've
> given me excellent tips on how to reproduce this and run it under gdb,
> but I'm currently fighting a gcc bug which I want to clear away first.
>
> Thanks!
Thank you!
In the meantime, I've recognized that I can store fp before and after
flushw in %l0 and %l1, and memcpy is not allowed to touch them. After
changing the code to
mov %fp, %l0
flushw
mov %fp, %l1
I've found that value of %fp does not change as a result of flushw
after all, even in the case when it crashes later. So, as far as I can
tell, memcpy is receiving correct arguments. Furthermore, looking at
memcpy implementation (backed by __memcpy_ultra3 in my case), I see
that it's likely (I've not examined all possible paths) that before it
branches to 'out', o1 will contain the current source address and
o3 will contain the distance between source and destination. I checked
these values after our memcpy call, and they are consistent, i.e. o1
points at the end of source region, and o3 is the difference between
the end of source and destination regions. That made me wonder whether
we do copy at least part of the data, and it appears that only the
beginning of the memory regions is not copied correctly. Here's an
example dump of the first 32 bytes in the crashing case after memcpy:
(gdb) x/32xw cont->machine_stack_src
0xffffc96c: 0x00000001 0xffffc9e0 0xffffc9e0 0x00000000
0xffffc97c: 0x00000000 0x00000000 0x00000000 0x00000000
0xffffc98c: 0xf7fb1cb8 0xffffca40 0x00003910 0x00000000
0xffffc99c: 0x00022b88 0x00022b88 0x000001b5 0xffffc9e0
0xffffc9ac: 0xf7f4d7a4 0x00000000 0xf7ffc4d0 0xf7decc32
0xffffc9bc: 0xf7de8888 0xf7de46c8 0xffffc9f8 0x00000000
0xffffc9cc: 0x00000001 0xffffffff 0x001d6620 0x00047508
0xffffc9dc: 0x00047508 0x00000000 0x00000000 0x00000000
(gdb) x/32xw cont->machine_stack
0x1d6938: 0x00000001 0x00000000 0x00000000 0x00000000
0x1d6948: 0xf7fb1cb8 0x000c8170 0x00000001 0x000c76a9
0x1d6958: 0xffffcc94 0x00000001 0x000c76a8 0xffffcc30
0x1d6968: 0xf7ebc03c 0x00000000 0x00000000 0x00000005
0x1d6978: 0x000001db 0x00000000 0xf7ffc4d0 0xf7decc32
0x1d6988: 0xf7de8888 0xf7de46c8 0xffffc9f8 0x00000000
0x1d6998: 0x00000001 0xffffffff 0x001d6620 0x00047508
0x1d69a8: 0x00047508 0x00000000 0x00000000 0x00000000
As you can see, the first 17 words of the memory regions differ, but
after the data appears to be copied correctly (total amount of data
copied in this case is 437 words).
Assuming that the analysis is correct, and memcpy does receive correct
arguments, it might be a bug in __memcpy_ultra3 (which would be very
exciting :-). If you are using an UltraSparc III machine as well, and
could try it on a different architecture, I would be a very interested
in the result.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (14 preceding siblings ...)
2011-10-16 17:07 ` Jurij Smakov
@ 2011-10-17 0:56 ` David Miller
2011-10-18 20:46 ` Jurij Smakov
` (7 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-17 0:56 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sun, 16 Oct 2011 18:07:38 +0100
> Assuming that the analysis is correct, and memcpy does receive correct
> arguments, it might be a bug in __memcpy_ultra3 (which would be very
> exciting :-). If you are using an UltraSparc III machine as well, and
> could try it on a different architecture, I would be a very interested
> in the result.
I reproduced your crash last week on a Niagara3 system, therefore I
don't think it is dependent upon the memcpy implementation.
If you are still convinced it is some memcpy issue :-) I can only
suggest that you check those buffer pointers passed to memcpy, if
(given the size) they overlap at all, that would be a bug. You can't
use memcpy() for overlapping buffers, one must use memmove() instead.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (15 preceding siblings ...)
2011-10-17 0:56 ` David Miller
@ 2011-10-18 20:46 ` Jurij Smakov
2011-10-18 20:53 ` David Miller
` (6 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-18 20:46 UTC (permalink / raw)
To: sparclinux
On Sun, Oct 16, 2011 at 08:56:52PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sun, 16 Oct 2011 18:07:38 +0100
>
> > Assuming that the analysis is correct, and memcpy does receive correct
> > arguments, it might be a bug in __memcpy_ultra3 (which would be very
> > exciting :-). If you are using an UltraSparc III machine as well, and
> > could try it on a different architecture, I would be a very interested
> > in the result.
>
> I reproduced your crash last week on a Niagara3 system, therefore I
> don't think it is dependent upon the memcpy implementation.
>
> If you are still convinced it is some memcpy issue :-) I can only
> suggest that you check those buffer pointers passed to memcpy, if
> (given the size) they overlap at all, that would be a bug. You can't
> use memcpy() for overlapping buffers, one must use memmove() instead.
Ok, I'm now convinced that memcpy() is not broken :-). However, it
took a while to explain the behaviour I see. Here's what happens:
1. We flush the windows before performing the memcpy. This should
flush all register windows to memory, except the current one in use
(this is a crucial observation). So, for the current window we have
valid values in registers, but junk in memory, pointed to by sp.
2. The source address happen to differ from the current sp only by 4
bytes:
(gdb) info reg sp
sp 0xffffc970 0xffffc970
(gdb) print cont->machine_stack_src
$3 = (VALUE *) 0xffffc96c
(gdb)
I guess that the expectation is that all register windows (including
the current one!) are correctly represented in memory after flushw.
But for the current register window this is not true, because flushw
is not supposed to flush it (according to Sparc Architecture Manual,
3.2.7).
3. We perform the memcpy(), copying the junk we believe to be valid
register values (sp to sp + 16 words) to cont->machine_stack.
Restoring it later causes a crash.
That explains why 'ta 0x03' works - when it is executed, it actually
does save, which increments register window, then flushw, making sure
that the register window corresponding to cont_capture state is
getting flushed as well, and then restores/returns. As a result, we
end up with correct register content for the current window in memory
before memcpy.
Finally, the "failing" memcpy is also easy to explain now: when we
invoke the memcpy, we do it with the correct arguments, but we really
do have incorrect memory contents in the source buffer, which end up
in the destination. However, as soon as gdb breaks, it flushes *all*
register windows of the interrupted process, including the current
one, altering the memory contents of the source buffer, and making it
appear like memcpy failed to synchronize the source and the
destination.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (16 preceding siblings ...)
2011-10-18 20:46 ` Jurij Smakov
@ 2011-10-18 20:53 ` David Miller
2011-10-22 9:00 ` Jurij Smakov
` (5 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-18 20:53 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Tue, 18 Oct 2011 21:46:20 +0100
> Finally, the "failing" memcpy is also easy to explain now: when we
> invoke the memcpy, we do it with the correct arguments, but we really
> do have incorrect memory contents in the source buffer, which end up
> in the destination. However, as soon as gdb breaks, it flushes *all*
> register windows of the interrupted process, including the current
> one, altering the memory contents of the source buffer, and making it
> appear like memcpy failed to synchronize the source and the
> destination.
Now this makes a lot of sense, thanks for working to get to the bottom
of this.
I would suggest fixing this in Ruby by one or two possible methods:
1) Always do "ta 0x03", this is the simplest but most expensive fix.
2) Force the "flushw" into a helper function and force GCC to allocate
a register window by clobbering the return address register, something
like:
void flushw_helper(void)
{
/* Clobber %o7 so that the compiler is forced to allocate
a register window for this function. */
__asm__ __volatile__("flushw" : : : "o7");
}
That results (on 32-bit) in something like:
save %sp, -96, %sp
flushw
return %i7 + 8
nop
which gives us exactly what we need.
Thanks again Jurij.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (17 preceding siblings ...)
2011-10-18 20:53 ` David Miller
@ 2011-10-22 9:00 ` Jurij Smakov
2011-10-22 9:05 ` David Miller
` (4 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-22 9:00 UTC (permalink / raw)
To: sparclinux
On Tue, Oct 18, 2011 at 04:53:53PM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Tue, 18 Oct 2011 21:46:20 +0100
>
> > Finally, the "failing" memcpy is also easy to explain now: when we
> > invoke the memcpy, we do it with the correct arguments, but we really
> > do have incorrect memory contents in the source buffer, which end up
> > in the destination. However, as soon as gdb breaks, it flushes *all*
> > register windows of the interrupted process, including the current
> > one, altering the memory contents of the source buffer, and making it
> > appear like memcpy failed to synchronize the source and the
> > destination.
>
> Now this makes a lot of sense, thanks for working to get to the bottom
> of this.
>
> I would suggest fixing this in Ruby by one or two possible methods:
>
> 1) Always do "ta 0x03", this is the simplest but most expensive fix.
It turns out that freebsd/sparc64 is also suffering from this bug,
and, as far as I can tell, there is no equivalent of 'ta 0x03' there.
So, to fix it in a more generic way we probably should push for the
second option.
> 2) Force the "flushw" into a helper function and force GCC to allocate
> a register window by clobbering the return address register, something
> like:
>
> void flushw_helper(void)
> {
> /* Clobber %o7 so that the compiler is forced to allocate
> a register window for this function. */
> __asm__ __volatile__("flushw" : : : "o7");
> }
>
> That results (on 32-bit) in something like:
>
> save %sp, -96, %sp
> flushw
> return %i7 + 8
> nop
>
> which gives us exactly what we need.
Is something conceptually wrong with just doing this:
asm volatile ("save; flush; restore");
It fixes the problem for freebsd (will test on linux later today), and
does not rely on compiler doing (or not doing) the "right"
optimization.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (18 preceding siblings ...)
2011-10-22 9:00 ` Jurij Smakov
@ 2011-10-22 9:05 ` David Miller
2011-10-22 9:43 ` Jurij Smakov
` (3 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-22 9:05 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 22 Oct 2011 10:00:24 +0100
> Is something conceptually wrong with just doing this:
>
> asm volatile ("save; flush; restore");
>
> It fixes the problem for freebsd (will test on linux later today), and
> does not rely on compiler doing (or not doing) the "right"
> optimization.
I would not do a "naked" save here, please just use the construct
I showed you with the %o7 register clobber.
If you don't allocate space for the new register window, you will
be writing the local registers to the parent's register window.
Maybe you get lucky and this works, but after spending multiple weeks
on this bug why take any chances at all at this point?
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (19 preceding siblings ...)
2011-10-22 9:05 ` David Miller
@ 2011-10-22 9:43 ` Jurij Smakov
2011-10-22 22:54 ` David Miller
` (2 subsequent siblings)
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-22 9:43 UTC (permalink / raw)
To: sparclinux
On Sat, Oct 22, 2011 at 05:05:05AM -0400, David Miller wrote:
> From: Jurij Smakov <jurij@wooyd.org>
> Date: Sat, 22 Oct 2011 10:00:24 +0100
>
> > Is something conceptually wrong with just doing this:
> >
> > asm volatile ("save; flush; restore");
> >
> > It fixes the problem for freebsd (will test on linux later today), and
> > does not rely on compiler doing (or not doing) the "right"
> > optimization.
>
> I would not do a "naked" save here, please just use the construct
> I showed you with the %o7 register clobber.
Yes, I missed that save is not allocating any memory, which is
wrong. What I actually had in mind was this:
asm volatile ("save %sp, -96, %sp; flushw; restore");
My concern with the approach you are proposing is that compiler
may (in the future, perhaps) just optimize out the helper function,
making arrangements to save %o7 in one of the unused registers, thus
removing allocation of another register window and breaking it again.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (20 preceding siblings ...)
2011-10-22 9:43 ` Jurij Smakov
@ 2011-10-22 22:54 ` David Miller
2011-10-23 13:47 ` Jurij Smakov
2011-10-23 20:35 ` David Miller
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-22 22:54 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sat, 22 Oct 2011 10:43:39 +0100
> On Sat, Oct 22, 2011 at 05:05:05AM -0400, David Miller wrote:
>> From: Jurij Smakov <jurij@wooyd.org>
>> Date: Sat, 22 Oct 2011 10:00:24 +0100
>>
>> > Is something conceptually wrong with just doing this:
>> >
>> > asm volatile ("save; flush; restore");
>> >
>> > It fixes the problem for freebsd (will test on linux later today), and
>> > does not rely on compiler doing (or not doing) the "right"
>> > optimization.
>>
>> I would not do a "naked" save here, please just use the construct
>> I showed you with the %o7 register clobber.
>
> Yes, I missed that save is not allocating any memory, which is
> wrong. What I actually had in mind was this:
>
> asm volatile ("save %sp, -96, %sp; flushw; restore");
Only works on 32-bit, will crash on 64-bit.
Just let the compiler do it for you, please... the compiler absolutely cannot
optimize away a function that uses a volatile asm statement. And it absolutely
cannot make a function that clobbers %o7 a leaf function.
Please just use the approach I suggested, it covers all cases and is in fact
future proof.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (21 preceding siblings ...)
2011-10-22 22:54 ` David Miller
@ 2011-10-23 13:47 ` Jurij Smakov
2011-10-23 20:35 ` David Miller
23 siblings, 0 replies; 25+ messages in thread
From: Jurij Smakov @ 2011-10-23 13:47 UTC (permalink / raw)
To: sparclinux
On Sat, Oct 22, 2011 at 06:54:05PM -0400, David Miller wrote:
>
> Just let the compiler do it for you, please... the compiler absolutely cannot
> optimize away a function that uses a volatile asm statement. And it absolutely
> cannot make a function that clobbers %o7 a leaf function.
That's not what I see. I tried writing up a patch for Ruby using this,
and here's what I get:
1. Preprocessed source of cont.c contains flush_register_windows
definition:
# 219 "./include/ruby/defines.h"
static void
flush_register_windows(void)
{
asm __volatile__ ("flushw" : : : "o7");
}
2. Preprocessed code of cont_save_machine_stack contains a call to
flush_register_windows.
3. In cont.s both cont_save_machine_stack and flush_register_windows
are completely optimized out when compiled with -O2. Relevant
snippet:
.LL47:
call ruby_xmalloc2, 0
mov 4, %o1
ld [%fp-12], %g3
st %o0, [%g3+28]
.LLBB115:
.LLBB114:
.loc 2 222 0
#APP
! 222 "./include/ruby/defines.h" 1
flushw
! 0 "" 2
#NO_APP
.LLBE114:
.LLBE115:
.loc 1 353 0
ld [%fp-12], %g1
sll %i4, 2, %o2
call memcpy, 0
ld [%g1+32], %o1
In assembled form (objdump -d) it looks like this:
54c: 40 00 00 00 call 54c <cont_capture+0x24c>
550: 92 10 20 04 mov 4, %o1
554: c6 07 bf f4 ld [ %fp + -12 ], %g3
558: d0 20 e0 1c st %o0, [ %g3 + 0x1c ]
55c: 81 58 00 00 flushw
560: c2 07 bf f4 ld [ %fp + -12 ], %g1
564: 95 2f 20 02 sll %i4, 2, %o2
568: 40 00 00 00 call 568 <cont_capture+0x268>
56c: d2 00 60 20 ld [ %g1 + 0x20 ], %o1
I also tried listing the clobbered register as "%o7" instead of "o7",
as the examples I could find prepend the register names with % in the
clobber list. The results were the same.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: longjmp question
2011-10-07 23:22 longjmp question Jurij Smakov
` (22 preceding siblings ...)
2011-10-23 13:47 ` Jurij Smakov
@ 2011-10-23 20:35 ` David Miller
23 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2011-10-23 20:35 UTC (permalink / raw)
To: sparclinux
From: Jurij Smakov <jurij@wooyd.org>
Date: Sun, 23 Oct 2011 14:47:27 +0100
> On Sat, Oct 22, 2011 at 06:54:05PM -0400, David Miller wrote:
>>
>> Just let the compiler do it for you, please... the compiler absolutely cannot
>> optimize away a function that uses a volatile asm statement. And it absolutely
>> cannot make a function that clobbers %o7 a leaf function.
>
> That's not what I see. I tried writing up a patch for Ruby using this,
> and here's what I get:
>
> 1. Preprocessed source of cont.c contains flush_register_windows
> definition:
>
> # 219 "./include/ruby/defines.h"
> static void
> flush_register_windows(void)
> {
> asm __volatile__ ("flushw" : : : "o7");
> }
You can't put it in a header file, GCC will inline it. You have to
put it in a function in a seperate C file, and this is what I said to
do the other week in my original email.
You're only other option is to explicitly mark this
__attribute__((__noinline__))
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2011-10-23 20:35 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-07 23:22 longjmp question Jurij Smakov
2011-10-07 23:42 ` David Miller
2011-10-08 6:43 ` David Miller
2011-10-08 12:55 ` Jurij Smakov
2011-10-08 19:22 ` David Miller
2011-10-12 21:22 ` Jurij Smakov
2011-10-12 22:09 ` David Miller
2011-10-12 22:17 ` David Miller
2011-10-12 23:06 ` David Miller
2011-10-12 23:21 ` Jurij Smakov
2011-10-12 23:42 ` David Miller
2011-10-13 22:06 ` Jurij Smakov
2011-10-13 22:35 ` David Miller
2011-10-14 23:06 ` Jurij Smakov
2011-10-14 23:26 ` David Miller
2011-10-16 17:07 ` Jurij Smakov
2011-10-17 0:56 ` David Miller
2011-10-18 20:46 ` Jurij Smakov
2011-10-18 20:53 ` David Miller
2011-10-22 9:00 ` Jurij Smakov
2011-10-22 9:05 ` David Miller
2011-10-22 9:43 ` Jurij Smakov
2011-10-22 22:54 ` David Miller
2011-10-23 13:47 ` Jurij Smakov
2011-10-23 20:35 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.