kernel-testers.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Reproducible rRootage segfault with 2.6.25 and above (regression?)
@ 2008-08-24 22:29 Sitsofe Wheeler
  2008-08-25 12:16 ` Alan Cox
  0 siblings, 1 reply; 3+ messages in thread
From: Sitsofe Wheeler @ 2008-08-24 22:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: kernel-testers

I've found that when running certain levels in the game rRootage on 
kernels later than 2.6.24 a segfault will be caused. This segfault is 
not there on 2.6.24 (and below) though...

Frustratingly I have been unable to bisect my way to the kernel change 
because I hit a USB timeout issue bisecting between 2.6.24-2.6.25 which 
made booting impossible. Further it seems there are a number of 
conditions that need to be met before the problem manifests itself:

1. Compiler optimisation used to compile rRootage must be -O1 or higher 
(-Os also triggers the problem)
2. The running kernel (going by release) must be 2.6.25 or later.
3. The gcc used to compile the game must (seemingly) not be 3.3 (using 
4.2 shows the problem. Other versions may also show up the problem).
4. Not every level in every mode will show the problem (it seems linked 
to certain patterns). I have found level 9A in the green "GigaWing" mode 
is usually quick to trigger the issue but you may have to kill the first 
enemy once to see the problem (if you can just get to even that part it 
is likely the problem is non present).

I have seen the issue on a range of 2.6.25+ kernels (both hand compiled 
on openSUSE kernels and a pre-shipped 2.6.26-5 from Ubuntu 8.10).

The segfault in question is due to an array being accessed beyond its 
bounds (the array sctbl on this line 
http://www.koders.com/cpp/fid93F842B399CA68D754CADEC374AE934EED72C07D.aspx#L246 
). Running the game under valgrind on a 2.6.24 kernel did not generate 
any  warnings about that array (using MALLOC_CHECK_=2 didn't generate 
any warnings either). The problem has been reproduced on two different 
machines (a Thinkpad T60 and an eeePC).

Finally, this also afflicts a prebuilt binary from 2004 (which probably 
wasn't built using gcc4.x 
http://sourceforge.net/project/showfiles.php?group_id=112441 ).

The issue is fiddly but reproducible. All help in pinpointing the 
problem source is appreciated.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Reproducible rRootage segfault with 2.6.25 and above (regression?)
  2008-08-24 22:29 Reproducible rRootage segfault with 2.6.25 and above (regression?) Sitsofe Wheeler
@ 2008-08-25 12:16 ` Alan Cox
       [not found]   ` <20080825131620.1d6aa87f-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 3+ messages in thread
From: Alan Cox @ 2008-08-25 12:16 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> The issue is fiddly but reproducible. All help in pinpointing the 
> problem source is appreciated.

For the kernel bisect if you get stuck at a point it fails remember that
point and then lie either yes/no to it working and carry on. If need be
you can go back the other way.

Another completely off the wall guess would be that your client code is
causing gcc to generate something where it is using data which has ended
up below the stack pointer and the timings have changed. Either through
gcc bug or passing around the address of an object that is out of
context. At that point a signal will rewrite the data in fun ways
producing results like you describe.

Alan

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Reproducible rRootage segfault with 2.6.25 and above (solved)
       [not found]   ` <20080825131620.1d6aa87f-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
@ 2008-08-25 20:30     ` Sitsofe Wheeler
  0 siblings, 0 replies; 3+ messages in thread
From: Sitsofe Wheeler @ 2008-08-25 20:30 UTC (permalink / raw)
  To: Alan Cox
  Cc: public-linux-kernel-u79uwXL29TY76Z2rM5mHXA-z5DuStaUktnZ+VzJOa5vwg,
	public-kernel-testers-u79uwXL29TY76Z2rM5mHXA-z5DuStaUktnZ+VzJOa5vwg



Alan Cox wrote:
> For the kernel bisect if you get stuck at a point it fails remember that
> point and then lie either yes/no to it working and carry on. If need be
> you can go back the other way.

I tried this quite a few times (you can always use replay and edit out
the lie) before posting (and using gitk to pick commits to) but it seems
like huge swathes of what I was interested in were inside this USB
issue. Eventually I broke down and used a loan laptop that didn't need
to boot from USB. I narrowed the issue down to 10 or so patches (from
8a423ff0c4a0472607bbed6790fdaeec54af2ebb to
0249c9c1e7505c2b020bcc6deaf1e0415de9943e which covers patches that
randomize brk and change vDSO) but after further incorrectly bisecting
to a patch it looks like the segfault was totally legit...

> Another completely off the wall guess would be that your client code is
> causing gcc to generate something where it is using data which has ended
> up below the stack pointer and the timings have changed. Either through
> gcc bug or passing around the address of an object that is out of
> context. At that point a signal will rewrite the data in fun ways
> producing results like you describe.

After reading this I went back and stuffed a bunch of asserts into the
rRootage code to see what was going on and found what looks like a bug
rRootage. I guess valgrind can't do array bounds checking - in fact this
is what I get for not reading the FAQ -
http://valgrind.org/docs/manual/faq.html#faq.overruns . A workaround
seems to be to do capping on the value used to index the array -
https://bugs.launchpad.net/ubuntu/+source/rrootage/+bug/261189/comments/4
  . I even just tried using mudflap but that brought up so many spurious
warnings (supposedly it doesn't currently do well with C++) it wasn't
helpful.


-- 
Sitsofe | http://sucs.org/~sits/



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-08-25 20:30 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-24 22:29 Reproducible rRootage segfault with 2.6.25 and above (regression?) Sitsofe Wheeler
2008-08-25 12:16 ` Alan Cox
     [not found]   ` <20080825131620.1d6aa87f-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org>
2008-08-25 20:30     ` Reproducible rRootage segfault with 2.6.25 and above (solved) Sitsofe Wheeler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).