* frequent slab corruption (since a long time)
@ 2006-08-02 2:16 Dave Jones
2006-08-02 2:34 ` Roland Dreier
` (2 more replies)
0 siblings, 3 replies; 14+ messages in thread
From: Dave Jones @ 2006-08-02 2:16 UTC (permalink / raw)
To: Linux Kernel
Every so often, I see a slab corruption bug reported against
the Fedora kernels (going back as far as 2.6.11), and it's
still plagueing us.
It seems to have turned up in a number of different scenarios,
which makes it all the more complicated, but the footprint is
always the same. We write ffffffff00000000 to freed memory.
All the example cases seen so far have been on 32-bit x86.
Anyone have any clues where that value could be coming from?
There's a collection of corruption reports at
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=160878
Dave
--
http://www.codemonkey.org.uk
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: frequent slab corruption (since a long time) 2006-08-02 2:16 frequent slab corruption (since a long time) Dave Jones @ 2006-08-02 2:34 ` Roland Dreier 2006-08-02 3:37 ` Andi Kleen 2006-08-02 5:05 ` David Miller 2 siblings, 0 replies; 14+ messages in thread From: Roland Dreier @ 2006-08-02 2:34 UTC (permalink / raw) To: Dave Jones; +Cc: Linux Kernel > There's a collection of corruption reports at > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=160878 I notice that the first few reports (kernels <= 2.6.14) have len=4096, while the later reports (kernels >= 2.6.16) have len=2048. So assuming it's the same bug, the use after free has moved from a 4096-byte slab to a 2048-byte slab. Were there ny data structures we shrank between 2.6.14 and 2.6.16? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 2:16 frequent slab corruption (since a long time) Dave Jones 2006-08-02 2:34 ` Roland Dreier @ 2006-08-02 3:37 ` Andi Kleen 2006-08-02 4:22 ` Dave Jones 2006-08-02 5:05 ` David Miller 2 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2006-08-02 3:37 UTC (permalink / raw) To: Dave Jones; +Cc: linux-kernel Dave Jones <davej@redhat.com> writes: > Every so often, I see a slab corruption bug reported against > the Fedora kernels (going back as far as 2.6.11), and it's > still plagueing us. > > It seems to have turned up in a number of different scenarios, > which makes it all the more complicated, but the footprint is > always the same. We write ffffffff00000000 to freed memory. DEBUG_PAGEALLOC + a small slab patch to force the 2k slab to be only a single object per page (so that a kfree() immediately triggers an unmap) would catch it I guess. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 3:37 ` Andi Kleen @ 2006-08-02 4:22 ` Dave Jones 2006-08-02 4:35 ` Andi Kleen 0 siblings, 1 reply; 14+ messages in thread From: Dave Jones @ 2006-08-02 4:22 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Wed, Aug 02, 2006 at 05:37:28AM +0200, Andi Kleen wrote: > Dave Jones <davej@redhat.com> writes: > > > Every so often, I see a slab corruption bug reported against > > the Fedora kernels (going back as far as 2.6.11), and it's > > still plagueing us. > > > > It seems to have turned up in a number of different scenarios, > > which makes it all the more complicated, but the footprint is > > always the same. We write ffffffff00000000 to freed memory. > > DEBUG_PAGEALLOC + a small slab patch to force the 2k slab to be > only a single object per page (so that a kfree() immediately > triggers an unmap) would catch it I guess. Problem with that approach is that DEBUG_PAGEALLOC makes things so damned slow that it's pretty much unusable, and this bug doesn't seem to want to repeat itself to order, so I doubt many people would put up with the slowdown long enough to chase it down. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 4:22 ` Dave Jones @ 2006-08-02 4:35 ` Andi Kleen 2006-08-02 4:46 ` Dave Jones 0 siblings, 1 reply; 14+ messages in thread From: Andi Kleen @ 2006-08-02 4:35 UTC (permalink / raw) To: Dave Jones; +Cc: linux-kernel On Wednesday 02 August 2006 06:22, Dave Jones wrote: > Problem with that approach is that DEBUG_PAGEALLOC makes things > so damned slow that it's pretty much unusable, and this bug > doesn't seem to want to repeat itself to order, so I doubt > many people would put up with the slowdown long enough to chase it down. Really? It shouldn't be that much slower in theory. Do you have numbers? If it's a big problem it could probably be made faster by batching the TLB flushes more. -Andi ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 4:35 ` Andi Kleen @ 2006-08-02 4:46 ` Dave Jones 0 siblings, 0 replies; 14+ messages in thread From: Dave Jones @ 2006-08-02 4:46 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-kernel On Wed, Aug 02, 2006 at 06:35:00AM +0200, Andi Kleen wrote: > On Wednesday 02 August 2006 06:22, Dave Jones wrote: > > > Problem with that approach is that DEBUG_PAGEALLOC makes things > > so damned slow that it's pretty much unusable, and this bug > > doesn't seem to want to repeat itself to order, so I doubt > > many people would put up with the slowdown long enough to chase it down. > > Really? It shouldn't be that much slower in theory. Do you > have numbers? You need slower boxes :-) Every time I enable it to try and diagnose a bug in the Fedora kernel I get a flood of "hey what gives, everything got slow" emails. That speaks louder than any numbers to me. It could be less of an issue on modern CPUs than it used to be, but it has been painful enough in the past that I've really only enabled it when I've been desperately trying to chase something down. > If it's a big problem it could probably be made faster by batching > the TLB flushes more. Maybe, though that gives me the creeps a little for some reason. Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 2:16 frequent slab corruption (since a long time) Dave Jones 2006-08-02 2:34 ` Roland Dreier 2006-08-02 3:37 ` Andi Kleen @ 2006-08-02 5:05 ` David Miller 2006-08-02 5:31 ` David Miller 2 siblings, 1 reply; 14+ messages in thread From: David Miller @ 2006-08-02 5:05 UTC (permalink / raw) To: davej; +Cc: linux-kernel From: Dave Jones <davej@redhat.com> Date: Tue, 1 Aug 2006 22:16:17 -0400 > Anyone have any clues where that value could be coming from? Some of the dumps in there looks like ethernet headers. For example, in comment #7: 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 010: 5a 5a 00 11 85 6a 0f ef 00 e0 52 cf 6c 00 08 00 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ That looks like an ethernet header for an IPv4 packet to ethernet destination MAC 00:11:85:6a:0f:ef from ethernet source MAC 00:e0:52:cf:6c:00 But this chunk is OK since we are looking at a neighbouring INUSE object. The reocurring corruption: 0b0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff 0c0: 00 00 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b is less identifyable. Although tg3_alloc_rx_skb() shows up consistently in the neighbouring objects, the actually corrupted piece is marked by release_mem() which is the TTY layer. The corruption is always a 32-bit 0xffffffff followed by a 32-bit 0x00000000, 12 bytes into the object. If it's sitting next to RX ethernet packets, it's probably something in the vacinity of 1500+ bytes in size as that's how large the RX skb data areas will be that the tg3 driver allocates unless it is in Jumbo MTU mode. By default, do_tty_write() will use a chunk size of 2048 for the write buffer, tty->write_buf, and this is freed up as part of release_mem() processing. Another possibility, is the struct tty_struct itself since that is a sizable structure too. And this theory is supported by alloc_tty_struct() being in some of the triggering backtraces. Perhaps a TTY refcounting problem or race condition of some sort. What is 12-bytes into the tty_struct on x86? The struct tty_ldisc, "ldisc" is. Oddly enough, this doesn't match up, since we'd expect TTY_LDISC_MAGIC (0x5403) and instead we see 0xffffffff there. Also, after the magic, we'd expect the address of the "n_tty" string in tty_ldisc_N_TTY to show up in the next word, instead we find NULL (0x00000000) there. Something is clobbering the ldisc member of this free'd tty_struct is seems. Maybe there is some problem with ldisc refcounting. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 5:05 ` David Miller @ 2006-08-02 5:31 ` David Miller 2006-08-02 22:23 ` Dave Jones 0 siblings, 1 reply; 14+ messages in thread From: David Miller @ 2006-08-02 5:31 UTC (permalink / raw) To: davej; +Cc: linux-kernel From: David Miller <davem@davemloft.net> Date: Tue, 01 Aug 2006 22:05:38 -0700 (PDT) > The corruption is always a 32-bit 0xffffffff followed by > a 32-bit 0x00000000, 12 bytes into the object. This analysis is wrong, it's "0xb0 + 12" bytes into the object which is 188 bytes. For x86, this lands us at the "count" member of the tty_struct, and it shows that the tty count has decremented to -1 (0xffffffff) which is a serious bug. Note also that the ws_row and ws_col fields of the tty->winsize are next and has been zero'd out (0x00000000) by the corruption. There aren't too many places that write to tty->winsize, the most notable is tiocswinsz(), called from tty_ioctl. For a PTY with sub-type PTY_TYPE_MASTE, which is very likely the case we have here, it assigns the user provided winsize to both tty->winsize and tty->link->winsz (via the real_tty) argument to tiocswinsz(). Actually, there is one other notable spot that writes to tty->winsize, drivers/char/vt.c:vc_resize(), which copies it to vc->vc_tty->winsize. Perhaps this is a clue... but wait there is a better clue. con_open() sets values there, and in particular it only assigns the ws_row and ws_col members, which matches up with the fact that we see only the first two members of tty->winsize spammed to zero. If the guilty code path involved vc_resize() or tiocswinsz() we'd see 8 bytes, not 4 bytes, of the tty->winsize set to something other than the SLAB free poison chars. So it seems, from this perspective, that con_open()'s assignments which are causing the corruption. This is backed up by the trace found in bugzilla #200928 being early during bootup. But even this doesn't add up because con_open() has to be seeing zero's there at the time it runs, not the poison values, since it's assignments are guarded like this: if (!tty->winsize.ws_row && !tty->winsize.ws_col) { tty->winsize.ws_row = vc_cons[currcons].d->vc_rows; tty->winsize.ws_col = vc_cons[currcons].d->vc_cols; } Hmmm... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 5:31 ` David Miller @ 2006-08-02 22:23 ` Dave Jones 2006-08-02 22:49 ` David Miller 2006-08-02 23:14 ` Alan Cox 0 siblings, 2 replies; 14+ messages in thread From: Dave Jones @ 2006-08-02 22:23 UTC (permalink / raw) To: David Miller; +Cc: linux-kernel, jbaron On Tue, Aug 01, 2006 at 10:31:10PM -0700, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Tue, 01 Aug 2006 22:05:38 -0700 (PDT) > > > The corruption is always a 32-bit 0xffffffff followed by > > a 32-bit 0x00000000, 12 bytes into the object. > > This analysis is wrong, it's "0xb0 + 12" bytes into the object > which is 188 bytes. For x86, this lands us at the "count" > member of the tty_struct, and it shows that the tty count > has decremented to -1 (0xffffffff) which is a serious bug. <stabbing in the dark here> None of the code manipulating tty->count seems to be under the tty_mutex. Should it be ? Or is this protected through some other means? Jason, ISTR you've done some digging around this area wrt races, any insight ? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 22:23 ` Dave Jones @ 2006-08-02 22:49 ` David Miller 2006-08-03 17:40 ` Alan Cox 2006-08-02 23:14 ` Alan Cox 1 sibling, 1 reply; 14+ messages in thread From: David Miller @ 2006-08-02 22:49 UTC (permalink / raw) To: davej; +Cc: linux-kernel, jbaron From: Dave Jones <davej@redhat.com> Date: Wed, 2 Aug 2006 18:23:21 -0400 > None of the code manipulating tty->count seems to be under > the tty_mutex. Should it be ? > Or is this protected through some other means? It is in the primary code paths at least, all callers of init_dev() (which increments tty->count) grab the mutex and also release_dev() grabs the mutex around tty->count manipulations. I'm surprised that when this triggers we don't get one of these two messages: if (pty_master) { if (--o_tty->count < 0) { printk(KERN_WARNING "release_dev: bad pty slave count " "(%d) for %s\n", o_tty->count, tty_name(o_tty, buf)); o_tty->count = 0; } } if (--tty->count < 0) { printk(KERN_WARNING "release_dev: bad tty->count (%d) for %s\n", tty->count, tty_name(tty, buf)); tty->count = 0; } However, there seems to be some kind of dependency of TTY opennings holding the BKL, as least as far as this comment on con_close() is concerned: /* * tty_mutex is released, but we still hold BKL, so there is * still exclusion against init_dev() */ But it is not clear to me that tty_open() and ptmx_open() always run with the BKL held. chrdev_open() wraps the ->open call with the BKL held, but then it plugs in the device's fops which should allow a direct filp->fops->open() call from the VFS layer without the BKL grabbing right? chrdev_open() should catch /dev/foo char device opens, but what about the sysfs instances? They might bypass this path too somehow, thus another case where the BKL won't be held on open(). Hmmm... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 22:49 ` David Miller @ 2006-08-03 17:40 ` Alan Cox 2006-08-03 17:56 ` Dave Jones 0 siblings, 1 reply; 14+ messages in thread From: Alan Cox @ 2006-08-03 17:40 UTC (permalink / raw) To: David Miller; +Cc: davej, linux-kernel, jbaron Ar Mer, 2006-08-02 am 15:49 -0700, ysgrifennodd David Miller: > > None of the code manipulating tty->count seems to be under > > the tty_mutex. Should it be ? > > Or is this protected through some other means? > > It is in the primary code paths at least, all callers of init_dev() > (which increments tty->count) grab the mutex and also release_dev() > grabs the mutex around tty->count manipulations. I've been auditing tty code and its joyously bad but only in harmless places so far except for one. init_dev (and caller) relies on tty_mutex to ensure that the driver->ttys list is protected from things going away. release_mem() removes stuff from the said list and frees memory. It is not always called under tty_mutex and that appears very dubious to me at the moment although tty->closing and the BKL *might* be sufficient. Alan ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-03 17:40 ` Alan Cox @ 2006-08-03 17:56 ` Dave Jones 2006-08-03 20:48 ` Alan Cox 0 siblings, 1 reply; 14+ messages in thread From: Dave Jones @ 2006-08-03 17:56 UTC (permalink / raw) To: Alan Cox; +Cc: David Miller, linux-kernel, jbaron On Thu, Aug 03, 2006 at 06:40:42PM +0100, Alan Cox wrote: > Ar Mer, 2006-08-02 am 15:49 -0700, ysgrifennodd David Miller: > > > None of the code manipulating tty->count seems to be under > > > the tty_mutex. Should it be ? > > > Or is this protected through some other means? > > > > It is in the primary code paths at least, all callers of init_dev() > > (which increments tty->count) grab the mutex and also release_dev() > > grabs the mutex around tty->count manipulations. > > I've been auditing tty code and its joyously bad but only in harmless > places so far except for one. > > init_dev (and caller) relies on tty_mutex to ensure that the > driver->ttys list is protected from things going away. > > release_mem() removes stuff from the said list and frees memory. It is > not always called under tty_mutex and that appears very dubious to me at > the moment although tty->closing and the BKL *might* be sufficient. Against my better judgment I was poring over that code until the wee hours last night, and one thing crossed my mind re: the assumptions made about the BKL in that subsystem. Now that the BKL is preemtible, do any of those assumptions break ? Dave -- http://www.codemonkey.org.uk ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-03 17:56 ` Dave Jones @ 2006-08-03 20:48 ` Alan Cox 0 siblings, 0 replies; 14+ messages in thread From: Alan Cox @ 2006-08-03 20:48 UTC (permalink / raw) To: Dave Jones; +Cc: David Miller, linux-kernel, jbaron Ar Iau, 2006-08-03 am 13:56 -0400, ysgrifennodd Dave Jones: > Against my better judgment I was poring over that code until the wee > hours last night, and one thing crossed my mind re: the assumptions made > about the BKL in that subsystem. Now that the BKL is preemtible, do > any of those assumptions break ? >From the walking of the code so far I think quite a few of them were already broken. I'm still annotating and I hope by the end of this evening I'll have a patch to post that documents the current state of affairs better, including some gems 8) ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: frequent slab corruption (since a long time) 2006-08-02 22:23 ` Dave Jones 2006-08-02 22:49 ` David Miller @ 2006-08-02 23:14 ` Alan Cox 1 sibling, 0 replies; 14+ messages in thread From: Alan Cox @ 2006-08-02 23:14 UTC (permalink / raw) To: Dave Jones; +Cc: David Miller, linux-kernel, jbaron Ar Mer, 2006-08-02 am 18:23 -0400, ysgrifennodd Dave Jones: > None of the code manipulating tty->count seems to be under > the tty_mutex. Should it be ? > Or is this protected through some other means? Its old lock_kernel() code so its a prime suspect for most offences. Given the age of the reports it appears that the tty buffering changes are too new. The ldisc locking changes are a candidate but shouldn't be doing anything that breaks the kernel lock stuff. Will look tomorrow see if anything strikes me about that lot. The tty_mutex primarily protected current->signal.tty (except if you are using SELinux which has some extremely dubious looking code for tty handling). Someone ought to review flush_unauthorized_files() although it doesn't fit this problem. Alan ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-08-03 20:29 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-02 2:16 frequent slab corruption (since a long time) Dave Jones 2006-08-02 2:34 ` Roland Dreier 2006-08-02 3:37 ` Andi Kleen 2006-08-02 4:22 ` Dave Jones 2006-08-02 4:35 ` Andi Kleen 2006-08-02 4:46 ` Dave Jones 2006-08-02 5:05 ` David Miller 2006-08-02 5:31 ` David Miller 2006-08-02 22:23 ` Dave Jones 2006-08-02 22:49 ` David Miller 2006-08-03 17:40 ` Alan Cox 2006-08-03 17:56 ` Dave Jones 2006-08-03 20:48 ` Alan Cox 2006-08-02 23:14 ` Alan Cox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox