netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Kernel crash in 2.6.0-test9-mm3
@ 2003-11-19  2:22 Krishna Kumar
  2003-11-19  2:24 ` David S. Miller
  2003-11-19  2:58 ` Reuben Farrelly
  0 siblings, 2 replies; 12+ messages in thread
From: Krishna Kumar @ 2003-11-19  2:22 UTC (permalink / raw)
  To: Reuben Farrelly; +Cc: Andrew Morton, David S. Miller, netdev





Could this be happening on an SMP system only ? If so, e100intr routine
services tx queues (e100_tx_srv)
without holding a lock. Can't multiple rx interrupts be scheduled on
different cpus at the same time, and
each execute dev_kfree_skb_irq() which decrements the ref count too many
times ? But the softirq handler
(net_tx_action) seems to clean up the skb once as the dec_test returns 1
only if count is zero, so I don't see
where the dst ref is being decremented wrongly in this case.

Can someone explain why the intr handler doesn't need locks to stop other
intr on different cpu's from going
through the same devices memory at the same time ?

Thanks,

- KK



|---------+---------------------------->
|         |           Reuben Farrelly  |
|         |           <reuben-linux@reu|
|         |           b.net>           |
|         |           Sent by:         |
|         |           netdev-bounce@oss|
|         |           .sgi.com         |
|         |                            |
|         |                            |
|         |           11/18/2003 05:22 |
|         |           PM               |
|         |                            |
|---------+---------------------------->
  >-----------------------------------------------------------------------------------------------------------------|
  |                                                                                                                 |
  |       To:       "David S. Miller" <davem@redhat.com>, Andrew Morton <akpm@osdl.org>                             |
  |       cc:       netdev@oss.sgi.com                                                                              |
  |       Subject:  Re: Kernel crash in 2.6.0-test9-mm3                                                             |
  |                                                                                                                 |
  >-----------------------------------------------------------------------------------------------------------------|




FWIW I'm compiling with:

[root@tornado log]# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.3.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--disable-checking --with-system-zlib --enable-__cxa_atexit
--host=i386-redhat-linux
Thread model: posix
gcc version 3.3.2 20031107 (Red Hat Linux 3.3.2-2)
[root@tornado log]#

Reuben


At 13:49 19/11/2003, David S. Miller wrote:
>On Tue, 18 Nov 2003 11:01:39 -0800
>Andrew Morton <akpm@osdl.org> wrote:
>
> > It's one for the networking guys.
> >
> > The mm kernels have a patch which detects when atomic_dec_and_test
> > takes an atomic_t negative - it is assumed that this is a bug so
> > a warning is generated.
>
>Andrew I've analyzed this a bit.  This is incredible evidence in
>these dumps that either there is a bug in Linus's atomic_dec_and_test()
>debugging hack or GCC is miscompiling it in certain cases with certain
>versions of the compiler.
>
>Look at this:
>
> > > Nov 18 23:09:00 tornado kernel:  [<c029203c>]
> skb_release_data+0x14c/0x160
> > > Nov 18 23:09:00 tornado kernel:  [<c0292063>] kfree_skbmem+0x13/0x30
> > > Nov 18 23:09:00 tornado kernel:  [<c0292138>] __kfree_skb+0xb8/0x1b0
> > > Nov 18 23:09:00 tornado kernel:  [<c0218815>] e100intr+0x1e5/0x290
>
>Ok, releasing an SKB data area twice.
>
> > > Nov 18 23:09:00 tornado kernel: BUG: dst underflow 0: c02921ef
>
>Freeing a 'dst' entry one too many times.
>
> > > Nov 18 23:09:00 tornado kernel: Attempt to release alive inet socket
> dfd4c780
>
>A socket refcount dropping to zero too early, before it's marked dead.
>
>These last two problems are very serious errors, and would have
>printed out debugging messages before the atomic_dec_and_test() patch.
>If these last two messages don't show up without the
>atomic_dec_and_test() debugging patch applied, well there you
>go... :-)
>
>In that debugging patch, I'm wondering something about x86.
>When one goes "sete %reg; sets %reg" does the first 'sete' modify
>the condition codes by chance?  Probably not...

^ permalink raw reply	[flat|nested] 12+ messages in thread
* RE: Kernel crash in 2.6.0-test9-mm3
@ 2003-11-20  2:40 Feldman, Scott
  2003-11-23 20:29 ` Rask Ingemann Lambertsen
  0 siblings, 1 reply; 12+ messages in thread
From: Feldman, Scott @ 2003-11-20  2:40 UTC (permalink / raw)
  To: David S. Miller, Krishna Kumar; +Cc: reuben-linux, akpm, netdev

> However, with things like IOAPIC and such, it might be 
> possible for two cpus to enter e100intr() simultaneously, 
> both read the same status, both see that the interrupt is 
> pending, and both thus process the interrupt and race with each other.
> 
> Scott, what prevents the above from happening?

Whoa, this question is freaking me out just a little bit: my assumption
is that the device's interrupt line has been masked off at the CPU/PIC
before e100intr() is ever called, so 1) there really isn't any need to
disable device's interrupts from the driver (see eepro100.c), 2) or even
hold a lock unless we shared something critical on the queuing side (see
e1000), and 3) only one e100intr is running.  [public spanking in
order?]

I'm not sure what's behind the rest of the bug report, but if you're
saying e100intr() can be running simultaneously on two different CPUs,
then there is a problem because the test for device interrupt and the
acking of device interrupt are two steps that need to be protected with
a lock.

-scott

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-11-23 20:29 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <6.0.1.1.2.20031118232152.01ae5728@tornado.reub.net>
2003-11-18 19:01 ` Kernel crash in 2.6.0-test9-mm3 Andrew Morton
2003-11-18 22:22   ` David S. Miller
2003-11-19  0:49   ` David S. Miller
2003-11-19  1:22     ` Reuben Farrelly
2003-11-19  2:02     ` Andrew Morton
2003-11-19  2:22 Krishna Kumar
2003-11-19  2:24 ` David S. Miller
2003-11-19  2:58 ` Reuben Farrelly
     [not found]   ` <20031119185157.3edf69c8.davem@redhat.com>
2003-11-20  3:05     ` Reuben Farrelly
     [not found]       ` <20031119190258.4d926957.davem@redhat.com>
2003-11-20  7:30         ` Reuben Farrelly
  -- strict thread matches above, loose matches on Subject: below --
2003-11-20  2:40 Feldman, Scott
2003-11-23 20:29 ` Rask Ingemann Lambertsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).