netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel panic removing devices from a teql queuing discipline
@ 2007-10-29 18:00 Chuck Ebbert
  2007-10-30  8:33 ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Chuck Ebbert @ 2007-10-29 18:00 UTC (permalink / raw)
  To: Netdev

https://bugzilla.redhat.com/show_bug.cgi?id=219488

Still happening in 2.6.22.9:

BUG: unable to handle kernel paging request at virtual address 66696674
 printing eip:
d098d4de
*pde = 00000000
Oops: 0000 [#1]
SMP 
last sysfs file: /class/net/lo/ifindex
Modules linked in: sch_teql netconsole autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 dm_multipath video sbs i2c_ec button battery asus_acpi ac parport_pc lp parport floppy i2c_piix4 pcspkr i2c_core pcnet32 mii serio_raw ide_cd cdrom dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
CPU:    0
EIP:    0060:[<d098d4de>]    Not tainted VLI
EFLAGS: 00010202   (2.6.18-1.2849.fc6 #1) 
EIP is at teql_master_xmit+0xdc/0x3aa [sch_teql]
eax: c06a82c0   ebx: cde25c80   ecx: 00000000   edx: c06ad680
esi: c12f8e00   edi: cfc73800   ebp: 66696670   esp: ca6c8bb8
ds: 007b   es: 007b   ss: 0068
Process ping (pid: 2275, ti=ca6c8000 task=cc7bd400 task.ti=ca6c8000)
Stack: cde25174 00000000 000004cc ca6f0800 c12f8e00 000004cc cc7f5280 ca6f0c00 
       cc7f5280 cc7f5280 00000000 00000000 00000000 c06a82c0 00000000 ca6f0800 
       c12f8e00 c12f8e00 c0823e08 c05b9606 ca6c8c20 00000000 c12f8e00 ca6f0800 
Call Trace:
 [<c05b9606>] dev_hard_start_xmit+0x1b9/0x218
 [<c05c72e1>] __qdisc_run+0xde/0x19b
 [<c05baeea>] dev_queue_xmit+0x147/0x265
 [<c05d8a0c>] ip_output+0x1df/0x20b
 [<c05d63bd>] ip_push_pending_frames+0x301/0x3c3
 [<c05ef0a6>] raw_sendmsg+0x62e/0x6f0
 [<c05f5913>] inet_sendmsg+0x3b/0x45
 [<c05af6a6>] sock_sendmsg+0xd0/0xeb
 [<c05afec9>] sys_sendmsg+0x192/0x1f7
 [<c05b1427>] sys_socketcall+0x240/0x261
 [<c0404013>] syscall_call+0x7/0xb

The panic is in __teql_resolve (which has been inlined into teql_master_xmit) in
net/sched/sch_teql.c at this line:

	if (n && n->tbl == mn->tbl &&

Specifically the dereference of n->tbl is faulting as n is not valid.

And the address looks like part of an ASCCI string...  "figt"

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel panic removing devices from a teql queuing discipline
  2007-10-29 18:00 kernel panic removing devices from a teql queuing discipline Chuck Ebbert
@ 2007-10-30  8:33 ` David Miller
  2007-11-05 20:08   ` Evgeniy Polyakov
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2007-10-30  8:33 UTC (permalink / raw)
  To: cebbert; +Cc: netdev

From: Chuck Ebbert <cebbert@redhat.com>
Date: Mon, 29 Oct 2007 14:00:01 -0400

> The panic is in __teql_resolve (which has been inlined into teql_master_xmit) in
> net/sched/sch_teql.c at this line:
> 
> 	if (n && n->tbl == mn->tbl &&
> 
> Specifically the dereference of n->tbl is faulting as n is not valid.
> 
> And the address looks like part of an ASCCI string...  "figt"

I studied sch_teql.c a bit and I suspect that the slave list
management in teql_destroy() and teql_qdisc_init() might be
suspect.

If someone can take a closer look at this, I'd appreciate it.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel panic removing devices from a teql queuing discipline
  2007-10-30  8:33 ` David Miller
@ 2007-11-05 20:08   ` Evgeniy Polyakov
  2007-11-06 10:48     ` Evgeniy Polyakov
  0 siblings, 1 reply; 5+ messages in thread
From: Evgeniy Polyakov @ 2007-11-05 20:08 UTC (permalink / raw)
  To: David Miller; +Cc: cebbert, netdev

On Tue, Oct 30, 2007 at 01:33:41AM -0700, David Miller (davem@davemloft.net) wrote:
> > The panic is in __teql_resolve (which has been inlined into teql_master_xmit) in
> > net/sched/sch_teql.c at this line:
> > 
> > 	if (n && n->tbl == mn->tbl &&
> > 
> > Specifically the dereference of n->tbl is faulting as n is not valid.

n is never valid (null), mn is garbage.

> > And the address looks like part of an ASCCI string...  "figt"
> 
> I studied sch_teql.c a bit and I suspect that the slave list
> management in teql_destroy() and teql_qdisc_init() might be
> suspect.

tecl_reset() is called from deactivate and qdisc is set to noop already,
but subsequent teql_xmit does not know about it and dereference private
data as teql qdisc and thus oopses. I will fix it tomorrow if you will
not catch it first :)

-- 
	Evgeniy Polyakov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: kernel panic removing devices from a teql queuing discipline
  2007-11-05 20:08   ` Evgeniy Polyakov
@ 2007-11-06 10:48     ` Evgeniy Polyakov
  2007-11-06 11:08       ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Evgeniy Polyakov @ 2007-11-06 10:48 UTC (permalink / raw)
  To: David Miller; +Cc: cebbert, netdev

On Mon, Nov 05, 2007 at 11:08:00PM +0300, Evgeniy Polyakov (johnpol@2ka.mipt.ru) wrote:
> On Tue, Oct 30, 2007 at 01:33:41AM -0700, David Miller (davem@davemloft.net) wrote:
> > > The panic is in __teql_resolve (which has been inlined into teql_master_xmit) in
> > > net/sched/sch_teql.c at this line:
> > > 
> > > 	if (n && n->tbl == mn->tbl &&
> > > 
> > > Specifically the dereference of n->tbl is faulting as n is not valid.
> 
> n is never valid (null), mn is garbage.

My fault, of course you are right, n is invalid because it is
dereferenced from qdisc, which was changed. That was too late in Moscow 
for conclusions...

> > > And the address looks like part of an ASCCI string...  "figt"
> > 
> > I studied sch_teql.c a bit and I suspect that the slave list
> > management in teql_destroy() and teql_qdisc_init() might be
> > suspect.
> 
> tecl_reset() is called from deactivate and qdisc is set to noop already,
> but subsequent teql_xmit does not know about it and dereference private
> data as teql qdisc and thus oopses. I will fix it tomorrow if you will
> not catch it first :)

It looks like I am.
Tested, works, fixed.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index f05ad9a..e0a44b9 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -263,6 +276,9 @@ __teql_resolve(struct sk_buff *skb, struct sk_buff *skb_res, struct net_device *
 static __inline__ int
 teql_resolve(struct sk_buff *skb, struct sk_buff *skb_res, struct net_device *dev)
 {
+	if (dev->qdisc == &noop_qdisc)
+		return -ENODEV;
+
 	if (dev->hard_header == NULL ||
 	    skb->dst == NULL ||
 	    skb->dst->neighbour == NULL)

-- 
	Evgeniy Polyakov

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: kernel panic removing devices from a teql queuing discipline
  2007-11-06 10:48     ` Evgeniy Polyakov
@ 2007-11-06 11:08       ` David Miller
  0 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2007-11-06 11:08 UTC (permalink / raw)
  To: johnpol; +Cc: cebbert, netdev

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Tue, 6 Nov 2007 13:48:55 +0300

> Tested, works, fixed.
> 
> Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>

Applied, thanks a lot Evgeniy!

I'll queue this up for -stable too.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-11-06 11:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-29 18:00 kernel panic removing devices from a teql queuing discipline Chuck Ebbert
2007-10-30  8:33 ` David Miller
2007-11-05 20:08   ` Evgeniy Polyakov
2007-11-06 10:48     ` Evgeniy Polyakov
2007-11-06 11:08       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).