netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops in filter add
@ 2007-03-19 20:10 Chris Madden
  2007-03-20  2:22 ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Madden @ 2007-03-19 20:10 UTC (permalink / raw)
  To: netdev; +Cc: jamal, tgraf, davem

Hi-

We're experiencing an oops when we attempt to add a filter to an ingress
qdisc while heavy traffic is passing through the device being
manipulated.    Currently, we're using 2.6.20.3.  Here is the setup:

We have traffic coming in a device, say, eth1.  While traffic comes in,
we use tc to add an ingress queuing discipline with "tc qdisc add dev
eth1 handle ffff: ingress".  This works, and a "tc qdisc show" confirms
that it is in place.

Its hard to boil down exactly where things are going wrong, but it seems
to center around adding a basic match filter to the ingress path we
created above.  

I did some digging, and it appears the filter add isn't mutexed right.  
Inside net/core/dev.c, ing_filter, I see:

        spin_lock(&dev->ingress_lock);
        if ((q = dev->qdisc_ingress) != NULL)
            result = q->enqueue(skb, q);
        spin_unlock(&dev->ingress_lock);

And unless I'm missing something, this is the only place this lock is
used ( other than initialization ).  In net/sched/cls_api.c, I see we do
qdisc_lock_tree/qdisc_unlock_tree (which locks dev->queue_lock).  As
near as I can tell, this is our problem ( our mutexes don't prohibit
manipulation while packets are flowing ).

It doesn't happen every time, but if we have a script that adds/removes
qdiscs and filters, and run a few hundred megabit through the interface,
it dies eventually.  I've been trying to puzzle out the locking myself,
but I fear I've been less than successful.
 
Oops below:

 [<c0220681>] tc_classify+0x34/0xbc
 [<f8a4d10b>] ingress_enqueue+0x16/0x55 [sch_ingress]
 [<c021451e>] netif_receive_skb+0x1de/0x2bf
 [<f88a6169>] e1000_clean_rx_irq+0x35b/0x42c [e1000]
 [<f88a5e0e>] e1000_clean_rx_irq+0x0/0x42c [e1000]
 [<f88a519b>] e1000_clean+0x6e/0x23d [e1000]
 [<c0215ed7>] net_rx_action+0xd5/0x1c6
 [<c011cafc>] __do_softirq+0x5d/0xba
 [<c0104c7f>] do_softirq+0x59/0xa9
 [<f8a50194>] basic_dump+0x0/0x115 [cls_basic]
 [<c022458e>] tcf_em_tree_dump+0x14d/0x2ff
 [<c01341db>] handle_fasteoi_irq+0x0/0xa0
 [<c0104d70>] do_IRQ+0xa1/0xb9
 [<c010367f>] common_interrupt+0x23/0x28
 [<f8a50497>] basic_change+0x167/0x370 [cls_basic]
 [<c02220df>] tc_ctl_tfilter+0x3ec/0x469
 [<c0221cf3>] tc_ctl_tfilter+0x0/0x469
 [<c021b250>] rtnetlink_rcv_msg+0x1b3/0x1d8
 [<c0221378>] tc_dump_qdisc+0x0/0xfe
 [<c02259cd>] netlink_run_queue+0x50/0xbe
 [<c021b09d>] rtnetlink_rcv_msg+0x0/0x1d8
 [<c021b05c>] rtnetlink_rcv+0x25/0x3d
 [<c0225e14>] netlink_data_ready+0x12/0x4c
 [<c0224e0e>] netlink_sendskb+0x19/0x30
 [<c0225df6>] netlink_sendmsg+0x242/0x24e
 [<c020ba1f>] sock_sendmsg+0xbc/0xd4
 [<c0128325>] autoremove_wake_function+0x0/0x35
 [<c0128325>] autoremove_wake_function+0x0/0x35
 [<c021180e>] verify_iovec+0x3e/0x70
 [<c020bbcb>] sys_sendmsg+0x194/0x1f9
 [<c0112bde>] __wake_up+0x32/0x43
 [<c022506a>] netlink_insert+0x106/0x110
 [<c022513b>] netlink_autobind+0xc7/0xe3
 [<c0226356>] netlink_bind+0x8d/0x127
 [<c013f61b>] do_wp_page+0x149/0x36c
 [<c015d17b>] d_alloc+0x138/0x17a
 [<c0140a24>] __handle_mm_fault+0x756/0x7a6
 [<c020ca68>] sys_socketcall+0x223/0x242
 [<c0263d8b>] do_page_fault+0x0/0x525
 [<c0102ce4>] syscall_call+0x7/0xb

Chris Madden
Reflex Security

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-19 20:10 Oops in filter add Chris Madden
@ 2007-03-20  2:22 ` David Miller
  2007-03-20  6:54   ` jamal
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2007-03-20  2:22 UTC (permalink / raw)
  To: chris; +Cc: netdev, hadi, tgraf

From: Chris Madden <chris@reflexsecurity.com>
Date: Mon, 19 Mar 2007 16:10:29 -0400

> I did some digging, and it appears the filter add isn't mutexed right.  
> Inside net/core/dev.c, ing_filter, I see:
> 
>         spin_lock(&dev->ingress_lock);
>         if ((q = dev->qdisc_ingress) != NULL)
>             result = q->enqueue(skb, q);
>         spin_unlock(&dev->ingress_lock);
> 
> And unless I'm missing something, this is the only place this lock is
> used ( other than initialization ).  In net/sched/cls_api.c, I see we do
> qdisc_lock_tree/qdisc_unlock_tree (which locks dev->queue_lock).  As
> near as I can tell, this is our problem ( our mutexes don't prohibit
> manipulation while packets are flowing ).

I think this should use dev->queue_lock.

It looks like the idea might have been to allow more parallelized
running of ingress filters, but this is done wrong and leads to
the crashes you are seeing.

Can you just replace the above with dev->queue_lock and see if
that makes your problem go away?  THanks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  2:22 ` David Miller
@ 2007-03-20  6:54   ` jamal
  2007-03-20  6:58     ` Patrick McHardy
  2007-03-20 14:15     ` Chris Madden
  0 siblings, 2 replies; 18+ messages in thread
From: jamal @ 2007-03-20  6:54 UTC (permalink / raw)
  To: David Miller; +Cc: chris, netdev, tgraf

On Mon, 2007-19-03 at 19:22 -0700, David Miller wrote:

> 
> I think this should use dev->queue_lock.
> 

It would slow down things if he is doing both ingress and egress
traffic as well as control changes.

> It looks like the idea might have been to allow more parallelized
> running of ingress filters, but this is done wrong and leads to
> the crashes you are seeing.

The main idea is to avoid one BigLock for both ingress and egress;
Which was/is still useful in the compat mode where netfilter is used
instead. 

> Can you just replace the above with dev->queue_lock and see if
> that makes your problem go away?  THanks.

It should; 
i will stare at the code later and see if i can send a better patch,
maybe a read_lock(qdisc_tree_lock). Chris, if you can experiment by just
locking against filters instead, that would be nicer for the reasons i
described above.

cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  6:54   ` jamal
@ 2007-03-20  6:58     ` Patrick McHardy
  2007-03-20  7:18       ` jamal
  2007-03-20 14:15     ` Chris Madden
  1 sibling, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20  6:58 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, chris, netdev, tgraf

jamal wrote:
> On Mon, 2007-19-03 at 19:22 -0700, David Miller wrote:
> 
>>It looks like the idea might have been to allow more parallelized
>>running of ingress filters, but this is done wrong and leads to
>>the crashes you are seeing.
> 
> 
> The main idea is to avoid one BigLock for both ingress and egress;
> Which was/is still useful in the compat mode where netfilter is used
> instead. 


In that case is isn't even used.

>>Can you just replace the above with dev->queue_lock and see if
>>that makes your problem go away?  THanks.
> 
> 
> It should; 
> i will stare at the code later and see if i can send a better patch,
> maybe a read_lock(qdisc_tree_lock).


You would need to make qdisc_lock_tree() aware of the difference
between ingress and egress.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  6:58     ` Patrick McHardy
@ 2007-03-20  7:18       ` jamal
  2007-03-20  7:29         ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: jamal @ 2007-03-20  7:18 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, chris, netdev, tgraf

On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
> jamal wrote:

> > The main idea is to avoid one BigLock for both ingress and egress;
> > Which was/is still useful in the compat mode where netfilter is used
> > instead. 
> 
> 
> In that case is isn't even used.
> 

Ok. It certainly used to matter in the old days.

> You would need to make qdisc_lock_tree() aware of the difference
> between ingress and egress.

If you have the cycles please go ahead - I really dont have much time
today (have to present in a few hours and havent even started).
I will most certainly look in 1-2 days; for now the queue_lock should
suffice; 


cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  7:18       ` jamal
@ 2007-03-20  7:29         ` Patrick McHardy
  2007-03-20  9:25           ` jamal
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20  7:29 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, chris, netdev, tgraf

jamal wrote:
> On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
> 
> Ok. It certainly used to matter in the old days.


Actually it has never been used anywhere else but in ing_filter,
it was introduced together with the TC actions.

>>You would need to make qdisc_lock_tree() aware of the difference
>>between ingress and egress.
> 
> 
> If you have the cycles please go ahead - I really dont have much time
> today (have to present in a few hours and havent even started).
> I will most certainly look in 1-2 days; for now the queue_lock should
> suffice; 


I'll try, but no promises, I'm a bit behind with various things myself.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  7:29         ` Patrick McHardy
@ 2007-03-20  9:25           ` jamal
  2007-03-20 10:54             ` Chris Madden
  2007-03-20 10:58             ` Patrick McHardy
  0 siblings, 2 replies; 18+ messages in thread
From: jamal @ 2007-03-20  9:25 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, chris, netdev, tgraf

On Tue, 2007-20-03 at 08:29 +0100, Patrick McHardy wrote:
> jamal wrote:
> > On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
> > 
> > Ok. It certainly used to matter in the old days.
> 
> 
> Actually it has never been used anywhere else but in ing_filter,
> it was introduced together with the TC actions.
> 

You are correct. I looked at old 2.4 and all i saw was:

----------
/* 
 revisit later: Use a private since lock dev->queue_lock is also
 used on the egress (might slow things for an iota)
*/
 
         if (dev->qdisc_ingress) {
                 spin_lock(&dev->queue_lock);
                 if ((q = dev->qdisc_ingress) != NULL)
                         fwres = q->enqueue(skb, q);
                 spin_unlock(&dev->queue_lock);
         }
------

So the resolution (as Dave points out) was wrong. In any case, restoring
queue_lock for now would slow things but will remove the race.

> 
> I'll try, but no promises, I'm a bit behind with various things myself.

I will ping you in a few days and if you havent done anything i will
take it up.
I am almost tempted to make the ingress filters to not have any
dependencies on egress whatsoever. It will create more locks but
will make the datapath faster. Actions can still be shared, but thats
lesser of an overhead.

cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  9:25           ` jamal
@ 2007-03-20 10:54             ` Chris Madden
  2007-03-20 11:02               ` Patrick McHardy
  2007-03-20 10:58             ` Patrick McHardy
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Madden @ 2007-03-20 10:54 UTC (permalink / raw)
  To: hadi; +Cc: Patrick McHardy, David Miller, netdev, tgraf

Thanks for all your replies!

One thing I did notice in examining tc_ctl_tfilter was that there is
something like:

        qdisc_lock_tree(dev);
        tp->next = *back;
        *back = tp;
        qdisc_unlock_tree(dev);

And then proceed to the data structure down below with:

err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);

Simply reordering these seems to ameliorate the problem greatly.  I
don't know if this is a generic solution or something specific to the
basic filter only. 

Chris Madden

jamal wrote:
> On Tue, 2007-20-03 at 08:29 +0100, Patrick McHardy wrote:
>   
>> jamal wrote:
>>     
>>> On Tue, 2007-20-03 at 07:58 +0100, Patrick McHardy wrote:
>>>
>>> Ok. It certainly used to matter in the old days.
>>>       
>> Actually it has never been used anywhere else but in ing_filter,
>> it was introduced together with the TC actions.
>>
>>     
>
> You are correct. I looked at old 2.4 and all i saw was:
>
> ----------
> /* 
>  revisit later: Use a private since lock dev->queue_lock is also
>  used on the egress (might slow things for an iota)
> */
>  
>          if (dev->qdisc_ingress) {
>                  spin_lock(&dev->queue_lock);
>                  if ((q = dev->qdisc_ingress) != NULL)
>                          fwres = q->enqueue(skb, q);
>                  spin_unlock(&dev->queue_lock);
>          }
> ------
>
> So the resolution (as Dave points out) was wrong. In any case, restoring
> queue_lock for now would slow things but will remove the race.
>
>   
>> I'll try, but no promises, I'm a bit behind with various things myself.
>>     
>
> I will ping you in a few days and if you havent done anything i will
> take it up.
> I am almost tempted to make the ingress filters to not have any
> dependencies on egress whatsoever. It will create more locks but
> will make the datapath faster. Actions can still be shared, but thats
> lesser of an overhead.
>
> cheers,
> jamal
>
>   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  9:25           ` jamal
  2007-03-20 10:54             ` Chris Madden
@ 2007-03-20 10:58             ` Patrick McHardy
  2007-03-21  9:33               ` jamal
  1 sibling, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20 10:58 UTC (permalink / raw)
  To: hadi; +Cc: David Miller, chris, netdev, tgraf

jamal wrote:
> On Tue, 2007-20-03 at 08:29 +0100, Patrick McHardy wrote:
> 
>>Actually it has never been used anywhere else but in ing_filter,
>>it was introduced together with the TC actions.
>>
> 
> 
> You are correct. I looked at old 2.4 and all i saw was:
> 
> ----------
> /* 
>  revisit later: Use a private since lock dev->queue_lock is also
>  used on the egress (might slow things for an iota)
> */
>  
>          if (dev->qdisc_ingress) {
>                  spin_lock(&dev->queue_lock);
>                  if ((q = dev->qdisc_ingress) != NULL)
>                          fwres = q->enqueue(skb, q);
>                  spin_unlock(&dev->queue_lock);
>          }
> ------
> 
> So the resolution (as Dave points out) was wrong. In any case, restoring
> queue_lock for now would slow things but will remove the race.


Yes. I think thats what we should do for 2.6.21, since fixing
this while keeping ingress_lock is quite intrusive.


>>I'll try, but no promises, I'm a bit behind with various things myself.
> 
> 
> I will ping you in a few days and if you havent done anything i will
> take it up.
> I am almost tempted to make the ingress filters to not have any
> dependencies on egress whatsoever. It will create more locks but
> will make the datapath faster. Actions can still be shared, but thats
> lesser of an overhead.


I'm on it. I'm using the opportunity to try to simply the qdisc locking.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 10:54             ` Chris Madden
@ 2007-03-20 11:02               ` Patrick McHardy
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20 11:02 UTC (permalink / raw)
  To: Chris Madden; +Cc: hadi, David Miller, netdev, tgraf

Chris Madden wrote:
> Thanks for all your replies!
> 
> One thing I did notice in examining tc_ctl_tfilter was that there is
> something like:
> 
>         qdisc_lock_tree(dev);
>         tp->next = *back;
>         *back = tp;
>         qdisc_unlock_tree(dev);
> 
> And then proceed to the data structure down below with:
> 
> err = tp->ops->change(tp, cl, t->tcm_handle, tca, &fh);
> 
> Simply reordering these seems to ameliorate the problem greatly.  I
> don't know if this is a generic solution or something specific to the
> basic filter only. 


It might hide the problem, but there are currently a lot of places
where things can go wrong.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20  6:54   ` jamal
  2007-03-20  6:58     ` Patrick McHardy
@ 2007-03-20 14:15     ` Chris Madden
  2007-03-20 14:48       ` Patrick McHardy
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Madden @ 2007-03-20 14:15 UTC (permalink / raw)
  To: hadi; +Cc: netdev

jamal wrote:
> On Mon, 2007-19-03 at 19:22 -0700, David Miller wrote:
>> Can you just replace the above with dev->queue_lock and see if
>> that makes your problem go away?  THanks.
>>     
>
> It should; 
> i will stare at the code later and see if i can send a better patch,
> maybe a read_lock(qdisc_tree_lock). Chris, if you can experiment by just
> locking against filters instead, that would be nicer for the reasons i
> described above.
>
>   
Ok, I replaced ingress_lock with queue_lock in ing_filter and it died in
the same place.  Trace below (looks to be substantively the same)...

If I am reading tc_ctl_tfilter correctly, we are adding our new
tcf_proto to the end of the list, and it is getting used before the
change function is getting executed on it.   Locking the ingress_lock in
qdisc_lock_tree ( in addition to the extant queue_lock )seems to have
the same effect; I can get  a traceback from that if its useful.

BUG: unable to handle kernel NULL pointer dereference at virtual address
0000004
 printing
eip:                                                                 
f8a5f093                                                                       

*pde =
00000000                                                                
Oops: 0000
[#1]                                                                
SMP                                                                            

Modules linked in: af_packet cls_basic sch_ingress button ac battery
ts_match bn
CPU:   
0                                                                      
EIP:    0060:[<f8a5f093>]    Not tainted
VLI                                   
EFLAGS: 00010246   (2.6.20.3-x86
#1)                                           
EIP is at basic_classify+0xb/0x69
[cls_basic]                                  
eax: f3d57080   ebx: f3dd6f00   ecx: c0329f1c   edx:
f3dd6f00                  
esi: c0329f1c   edi: 00000000   ebp: f3d57080   esp:
c0329ee4                  
ds: 007b   es: 007b   ss:
0068                                                 
Process python (pid: 2503, ti=c0329000 task=f431e570
task.ti=f42ac000)         
Stack: f3dd6f00 f3d57080 f3dd6f00 00000008 c02206a1 00000286 f8aaad60
f3d57080 
       c0329f1c f3e606c0 f3d57080 00000000 df993000 f8a5c10b 00000080
c02108cd 
       df993000 f3d57080 c021454e df993000 00000040 f3db8780 00000008
00000000 
Call Trace:
 [<c02206a1>] tc_classify+0x34/0xbc                                      
 [<f8a5c10b>] ingress_enqueue+0x16/0x55 [sch_ingress]                    
 [<c02108cd>] __alloc_skb+0x53/0xfd                                      
 [<c021454e>] netif_receive_skb+0x1cd/0x2a6                              
 [<f88aa162>] e1000_clean_rx_irq+0x35b/0x42c [e1000]                     
 [<f88a9e07>] e1000_clean_rx_irq+0x0/0x42c [e1000]                       
 [<f88a9194>] e1000_clean+0x6e/0x23d [e1000]                             
 [<c011007b>] handle_vm86_fault+0x16/0x6f7                               
 [<c0215ef6>] net_rx_action+0xcc/0x1bc                                   
 [<c011cafc>] __do_softirq+0x5d/0xba                                     
 [<c0104c7f>] do_softirq+0x59/0xa9                                       
 [<c01341e3>] handle_fasteoi_irq+0x0/0xa0                                
 [<c0104d70>] do_IRQ+0xa1/0xb9                                           
 [<c010367f>] common_interrupt+0x23/0x28                                 
 [<c014c39a>] kmem_cache_zalloc+0x33/0x5c                                
 [<f8a5f4ac>] basic_change+0x17c/0x370 [cls_basic]                       
 [<c02220ff>] tc_ctl_tfilter+0x3ec/0x469                                 
 [<c0221d13>] tc_ctl_tfilter+0x0/0x469                                   
 [<c021b270>] rtnetlink_rcv_msg+0x1b3/0x1d8                              
 [<c0221398>] tc_dump_qdisc+0x0/0xfe                                     
 [<c02259ed>] netlink_run_queue+0x50/0xbe                                
 [<c021b0bd>] rtnetlink_rcv_msg+0x0/0x1d8                                
 [<c021b07c>] rtnetlink_rcv+0x25/0x3d
 [<c0225e34>] netlink_data_ready+0x12/0x4c
 [<c0224e2e>] netlink_sendskb+0x19/0x30
 [<c0225e16>]
netlink_sendmsg+0x242/0x24e                                      
 [<c020ba5f>]
sock_sendmsg+0xbc/0xd4                                           
 [<c012832d>]
autoremove_wake_function+0x0/0x35                                
 [<c012832d>]
autoremove_wake_function+0x0/0x35                                
 [<c010367f>]
common_interrupt+0x23/0x28                                       
 [<c021184e>]
verify_iovec+0x3e/0x70                                           
 [<c020bc0b>]
sys_sendmsg+0x194/0x1f9                                          
 [<c0112bde>]
__wake_up+0x32/0x43                                              
 [<c022508a>]
netlink_insert+0x106/0x110                                       
 [<c022515b>]
netlink_autobind+0xc7/0xe3                                       
 [<c0226376>]
netlink_bind+0x8d/0x127                                          
 [<c013f623>]
do_wp_page+0x149/0x36c                                           
 [<c015d183>]
d_alloc+0x138/0x17a                                              
 [<c0140a2c>]
__handle_mm_fault+0x756/0x7a6                                    
 [<c020caa8>]
sys_socketcall+0x223/0x242                                       
 [<c0263dc3>]
do_page_fault+0x0/0x525                                          
 [<c0102ce4>]
syscall_call+0x7/0xb                                             
 =======================                                                       

Code: 1a ff 43 08 8b 76 18 83 ee 18 8b 46 18 0f 18 00 90 8d 56 18 8d 47
04 39 c
EIP: [<f8a5f093>] basic_classify+0xb/0x69 [cls_basic] SS:ESP
0068:c0329ee4     
 <0>Kernel panic - not syncing: Fatal exception in
interrupt                   


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 14:15     ` Chris Madden
@ 2007-03-20 14:48       ` Patrick McHardy
  2007-03-20 14:57         ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20 14:48 UTC (permalink / raw)
  To: Chris Madden; +Cc: hadi, netdev

Chris Madden wrote:
> Ok, I replaced ingress_lock with queue_lock in ing_filter and it died in
> the same place.  Trace below (looks to be substantively the same)...
> 
> If I am reading tc_ctl_tfilter correctly, we are adding our new
> tcf_proto to the end of the list, and it is getting used before the
> change function is getting executed on it.   Locking the ingress_lock in
> qdisc_lock_tree ( in addition to the extant queue_lock )seems to have
> the same effect; I can get  a traceback from that if its useful.


The problem is that some classifiers (like basic, fw and route)
implement empty ->init functions, although fw and route properly
deal with it in their classification functions. The ->init function
should allocate and initalize tp->root, which basic fails to do.
If you give me a few hours I'll cook up a patch for this.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 14:48       ` Patrick McHardy
@ 2007-03-20 14:57         ` Patrick McHardy
  2007-03-20 15:11           ` Thomas Graf
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20 14:57 UTC (permalink / raw)
  To: Chris Madden; +Cc: hadi, netdev

[-- Attachment #1: Type: text/plain, Size: 1032 bytes --]

Patrick McHardy wrote:
> Chris Madden wrote:
> 
>>Ok, I replaced ingress_lock with queue_lock in ing_filter and it died in
>>the same place.  Trace below (looks to be substantively the same)...
>>
>>If I am reading tc_ctl_tfilter correctly, we are adding our new
>>tcf_proto to the end of the list, and it is getting used before the
>>change function is getting executed on it.   Locking the ingress_lock in
>>qdisc_lock_tree ( in addition to the extant queue_lock )seems to have
>>the same effect; I can get  a traceback from that if its useful.
> 
> 
> 
> The problem is that some classifiers (like basic, fw and route)
> implement empty ->init functions, although fw and route properly
> deal with it in their classification functions. The ->init function
> should allocate and initalize tp->root, which basic fails to do.
> If you give me a few hours I'll cook up a patch for this.


Actually its only cls_basic thats broken, in case of route and fw
its intentional for backwards-compatibility.

Can you try this patch please?


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 822 bytes --]

diff --git a/net/sched/cls_basic.c b/net/sched/cls_basic.c
index fad08e5..ed788d0 100644
--- a/net/sched/cls_basic.c
+++ b/net/sched/cls_basic.c
@@ -81,6 +81,13 @@ static void basic_put(struct tcf_proto *
 
 static int basic_init(struct tcf_proto *tp)
 {
+	struct basic_head *head;
+
+	head = kzalloc(sizeof(*head), GFP_KERNEL);
+	if (head == NULL)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&head->flist);
+	tp->root = head;
 	return 0;
 }
 
@@ -175,16 +182,6 @@ static int basic_change(struct tcf_proto
 		return basic_set_parms(tp, f, base, tb, tca[TCA_RATE-1]);
 	}
 
-	err = -ENOBUFS;
-	if (head == NULL) {
-		head = kzalloc(sizeof(*head), GFP_KERNEL);
-		if (head == NULL)
-			goto errout;
-
-		INIT_LIST_HEAD(&head->flist);
-		tp->root = head;
-	}
-
 	f = kzalloc(sizeof(*f), GFP_KERNEL);
 	if (f == NULL)
 		goto errout;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 14:57         ` Patrick McHardy
@ 2007-03-20 15:11           ` Thomas Graf
  2007-03-20 15:13             ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: Thomas Graf @ 2007-03-20 15:11 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Chris Madden, hadi, netdev

* Patrick McHardy <kaber@trash.net> 2007-03-20 15:57
> Actually its only cls_basic thats broken, in case of route and fw
> its intentional for backwards-compatibility.

Absolutely right, thanks Patrick.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 15:11           ` Thomas Graf
@ 2007-03-20 15:13             ` Patrick McHardy
  2007-03-20 16:27               ` Chris Madden
  0 siblings, 1 reply; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20 15:13 UTC (permalink / raw)
  To: Thomas Graf; +Cc: Chris Madden, hadi, netdev

Thomas Graf wrote:
> * Patrick McHardy <kaber@trash.net> 2007-03-20 15:57
> 
>>Actually its only cls_basic thats broken, in case of route and fw
>>its intentional for backwards-compatibility.
> 
> 
> Absolutely right, thanks Patrick.


There was a small bug in my patch (broken return value on memory
allocation failure, not relevant for testing though). I'll push
the fixed patch to Dave once Chris confirms that it fixes the
problem he's seeing.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 15:13             ` Patrick McHardy
@ 2007-03-20 16:27               ` Chris Madden
  2007-03-20 17:06                 ` Patrick McHardy
  0 siblings, 1 reply; 18+ messages in thread
From: Chris Madden @ 2007-03-20 16:27 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Thomas Graf, hadi, netdev

Patrick McHardy wrote:
>
> There was a small bug in my patch (broken return value on memory
> allocation failure, not relevant for testing though). I'll push
> the fixed patch to Dave once Chris confirms that it fixes the
> problem he's seeing.
>   
Looks like that may have done it.  I ramped up the traffic on it and did
thousands of inserts/removals and the box seems happy.

I had ing_filter still locking dev->ingress_lock ( instead of the
queue_lock suggestion earlier ).   So while that race may still be in
there, I haven't been able to hit the locking problem.

chris

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 16:27               ` Chris Madden
@ 2007-03-20 17:06                 ` Patrick McHardy
  0 siblings, 0 replies; 18+ messages in thread
From: Patrick McHardy @ 2007-03-20 17:06 UTC (permalink / raw)
  To: Chris Madden; +Cc: Thomas Graf, hadi, netdev

Chris Madden wrote:
> Patrick McHardy wrote:
> 
>>There was a small bug in my patch (broken return value on memory
>>allocation failure, not relevant for testing though). I'll push
>>the fixed patch to Dave once Chris confirms that it fixes the
>>problem he's seeing.
>>  
> 
> Looks like that may have done it.  I ramped up the traffic on it and did
> thousands of inserts/removals and the box seems happy.


Thanks Chris.

> I had ing_filter still locking dev->ingress_lock ( instead of the
> queue_lock suggestion earlier ).   So while that race may still be in
> there, I haven't been able to hit the locking problem.


I'll push a patch for this as well, we can hopefully go back to using
ingress_lock in 2.6.22.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Oops in filter add
  2007-03-20 10:58             ` Patrick McHardy
@ 2007-03-21  9:33               ` jamal
  0 siblings, 0 replies; 18+ messages in thread
From: jamal @ 2007-03-21  9:33 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: David Miller, chris, netdev, tgraf

On Tue, 2007-20-03 at 11:58 +0100, Patrick McHardy wrote:
> jamal wrote:

> > So the resolution (as Dave points out) was wrong. In any case, restoring
> > queue_lock for now would slow things but will remove the race.
> 
> 
> Yes. I think thats what we should do for 2.6.21, since fixing
> this while keeping ingress_lock is quite intrusive.
> 

reasonable.

> I'm on it. I'm using the opportunity to try to simply the qdisc locking.

Ok, thanks Patrick. 
BTW, I was just staring at the code and i think i have found probably a
long standing minor bug on the holding of the tree lock. I will post
 a patch shortly if i dont get disrupted.

cheers,
jamal


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2007-03-21  9:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-03-19 20:10 Oops in filter add Chris Madden
2007-03-20  2:22 ` David Miller
2007-03-20  6:54   ` jamal
2007-03-20  6:58     ` Patrick McHardy
2007-03-20  7:18       ` jamal
2007-03-20  7:29         ` Patrick McHardy
2007-03-20  9:25           ` jamal
2007-03-20 10:54             ` Chris Madden
2007-03-20 11:02               ` Patrick McHardy
2007-03-20 10:58             ` Patrick McHardy
2007-03-21  9:33               ` jamal
2007-03-20 14:15     ` Chris Madden
2007-03-20 14:48       ` Patrick McHardy
2007-03-20 14:57         ` Patrick McHardy
2007-03-20 15:11           ` Thomas Graf
2007-03-20 15:13             ` Patrick McHardy
2007-03-20 16:27               ` Chris Madden
2007-03-20 17:06                 ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).