public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
       [not found]     ` <20091008.175703.83006470.davem@davemloft.net>
@ 2009-10-17 17:57       ` Anirban Sinha
  2009-10-18  2:35         ` Anirban Sinha
  0 siblings, 1 reply; 11+ messages in thread
From: Anirban Sinha @ 2009-10-17 17:57 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel, ani

On Thu, 8 Oct 2009, David Miller wrote:

> >> > We are noticing a kernel OOPS on 2.6.26 kernel when we issue the command
> >> > "clear ip bgp <bgp-peer-ip>" on Quagga BGP routing software.
> >>
> >> You will need to update your kernel, there have been many TCP
> >> MD5 bug fixes since 2.6.26
> >>
> >
> > Sigh ... wish that were that easy!
>
> Contact your vendor for support :-)

but we *are* the vendors for our distribution and even though I may not be
a networking guru, I have little bit knowledge in working my way through the
kernel code. I have traced down the cause of the BUG() and when I did a git
pull against Linus' tree, I see the same issue in teh git tip as well.

The BUG() is triggered from kernel/timer.c, line 1037 within the
function __run_timers(). I am reporting these line numbers from the latest git
tip.

What happens is that before and after the callback, the code grabs the preempt
count and catches unbalanced preempt_enable() and preempt_disable() calls from
within the callback function.  In this case, the callback function is
inet_twdr_hangman() as can be seen from this instrumented log:

[02:15:15.941981] Kernel panic - not syncing: <3>huh, entered ffffffff803fbd60
(inet_twdr_hangman+0x0/0xe0)with preempt_count 00000102, exited with 00000101?

Clearly there is an extra unbalanced preempt_enable() somewhere within the
callback function.

When I looked at the function, I see that in net/ipv4/inet_timewait_sock.c
line 215, the function calls schedule_work(). schedule_work() calls
queue_work() which in turn calls put_cpu() that ultimately does a
preempt_enable(). It is this unbalanced preempt_enable() that decriments the
preempt_count by one as can be seen from the above trace.

I suspect that workqueue related operatios are illegal from a timer callback
function. In that case, the above mentioned callback function needs to be
fixed.

Yes, I can't explain why others are also not seeing the same bug crash. We
don't have the luxury to pull in the latest and greatest kernel from the git
tree everytime an update is made and try it out. So I am unable to repo the
issue with the latest kernel. But if that means that this issue should be
ignored, then that is fine by me. We will fix our private kenrel with an
appropriate patch as we continue to investigate more.

Cheers,

Ani


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-17 17:57       ` Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled Anirban Sinha
@ 2009-10-18  2:35         ` Anirban Sinha
  2009-10-18 20:19           ` Anirban Sinha
  0 siblings, 1 reply; 11+ messages in thread
From: Anirban Sinha @ 2009-10-18  2:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: David Miller, netdev, Anirban Sinha



Once upon a time, like on 09-10-17 10:57 AM, Anirban Sinha wrote:
> On Thu, 8 Oct 2009, David Miller wrote:
> 
>>>>> We are noticing a kernel OOPS on 2.6.26 kernel when we issue the command
>>>>> "clear ip bgp <bgp-peer-ip>" on Quagga BGP routing software.

and btw, this is the crash (on mips) we are talking about:

# [23:10:35.108808] Kernel bug detected[#1]:
[23:10:35.112527] Cpu 0
[23:10:35.114676] $ 0   : 0000000000000000 0000000014001fe0
0000000000000066 0000000000000004
[23:10:35.122845] $ 4   : ffffffff80516c10 0000000014001fe0
ffffffff8050c010 0000000000000004
[23:10:35.131015] $ 8   : 0000000000000000 0000000000000041
ffffffff805142e8 0000000000000001
[23:10:35.139184] $12   : ffffffff80600000 ffffffff805f0000
0000000000000064 0000000000000190
[23:10:35.147354] $16   : 0000000000000102 ffffffff803afdf0
ffffffff80539040 ffffffff80600780
[23:10:35.155526] $20   : ffffffff80540000 0000000000200200
ffffffff804c0000 000000000000000a
[23:10:35.163695] $24   : a3d70a3d70a3d70b 8000000000000003
[23:10:35.171865] $28   : ffffffff8050c000 ffffffff8050fd90
9000000010030000 ffffffff801487a8
[23:10:35.180035] Hi    : 0000000000000000
[23:10:35.183819] Lo    : 0000000000000000
[23:10:35.187603] epc   : ffffffff801487a8 run_timer_softirq+0x198/0x258
Tainted: P
[23:10:35.196032] ra    : ffffffff801487a8 run_timer_softirq+0x198/0x258
[23:10:35.202395] Status: 14001fe3    KX SX UX KERNEL EXL IE
[23:10:35.207814] Cause : 00808024
[23:10:35.210911] PrId  : 01041100 (SiByte SB1A)
[23:10:35.215209] Modules linked in: xt_state ipt_REJECT iptable_filter
nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
ip_tables ebtable_filter ebtables bridge llc zeug_ipmcdrv(P) irqdisp(P)
zvirt(P) zeugmod(P) softdog
[23:10:35.236024] Process swapper (pid: 0, threadinfo=ffffffff8050c000,
task=ffffffff805142e8, tls=0000000000000000)
[23:10:35.246169] Stack : ffffffff8050fd90 ffffffff8050fd90
0000000014001fe0 ffffffff805ff3e0
[23:10:35.254166]         ffffffff806003c4 0000000000000001
ffffffff8053f650 ffffffff805706d0
[23:10:35.262337]         ffffffff80572020 ffffffff80142280
ffffffff806003c0 0000000000000000
[23:10:35.270507]         0000000014001fe0 000000000000c5b0
ffffffff8fefc520 ffffffff8feea52c
[23:10:35.278676]         0000000000000015 0000000000004460
0000000000000940 ffffffff8fe1bf00
[23:10:35.286846]         ffffffff8fffdab0 ffffffff80142410
0000000000000000 ffffffff80142778
[23:10:35.295017]         ffffffff80103d20 ffffffff80103d20
0000000000000000 0000000014001fe1
[23:10:35.303187]         0000000000040000 ffffffff8050c010
0000000000000000 a80000017f87c138
[23:10:35.311357]         0000000014001fe0 ffffffffffff00fe
0000000000000004 a80000017e7e0680
[23:10:35.319528]         0000000000000000 000000000000001d
ffffffff8050ffe0 0000000000001f00
[23:10:35.327696]         ...
[23:10:35.330536] Call Trace:
[23:10:35.333201] [<ffffffff801487a8>] run_timer_softirq+0x198/0x258
[23:10:35.339224] [<ffffffff80142280>] __do_softirq+0x198/0x288
[23:10:35.344812] [<ffffffff80142410>] do_softirq+0xa0/0xa8
[23:10:35.350057] [<ffffffff80142778>] irq_exit+0x70/0x88
[23:10:35.355131] [<ffffffff80103d20>] ret_from_irq+0x0/0x4
[23:10:35.360377] [<ffffffff801063f4>] cpu_idle+0x1c/0x88
[23:10:35.365455]
[23:10:35.367171]
[23:10:35.367174] Code: 0040382d  0c04ef4c  00000000 <0200000d> 0c10ee9c
0260202d  dfa60000  17a6ffe5  00000000
[23:10:35.378822] Kernel panic - not syncing: Fatal exception in
interrupt


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-18  2:35         ` Anirban Sinha
@ 2009-10-18 20:19           ` Anirban Sinha
  2009-10-19 12:13             ` Oleg Nesterov
  0 siblings, 1 reply; 11+ messages in thread
From: Anirban Sinha @ 2009-10-18 20:19 UTC (permalink / raw)
  To: linux-kernel, Oleg Nesterov; +Cc: David Miller, netdev, Anirban Sinha

Hi Oleg:

I have a question for you. The queue_work() routine which is called from schedule_work() does a put_cpu() which in turn does a enable_preempt(). Is this an attempt to trigger the scheduler? One of the side affects of this enable_preempt() is the crash that we see below. What is happening is that  a timer callback routine, in  this case inet_twdr_hangman(), tries a bunch of cleanup until a threshold is reached. If further cleanups needs to be done beyond the threshold, it queues a work function. Now when the timer callback is run in __run_timers(), the routine grabs the value of preempt_count before and after the callback function call. If the two counts do not match, it calls BUG() (line 1037 in kernel/timer.c). Is is it illegal to schedule a work function from within a timer callback? What would be a good solution? I have already posted in netdev but since workqueues and timers are general kernel infrastructure, I thought I might as well post the question in the main linux m
ailing list and to you.

Here's the output from my instrumented BUG() call:

[02:15:15.941981] Kernel panic - not syncing: <3>huh, entered ffffffff803fbd60
(inet_twdr_hangman+0x0/0xe0)with preempt_count 00000102, exited with 00000101?

 I was thinking of a hacky solution, to replace schedule_work() with schedule_delayed_work() just to get around the issue. But I am sure this is just too hacky and probably not the ideal solution ...

Cheers,

Ani


Once upon a time, like on 09-10-17 7:35 PM, Anirban Sinha wrote:
> 
> 
> Once upon a time, like on 09-10-17 10:57 AM, Anirban Sinha wrote:
>> On Thu, 8 Oct 2009, David Miller wrote:
>>
>>>>>> We are noticing a kernel OOPS on 2.6.26 kernel when we issue the command
>>>>>> "clear ip bgp <bgp-peer-ip>" on Quagga BGP routing software.
> 
> and btw, this is the crash (on mips) we are talking about:
> 
> # [23:10:35.108808] Kernel bug detected[#1]:
> [23:10:35.112527] Cpu 0
> [23:10:35.114676] $ 0   : 0000000000000000 0000000014001fe0
> 0000000000000066 0000000000000004
> [23:10:35.122845] $ 4   : ffffffff80516c10 0000000014001fe0
> ffffffff8050c010 0000000000000004
> [23:10:35.131015] $ 8   : 0000000000000000 0000000000000041
> ffffffff805142e8 0000000000000001
> [23:10:35.139184] $12   : ffffffff80600000 ffffffff805f0000
> 0000000000000064 0000000000000190
> [23:10:35.147354] $16   : 0000000000000102 ffffffff803afdf0
> ffffffff80539040 ffffffff80600780
> [23:10:35.155526] $20   : ffffffff80540000 0000000000200200
> ffffffff804c0000 000000000000000a
> [23:10:35.163695] $24   : a3d70a3d70a3d70b 8000000000000003
> [23:10:35.171865] $28   : ffffffff8050c000 ffffffff8050fd90
> 9000000010030000 ffffffff801487a8
> [23:10:35.180035] Hi    : 0000000000000000
> [23:10:35.183819] Lo    : 0000000000000000
> [23:10:35.187603] epc   : ffffffff801487a8 run_timer_softirq+0x198/0x258
> Tainted: P
> [23:10:35.196032] ra    : ffffffff801487a8 run_timer_softirq+0x198/0x258
> [23:10:35.202395] Status: 14001fe3    KX SX UX KERNEL EXL IE
> [23:10:35.207814] Cause : 00808024
> [23:10:35.210911] PrId  : 01041100 (SiByte SB1A)
> [23:10:35.215209] Modules linked in: xt_state ipt_REJECT iptable_filter
> nf_conntrack_ftp ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4
> ip_tables ebtable_filter ebtables bridge llc zeug_ipmcdrv(P) irqdisp(P)
> zvirt(P) zeugmod(P) softdog
> [23:10:35.236024] Process swapper (pid: 0, threadinfo=ffffffff8050c000,
> task=ffffffff805142e8, tls=0000000000000000)
> [23:10:35.246169] Stack : ffffffff8050fd90 ffffffff8050fd90
> 0000000014001fe0 ffffffff805ff3e0
> [23:10:35.254166]         ffffffff806003c4 0000000000000001
> ffffffff8053f650 ffffffff805706d0
> [23:10:35.262337]         ffffffff80572020 ffffffff80142280
> ffffffff806003c0 0000000000000000
> [23:10:35.270507]         0000000014001fe0 000000000000c5b0
> ffffffff8fefc520 ffffffff8feea52c
> [23:10:35.278676]         0000000000000015 0000000000004460
> 0000000000000940 ffffffff8fe1bf00
> [23:10:35.286846]         ffffffff8fffdab0 ffffffff80142410
> 0000000000000000 ffffffff80142778
> [23:10:35.295017]         ffffffff80103d20 ffffffff80103d20
> 0000000000000000 0000000014001fe1
> [23:10:35.303187]         0000000000040000 ffffffff8050c010
> 0000000000000000 a80000017f87c138
> [23:10:35.311357]         0000000014001fe0 ffffffffffff00fe
> 0000000000000004 a80000017e7e0680
> [23:10:35.319528]         0000000000000000 000000000000001d
> ffffffff8050ffe0 0000000000001f00
> [23:10:35.327696]         ...
> [23:10:35.330536] Call Trace:
> [23:10:35.333201] [<ffffffff801487a8>] run_timer_softirq+0x198/0x258
> [23:10:35.339224] [<ffffffff80142280>] __do_softirq+0x198/0x288
> [23:10:35.344812] [<ffffffff80142410>] do_softirq+0xa0/0xa8
> [23:10:35.350057] [<ffffffff80142778>] irq_exit+0x70/0x88
> [23:10:35.355131] [<ffffffff80103d20>] ret_from_irq+0x0/0x4
> [23:10:35.360377] [<ffffffff801063f4>] cpu_idle+0x1c/0x88
> [23:10:35.365455]
> [23:10:35.367171]
> [23:10:35.367174] Code: 0040382d  0c04ef4c  00000000 <0200000d> 0c10ee9c
> 0260202d  dfa60000  17a6ffe5  00000000
> [23:10:35.378822] Kernel panic - not syncing: Fatal exception in
> interrupt
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-18 20:19           ` Anirban Sinha
@ 2009-10-19 12:13             ` Oleg Nesterov
  2009-10-19 15:32               ` Anirban Sinha
  2009-10-20  0:56               ` Anirban Sinha
  0 siblings, 2 replies; 11+ messages in thread
From: Oleg Nesterov @ 2009-10-19 12:13 UTC (permalink / raw)
  To: Anirban Sinha; +Cc: linux-kernel, David Miller, netdev, Anirban Sinha

Hi Anirban,

On 10/18, Anirban Sinha wrote:
>
> I have a question for you. The queue_work() routine which is called from
> schedule_work() does a put_cpu() which in turn does a enable_preempt(). Is
> this an attempt to trigger the scheduler?

No. please note that queue_work() does get_cpu() + put_cpu() to protect
against cpu_down() in between.

This can trigger the scheduler of course, but everything should be OK.

> One of the side affects of
> this enable_preempt() is the crash that we see below. What is happening
> is that  a timer callback routine, in  this case inet_twdr_hangman(),
> tries a bunch of cleanup until a threshold is reached. If further cleanups
> needs to be done beyond the threshold, it queues a work function. Now when
> the timer callback is run in __run_timers(), the routine grabs the value
> of preempt_count before and after the callback function call. If the two
> counts do not match, it calls BUG() (line 1037 in kernel/timer.c).

Yes, but I can't see how queue_work() can be involved, it doesn't change
->preempt_count. Note again it does put after get.

> Is is
> it illegal to schedule a work function from within a timer callback?

Yes sure.

I'd suppose that this unbalance comes from inet_twdr_hangman() pathes.

Could you verify this?

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-19 12:13             ` Oleg Nesterov
@ 2009-10-19 15:32               ` Anirban Sinha
  2009-10-19 15:36                 ` Oleg Nesterov
  2009-10-20  0:56               ` Anirban Sinha
  1 sibling, 1 reply; 11+ messages in thread
From: Anirban Sinha @ 2009-10-19 15:32 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, David Miller, netdev, Anirban Sinha



Once upon a time, like on 09-10-19 5:13 AM, Oleg Nesterov wrote:
> Hi Anirban,
> 
> On 10/18, Anirban Sinha wrote:
>>
>> I have a question for you. The queue_work() routine which is called from
>> schedule_work() does a put_cpu() which in turn does a enable_preempt(). Is
>> this an attempt to trigger the scheduler?
> 
> No. please note that queue_work() does get_cpu() + put_cpu() to protect
> against cpu_down() in between.

grrr! Ah yes, my eyes failed me (or it saw what I wanted it to see :)). You do have a get_cpu() and put_cpu() together in the same code path. I guess I will have to keep looking at inet_twdr_hangman(). 

>> Is is
>> it illegal to schedule a work function from within a timer callback?
> 
> Yes sure.

hmm. may be in that case, that function needs to be re-written. 

> I'd suppose that this unbalance comes from inet_twdr_hangman() pathes.
> 
> Could you verify this?

I'll keep looking. Thanks for the help Oleg. 

Ani


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-19 15:32               ` Anirban Sinha
@ 2009-10-19 15:36                 ` Oleg Nesterov
  2009-10-19 16:01                   ` Anirban Sinha
  0 siblings, 1 reply; 11+ messages in thread
From: Oleg Nesterov @ 2009-10-19 15:36 UTC (permalink / raw)
  To: Anirban Sinha; +Cc: linux-kernel, David Miller, netdev, Anirban Sinha

On 10/19, Anirban Sinha wrote:
>
> Once upon a time, like on 09-10-19 5:13 AM, Oleg Nesterov wrote:
>
> >> Is is
> >> it illegal to schedule a work function from within a timer callback?
> >
> > Yes sure.
>
> hmm. may be in that case, that function needs to be re-written.

OOPS!!!! I misread your question, didn't notice "il" above...

I meant: yes sure it _is legal_ to schedule a work from within a timer
callback (in fact it is legal from any context).

Sorry for confusion.

Oleg.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-19 15:36                 ` Oleg Nesterov
@ 2009-10-19 16:01                   ` Anirban Sinha
  0 siblings, 0 replies; 11+ messages in thread
From: Anirban Sinha @ 2009-10-19 16:01 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: linux-kernel, David Miller, netdev, Anirban Sinha


> I meant: yes sure it _is legal_ to schedule a work from within a timer
> callback (in fact it is legal from any context).

ok, then that part of the function looks fine.

Ani

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-19 12:13             ` Oleg Nesterov
  2009-10-19 15:32               ` Anirban Sinha
@ 2009-10-20  0:56               ` Anirban Sinha
  2009-10-20  1:08                 ` [PATCH] " Anirban Sinha
  1 sibling, 1 reply; 11+ messages in thread
From: Anirban Sinha @ 2009-10-20  0:56 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Anirban Sinha, linux-kernel, David Miller, netdev


> I'd suppose that this unbalance comes from inet_twdr_hangman() pathes.
>
> Could you verify this?


Yes, I have now verified this. There is indeed an issue with one of the
functions called by inet_twdr_hangman(). The call sequence is:

inet_twdr_hangman() -> inet_twdr_do_twkill_work() -> inet_twsk_put() ->
twsk_destructor().

In this case, the destructor callback is tcp_twsk_destructor() (installed
from line 1208 in net/ipv4/tcp_ipv4.c and line 906 in net/ipv6/tcp_ipv6.c) .
Without the TCP_MD5SUM compiled in, the function is a no-op. However, with the MD5SUM
compiled in, it calls tcp_put_md5_sig_pool() (when keylen is non zero) which
does an unbalanced put_cpu(). I did a grep across the entire tree.
tcp_put_md5_sig_pool() is a matching function for tcp_get_md5_sig_pool() and
in all other TCP IPV4 cases, it is called from net/ipv4/tcp_ipv4.c from
functions tcp_v4_md5_hash_hdr() and


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-20  0:56               ` Anirban Sinha
@ 2009-10-20  1:08                 ` Anirban Sinha
  2009-10-20  1:13                   ` David Miller
  0 siblings, 1 reply; 11+ messages in thread
From: Anirban Sinha @ 2009-10-20  1:08 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Anirban Sinha, linux-kernel, David Miller, netdev



> I'd suppose that this unbalance comes from inet_twdr_hangman() pathes.
>
> Could you verify this?

Yes, I have now verified this. There is indeed an issue with one of the
functions called by inet_twdr_hangman(). The call sequence is:

inet_twdr_hangman() -> inet_twdr_do_twkill_work() -> inet_twsk_put() ->
twsk_destructor().

In this case, the destructor callback is tcp_twsk_destructor() (installed
from line 1208 in net/ipv4/tcp_ipv4.c and line 906 in net/ipv6/tcp_ipv6.c) .
Without the TCP_MD5SUM compiled in, the function is a no-op. However, with the MD5SUM
compiled in, it calls tcp_put_md5_sig_pool() (when keylen is non zero) which
does an unbalanced put_cpu(). I did a grep across the entire tree.
tcp_put_md5_sig_pool() is a matching function for tcp_get_md5_sig_pool() and
in all other TCP IPV4 cases, it is called from net/ipv4/tcp_ipv4.c from
functions tcp_v4_md5_hash_hdr() and tcp_v4_hash_skb() along with the matching
get()
function. So I would think that in tcp_twsk_destructor(), the call should be
replaced by  tcp_free_md5_sig_pool() instead.

Signed-of-by: Anirban Sinha <asinha@zeugmasystems.com>
---

net/ipv4/tcp_minisocks.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index e48c37d..dccc01e 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -363,7 +363,7 @@ void tcp_twsk_destructor(struct sock *sk)
 #ifdef CONFIG_TCP_MD5SIG
        struct tcp_timewait_sock *twsk = tcp_twsk(sk);
        if (twsk->tw_md5_keylen)
-               tcp_put_md5sig_pool();
+               tcp_free_md5sig_pool();
 #endif
 }








^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-20  1:08                 ` [PATCH] " Anirban Sinha
@ 2009-10-20  1:13                   ` David Miller
  2009-10-20  1:17                     ` Anirban Sinha
  0 siblings, 1 reply; 11+ messages in thread
From: David Miller @ 2009-10-20  1:13 UTC (permalink / raw)
  To: asinha; +Cc: oleg, ani, linux-kernel, netdev

From: Anirban Sinha <asinha@zeugmasystems.com>
Date: Mon, 19 Oct 2009 18:08:21 -0700 (PDT)

> @@ -363,7 +363,7 @@ void tcp_twsk_destructor(struct sock *sk)
>  #ifdef CONFIG_TCP_MD5SIG
>         struct tcp_timewait_sock *twsk = tcp_twsk(sk);
>         if (twsk->tw_md5_keylen)
> -               tcp_put_md5sig_pool();
> +               tcp_free_md5sig_pool();
>  #endif
>  }

This has been fixed in the tree for a month of so:

commit 657e9649e745b06675aa5063c84430986cdc3afa
Author: Robert Varga <nite@hq.alert.sk>
Date:   Tue Sep 15 23:49:21 2009 -0700

    tcp: fix CONFIG_TCP_MD5SIG + CONFIG_PREEMPT timer BUG()
    
    I have recently came across a preemption imbalance detected by:
    
    <4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101?
    <0>------------[ cut here ]------------
    <2>kernel BUG at /usr/src/linux/kernel/timer.c:664!
    <0>invalid opcode: 0000 [1] PREEMPT SMP
    
    with ffffffff80644630 being inet_twdr_hangman().
    
    This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a
    bit, so I looked at what might have caused it.
    
    One thing that struck me as strange is tcp_twsk_destructor(), as it
    calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the
    detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well,
    as far as I can tell.
    
    Signed-off-by: Robert Varga <nite@hq.alert.sk>
    Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index 045bcfd..624c3c9 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -363,7 +363,7 @@ void tcp_twsk_destructor(struct sock *sk)
 #ifdef CONFIG_TCP_MD5SIG
 	struct tcp_timewait_sock *twsk = tcp_twsk(sk);
 	if (twsk->tw_md5_keylen)
-		tcp_put_md5sig_pool();
+		tcp_free_md5sig_pool();
 #endif
 }
 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Re: Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled
  2009-10-20  1:13                   ` David Miller
@ 2009-10-20  1:17                     ` Anirban Sinha
  0 siblings, 0 replies; 11+ messages in thread
From: Anirban Sinha @ 2009-10-20  1:17 UTC (permalink / raw)
  To: David Miller; +Cc: oleg, ani, linux-kernel, netdev


> This has been fixed in the tree for a month of so:

Grrrr! Time for me to do a git pull again. The kernel source tree in my work
machine must be out of date by about the same time.


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-10-20  1:17 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.64.0910081514050.18315@sleet.zeugmasystems.local>
     [not found] ` <20091008.155429.02850661.davem@davemloft.net>
     [not found]   ` <Pine.LNX.4.64.0910081622400.18315@sleet.zeugmasystems.local>
     [not found]     ` <20091008.175703.83006470.davem@davemloft.net>
2009-10-17 17:57       ` Kernel oops when clearing bgp neighbor info with TCP MD5SUM enabled Anirban Sinha
2009-10-18  2:35         ` Anirban Sinha
2009-10-18 20:19           ` Anirban Sinha
2009-10-19 12:13             ` Oleg Nesterov
2009-10-19 15:32               ` Anirban Sinha
2009-10-19 15:36                 ` Oleg Nesterov
2009-10-19 16:01                   ` Anirban Sinha
2009-10-20  0:56               ` Anirban Sinha
2009-10-20  1:08                 ` [PATCH] " Anirban Sinha
2009-10-20  1:13                   ` David Miller
2009-10-20  1:17                     ` Anirban Sinha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox