* Re: System crash in tcp_fragment()
[not found] <3CE95190.75C52E2D@mvista.com>
@ 2002-05-20 20:29 ` Andi Kleen
[not found] ` <20020520222937.A1467@averell>
` (3 subsequent siblings)
4 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-05-20 20:29 UTC (permalink / raw)
To: george anzinger; +Cc: netdev, linux-net, davem, ak, kuznet, pekkas
> Incase you can not see this, it appears that the addresses
> of each skb are alright, so the assumption is that the skb
> passed to tcp_fragment() has been unlinked while
> tcp_fragment() was doing its thing. This implies a need for
> locking at some higher level and we don't know enough about
> the tcp code to divine where this might best be done.
2.4 TCP should in theory already have enough locking to prevent this
(the socket lock that is aquired by timers and user context socket users)
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020520222937.A1467@averell>]
* Re: System crash in tcp_fragment()
[not found] ` <20020520222937.A1467@averell>
@ 2002-05-20 21:18 ` george anzinger
0 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-05-20 21:18 UTC (permalink / raw)
To: Andi Kleen; +Cc: netdev, linux-net, davem, kuznet, pekkas
Andi Kleen wrote:
>
> > Incase you can not see this, it appears that the addresses
> > of each skb are alright, so the assumption is that the skb
> > passed to tcp_fragment() has been unlinked while
> > tcp_fragment() was doing its thing. This implies a need for
> > locking at some higher level and we don't know enough about
> > the tcp code to divine where this might best be done.
>
> 2.4 TCP should in theory already have enough locking to prevent this
> (the socket lock that is aquired by timers and user context socket users)
>
> -Andi
Here is another oops, not quite the same, AND with an assert
failure ahead of it. I append the whole report and some and
some observations:
We had two more panics over the weekend.
Here is the analysis from one of them.
---------comments from Dave Howell--------------
Looking at the sysint4l dump, some observations:
- Panic was due to an Oops (Null pointer dereference kernel
incident)
- Full system configuration is in kernel startup logs
(memory, disks, chipsets,
etc)
- Last part of kernel log has oops info, follows kernel
assertion failed
warning:
<4>KERNRL: assertion (atomic_read(&sk->wmem_alloc) == 0)
failed at af_inet.c <==============
(174):inet_sock_destruct
<1>Unable to handle kernel NULL pointer dereference at
virtual address 00000049
<4> printing eip:
<4>c0255196
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU: 1
<4>EIP: 0010:[<c0255196>] Not tainted
<4>EFLAGS: 00010213
<4>eax: c6ace4c8 ebx: 00000000 ecx: 00000004 edx:
00000000
<4>esi: c6ace538 edi: c6ace460 ebp: 00000026 esp:
c1219eb4
<4>ds: 0018 es: 0018 ss: 0018
<4>Process swapper (pid: 0, stackpage=c1219000)
<4>Stack: c6ace460 c6ace538 c6ace460 004ec3ef c025de3e
c6ace460 00000000
c72011a0
<4> c1218050 004ec2d2 c02395b2 c6ace460 c6ace538
c1218000 004ec3ef
c025e056
<4> c6ace460 c1218000 00000046 004ebfe7 00000000
c1218000 00cf70a0
c0128eaa
<4>Call Trace: [<c025de3e>] [<c02395b2>] [<c025e056>]
[<c0128eaa>] [<c025df70>]
<4> [<c0128fad>] [<c012483b>] [<c0109704>] [<c0105490>]
[<c0105490>]
[<c0105490>]
<4> [<c01054bc>] [<c0105542>] [<c011d51b>] [<c011d8ad>]
<4>
<4>Code: 0f b6 4b 49 45 f6 c1 82 74 0c 31 d2 89 96 78 01 00
00 0f b6
- Finally at the bottom of the trace the active backtrace, a
bit suspect
because it's on the interrupt
side (not trace but process it's attributed to).
===========================
STACK TRACE OF FAILING TASK
===========================
================================================================
STACK TRACE FOR TASK: 0xc1218000 (swapper)
0 tcp_enter_loss+198 [0xc0255196]
1 tcp_retransmit_timer+473 [0xc025de39]
2 tcp_write_timer+225 [0xc025e051]
3 timer_bh+710 [0xc0128ea6]
4 timer_softirq+40 [0xc0128fa8]
5 do_softirq+185 [0xc0124839]
6 do_IRQ+511 [0xc01096ff]
7 do_IRQ+511 [0xc01096ff]
TRACE ERROR 0x1
================================================================
- In comparison with previous dump looks like the same
upstream event occured,
with a timer bottom half running and invoking the
tcp_retransmit_timer. Last
one caught it oopsing in the tcp_fragment code, this is a
bit different but the
upstream path there is the same.
- Same pile of unknown symbol references bringing up dump
manually in lcrash,
must be corrupt or wrong system.0 or kerntypes.0. Needs a
look.
- Dumped tcp_enter_loss+0 to tcp_enter_loss+200 to see site
at
tcp_enter_loss+198.
Code at this site is:
movzbl 0x49(%ebx),%ecx
%ebx is NULL at this point (see above), hence the oops at
00000049.
Code for function is in net/ipv4/tcp_input.c starting at
line 987.
- The failure is in the loop starting at line 1002:
for_retrans_queue(skb, sk, tp) {
cnt++;
if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS)
tp->undo_marker = 0;
TCP_SKB_CB(skb)->sacked &=
(~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
if
(!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) || how) {
TCP_SKB_CB(skb)->sacked &=
~TCPCB_SACKED_ACKED;
TCP_SKB_CB(skb)->sacked |=
TCPCB_LOST;
tp->lost_out++;
} else {
tp->sacked_out++;
tp->fackets_out = cnt;
}
}
I didn't fully map the code but think that the expansion of:
if (TCP_SKB_CB(skb)->sacked&TCPCB_RETRANS)
is where the zeroed pointer is used. Looks like the intent
is that skp is the
iterater variable to loop through the retrans_queue and it
got the zero value
set on some iteration, not the first. So my guess is a
corrupted queue element
pointer being picked up and used.
- I still would look upstream at the timer bottom half
invocation as in both
of the dumps this upstream trace is present, and it seems
like an exception
path for a timeout that leads to a retransmit.
- Also needs a look is the kernel assertion that failed and
likely led to the
oops, looks a lot like an allocation failed and returned a
NULL value, this
would be my top culprit to pursue.
Code from af_net.c at line 174:
void inet_sock_destruct(struct sock *sk)
{
__skb_queue_purge(&sk->receive_queue);
__skb_queue_purge(&sk->error_queue);
if (sk->type == SOCK_STREAM && sk->state !=
TCP_CLOSE) {
printk("Attempt to release TCP socket in
state %d %p\n",
sk->state,
sk);
return;
}
if (!sk->dead) {
printk("Attempt to release alive inet socket
%p\n", sk);
return;
}
BUG_TRAP(atomic_read(&sk->rmem_alloc) == 0);
BUG_TRAP(atomic_read(&sk->wmem_alloc) == 0); <<--
assert reported
here
BUG_TRAP(sk->wmem_queued == 0);
BUG_TRAP(sk->forward_alloc == 0);
if (sk->protinfo.af_inet.opt)
kfree(sk->protinfo.af_inet.opt);
Continuing on after this likely led to the oops that
killed us.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: System crash in tcp_fragment()
[not found] <3CE95190.75C52E2D@mvista.com>
2002-05-20 20:29 ` System crash in tcp_fragment() Andi Kleen
[not found] ` <20020520222937.A1467@averell>
@ 2002-05-20 21:25 ` kuznet
2002-05-20 22:08 ` David S. Miller
[not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
4 siblings, 0 replies; 40+ messages in thread
From: kuznet @ 2002-05-20 21:25 UTC (permalink / raw)
To: george anzinger; +Cc: netdev, linux-net, davem, ak, pekkas
Hello!
> Is is possible for the other cpu (or even this one given the
> ksoftirqd stuff) to remove or alter the skb that
> tcp_fragment() is processing?
No.
> What locks, if any, are needed to prevent this.
They are already applied.
Looking at the last lines of the bugzilla thread, I see the answer:
> - Observation, the BUG_TRAP assertion likely should have done something better
> than just report the condition, wonder if an earlier panic or other corrective
> action may have kept the TCP stack moving instead of the oops? Looks lazy to
> me...
Well, add BUG() to BIG_TRAP() to oops it earlier. Maybe this will move
you closer to real problem.
And also your reasoning about smp_processor_id() sounds strange,
preemption code must reschedule thread preemted in the kernel to the
same cpu, it is enough to avoid troubles with this.
Alexey
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] <3CE95190.75C52E2D@mvista.com>
` (2 preceding siblings ...)
2002-05-20 21:25 ` kuznet
@ 2002-05-20 22:08 ` David S. Miller
[not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
4 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-05-20 22:08 UTC (permalink / raw)
To: george; +Cc: netdev, linux-net, ak, kuznet, pekkas
From: george anzinger <george@mvista.com>
Date: Mon, 20 May 2002 12:42:08 -0700
I wonder if you could help me squash a bug in the tcp code.
Here is what we know thus far:
An SMP (x386 dual) 2.4.17 kernel crashes with an attempt to
deference NULL at the end of tcp_fragment() (in
net/ipv4/tcp_output.c) while attempting to link in the newly
created fragment.
The bugzilla report is:
%99 of all such bug reports turn out to be driver bugs
where the net driver frees SKBs improperly or there is some
missing internal locking in the net device driver.
I think you efforts are better spent auditing what net
drivers are being used on this machine :-)
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <200205202125.BAA03545@sex.inr.ac.ru>]
* Re: System crash in tcp_fragment()
[not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
@ 2002-05-20 23:01 ` george anzinger
2002-05-20 23:54 ` Andi Kleen
` (2 subsequent siblings)
3 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-05-20 23:01 UTC (permalink / raw)
To: kuznet; +Cc: netdev, linux-net, davem, ak, pekkas
kuznet@ms2.inr.ac.ru wrote:
>
> Hello!
>
> > Is is possible for the other cpu (or even this one given the
> > ksoftirqd stuff) to remove or alter the skb that
> > tcp_fragment() is processing?
>
> No.
> > What locks, if any, are needed to prevent this.
>
> They are already applied.
>
> Looking at the last lines of the bugzilla thread, I see the answer:
>
> > - Observation, the BUG_TRAP assertion likely should have done something better
> > than just report the condition, wonder if an earlier panic or other corrective
> > action may have kept the TCP stack moving instead of the oops? Looks lazy to
> > me...
>
> Well, add BUG() to BIG_TRAP() to oops it earlier. Maybe this will move
> you closer to real problem.
Right. Will do.
>
> And also your reasoning about smp_processor_id() sounds strange,
> preemption code must reschedule thread preemted in the kernel to the
> same cpu, it is enough to avoid troubles with this.
That (ahem) hack was tried and rejected by the powers that
be. I admit that is it more work to find the resulting
problems, but the fixes seem to be easy and do not to add to
overhead, at least thus far.
>
> Alexey
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
2002-05-20 23:01 ` george anzinger
@ 2002-05-20 23:54 ` Andi Kleen
2002-06-09 17:31 ` Network oops george anzinger
[not found] ` <3D0390E2.1B80ADEE@mvista.com>
3 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-05-20 23:54 UTC (permalink / raw)
To: kuznet; +Cc: george anzinger, netdev, linux-net, davem, ak, pekkas
> And also your reasoning about smp_processor_id() sounds strange,
> preemption code must reschedule thread preemted in the kernel to the
> same cpu, it is enough to avoid troubles with this.
It's unfortunately the truth. Even in_interrupt() is buggy current
on SMP preemptive.
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread* Network oops
[not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
2002-05-20 23:01 ` george anzinger
2002-05-20 23:54 ` Andi Kleen
@ 2002-06-09 17:31 ` george anzinger
[not found] ` <3D0390E2.1B80ADEE@mvista.com>
3 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-09 17:31 UTC (permalink / raw)
To: kuznet; +Cc: netdev, linux-net, davem, ak, pekkas
Is this sort of thing expected when the NIC is out of poop?
<4>eth1: card reports no resources.
<4>KERNEL: assertion ((int)tcp_packets_in_flight(tp) >=
0) failed at tcp_input.c(956):tcp_sacktag_write_queue
<7>Leak l=8 4
<1>Unable to handle kernel NULL pointer dereference at
virtual address 00000049
<4> printing eip:
<4>c025b7aa
<1>*pde = 00000000
<4>Oops: 0000
<4>CPU: 0
<4>EIP: 0010:[<c025b7aa>] Not tainted
<4>EFLAGS: 00010207
<4>eax: c40d82c8 ebx: 00000000 ecx: 00000004 edx:
00000000
<4>esi: c40d8338 edi: c40d8260 ebp: c0437eb4 esp:
c0437ea4
<4>ds: 0018 es: 0018 ss: 0018
<4>Process swapper (pid: 0, stackpage=c0437000)
<4>Stack: 0000002e c40d8260 c40d8338 c40d8260 c0437ee4
c0264e8b c40d8260 00000000
<4> 00000001 c73cd944 00000009 c0437f10 c887250f
c40d8260 c40d8338 c0436000
<4> c0437f08 c02650a6 c40d8260 c037edb8 00000094
00000000 c0437f08 c40d8260
<4>Call Trace: [<c0264e8b>] [<c887250f>] [<c02650a6>]
[<c01296a9>] [<c0264fc0>]
<4> [<c01297ac>] [<c0124ebb>] [<c0109ae2>]
[<c01054f0>] [<c0105530>] [<c01055c2>]
<4> [<c0105000>]
<4>
<4>Code: 0f b6 4b 49 f6 c1 82 74 0c 31 d2 89 96 78 01 00
00 0f b6 4b
STACK TRACE FOR TASK: 0xc0436000 (swapper)
0 tcp_enter_loss+202 [0xc025b7aa]
1 tcp_retransmit_timer+566 [0xc0264e86]
2 tcp_write_timer+225 [0xc02650a1]
3 timer_bh+710 [0xc01296a6]
4 timer_softirq+39 [0xc01297a7]
5 do_softirq+249 [0xc0124eb9]
6 do_IRQ+525 [0xc0109add]
7 do_IRQ+525 [0xc0109add]
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3D0390E2.1B80ADEE@mvista.com>]
* Re: Network oops
[not found] ` <3D0390E2.1B80ADEE@mvista.com>
@ 2002-06-10 4:31 ` David S. Miller
2002-06-10 4:32 ` David S. Miller
` (2 subsequent siblings)
3 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-06-10 4:31 UTC (permalink / raw)
To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas
No mention of what kernel version, what patches applied, etc.
so we cannot help you.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: Network oops
[not found] ` <3D0390E2.1B80ADEE@mvista.com>
2002-06-10 4:31 ` David S. Miller
@ 2002-06-10 4:32 ` David S. Miller
[not found] ` <20020609.213150.32126725.davem@redhat.com>
[not found] ` <20020609.213224.01016187.davem@redhat.com>
3 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-06-10 4:32 UTC (permalink / raw)
To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas
Also we need to know what network driver, if preemption is being
used (being from mvista I assume you are using preemption, and if
so make sure you have the preemption networking patches applied).
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020609.213150.32126725.davem@redhat.com>]
* Re: Network oops
[not found] ` <20020609.213150.32126725.davem@redhat.com>
@ 2002-06-10 4:48 ` george anzinger
[not found] ` <3D042F8F.72764243@mvista.com>
1 sibling, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-10 4:48 UTC (permalink / raw)
To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas
"David S. Miller" wrote:
>
> No mention of what kernel version, what patches applied, etc.
> so we cannot help you.
Sorry bout that. It is a 2.4.17 kernel and the test is to
verify that we have the preempt code right. The question at
hand is if this is a likely preempt problem or just a pure
overload. The stress on the network is rather high.
I would expect that the network code would recover from this
sort of thing, so we are looking for a preempt issue at the
moment. Still, it could just be the way things work in the
2.4.17 kernel so I thought I would ask.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3D042F8F.72764243@mvista.com>]
* Re: Network oops
[not found] ` <3D042F8F.72764243@mvista.com>
@ 2002-06-10 5:06 ` David S. Miller
2002-06-20 17:19 ` george anzinger
[not found] ` <3D120EAE.5A0D365E@mvista.com>
2 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-06-10 5:06 UTC (permalink / raw)
To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas
From: george anzinger <george@mvista.com>
Date: Sun, 09 Jun 2002 21:48:15 -0700
I would expect that the network code would recover from this
sort of thing, so we are looking for a preempt issue at the
moment. Still, it could just be the way things work in the
2.4.17 kernel so I thought I would ask.
Even though 2.4.17 is pretty old I still think it's a preempt
problem.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: Network oops
[not found] ` <3D042F8F.72764243@mvista.com>
2002-06-10 5:06 ` David S. Miller
@ 2002-06-20 17:19 ` george anzinger
[not found] ` <3D120EAE.5A0D365E@mvista.com>
2 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-20 17:19 UTC (permalink / raw)
To: David S. Miller, kuznet, netdev, linux-net, ak, pekkas,
Cress, Andrew R
We need help from someone who knows the network code. I have tried to give all the relevant information below. The machine that last failed can still be examined with kgdb to answer any further questions.
We are working with a 2.4.17 kernel with all the latest preempt patches as well as the high-res-timres patch (which, by the way has the proposed TIMER_BH conversion to softirq code). Other patches are applied but, when removed, do not affect the below described behavior. The system is an SMP (2) processor: <4>CPU0: Intel(R) Pentium(R) III CPU family 1266MHz stepping 01
A break point placed in deliver_to_old_ones() is never hit.
Failure rate under heavy network stress (4 machines each measuring network performance with all 4 machines in the test) occurs very seldom. The last failure took 32 hours, the prior one 22 hours.
Failure appears to occur because a call is made to a bogus address that is pulled from an skb.
Here is a back trace of the latest failure:
Program received signal SIGEMT, Emulation trap.
0x00b24d18 in ?? () at af_packet.c:1891
(gdb) bt
#0 0x00b24d18 in ?? () at af_packet.c:1891
#1 0xc0267751 in tcp_v4_destroy_sock (sk=0xc7164260)
at /usr/src/linux-2.4.17-CLT/include/net/tcp.h:1673
#2 0xc02566f1 in tcp_destroy_sock (sk=0xc7164260) at tcp.c:1800
#3 0xc025731a in tcp_close (sk=0xc7164260, timeout=0) at tcp.c:1971
#4 0xc0274f67 in inet_release (sock=0xc2c94160) at af_inet.c:465
#5 0xc02304a2 in sock_release (sock=0xc2c94160) at socket.c:489
#6 0xc0230a50 in sock_close (inode=0xc2c94040, filp=0xc51297e0)
at socket.c:724
#7 0xc014d833 in fput (file=0xc51297e0) at file_table.c:113
#8 0xc014bfe3 in filp_close (filp=0xc51297e0, id=0xc2eb6d60) at open.c:838
#9 0xc014c0b2 in sys_close (fd=4) at open.c:862
#10 0xc010782b in system_call () at af_packet.c:1891
#11 0x40043507 in ?? () at af_packet.c:1891
(
kgdb caught this in:
0xc0119b3d is in do_page_fault (fault.c:329).
324 * terminate things with extreme prejudice.
325 */
326 #ifdef CONFIG_KGDB
327 if (!user_mode(regs)){
328 kgdb_handle_exception(14,SIGBUS, error_code, regs);
329 return;
330 }
331 #endif
332
333 bust_spinlocks(1);
(
In both failures we found this in the log buffer just prior to the failure:
<5>\n<4>KERNEL: assertion ((int)tcp_packets_in_flight(tp) >= 0) failed at tcp_input.c(956):tcp_sacktag_write_queue\n
On the assumption that preemption is the root cause of this problem we have instrumented the preemption code to keep track of the last 100 preemptions. What follows is an edited log
of these preemptions. Each entry consists of two words of time (sec, usec) followed by the
address (hex and symbolic) and the pid of the process. Preemptions clearly unrelated to the network code have been removed. Most of these were in idle or sched_yield.
0xc0368e60 <preempt_log>: 0x3d101941 0xc9868 0xc025f4bb <tcp_transmit_skb+747> 0x15aa
0xc0368e80 <preempt_log+32>: 0x3d101941 0xec247 0xc025d821 <tcp_copy_to_iovec+129> 0x15aa
0xc0368ea0 <preempt_log+64>: 0x3d101942 0xc1f6 0xc0167ee0 <update_atime> 0x15ae
0xc0368ec0 <preempt_log+96>: 0x3d101942 0x24440 0xc025d821 <tcp_copy_to_iovec+129> 0x15aa
0xc0368ee0 <preempt_log+128>: 0x3d101942 0x26eae 0xc025d821 <tcp_copy_to_iovec+129> 0x15a9
0xc0368f00 <preempt_log+160>: 0x3d101942 0x2a4b4 0xc025d821 <tcp_copy_to_iovec+129> 0x15aa
0xc0368f20 <preempt_log+192>: 0x3d101942 0x2c74a 0xc02558b1 <tcp_prequeue_process+305> 0x15aa
0xc0368f40 <preempt_log+224>: 0x3d101942 0x2d5b6 0xc0255741 <tcp_data_wait+705> 0x15aa
0xc0368f60 <preempt_log+256>: 0x3d101942 0x2e634 0xc02558b1 <tcp_prequeue_process+305> 0x15a9
0xc0368f80 <preempt_log+288>: 0x3d101942 0x2fcaa 0xc024db81 <ip_output+401> 0x15aa
0xc0368fc0 <preempt_log+352>: 0x3d101942 0x30c4f 0xc02558b1 <tcp_prequeue_process+305> 0x15a6
0xc0368fe0 <preempt_log+384>: 0x3d101942 0x33603 0xc02558b1 <tcp_prequeue_process+305> 0x15a6
0xc0369000 <preempt_log+416>: 0x3d101942 0xc5137 0xc011d1ec <remove_wait_queue+156> 0x15a8
0xc0369060 <preempt_log+512>: 0x3d101944 0x6f2e4 0xc025561b <tcp_data_wait+411> 0x15a8
0xc0369120 <preempt_log+704>: 0x3d101948 0xae22f 0xc024dc28 <ip_queue_xmit+24> 0x15b2
0xc0369140 <preempt_log+736>: 0x3d101949 0xbe01 0xc02558b1 <tcp_prequeue_process+305> 0x15b2
0xc03691a0 <preempt_log+832>: 0x3d10194b 0xbd567 0xc025d821 <tcp_copy_to_iovec+129> 0x15b3
0xc0369250 <preempt_log+1008>: 0x3d10194b 0xbde39 0xc01054df <default_idle+47> 0x0
Each of these is pinned to source below:
(gdb)
(gdb)l* l* tcp_transmit_skb+747
0xc01f4a83 is in tcp_transmit_skb (/usr/src/cvs/hhl-kernel-cambell/linux_test/include/net/tcp.h:1422).
1417 TCPOLEN_TIMESTAMP);
1418 *ptr++ = htonl(tstamp);
1419 *ptr++ = htonl(tp->ts_recent);
1420 }
1421 if (tp->eff_sacks) {
1422 struct tcp_sack_block *sp = tp->dsack ? tp->duplicate_sack : tp->selective_acks;
1423 int this_sack;
1424
1425 *ptr++ = __constant_htonl((TCPOPT_NOP << 24) |
1426 (TCPOPT_NOP << 16) |
(gdb)l * tcp_copy_to_iovec+129
0xc01f2eed is in tcp_copy_to_iovec (tcp_input.c:3157).
3152 int chunk = skb->len - hlen;
3153 int err;
3154
3155 local_bh_enable();
3156 if (skb->ip_summed==CHECKSUM_UNNECESSARY)
3157 err = skb_copy_datagram_iovec(skb, hlen, tp->ucopy.iov, chunk);
3158 else
3159 err = skb_copy_and_csum_datagram_iovec(skb, hlen, tp->ucopy.iov);
3160
3161 if (!err) {
(gdb) l *tcp_prequeue_process+305
0xc01ebba5 is in tcp_recvmsg (tcp.c:1400).
1395 int err;
1396 int target; /* Read at least this many bytes */
1397 long timeo;
1398 struct task_struct *user_recv = NULL;
1399
1400 lock_sock(sk);
1401
1402 TCP_CHECK_TIMER(sk);
1403
1404 err = -ENOTCONN;
(gdb) l* tcp_data_wait+705
0xc01ebb49 is in tcp_prequeue_process (tcp.c:1376).
1371 while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
1372 sk->backlog_rcv(sk, skb);
1373 local_bh_enable();
1374
1375 /* Clear memory counter. */
1376 tp->ucopy.memory = 0;
1377 }
1378
1379 /*
1380 * This routine copies from a sock struct into the user buffer.
(gdb) l *tcp_data_wait+411
0xc01eba23 is in tcp_data_wait (tcp.c:1354).
1349 release_sock(sk);
1350
1351 if (skb_queue_empty(&sk->receive_queue))
1352 timeo = schedule_timeout(timeo);
1353
1354 lock_sock(sk);
1355 clear_bit(SOCK_ASYNC_WAITDATA, &sk->socket->flags);
1356
1357 remove_wait_queue(sk->sleep, &wait);
1358 __set_current_state(TASK_RUNNING);
(gdb) (gdb) l *ip_queue_xmit+24
0xc024dc28 is in ip_queue_xmit (ip_output.c:351).
346 struct iphdr *iph;
347
348 /* Skip all of this if the packet is already routed,
349 * f.e. by something like SCTP.
350 */
351 rt = (struct rtable *) skb->dst;
352 if (rt != NULL)
353 goto packet_routed;
354
355 /* Make sure we can route this packet. */
(gdb)
(gdb) l * ip_output+401
0xc01e55d9 is in ip_queue_xmit (ip_output.c:344).
339 }
340
341 int ip_queue_xmit(struct sk_buff *skb)
342 {
343 struct sock *sk = skb->sk;
344 struct ip_options *opt = sk->protinfo.af_inet.opt;
345 struct rtable *rt;
346 struct iphdr *iph;
347
348 /* Skip all of this if the packet is already routed,
Any help would be greatly appreciated. We also can probe the system to answer any further questions. As said above, we are assuming this is related to preemption, however, that assumption may be bad.
Thanks
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3D120EAE.5A0D365E@mvista.com>]
* Re: Network oops
[not found] ` <3D120EAE.5A0D365E@mvista.com>
@ 2002-06-21 0:38 ` David S. Miller
[not found] ` <20020620.173805.55219901.davem@redhat.com>
1 sibling, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-06-21 0:38 UTC (permalink / raw)
To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress
I don't understand, you've completely turned the preemption model of
the kernel upside down and you want _US_ to debug this for you?
This is a lot of work, work I personally don't have time for.
It requires a full audit of the networking in the new preemption
environment.
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020620.173805.55219901.davem@redhat.com>]
* Re: Network oops
[not found] ` <20020620.173805.55219901.davem@redhat.com>
@ 2002-06-21 14:16 ` george anzinger
[not found] ` <3D133538.60B6810C@mvista.com>
1 sibling, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-21 14:16 UTC (permalink / raw)
To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress
"David S. Miller" wrote:
>
> I don't understand, you've completely turned the preemption model of
> the kernel upside down and you want _US_ to debug this for you?
You could look at it this way :)
On the other hand, I prefer to think of the linux community as a group of folks working on making one of the best OSs on the planet even better. Its adoption by more and more companies attests to its place in the world. We who work on it, IMHO, should cooperate with each other in our efforts to make the system even better. Sure we will disagree about some of the things others are doing and thus the need for some central control of the system. For this we are indebted to Linus. And we may not have time to spend working on the system when others would most like us to, but that is the nature
of life and we should respect others in this regard.
So while I would very much like your help, I understand that you may be busy with other things at this time and not have the time. Still, I must ask, just as others ask of me.
>
> This is a lot of work, work I personally don't have time for.
I can understand and accept that.
> It requires a full audit of the networking in the new preemption
> environment.
Any pointers on what to look for would be welcome.
--
George Anzinger george@mvista.com
High-res-timers: http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3D133538.60B6810C@mvista.com>]
* Re: Network oops
[not found] ` <3D133538.60B6810C@mvista.com>
@ 2002-06-21 14:17 ` David S. Miller
[not found] ` <20020621.071720.07439917.davem@redhat.com>
` (2 subsequent siblings)
3 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-06-21 14:17 UTC (permalink / raw)
To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress
From: george anzinger <george@mvista.com>
Date: Fri, 21 Jun 2002 07:16:24 -0700
[ BTW please start using newlines in your emails, instead of 1,000
character monstrosities lacking them. Thanks ]
On the other hand, I prefer to think of the linux community as a
group of folks working on making one of the best OSs on the planet
even better.
Right, and I don't think CONFIG_PREEMPT makes the planet better.
In fact, now that you mention it, I think CONFIG_PREEMPT is a pile
of crap.
> It requires a full audit of the networking in the new preemption
> environment.
Any pointers on what to look for would be welcome.
And you go right back to asking me to do the work for you.
What is wrong with you?
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020621.071720.07439917.davem@redhat.com>]
* Re: Network oops
[not found] ` <20020621.071720.07439917.davem@redhat.com>
@ 2002-06-21 15:12 ` george anzinger
0 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-21 15:12 UTC (permalink / raw)
To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress
"David S. Miller" wrote:
>
> From: george anzinger <george@mvista.com>
> Date: Fri, 21 Jun 2002 07:16:24 -0700
>
> [ BTW please start using newlines in your emails, instead of 1,000
> character monstrosities lacking them. Thanks ]
Sure. I am trying to find something that works with patches
as well as text. Still looking I guess.
>
> On the other hand, I prefer to think of the linux community as a
> group of folks working on making one of the best OSs on the planet
> even better.
>
> Right, and I don't think CONFIG_PREEMPT makes the planet better.
> In fact, now that you mention it, I think CONFIG_PREEMPT is a pile
> of crap.
Can we agree to disagree :)
>
> > It requires a full audit of the networking in the new preemption
> > environment.
>
> Any pointers on what to look for would be welcome.
>
> And you go right back to asking me to do the work for you.
> What is wrong with you?
I said they would be welcome, not that you have to reply...
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: Network oops
[not found] ` <3D133538.60B6810C@mvista.com>
2002-06-21 14:17 ` David S. Miller
[not found] ` <20020621.071720.07439917.davem@redhat.com>
@ 2002-06-22 0:55 ` Andi Kleen
[not found] ` <20020622025551.A1919@averell>
3 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-06-22 0:55 UTC (permalink / raw)
To: george anzinger
Cc: David S. Miller, kuznet, netdev, linux-net, ak, pekkas,
andrew.r.cress
> > It requires a full audit of the networking in the new preemption
> > environment.
>
> Any pointers on what to look for would be welcome.
I would look at the driver, especially races in its skb handling.
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020622025551.A1919@averell>]
[parent not found: <20020609.213224.01016187.davem@redhat.com>]
* Re: Network oops
[not found] ` <20020609.213224.01016187.davem@redhat.com>
@ 2002-06-10 8:11 ` george anzinger
[not found] ` <3D045F15.578E1DA9@mvista.com>
1 sibling, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-10 8:11 UTC (permalink / raw)
To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas
"David S. Miller" wrote:
>
> Also we need to know what network driver, if preemption is being
> used (being from mvista I assume you are using preemption, and if
> so make sure you have the preemption networking patches applied).
Uh, are you saying there is a set of network patches? If
so, where might they be found? And yes, we are testing
preemption, trying to wring out the "last" bug :) (Could
you be referring to the patches we are developing?)
The driver (from the system log):
<4>eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
<4>eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by
Andrey V. Savochkin <saw@saw.sw.com.sg> and others
<6>eth0: OEM i82557/i82558 10/100 Ethernet,
00:03:47:BD:60:86, IRQ 21.
<6> Board assembly fab600-000, Physical connectors
present: RJ45
<6> Primary interface chip i82555 PHY #1.
<6> General self-test: passed.
<6> Serial sub-system self-test: passed.
<6> Internal registers self-test: passed.
<6> ROM checksum self-test: passed (0xb874c1d3).
<6>eth1: OEM i82557/i82558 10/100 Ethernet,
00:03:47:BD:60:87, IRQ 20.
<6> Board assembly fab600-000, Physical connectors
present: RJ45
<6> Primary interface chip i82555 PHY #1.
<6> General self-test: passed.
<6> Serial sub-system self-test: passed.
<6> Internal registers self-test: passed.
<6> ROM checksum self-test: passed (0xb874c1d3).
Looks like two NICs.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3D045F15.578E1DA9@mvista.com>]
* Re: Network oops
[not found] ` <3D045F15.578E1DA9@mvista.com>
@ 2002-06-10 8:31 ` David S. Miller
[not found] ` <20020610.013110.81671593.davem@redhat.com>
1 sibling, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-06-10 8:31 UTC (permalink / raw)
To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas
From: george anzinger <george@mvista.com>
Date: Mon, 10 Jun 2002 01:11:01 -0700
"David S. Miller" wrote:
>
> Also we need to know what network driver, if preemption is being
> used (being from mvista I assume you are using preemption, and if
> so make sure you have the preemption networking patches applied).
Uh, are you saying there is a set of network patches? If
so, where might they be found? And yes, we are testing
preemption, trying to wring out the "last" bug :) (Could
you be referring to the patches we are developing?)
An mvista person recently put "make net preempt safe" patches
into 2.5.x Basically it amounted to putting a few preempt_disable
sections into net/core/skbuff.c
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020610.013110.81671593.davem@redhat.com>]
* Re: Network oops
[not found] ` <20020610.013110.81671593.davem@redhat.com>
@ 2002-06-10 14:12 ` george anzinger
0 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-06-10 14:12 UTC (permalink / raw)
To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas
"David S. Miller" wrote:
>
> From: george anzinger <george@mvista.com>
> Date: Mon, 10 Jun 2002 01:11:01 -0700
>
> "David S. Miller" wrote:
> >
> > Also we need to know what network driver, if preemption is being
> > used (being from mvista I assume you are using preemption, and if
> > so make sure you have the preemption networking patches applied).
>
> Uh, are you saying there is a set of network patches? If
> so, where might they be found? And yes, we are testing
> preemption, trying to wring out the "last" bug :) (Could
> you be referring to the patches we are developing?)
>
> An mvista person recently put "make net preempt safe" patches
> into 2.5.x Basically it amounted to putting a few preempt_disable
> sections into net/core/skbuff.c
Ah, yes. Those are the ones I have found sofar. Robert
Love would be the submitter. It would apear there is at
least one more hole :( Thus the query.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>]
[parent not found: <3CE9E466.AC2358EE@mvista.com>]
* Re: System crash in tcp_fragment()
[not found] ` <3CE9E466.AC2358EE@mvista.com>
@ 2002-05-21 6:00 ` David S. Miller
[not found] ` <20020520.230021.29510217.davem@redhat.com>
1 sibling, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-05-21 6:00 UTC (permalink / raw)
To: george; +Cc: niv, kuznet, ak, netdev, linux-net, ak, pekkas
From: george anzinger <george@mvista.com>
Date: Mon, 20 May 2002 23:08:38 -0700
Nivedita Singhvi wrote:
>
> On Mon, 20 May 2002, David S. Miller wrote:
>
> > Such rule does not even make this piece of code legal. Consider:
> >
> > task1:cpu0: x = counters[smp_processor_id()];
> > cpu0: PREEMPT
> > task2:cpu0: x = counters[smp_processor_id()];
> > task2:cpu0: counters[smp_processor_id()] = x + 1;
> > cpu0: PREEMPT
> > task1:cpu0: counters[smp_processor_id()] = x + 1;
> > full garbage
May be someone could tell me if these matter. If you are
bumping a counter and you switch cpus in the middle, a.)
does it matter? and b.) if so which cpu should get the
count? I sort of thought that, if this were going on, it
did not really matter as long as some counter was bumped.
That's not the problem. We use per-cpu values for each counter (and
when the user asks for the value, we add together the values from
each processor).
Please review the example I quote above, you aren't reading it
carefully enough.
Let us imagine that we are dealing with counter "X", and
that the values at the beginning of the example are:
X[0] = 5
X[1] = 7
X[2] = ...
Actually, no values matter for the purposes of this example
except the one for cpu 0. Here is what happens, watch carefully:
> > task1:cpu0: x = counters[smp_processor_id()];
> > cpu0: PREEMPT
task1 sees 'x' as '5'
> > task2:cpu0: x = counters[smp_processor_id()];
> > task2:cpu0: counters[smp_processor_id()] = x + 1;
> > cpu0: PREEMPT
task2 bumps the counter to '6'
> > task1:cpu0: counters[smp_processor_id()] = x + 1;
> > full garbage
task1 also bumps the counter to '6'
This is the problem. We make these counters non-atomic on purpose
for performance reasons, so do not mention that as a possible fix.
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020520.230021.29510217.davem@redhat.com>]
* Re: System crash in tcp_fragment()
[not found] ` <20020520.230021.29510217.davem@redhat.com>
@ 2002-05-21 7:25 ` george anzinger
2002-05-21 9:49 ` Andi Kleen
[not found] ` <3CE9F679.90ACF597@mvista.com>
2 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-05-21 7:25 UTC (permalink / raw)
To: David S. Miller; +Cc: niv, kuznet, ak, netdev, linux-net, ak, pekkas
"David S. Miller" wrote:
>
> From: george anzinger <george@mvista.com>
> Date: Mon, 20 May 2002 23:08:38 -0700
>
> Nivedita Singhvi wrote:
> >
> > On Mon, 20 May 2002, David S. Miller wrote:
> >
> > > Such rule does not even make this piece of code legal. Consider:
> > >
> > > task1:cpu0: x = counters[smp_processor_id()];
> > > cpu0: PREEMPT
> > > task2:cpu0: x = counters[smp_processor_id()];
> > > task2:cpu0: counters[smp_processor_id()] = x + 1;
> > > cpu0: PREEMPT
> > > task1:cpu0: counters[smp_processor_id()] = x + 1;
> > > full garbage
>
> May be someone could tell me if these matter. If you are
> bumping a counter and you switch cpus in the middle, a.)
> does it matter? and b.) if so which cpu should get the
> count? I sort of thought that, if this were going on, it
> did not really matter as long as some counter was bumped.
>
> That's not the problem. We use per-cpu values for each counter (and
> when the user asks for the value, we add together the values from
> each processor).
>
> Please review the example I quote above, you aren't reading it
> carefully enough.
>
> Let us imagine that we are dealing with counter "X", and
> that the values at the beginning of the example are:
>
> X[0] = 5
> X[1] = 7
> X[2] = ...
>
> Actually, no values matter for the purposes of this example
> except the one for cpu 0. Here is what happens, watch carefully:
>
> > > task1:cpu0: x = counters[smp_processor_id()];
> > > cpu0: PREEMPT
>
> task1 sees 'x' as '5'
>
> > > task2:cpu0: x = counters[smp_processor_id()];
> > > task2:cpu0: counters[smp_processor_id()] = x + 1;
> > > cpu0: PREEMPT
>
> task2 bumps the counter to '6'
>
> > > task1:cpu0: counters[smp_processor_id()] = x + 1;
> > > full garbage
>
> task1 also bumps the counter to '6'
>
> This is the problem. We make these counters non-atomic on purpose
> for performance reasons, so do not mention that as a possible fix.
I understand the issue. The question is what is the
result. Bogus numbers do.. what? Does the kernel crash or
does the user think strange things while every thing just
keeps on working?
As for the fix, I would think that would be more up to the
network folks than me. Atomic is one option, but you can
also disable preemption. It is really light weight, and may
already be disabled in some of these cases.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] ` <20020520.230021.29510217.davem@redhat.com>
2002-05-21 7:25 ` george anzinger
@ 2002-05-21 9:49 ` Andi Kleen
[not found] ` <3CE9F679.90ACF597@mvista.com>
2 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-05-21 9:49 UTC (permalink / raw)
To: David S. Miller; +Cc: george, niv, kuznet, ak, netdev, linux-net, ak, pekkas
> That's not the problem. We use per-cpu values for each counter (and
> when the user asks for the value, we add together the values from
> each processor).
At least on x86 gcc usually seems to just generate an incl, which should
be ok because it is atomic enough (even when a reschedule happens it will
act as a full memory barrier)
So it'll likely just be a problem for load-store architectures.
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3CE9F679.90ACF597@mvista.com>]
* Re: System crash in tcp_fragment()
[not found] ` <3CE9F679.90ACF597@mvista.com>
@ 2002-05-21 7:22 ` David S. Miller
2002-05-21 12:47 ` kuznet
2002-05-21 12:54 ` Andi Kleen
2 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-05-21 7:22 UTC (permalink / raw)
To: george; +Cc: niv, kuznet, ak, netdev, linux-net, ak, pekkas
From: george anzinger <george@mvista.com>
Date: Tue, 21 May 2002 00:25:46 -0700
I understand the issue. The question is what is the
result. Bogus numbers do.. what? Does the kernel crash or
does the user think strange things while every thing just
keeps on working?
It's a quality of implementation issue.
It is just one example of a place where we use these kinds
of assumptions to index tables and the like.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] ` <3CE9F679.90ACF597@mvista.com>
2002-05-21 7:22 ` David S. Miller
@ 2002-05-21 12:47 ` kuznet
2002-05-21 15:42 ` george anzinger
2002-05-21 12:54 ` Andi Kleen
2 siblings, 1 reply; 40+ messages in thread
From: kuznet @ 2002-05-21 12:47 UTC (permalink / raw)
To: george anzinger; +Cc: davem, niv, ak, netdev, linux-net, ak, pekkas
Hello!
> I understand the issue.
I do not.
+#define preempt_disable() \
+do { \
+ ++current->preempt_count; \
+ barrier(); \
+} while (0)
Why does this work?
Alexey
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] ` <3CE9F679.90ACF597@mvista.com>
2002-05-21 7:22 ` David S. Miller
2002-05-21 12:47 ` kuznet
@ 2002-05-21 12:54 ` Andi Kleen
2 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-05-21 12:54 UTC (permalink / raw)
To: george anzinger
Cc: David S. Miller, niv, kuznet, ak, netdev, linux-net, ak, pekkas
> As for the fix, I would think that would be more up to the
> network folks than me. Atomic is one option, but you can
> also disable preemption. It is really light weight, and may
> already be disabled in some of these cases.
In most of them because it is running in a softirq.
Just the user context stuff has a possible problem on non i386/x86-64.
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: System crash in tcp_fragment()
[not found] <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>
[not found] ` <3CE9E466.AC2358EE@mvista.com>
@ 2002-05-21 6:08 ` george anzinger
1 sibling, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-05-21 6:08 UTC (permalink / raw)
To: Nivedita Singhvi
Cc: David S. Miller, kuznet, ak, netdev, linux-net, ak, pekkas
Nivedita Singhvi wrote:
>
> On Mon, 20 May 2002, David S. Miller wrote:
>
> > Such rule does not even make this piece of code legal. Consider:
> >
> > task1:cpu0: x = counters[smp_processor_id()];
> > cpu0: PREEMPT
> > task2:cpu0: x = counters[smp_processor_id()];
> > task2:cpu0: counters[smp_processor_id()] = x + 1;
> > cpu0: PREEMPT
> > task1:cpu0: counters[smp_processor_id()] = x + 1;
> > full garbage
> >
> > But it does bring up important point, preemption people need to
> > fully audit entire networking.
> >
> > It is totally broken by preemption the more I think about it.
> >
> > At the very beginning, all the SNMP counter bumping tricks will
> > totally fail with preemption enabled.
May be someone could tell me if these matter. If you are
bumping a counter and you switch cpus in the middle, a.)
does it matter? and b.) if so which cpu should get the
count? I sort of thought that, if this were going on, it
did not really matter as long as some counter was bumped.
> >
>
> A lot of the synchronization between process context and interrupt
> context is based on per-cpu data structures or simple locks
> (without disabling irq's globally) eg:
>
> softnet_data queue (we only disable local interrupts), and
> synchronization between tcp_readmsg() and tcp_rcv() over
> the receive queue would get confused (lock.users flag would
> be different on another CPU)..
Disabling local interrupts also disables preemption, as does
interrupt context.
>
> Wonder how any of it could possibly work..
It seems to take a LOT of work to break it. Even then, I
think this problem at hand is in the driver (a new one from
the intel folks).
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20020520.173416.105610032.davem@redhat.com>]
* Re: System crash in tcp_fragment()
[not found] <20020520.173416.105610032.davem@redhat.com>
@ 2002-05-21 1:00 ` kuznet
2002-05-21 1:49 ` Nivedita Singhvi
1 sibling, 0 replies; 40+ messages in thread
From: kuznet @ 2002-05-21 1:00 UTC (permalink / raw)
To: David S. Miller; +Cc: george, ak, netdev, linux-net, ak, pekkas
Hello!
> Such rule does not even make this piece of code legal. Consider:
>
> task1:cpu0: x = counters[smp_processor_id()];
> cpu0: PREEMPT
> task2:cpu0: x = counters[smp_processor_id()];
> task2:cpu0: counters[smp_processor_id()] = x + 1;
> cpu0: PREEMPT
> task1:cpu0: counters[smp_processor_id()] = x + 1;
> full garbage
Yup. And this has nothing to do with SMP...
> But it does bring up important point, preemption people need to
> fully audit entire networking.
Well, we can make this. It is too serious. Anyway, this means that
preemptive patch for 2.4 is "tainting" :-)
Alexey
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] <20020520.173416.105610032.davem@redhat.com>
2002-05-21 1:00 ` kuznet
@ 2002-05-21 1:49 ` Nivedita Singhvi
1 sibling, 0 replies; 40+ messages in thread
From: Nivedita Singhvi @ 2002-05-21 1:49 UTC (permalink / raw)
To: David S. Miller; +Cc: kuznet, george, ak, netdev, linux-net, ak, pekkas
On Mon, 20 May 2002, David S. Miller wrote:
> Such rule does not even make this piece of code legal. Consider:
>
> task1:cpu0: x = counters[smp_processor_id()];
> cpu0: PREEMPT
> task2:cpu0: x = counters[smp_processor_id()];
> task2:cpu0: counters[smp_processor_id()] = x + 1;
> cpu0: PREEMPT
> task1:cpu0: counters[smp_processor_id()] = x + 1;
> full garbage
>
> But it does bring up important point, preemption people need to
> fully audit entire networking.
>
> It is totally broken by preemption the more I think about it.
>
> At the very beginning, all the SNMP counter bumping tricks will
> totally fail with preemption enabled.
>
A lot of the synchronization between process context and interrupt
context is based on per-cpu data structures or simple locks
(without disabling irq's globally) eg:
softnet_data queue (we only disable local interrupts), and
synchronization between tcp_readmsg() and tcp_rcv() over
the receive queue would get confused (lock.users flag would
be different on another CPU)..
Wonder how any of it could possibly work..
thanks,
Nivedita
^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <3CE9960D.15D41380@mvista.com>]
[parent not found: <200205210041.EAA04407@sex.inr.ac.ru>]
* Re: System crash in tcp_fragment()
[not found] ` <200205210041.EAA04407@sex.inr.ac.ru>
@ 2002-05-21 0:34 ` David S. Miller
0 siblings, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-05-21 0:34 UTC (permalink / raw)
To: kuznet; +Cc: george, ak, netdev, linux-net, ak, pekkas
From: kuznet@ms2.inr.ac.ru
Date: Tue, 21 May 2002 04:41:39 +0400 (MSD)
+Two similar problems arise. An example code snippet:
+
+ struct this_needs_locking tux[NR_CPUS];
+ tux[smp_processor_id()] = some_value;
+ /* task is preempted here... */
+ something = tux[smp_processor_id()];
If you are not going to break all the kernel just make sure that
tasks preempted in the kernel do not migrate. That's all, simple & stupid.
Such rule does not even make this piece of code legal. Consider:
task1:cpu0: x = counters[smp_processor_id()];
cpu0: PREEMPT
task2:cpu0: x = counters[smp_processor_id()];
task2:cpu0: counters[smp_processor_id()] = x + 1;
cpu0: PREEMPT
task1:cpu0: counters[smp_processor_id()] = x + 1;
full garbage
But it does bring up important point, preemption people need to
fully audit entire networking.
It is totally broken by preemption the more I think about it.
At the very beginning, all the SNMP counter bumping tricks will
totally fail with preemption enabled.
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: System crash in tcp_fragment()
[not found] <3CE9960D.15D41380@mvista.com>
[not found] ` <200205210041.EAA04407@sex.inr.ac.ru>
@ 2002-05-21 0:41 ` kuznet
1 sibling, 0 replies; 40+ messages in thread
From: kuznet @ 2002-05-21 0:41 UTC (permalink / raw)
To: george anzinger; +Cc: ak, netdev, linux-net, davem, ak, pekkas
Hello!
> > if [ "$CONFIG_SMP" = "n" ]; then
> > bool 'Preemptible Kernel' CONFIG_PREEMPT
> > fi
> >
> That is not a fix! It is dodging the issue. ;)
It is the only real fix, if you are going to follow this instruction:
+RULE #1: Per-CPU data structures need explicit protection
+
+
+Two similar problems arise. An example code snippet:
+
+ struct this_needs_locking tux[NR_CPUS];
+ tux[smp_processor_id()] = some_value;
+ /* task is preempted here... */
+ something = tux[smp_processor_id()];
+
+First, since the data is per-CPU, it may not have explicit SMP locking, but
+require it otherwise. Second, when a preempted task is finally rescheduled,
+the previous value of smp_processor_id may not equal the current. You must
+protect these situations by disabling preemption around them.
If you are not going to break all the kernel just make sure that
tasks preempted in the kernel do not migrate. That's all, simple & stupid.
Alexey
^ permalink raw reply [flat|nested] 40+ messages in thread
[parent not found: <20020521015407.A1296@wotan.suse.de>]
* Re: System crash in tcp_fragment()
[not found] <20020521015407.A1296@wotan.suse.de>
@ 2002-05-21 0:11 ` kuznet
[not found] ` <3CE99434.20E7479C@mvista.com>
` (3 more replies)
0 siblings, 4 replies; 40+ messages in thread
From: kuznet @ 2002-05-21 0:11 UTC (permalink / raw)
To: Andi Kleen; +Cc: george, netdev, linux-net, davem, ak, pekkas
Hello!
> It's unfortunately the truth. Even in_interrupt() is buggy current
> on SMP preemptive.
And why folks working on preemtive kernel do not repair this?
To all that I remember it is well known issue.
Alexey
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <3CE99434.20E7479C@mvista.com>]
* Re: System crash in tcp_fragment()
[not found] ` <3CE99434.20E7479C@mvista.com>
@ 2002-05-21 0:18 ` David S. Miller
2002-05-21 0:39 ` Andi Kleen
1 sibling, 0 replies; 40+ messages in thread
From: David S. Miller @ 2002-05-21 0:18 UTC (permalink / raw)
To: george; +Cc: kuznet, ak, netdev, linux-net, ak, pekkas
From: george anzinger <george@mvista.com>
Date: Mon, 20 May 2002 17:26:29 -0700
Ah, but we have. in_interrupt() was fixed about a month
ago. I think Robert got it into all the patches. Whats
more, it is a bit faster too :)
Well, back to the main point, this code is pretty stable by
any measurement.
I truly believe it is some side effect of either a preemption
gotcha or a driver bug.
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
[not found] ` <3CE99434.20E7479C@mvista.com>
2002-05-21 0:18 ` David S. Miller
@ 2002-05-21 0:39 ` Andi Kleen
1 sibling, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-05-21 0:39 UTC (permalink / raw)
To: george anzinger; +Cc: kuznet, Andi Kleen, netdev, linux-net, davem, ak, pekkas
On Tue, May 21, 2002 at 02:26:29AM +0200, george anzinger wrote:
> kuznet@ms2.inr.ac.ru wrote:
> >
> > Hello!
> >
> > > It's unfortunately the truth. Even in_interrupt() is buggy current
> > > on SMP preemptive.
> >
> > And why folks working on preemtive kernel do not repair this?
> > To all that I remember it is well known issue.
> >
> Ah, but we have. in_interrupt() was fixed about a month
> ago. I think Robert got it into all the patches. Whats
> more, it is a bit faster too :)
It's not fixed in 2.5.15/16. I don't know about your patches.
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: System crash in tcp_fragment()
2002-05-21 0:11 ` kuznet
[not found] ` <3CE99434.20E7479C@mvista.com>
@ 2002-05-21 0:20 ` Andi Kleen
2002-05-21 0:26 ` george anzinger
[not found] ` <20020521022007.A6248@wotan.suse.de>
3 siblings, 0 replies; 40+ messages in thread
From: Andi Kleen @ 2002-05-21 0:20 UTC (permalink / raw)
To: kuznet; +Cc: Andi Kleen, george, netdev, linux-net, davem, ak, pekkas
On Tue, May 21, 2002 at 04:11:00AM +0400, A.N.Kuznetsov wrote:
> Hello!
>
> > It's unfortunately the truth. Even in_interrupt() is buggy current
> > on SMP preemptive.
>
> And why folks working on preemtive kernel do not repair this?
> To all that I remember it is well known issue.
It is fixed on x86-64 @)
if [ "$CONFIG_SMP" = "n" ]; then
bool 'Preemptible Kernel' CONFIG_PREEMPT
fi
-Andi
^ permalink raw reply [flat|nested] 40+ messages in thread* Re: System crash in tcp_fragment()
2002-05-21 0:11 ` kuznet
[not found] ` <3CE99434.20E7479C@mvista.com>
2002-05-21 0:20 ` Andi Kleen
@ 2002-05-21 0:26 ` george anzinger
[not found] ` <20020521022007.A6248@wotan.suse.de>
3 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-05-21 0:26 UTC (permalink / raw)
To: kuznet; +Cc: Andi Kleen, netdev, linux-net, davem, ak, pekkas
kuznet@ms2.inr.ac.ru wrote:
>
> Hello!
>
> > It's unfortunately the truth. Even in_interrupt() is buggy current
> > on SMP preemptive.
>
> And why folks working on preemtive kernel do not repair this?
> To all that I remember it is well known issue.
>
Ah, but we have. in_interrupt() was fixed about a month
ago. I think Robert got it into all the patches. Whats
more, it is a bit faster too :)
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread[parent not found: <20020521022007.A6248@wotan.suse.de>]
* System crash in tcp_fragment()
@ 2002-05-20 19:42 george anzinger
0 siblings, 0 replies; 40+ messages in thread
From: george anzinger @ 2002-05-20 19:42 UTC (permalink / raw)
To: netdev, linux-net, davem, ak, kuznet, pekkas
I wonder if you could help me squash a bug in the tcp code.
Here is what we know thus far:
An SMP (x386 dual) 2.4.17 kernel crashes with an attempt to
deference NULL at the end of tcp_fragment() (in
net/ipv4/tcp_output.c) while attempting to link in the newly
created fragment.
The bugzilla report is:
http://www.telecomlinux.org/bugzilla/show_bug.cgi?id=503
Incase you can not see this, it appears that the addresses
of each skb are alright, so the assumption is that the skb
passed to tcp_fragment() has been unlinked while
tcp_fragment() was doing its thing. This implies a need for
locking at some higher level and we don't know enough about
the tcp code to divine where this might best be done.
Here is the call stack:
Panic screen:
<1>Unable to handle kernel NULL pointer deference at
virtual address
00000004
<4> printing eip:
<4>c0256fb2
<1>*pde = 00000000
<4>Oops: 0002
<4>CPU: 1
<4>EIP: 0010:[<c0256fb2>] Not tainted
<4>EFLAGS: 00010296
<4>eax: 00000000 ebx: c4d3ada0 ecx: c4d3ada0 edx:
00000000
<4>esi: c4e60780 edi: 000005a8 ebp: 00000610 esp:
c1219e78
<4>ds: 0018 es: 0018 ss: 0018
<4>Process swapper (pid: 0, stackpage=c1219000)
<4>Stack: c4c84478 00000064 c88937cd 00006270 00000010
c4e60780 c4c84478
000005a8
<4> 000005a8 c025787f c4c843a0 c4e60780 000005a8
c4c843a0 c4c84478
c4c843a0
<4> 004bd6a9 c0259a32 c4c843a0 c4e60780 c4c843a0
00000000 c1219ee8
00004050
<4>Call Trace: [<c88937cd>] [<c025787f>] [<c0259a32>]
[<c01bedc5>]
[<c0259c36>]
<4> [<c0128d6a>] [<c0259b50>] [<c0128e6d>]
[<c01246fb>] [<c0109604>]
[<c0105490>]
<4> [<c0105490>] [<c0105490>] [<c01054bc>]
[<c0105542>] [<c011d3db>]
[<c011d76d>]
<4>
<4>Code: 89 5a 04 89 1e 89 43 08 ff 40 08 31 c0 83 c4 14
5b 5e 5f 5d
<1>Dumping from interrupt handler !
<1>Uncertain scenario - but will try my best
<4>
<4>dump: Dumping to device 0x806 [sd(8,6)] on CPU 1 ...
<4>dump: Compression value is 0x0, Writing dump header
<4>
<4>dump: Pass 1: Saving Reserved Pages:
<4>dump: Memory Bank[0]: 0 ... 7feffff:
[...]
lcrash backtrace:
>> bt
================================================================
STACK TRACE FOR TASK: 0xc1218000(swapper)
0 tcp_fragment+674 [0xc0256fb2]
1 tcp_retransmit_skb+170 [0xc025787a]
2 tcp_retransmit_timer+493 [0xc0259a2d]
3 tcp_write_timer+225 [0xc0259c31]
4 timer_bh+710 [0xc0128d66]
5 timer_softirq+40 [0xc0128e68]
6 do_softirq+185 [0xc01246f9]
7 do_IRQ+511 [0xc01095ff]
8 do_IRQ+511 [0xc01095ff]
TRACE ERROR 0x1
================================================================
We assumed that this might be related to preempt code in the
kernel, however, this now appears unlikely. The primary
reason for preempt related failures is the use of
unprotected "cpu ids" to access "per cpu" data structures.
To this end we have made changes to the "skb" management
code to include the smp_processor_id() calls in the relevant
interrupt off areas, however, this problem does not seem to
have any such issues.
Is is possible for the other cpu (or even this one given the
ksoftirqd stuff) to remove or alter the skb that
tcp_fragment() is processing? What locks, if any, are
needed to prevent this.
--
George Anzinger george@mvista.com
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml
^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2002-06-28 19:56 UTC | newest]
Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <3CE95190.75C52E2D@mvista.com>
2002-05-20 20:29 ` System crash in tcp_fragment() Andi Kleen
[not found] ` <20020520222937.A1467@averell>
2002-05-20 21:18 ` george anzinger
2002-05-20 21:25 ` kuznet
2002-05-20 22:08 ` David S. Miller
[not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
2002-05-20 23:01 ` george anzinger
2002-05-20 23:54 ` Andi Kleen
2002-06-09 17:31 ` Network oops george anzinger
[not found] ` <3D0390E2.1B80ADEE@mvista.com>
2002-06-10 4:31 ` David S. Miller
2002-06-10 4:32 ` David S. Miller
[not found] ` <20020609.213150.32126725.davem@redhat.com>
2002-06-10 4:48 ` george anzinger
[not found] ` <3D042F8F.72764243@mvista.com>
2002-06-10 5:06 ` David S. Miller
2002-06-20 17:19 ` george anzinger
[not found] ` <3D120EAE.5A0D365E@mvista.com>
2002-06-21 0:38 ` David S. Miller
[not found] ` <20020620.173805.55219901.davem@redhat.com>
2002-06-21 14:16 ` george anzinger
[not found] ` <3D133538.60B6810C@mvista.com>
2002-06-21 14:17 ` David S. Miller
[not found] ` <20020621.071720.07439917.davem@redhat.com>
2002-06-21 15:12 ` george anzinger
2002-06-22 0:55 ` Andi Kleen
[not found] ` <20020622025551.A1919@averell>
2002-06-28 19:56 ` george anzinger
[not found] ` <20020609.213224.01016187.davem@redhat.com>
2002-06-10 8:11 ` george anzinger
[not found] ` <3D045F15.578E1DA9@mvista.com>
2002-06-10 8:31 ` David S. Miller
[not found] ` <20020610.013110.81671593.davem@redhat.com>
2002-06-10 14:12 ` george anzinger
[not found] <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>
[not found] ` <3CE9E466.AC2358EE@mvista.com>
2002-05-21 6:00 ` System crash in tcp_fragment() David S. Miller
[not found] ` <20020520.230021.29510217.davem@redhat.com>
2002-05-21 7:25 ` george anzinger
2002-05-21 9:49 ` Andi Kleen
[not found] ` <3CE9F679.90ACF597@mvista.com>
2002-05-21 7:22 ` David S. Miller
2002-05-21 12:47 ` kuznet
2002-05-21 15:42 ` george anzinger
2002-05-21 12:54 ` Andi Kleen
2002-05-21 6:08 ` george anzinger
[not found] <20020520.173416.105610032.davem@redhat.com>
2002-05-21 1:00 ` kuznet
2002-05-21 1:49 ` Nivedita Singhvi
[not found] <3CE9960D.15D41380@mvista.com>
[not found] ` <200205210041.EAA04407@sex.inr.ac.ru>
2002-05-21 0:34 ` David S. Miller
2002-05-21 0:41 ` kuznet
[not found] <20020521015407.A1296@wotan.suse.de>
2002-05-21 0:11 ` kuznet
[not found] ` <3CE99434.20E7479C@mvista.com>
2002-05-21 0:18 ` David S. Miller
2002-05-21 0:39 ` Andi Kleen
2002-05-21 0:20 ` Andi Kleen
2002-05-21 0:26 ` george anzinger
[not found] ` <20020521022007.A6248@wotan.suse.de>
2002-05-21 0:34 ` george anzinger
2002-05-20 19:42 george anzinger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).