netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
  • * Re: System crash in tcp_fragment()
           [not found] <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>
           [not found] ` <3CE9E466.AC2358EE@mvista.com>
    @ 2002-05-21  6:08 ` george anzinger
      1 sibling, 0 replies; 25+ messages in thread
    From: george anzinger @ 2002-05-21  6:08 UTC (permalink / raw)
      To: Nivedita Singhvi
      Cc: David S. Miller, kuznet, ak, netdev, linux-net, ak, pekkas
    
    Nivedita Singhvi wrote:
    > 
    > On Mon, 20 May 2002, David S. Miller wrote:
    > 
    > > Such rule does not even make this piece of code legal.  Consider:
    > >
    > > task1:cpu0:   x = counters[smp_processor_id()];
    > >       cpu0:   PREEMPT
    > > task2:cpu0:   x = counters[smp_processor_id()];
    > > task2:cpu0:   counters[smp_processor_id()] = x + 1;
    > >       cpu0:   PREEMPT
    > > task1:cpu0:   counters[smp_processor_id()] = x + 1;
    > >               full garbage
    > >
    > > But it does bring up important point, preemption people need to
    > > fully audit entire networking.
    > >
    > > It is totally broken by preemption the more I think about it.
    > >
    > > At the very beginning, all the SNMP counter bumping tricks will
    > > totally fail with preemption enabled.
    
    May be someone could tell me if these matter.  If you are
    bumping a counter and you switch cpus in the middle, a.)
    does it matter? and b.) if so which cpu should get the
    count?  I sort of thought that, if this were going on, it
    did not really matter as long as some counter was bumped.
    > >
    > 
    > A lot of the synchronization between process context and interrupt
    > context is based on per-cpu data structures or simple locks
    > (without disabling irq's globally) eg:
    > 
    > softnet_data queue (we only disable local interrupts), and
    > synchronization between tcp_readmsg() and tcp_rcv() over
    > the receive queue would get confused (lock.users flag would
    > be different on another CPU)..
    
    Disabling local interrupts also disables preemption, as does
    interrupt context.
    > 
    > Wonder how any of it could possibly work..
    
    It seems to take a LOT of work to break it.  Even then, I
    think this problem at hand is in the driver (a new one from
    the intel folks).
    
    -- 
    George Anzinger   george@mvista.com
    High-res-timers: 
    http://sourceforge.net/projects/high-res-timers/
    Real time sched:  http://sourceforge.net/projects/rtsched/
    Preemption patch:
    http://www.kernel.org/pub/linux/kernel/people/rml
    
    ^ permalink raw reply	[flat|nested] 25+ messages in thread
  • [parent not found: <20020520.173416.105610032.davem@redhat.com>]
    [parent not found: <3CE9960D.15D41380@mvista.com>]
    [parent not found: <20020521015407.A1296@wotan.suse.de>]
    [parent not found: <3CE95190.75C52E2D@mvista.com>]
    * System crash in tcp_fragment()
    @ 2002-05-20 19:42 george anzinger
      0 siblings, 0 replies; 25+ messages in thread
    From: george anzinger @ 2002-05-20 19:42 UTC (permalink / raw)
      To: netdev, linux-net, davem, ak, kuznet, pekkas
    
    I wonder if you could help me squash a bug in the tcp code. 
    Here is what we know thus far:
    
    An SMP (x386 dual) 2.4.17 kernel crashes with an attempt to
    deference NULL at the end of tcp_fragment() (in
    net/ipv4/tcp_output.c) while attempting to link in the newly
    created fragment.  
    The bugzilla report is:
    
    
    http://www.telecomlinux.org/bugzilla/show_bug.cgi?id=503
    
    Incase you can not see this, it appears that the addresses
    of each skb are alright, so the assumption is that the skb
    passed to tcp_fragment() has been unlinked while
    tcp_fragment() was doing its thing.  This implies a need for
    locking at some higher level and we don't know enough about
    the tcp code to divine where this might best be done.
    
    Here is the call stack:
    
    Panic screen:
        <1>Unable to handle kernel NULL pointer deference at
    virtual address 
    00000004
        <4> printing eip:
        <4>c0256fb2
        <1>*pde = 00000000
        <4>Oops: 0002
        <4>CPU:    1
        <4>EIP:    0010:[<c0256fb2>]    Not tainted
        <4>EFLAGS: 00010296
        <4>eax: 00000000   ebx: c4d3ada0   ecx: c4d3ada0   edx:
    00000000
        <4>esi: c4e60780   edi: 000005a8   ebp: 00000610   esp:
    c1219e78
        <4>ds: 0018   es: 0018   ss: 0018
        <4>Process swapper (pid: 0, stackpage=c1219000)
        <4>Stack: c4c84478 00000064 c88937cd 00006270 00000010
    c4e60780 c4c84478 
    000005a8 
        <4>       000005a8 c025787f c4c843a0 c4e60780 000005a8
    c4c843a0 c4c84478 
    c4c843a0 
        <4>       004bd6a9 c0259a32 c4c843a0 c4e60780 c4c843a0
    00000000 c1219ee8 
    00004050 
        <4>Call Trace: [<c88937cd>] [<c025787f>] [<c0259a32>]
    [<c01bedc5>] 
    [<c0259c36>] 
        <4>   [<c0128d6a>] [<c0259b50>] [<c0128e6d>]
    [<c01246fb>] [<c0109604>] 
    [<c0105490>] 
        <4>   [<c0105490>] [<c0105490>] [<c01054bc>]
    [<c0105542>] [<c011d3db>] 
    [<c011d76d>] 
        <4>
        <4>Code: 89 5a 04 89 1e 89 43 08 ff 40 08 31 c0 83 c4 14
    5b 5e 5f 5d 
        <1>Dumping from interrupt handler !
        <1>Uncertain scenario - but will try my best
        <4>
        <4>dump: Dumping to device 0x806 [sd(8,6)] on CPU 1 ...
        <4>dump: Compression value is 0x0, Writing dump header 
        <4>
        <4>dump: Pass 1: Saving Reserved Pages: 
        <4>dump: Memory Bank[0]: 0 ... 7feffff: 
        [...]
    
    lcrash backtrace:
    >> bt
    ================================================================
    STACK TRACE FOR TASK: 0xc1218000(swapper)
    
     0 tcp_fragment+674 [0xc0256fb2]
     1 tcp_retransmit_skb+170 [0xc025787a]
     2 tcp_retransmit_timer+493 [0xc0259a2d]
     3 tcp_write_timer+225 [0xc0259c31]
     4 timer_bh+710 [0xc0128d66]
     5 timer_softirq+40 [0xc0128e68]
     6 do_softirq+185 [0xc01246f9]
     7 do_IRQ+511 [0xc01095ff]
     8 do_IRQ+511 [0xc01095ff]
    TRACE ERROR 0x1
    ================================================================
    
    We assumed that this might be related to preempt code in the
    kernel, however, this now appears unlikely.  The primary
    reason for preempt related failures is the use of
    unprotected "cpu ids" to access "per cpu" data structures. 
    To this end we have made changes to the "skb" management
    code to include the smp_processor_id() calls in the relevant
    interrupt off areas, however, this problem does not seem to
    have any such issues.
    
    Is is possible for the other cpu (or even this one given the
    ksoftirqd stuff) to remove or alter the skb that
    tcp_fragment() is processing?  What locks, if any, are
    needed to prevent this.
    -- 
    George Anzinger   george@mvista.com
    High-res-timers: 
    http://sourceforge.net/projects/high-res-timers/
    Real time sched:  http://sourceforge.net/projects/rtsched/
    Preemption patch:
    http://www.kernel.org/pub/linux/kernel/people/rml
    
    ^ permalink raw reply	[flat|nested] 25+ messages in thread

    end of thread, other threads:[~2002-05-21 15:42 UTC | newest]
    
    Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>
         [not found] ` <3CE9E466.AC2358EE@mvista.com>
    2002-05-21  6:00   ` System crash in tcp_fragment() David S. Miller
         [not found]   ` <20020520.230021.29510217.davem@redhat.com>
    2002-05-21  7:25     ` george anzinger
    2002-05-21  9:49     ` Andi Kleen
         [not found]     ` <3CE9F679.90ACF597@mvista.com>
    2002-05-21  7:22       ` David S. Miller
    2002-05-21 12:47       ` kuznet
    2002-05-21 15:42         ` george anzinger
    2002-05-21 12:54       ` Andi Kleen
    2002-05-21  6:08 ` george anzinger
         [not found] <20020520.173416.105610032.davem@redhat.com>
    2002-05-21  1:00 ` kuznet
    2002-05-21  1:49 ` Nivedita Singhvi
         [not found] <3CE9960D.15D41380@mvista.com>
         [not found] ` <200205210041.EAA04407@sex.inr.ac.ru>
    2002-05-21  0:34   ` David S. Miller
    2002-05-21  0:41 ` kuznet
         [not found] <20020521015407.A1296@wotan.suse.de>
    2002-05-21  0:11 ` kuznet
    2002-05-21  0:20   ` Andi Kleen
    2002-05-21  0:26   ` george anzinger
         [not found]   ` <20020521022007.A6248@wotan.suse.de>
    2002-05-21  0:34     ` george anzinger
         [not found]   ` <3CE99434.20E7479C@mvista.com>
    2002-05-21  0:18     ` David S. Miller
    2002-05-21  0:39     ` Andi Kleen
         [not found] <3CE95190.75C52E2D@mvista.com>
    2002-05-20 20:29 ` Andi Kleen
         [not found] ` <20020520222937.A1467@averell>
    2002-05-20 21:18   ` george anzinger
    2002-05-20 21:25 ` kuznet
    2002-05-20 22:08 ` David S. Miller
         [not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
    2002-05-20 23:01   ` george anzinger
    2002-05-20 23:54   ` Andi Kleen
    2002-05-20 19:42 george anzinger
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox;
    as well as URLs for NNTP newsgroup(s).