netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
  • * Re: System crash in tcp_fragment()
           [not found] <3CE9960D.15D41380@mvista.com>
           [not found] ` <200205210041.EAA04407@sex.inr.ac.ru>
    @ 2002-05-21  0:41 ` kuznet
      1 sibling, 0 replies; 25+ messages in thread
    From: kuznet @ 2002-05-21  0:41 UTC (permalink / raw)
      To: george anzinger; +Cc: ak, netdev, linux-net, davem, ak, pekkas
    
    Hello!
    
    > > if [ "$CONFIG_SMP" = "n" ]; then
    > >    bool 'Preemptible Kernel' CONFIG_PREEMPT
    > > fi
    > > 
    > That is not a fix!  It is dodging the issue. ;)
    
    It is the only real fix, if you are going to follow this instruction:
    
    +RULE #1: Per-CPU data structures need explicit protection
    +
    +
    +Two similar problems arise. An example code snippet:
    +
    +       struct this_needs_locking tux[NR_CPUS];
    +       tux[smp_processor_id()] = some_value;
    +       /* task is preempted here... */
    +       something = tux[smp_processor_id()];
    +
    +First, since the data is per-CPU, it may not have explicit SMP locking, but
    +require it otherwise.  Second, when a preempted task is finally rescheduled,
    +the previous value of smp_processor_id may not equal the current.  You must
    +protect these situations by disabling preemption around them.
    
    If you are not going to break all the kernel just make sure that
    tasks preempted in the kernel do not migrate. That's all, simple & stupid.
    
    Alexey
    
    ^ permalink raw reply	[flat|nested] 25+ messages in thread
  • [parent not found: <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>]
    [parent not found: <20020520.173416.105610032.davem@redhat.com>]
    [parent not found: <20020521015407.A1296@wotan.suse.de>]
    [parent not found: <3CE95190.75C52E2D@mvista.com>]
    * System crash in tcp_fragment()
    @ 2002-05-20 19:42 george anzinger
      0 siblings, 0 replies; 25+ messages in thread
    From: george anzinger @ 2002-05-20 19:42 UTC (permalink / raw)
      To: netdev, linux-net, davem, ak, kuznet, pekkas
    
    I wonder if you could help me squash a bug in the tcp code. 
    Here is what we know thus far:
    
    An SMP (x386 dual) 2.4.17 kernel crashes with an attempt to
    deference NULL at the end of tcp_fragment() (in
    net/ipv4/tcp_output.c) while attempting to link in the newly
    created fragment.  
    The bugzilla report is:
    
    
    http://www.telecomlinux.org/bugzilla/show_bug.cgi?id=503
    
    Incase you can not see this, it appears that the addresses
    of each skb are alright, so the assumption is that the skb
    passed to tcp_fragment() has been unlinked while
    tcp_fragment() was doing its thing.  This implies a need for
    locking at some higher level and we don't know enough about
    the tcp code to divine where this might best be done.
    
    Here is the call stack:
    
    Panic screen:
        <1>Unable to handle kernel NULL pointer deference at
    virtual address 
    00000004
        <4> printing eip:
        <4>c0256fb2
        <1>*pde = 00000000
        <4>Oops: 0002
        <4>CPU:    1
        <4>EIP:    0010:[<c0256fb2>]    Not tainted
        <4>EFLAGS: 00010296
        <4>eax: 00000000   ebx: c4d3ada0   ecx: c4d3ada0   edx:
    00000000
        <4>esi: c4e60780   edi: 000005a8   ebp: 00000610   esp:
    c1219e78
        <4>ds: 0018   es: 0018   ss: 0018
        <4>Process swapper (pid: 0, stackpage=c1219000)
        <4>Stack: c4c84478 00000064 c88937cd 00006270 00000010
    c4e60780 c4c84478 
    000005a8 
        <4>       000005a8 c025787f c4c843a0 c4e60780 000005a8
    c4c843a0 c4c84478 
    c4c843a0 
        <4>       004bd6a9 c0259a32 c4c843a0 c4e60780 c4c843a0
    00000000 c1219ee8 
    00004050 
        <4>Call Trace: [<c88937cd>] [<c025787f>] [<c0259a32>]
    [<c01bedc5>] 
    [<c0259c36>] 
        <4>   [<c0128d6a>] [<c0259b50>] [<c0128e6d>]
    [<c01246fb>] [<c0109604>] 
    [<c0105490>] 
        <4>   [<c0105490>] [<c0105490>] [<c01054bc>]
    [<c0105542>] [<c011d3db>] 
    [<c011d76d>] 
        <4>
        <4>Code: 89 5a 04 89 1e 89 43 08 ff 40 08 31 c0 83 c4 14
    5b 5e 5f 5d 
        <1>Dumping from interrupt handler !
        <1>Uncertain scenario - but will try my best
        <4>
        <4>dump: Dumping to device 0x806 [sd(8,6)] on CPU 1 ...
        <4>dump: Compression value is 0x0, Writing dump header 
        <4>
        <4>dump: Pass 1: Saving Reserved Pages: 
        <4>dump: Memory Bank[0]: 0 ... 7feffff: 
        [...]
    
    lcrash backtrace:
    >> bt
    ================================================================
    STACK TRACE FOR TASK: 0xc1218000(swapper)
    
     0 tcp_fragment+674 [0xc0256fb2]
     1 tcp_retransmit_skb+170 [0xc025787a]
     2 tcp_retransmit_timer+493 [0xc0259a2d]
     3 tcp_write_timer+225 [0xc0259c31]
     4 timer_bh+710 [0xc0128d66]
     5 timer_softirq+40 [0xc0128e68]
     6 do_softirq+185 [0xc01246f9]
     7 do_IRQ+511 [0xc01095ff]
     8 do_IRQ+511 [0xc01095ff]
    TRACE ERROR 0x1
    ================================================================
    
    We assumed that this might be related to preempt code in the
    kernel, however, this now appears unlikely.  The primary
    reason for preempt related failures is the use of
    unprotected "cpu ids" to access "per cpu" data structures. 
    To this end we have made changes to the "skb" management
    code to include the smp_processor_id() calls in the relevant
    interrupt off areas, however, this problem does not seem to
    have any such issues.
    
    Is is possible for the other cpu (or even this one given the
    ksoftirqd stuff) to remove or alter the skb that
    tcp_fragment() is processing?  What locks, if any, are
    needed to prevent this.
    -- 
    George Anzinger   george@mvista.com
    High-res-timers: 
    http://sourceforge.net/projects/high-res-timers/
    Real time sched:  http://sourceforge.net/projects/rtsched/
    Preemption patch:
    http://www.kernel.org/pub/linux/kernel/people/rml
    
    ^ permalink raw reply	[flat|nested] 25+ messages in thread

    end of thread, other threads:[~2002-05-21 15:42 UTC | newest]
    
    Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
    -- links below jump to the message on this page --
         [not found] <3CE9960D.15D41380@mvista.com>
         [not found] ` <200205210041.EAA04407@sex.inr.ac.ru>
    2002-05-21  0:34   ` System crash in tcp_fragment() David S. Miller
    2002-05-21  0:41 ` kuznet
         [not found] <Pine.LNX.4.33.0205201836160.9301-100000@w-nivedita2.des.beaverton.ibm.com>
         [not found] ` <3CE9E466.AC2358EE@mvista.com>
    2002-05-21  6:00   ` David S. Miller
         [not found]   ` <20020520.230021.29510217.davem@redhat.com>
         [not found]     ` <3CE9F679.90ACF597@mvista.com>
    2002-05-21  7:22       ` David S. Miller
    2002-05-21 12:47       ` kuznet
    2002-05-21 15:42         ` george anzinger
    2002-05-21 12:54       ` Andi Kleen
    2002-05-21  7:25     ` george anzinger
    2002-05-21  9:49     ` Andi Kleen
    2002-05-21  6:08 ` george anzinger
         [not found] <20020520.173416.105610032.davem@redhat.com>
    2002-05-21  1:00 ` kuznet
    2002-05-21  1:49 ` Nivedita Singhvi
         [not found] <20020521015407.A1296@wotan.suse.de>
    2002-05-21  0:11 ` kuznet
         [not found]   ` <3CE99434.20E7479C@mvista.com>
    2002-05-21  0:18     ` David S. Miller
    2002-05-21  0:39     ` Andi Kleen
    2002-05-21  0:20   ` Andi Kleen
    2002-05-21  0:26   ` george anzinger
         [not found]   ` <20020521022007.A6248@wotan.suse.de>
    2002-05-21  0:34     ` george anzinger
         [not found] <3CE95190.75C52E2D@mvista.com>
    2002-05-20 20:29 ` Andi Kleen
         [not found] ` <20020520222937.A1467@averell>
    2002-05-20 21:18   ` george anzinger
    2002-05-20 21:25 ` kuznet
    2002-05-20 22:08 ` David S. Miller
         [not found] ` <200205202125.BAA03545@sex.inr.ac.ru>
    2002-05-20 23:01   ` george anzinger
    2002-05-20 23:54   ` Andi Kleen
    2002-05-20 19:42 george anzinger
    

    This is a public inbox, see mirroring instructions
    for how to clone and mirror all data and code used for this inbox;
    as well as URLs for NNTP newsgroup(s).