From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: 2.4.30-hf1 do_IRQ stack overflows Date: Thu, 9 Jun 2005 12:00:26 -0300 Message-ID: <20050609150026.GA7900@logos.cnet> References: <20050511124640.GE8541@logos.cnet> <13943.1118147881@www19.gmx.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-kernel@vger.kernel.org, davem@redhat.com, netdev@oss.sgi.com, herbert@gondor.apana.org.au Return-path: To: Manfred Schwarb Content-Disposition: inline In-Reply-To: <13943.1118147881@www19.gmx.net> Sender: linux-kernel-owner@vger.kernel.org List-Id: netdev.vger.kernel.org Hi, On Tue, Jun 07, 2005 at 02:38:01PM +0200, Manfred Schwarb wrote: > > > > > > Hi Manfred, > > > > On Wed, May 11, 2005 at 10:15:02AM +0200, Manfred Schwarb wrote: > > > Hi, > > > with recent versions of the 2.4 kernel (Vanilla), I get an increasing > > amount of do_IRQ stack overflows. > > > This night, I got 3 of them. > > > With 2.4.28 I got an overflow about twice a year, with 2.4.29 nearly > > once a month and with > > > 2.4.30 nearly every day 8-(( > > > > The system is getting dangerously close to an actual stack overflow, which > > would > > crash the system. > > > > "do_IRQ: stack overflow: " indicates how many bytes are still available. > > > > The traces show huge networking execution paths. > > > > It seems you are using some packet scheduler (CONFIG_NET_SCHED)? Pretty > > much all > > traces show functions from sch_generic.c. Can you disable that for a test? > > > > Sorry to bother you again, but the problem didn't vanish completely. > This morning, I caught another one. I built a new kernel with > CONFIG_NET_SCHED=n as suggested, uptime is now 25 days, and the following > is the first do_IRQ since then (ksymoops -i): > > Jun 7 03:55:01 tp-meteodat7 kernel: f3238830 00000280 f49e7b80 00000000 > 00000042 cca1388e f4116980 f17aa000 > Jun 7 03:55:01 tp-meteodat7 kernel: c010d948 00000042 f4116980 > 00000000 cca1388e f4116980 f17aa000 00000042 > Jun 7 03:55:01 tp-meteodat7 kernel: 00000018 f61d0018 ffffff14 > c023a039 00000010 00000246 ee5ea480 00000000 > Jun 7 03:55:01 tp-meteodat7 kernel: Call Trace: [call_do_IRQ+5/13] > [skb_copy_and_csum_dev+73/256] > [nfsd:__insmod_nfsd_O/lib/modules/2.4.30-hf1/kernel/fs/nfsd/nfsd.+4256445916/96] > [qdisc_restart+114/432] [dev_queue_xmit+383/880] > Jun 7 03:55:01 tp-meteodat7 kernel: Call Trace: [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Jun 7 03:55:01 tp-meteodat7 kernel: [] [] > [] [] [] [] > Warning (Oops_read): Code line not seen, dumping what data is available Do you have the "do_IRQ stack overflow" output and the amount of bytes left it informs? > Trace; c010d948 > Trace; c023a039 > Trace; f90df5dc <[8139too]rtl8139_start_xmit+6c/180> > Trace; c0248402 > Trace; c023cc7f > Trace; c02561a8 > Trace; c02560f0 > Trace; c02560f0 I can't explain the "ip_finish_output2+0" entries. Odd. > Trace; c024760e > Trace; c02560f0 > Trace; c025492e > Trace; c02560f0 > Trace; c0256315 > Trace; c0256240 > Trace; c0256240 > Trace; c024760e > Trace; c0256240 > Trace; c0254d0d > Trace; c0256240 > Trace; c026daf0 > Trace; c0267c99 > Trace; c026a6f4 > Trace; c0259370 > Trace; c0259370 > Trace; c02661ca > Trace; c026edaa > Trace; c026f48e > Trace; c025174f > Trace; c02515f0 > Trace; c024760e > Trace; c02515f0 > Trace; c0251790 > Trace; c02510df > Trace; c02515f0 > Trace; c0251790 > Trace; c0251969 > Trace; c0251790 > Trace; c024760e > Trace; c0251790 > Trace; c02514b8 > Trace; c0251790 > Trace; c023d4d5 > Trace; c023d5a3 > Trace; c023d73a > Trace; c01254c6 > Trace; c010b094 > Trace; c010d948 I dont see any huge stack consumers on this callchain. David, Herbert, any clues what might be going on here?