From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sven-Thorsten Dietrich Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network Date: Mon, 16 Aug 2010 10:22:52 -0700 Message-ID: <4C6973EC.6000407@gmail.com> References: <4C65A844.1010702@gmail.com> <1281740245.13891.11.camel@baracus> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: linux-rt-users@vger.kernel.org To: John Culvertson Return-path: Received: from mail-ww0-f44.google.com ([74.125.82.44]:64792 "EHLO mail-ww0-f44.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754703Ab0HPRW6 (ORCPT ); Mon, 16 Aug 2010 13:22:58 -0400 Received: by wwj40 with SMTP id 40so6792346wwj.1 for ; Mon, 16 Aug 2010 10:22:57 -0700 (PDT) In-Reply-To: Sender: linux-rt-users-owner@vger.kernel.org List-ID: On 08/16/2010 10:16 AM, John Culvertson wrote: > linkwatch_event is defined in net/core/link_watch.c. > > It appears to work fine if I disable PREEMPT_HARDIRQs. > > Thanks for the feedback. My main objective with this platform at the > moment is to learn about preempt-rt and evaluate its stability and > suitability for use. > Cool - that makes sense - like I said, I am slammed with other stuff, but hopefully one of the heavywheigts on this will have a patch for you at some point - its definitely a bug of some species. Cheers Sven > On Fri, Aug 13, 2010 at 6:57 PM, Sven-Thorsten Dietrich > wrote: >> On Fri, 2010-08-13 at 17:20 -0400, John Culvertson wrote: >>> I changed from PREEMPT_RT to PREEMPT_DESKTOP, and now it will not >>> boot. I get the following error, then it hangs: >> LOL. Sounds like there is a little more work to be done on the .33 >> release. >> >> I think you are onto something however - although I am not at the moment >> blessed with the time to dig for you - >> >> linkwatch_event - is that part of e1000? >> >> If so, that's where I'd start digging. >> >> What happens if you first disable PREEMPT_HARDIRQs, >> and then next also disable PREEMPT_SOFTIRQs? >> >> I would assume after the latter it would work just fine. >> >> Regards, >> >> Sven >> >>> [ 9.025745] BUG: sleeping function called from invalid context at >>> mm/slab.c:3266 >>> [ 9.044859] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 16, >>> name: events/0 >>> [ 9.064192] Pid: 16, comm: events/0 Not tainted 2.6.33.7-rt29 #3 >>> [ 9.076674] Call Trace: >>> [ 9.085224] [] ? kmem_cache_alloc+0x1b/0x99 >>> [ 9.096780] [] ? __alloc_skb+0x2e/0x10a >>> [ 9.108079] [] ? alloc_skb+0x9/0xb >>> [ 9.118763] [] ? inet6_rt_notify+0x2f/0xb3 >>> [ 9.130285] [] ? fib6_add+0x21e/0x38e >>> [ 9.141487] [] ? __ip6_ins_rt+0x23/0x35 >>> [ 9.152743] [] ? addrconf_add_mroute+0x6c/0x72 >>> [ 9.164855] [] ? addrconf_add_dev+0x3d/0x49 >>> [ 9.176495] [] ? addrconf_notify+0x4f6/0x6d7 >>> [ 9.188213] [] ? extract_entropy+0x45/0xfe >>> [ 9.199753] [] ? need_resched+0x11/0x1a >>> [ 9.210686] [] ? rt_do_flush+0x26/0x105 >>> [ 9.221572] [] ? notifier_call_chain+0x2a/0x52 >>> [ 9.233426] [] ? raw_notifier_call_chain+0x9/0xc >>> [ 9.244968] [] ? netdev_state_change+0x18/0x29 >>> [ 9.256198] [] ? linkwatch_do_dev+0x9e/0xa7 >>> [ 9.267208] [] ? __linkwatch_run_queue+0xd4/0x108 >>> [ 9.278611] [] ? linkwatch_event+0x1d/0x22 >>> [ 9.289437] [] ? worker_thread+0xe1/0x15e >>> [ 9.299891] [] ? linkwatch_event+0x0/0x22 >>> [ 9.310157] [] ? autoremove_wake_function+0x0/0x2d >>> [ 9.321305] [] ? worker_thread+0x0/0x15e >>> [ 9.331182] [] ? kthread+0x52/0x57 >>> [ 9.340335] [] ? kthread+0x0/0x57 >>> [ 9.349248] [] ? kernel_thread_helper+0x6/0x10 >>> >>> >>> On Fri, Aug 13, 2010 at 4:17 PM, Sven-Thorsten Dietrich >>> wrote: >>>> On 08/13/2010 11:07 AM, John Culvertson wrote: >>>>> Thanks for the suggestions. I have tried the unpatched 2.6.33.7 >>>>> kernel, and the problem does not occur. The hardware is a single >>>>> board industrial computer with the network controllers onboard, so I >>>>> cannot easily try different NICs. I have not seen the problem occur >>>>> with only one port in use, but I have not tested that long enough to >>>>> be positive. >>>>> >>>>> One thing that may be a little odd about this computer is that both >>>>> Ethernet controllers (Intel 82559) share the same PCI interrupt. >>>>> Interrupt sharing should be OK, but since adjacent PCI slots in normal >>>>> PCs generally use different interrupts, it may not occur often in >>>>> other systems. >>>>> >>>> Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ >>>> threading options enabled? >>>> >>>> >>>> >> >> >>