From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sven-Thorsten Dietrich Subject: Re: 2.6.33.6-rt28 kernel oops while stressing network Date: Fri, 13 Aug 2010 15:57:25 -0700 Message-ID: <1281740245.13891.11.camel@baracus> References: <4C65A844.1010702@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: linux-rt-users@vger.kernel.org To: John Culvertson Return-path: Received: from mail-yw0-f46.google.com ([209.85.213.46]:48165 "EHLO mail-yw0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755842Ab0HMW54 (ORCPT ); Fri, 13 Aug 2010 18:57:56 -0400 Received: by ywh1 with SMTP id 1so1134042ywh.19 for ; Fri, 13 Aug 2010 15:57:56 -0700 (PDT) In-Reply-To: Sender: linux-rt-users-owner@vger.kernel.org List-ID: On Fri, 2010-08-13 at 17:20 -0400, John Culvertson wrote: > I changed from PREEMPT_RT to PREEMPT_DESKTOP, and now it will not > boot. I get the following error, then it hangs: LOL. Sounds like there is a little more work to be done on the .33 release. I think you are onto something however - although I am not at the moment blessed with the time to dig for you - linkwatch_event - is that part of e1000? If so, that's where I'd start digging. What happens if you first disable PREEMPT_HARDIRQs, and then next also disable PREEMPT_SOFTIRQs? I would assume after the latter it would work just fine. Regards, Sven > > [ 9.025745] BUG: sleeping function called from invalid context at > mm/slab.c:3266 > [ 9.044859] pcnt: 1 0 in_atomic(): 1, irqs_disabled(): 0, pid: 16, > name: events/0 > [ 9.064192] Pid: 16, comm: events/0 Not tainted 2.6.33.7-rt29 #3 > [ 9.076674] Call Trace: > [ 9.085224] [] ? kmem_cache_alloc+0x1b/0x99 > [ 9.096780] [] ? __alloc_skb+0x2e/0x10a > [ 9.108079] [] ? alloc_skb+0x9/0xb > [ 9.118763] [] ? inet6_rt_notify+0x2f/0xb3 > [ 9.130285] [] ? fib6_add+0x21e/0x38e > [ 9.141487] [] ? __ip6_ins_rt+0x23/0x35 > [ 9.152743] [] ? addrconf_add_mroute+0x6c/0x72 > [ 9.164855] [] ? addrconf_add_dev+0x3d/0x49 > [ 9.176495] [] ? addrconf_notify+0x4f6/0x6d7 > [ 9.188213] [] ? extract_entropy+0x45/0xfe > [ 9.199753] [] ? need_resched+0x11/0x1a > [ 9.210686] [] ? rt_do_flush+0x26/0x105 > [ 9.221572] [] ? notifier_call_chain+0x2a/0x52 > [ 9.233426] [] ? raw_notifier_call_chain+0x9/0xc > [ 9.244968] [] ? netdev_state_change+0x18/0x29 > [ 9.256198] [] ? linkwatch_do_dev+0x9e/0xa7 > [ 9.267208] [] ? __linkwatch_run_queue+0xd4/0x108 > [ 9.278611] [] ? linkwatch_event+0x1d/0x22 > [ 9.289437] [] ? worker_thread+0xe1/0x15e > [ 9.299891] [] ? linkwatch_event+0x0/0x22 > [ 9.310157] [] ? autoremove_wake_function+0x0/0x2d > [ 9.321305] [] ? worker_thread+0x0/0x15e > [ 9.331182] [] ? kthread+0x52/0x57 > [ 9.340335] [] ? kthread+0x0/0x57 > [ 9.349248] [] ? kernel_thread_helper+0x6/0x10 > > > On Fri, Aug 13, 2010 at 4:17 PM, Sven-Thorsten Dietrich > wrote: > > On 08/13/2010 11:07 AM, John Culvertson wrote: > >> > >> Thanks for the suggestions. I have tried the unpatched 2.6.33.7 > >> kernel, and the problem does not occur. The hardware is a single > >> board industrial computer with the network controllers onboard, so I > >> cannot easily try different NICs. I have not seen the problem occur > >> with only one port in use, but I have not tested that long enough to > >> be positive. > >> > >> One thing that may be a little odd about this computer is that both > >> Ethernet controllers (Intel 82559) share the same PCI interrupt. > >> Interrupt sharing should be OK, but since adjacent PCI slots in normal > >> PCs generally use different interrupts, it may not occur often in > >> other systems. > >> > > > > Does it reproduce when you turn off PREEMPT_RT, but leave all the IRQ > > threading options enabled? > > > > > >