linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How do I get rid of these "BUG: sleeping function called from ... kernel/rtmutex.c:707"?
@ 2011-05-20 14:30 Emmanuel Deloget
  2011-05-24 13:05 ` Emmanuel Deloget
  0 siblings, 1 reply; 2+ messages in thread
From: Emmanuel Deloget @ 2011-05-20 14:30 UTC (permalink / raw)
  To: linux-rt-users

Hello,

I hope this message will find its way to the linux-rt mailing list. I 
subscribed but for reasons that are unknown to me I cannot receive 
anything from this list (I contacted the owner to sort out the problem). 
As I side note, for this very reason, I'll appreciate if you CC me 
whenever you answer to this mail, otherwise I might miss it. Thanks in 
advance.

I am using 2.6.33.7-rt30 (platform in arm/mach-ixp4xx ; distro is 
OpenWRT with 2.6.33.7 re-imported (it has been removed from OpenWRT)).

When I up a network interface with ifconfig, I systematically get the 
following error message in dmesg :

[   64.205417] BUG: sleeping function called from invalid context at 
kernel/rtmutex.c:707
[   64.205453] pcnt: 0 0 in_atomic(): 0, irqs_disabled(): 128, pid: 
1047, name: ifconfig
[   64.205472] Backtrace:
[   64.205521] [<c002b6c0>] (dump_backtrace+0x0/0x110) from [<c02dbe20>] 
(dump_stack+0x18/0x1c)
[   64.205545]  r7:c780daa0 r6:c780daa0 r5:00000020 r4:00000000
[   64.205591] [<c02dbe08>] (dump_stack+0x0/0x1c) from [<c0032f7c>] 
(__might_sleep+0xf8/0x118)
[   64.205629] [<c0032e84>] (__might_sleep+0x0/0x118) from [<c02de468>] 
(rt_spin_lock+0x34/0x64)
[   64.205650]  r4:c03a18d8
[   64.205689] [<c02de434>] (rt_spin_lock+0x0/0x64) from [<c0095908>] 
(kmem_cache_alloc+0x40/0x15c)
[   64.205711]  r4:c5bd1df0
[   64.205744] [<c00958c8>] (kmem_cache_alloc+0x0/0x15c) from 
[<c01c749c>] (__alloc_skb+0x30/0x104)
[   64.205778] [<c01c746c>] (__alloc_skb+0x0/0x104) from [<c01c813c>] 
(dev_alloc_skb+0x20/0x44)
[   64.205800]  r8:a0000013 r7:c42f4340 r6:c42f4000 r5:c5ae1540 r4:c5bd1df0
[   64.205866] [<c01c811c>] (dev_alloc_skb+0x0/0x44) from [<bf0d9a88>] 
(do_dev_stop+0x11c/0x2e4 [ixp400_eth])
[   64.205909] [<bf0d9a60>] (do_dev_stop+0xf4/0x2e4 [ixp400_eth]) from 
[<bf0d9ba8>] (do_dev_stop+0x23c/0x2e4 [ixp400_eth])
[   64.205952] [<bf0d9b30>] (do_dev_stop+0x1c4/0x2e4 [ixp400_eth]) from 
[<bf0da230>] (do_dev_open+0x24/0xa4 [ixp400_eth])
[   64.205977]  r6:00001043 r5:bf0db448 r4:c42f4000
[   64.206020] [<bf0da20c>] (do_dev_open+0x0/0xa4 [ixp400_eth]) from 
[<c01d0e0c>] (dev_open+0xcc/0x134)
[   64.206042]  r5:bf0db1d0 r4:c42f4000
[   64.206086] [<c01d0d40>] (dev_open+0x0/0x134) from [<c01d0204>] 
(dev_change_flags+0xb0/0x190)
[   64.206107]  r5:00000041 r4:c42f4000
[   64.206154] [<c01d0154>] (dev_change_flags+0x0/0x190) from 
[<c0256444>] (devinet_ioctl+0x2f0/0x6b0)
[   64.206177]  r7:c5bd1e80 r6:c59e0600 r5:c788c4e0 r4:00000000
[   64.206226] [<c0256154>] (devinet_ioctl+0x0/0x6b0) from [<c02579a8>] 
(inet_ioctl+0xd0/0x10c)
[   64.206270] [<c02578d8>] (inet_ioctl+0x0/0x10c) from [<c01be7c8>] 
(sock_ioctl+0x1fc/0x25c)
[   64.206291]  r4:c4ab3ec0
[   64.206318] [<c01be5cc>] (sock_ioctl+0x0/0x25c) from [<c00a57bc>] 
(vfs_ioctl+0x34/0xb4)
[   64.206338]  r6:be8cdd60 r5:00008914 r4:c4ab3ec0
[   64.206375] [<c00a5788>] (vfs_ioctl+0x0/0xb4) from [<c00a5e98>] 
(do_vfs_ioctl+0x548/0x5b4)
[   64.206396]  r6:00000003 r5:c4ab3ec0 r4:c6014afc
[   64.206433] [<c00a5950>] (do_vfs_ioctl+0x0/0x5b4) from [<c00a5f44>] 
(sys_ioctl+0x40/0x60)
[   64.206477] [<c00a5f04>] (sys_ioctl+0x0/0x60) from [<c0027f20>] 
(ret_fast_syscall+0x0/0x28)
[   64.206499]  r7:00000036 r6:000001c3 r5:00000004 r4:000508e7

It seems that this message does not prevent ifconfig to work correctly. 
The code at kernel/rtmutex.c:707 says:

/* 701 */ static inline void
/* 702 */ rt_spin_lock_fastlock(struct rt_mutex *lock,
/* 703 */         void  (*slowfn)(struct rt_mutex *lock))
/* 704 */ {
/* 705 */     /* Temporary HACK! */
/* 706 */     if (likely(!current->in_printk))
/* 707 */         might_sleep();
/* 708 */     else if (in_atomic() || irqs_disabled())
/* 709 */         /* don't grab locks for printk in atomic */
/* 710 */         return;
/* 711 */
/* 712 */     if (likely(rt_mutex_cmpxchg(lock, NULL, current)))
/* 713 */         rt_mutex_deadlock_account_lock(lock, current);
/* 714 */     else
/* 715 */         slowfn(lock);
/* 716 */ }

I did my homework, and found that a similar message have been seen here 
and there ([1], [2]). But the solution provided in [1] (avoid the 
dynamic allocation, reserve an automatic (as in C 'auto' keyword) 
buffer) is not helping much in my case : I'm not going to ask to 
dev_alloc_skb() to not perform static allocations, while [2] do not 
provide any kind of solution.

Now, I want to get rid of these messages in the correct way. I can 
implement a bizzare hack to remove this issue, but I have no guarantee 
that this hack will be sensible. What I'm searching for is a correct (in 
the linux kernel way) to do this.

Can anybody help?

Thanks in advance,

-- Emmanuel Deloget

[1] http://www.spinics.net/lists/linux-rt-users/msg06048.html
[2] http://comments.gmane.org/gmane.linux.kernel.firewire.devel/14648


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: How do I get rid of these "BUG: sleeping function called from ... kernel/rtmutex.c:707"?
  2011-05-20 14:30 How do I get rid of these "BUG: sleeping function called from ... kernel/rtmutex.c:707"? Emmanuel Deloget
@ 2011-05-24 13:05 ` Emmanuel Deloget
  0 siblings, 0 replies; 2+ messages in thread
From: Emmanuel Deloget @ 2011-05-24 13:05 UTC (permalink / raw)
  To: linux-rt-users

Hello again,

Although I am unvoluntary disrupting the netiquete, I have to answer my 
own mail.

I've gone through multiple passes of investigation, and I have to temper 
my words a bit. Now, I'm no longer working to find a solution to this 
issue, as there is no obvious solution. Here is my analysis.

On 05/20/2011 04:30 PM, Emmanuel Deloget wrote:
> Hello,
>
> I hope this message will find its way to the linux-rt mailing list. I 
> subscribed but for reasons that are unknown to me I cannot receive 
> anything from this list (I contacted the owner to sort out the 
> problem). As I side note, for this very reason, I'll appreciate if you 
> CC me whenever you answer to this mail, otherwise I might miss it. 
> Thanks in advance.
>
> I am using 2.6.33.7-rt30 (platform in arm/mach-ixp4xx ; distro is 
> OpenWRT with 2.6.33.7 re-imported (it has been removed from OpenWRT)).
>
> When I up a network interface with ifconfig, I systematically get the 
> following error message in dmesg :
>
> [   64.205417] BUG: sleeping function called from invalid context at 
> kernel/rtmutex.c:707
> [   64.205453] pcnt: 0 0 in_atomic(): 0, irqs_disabled(): 128, pid: 
> 1047, name: ifconfig
> [   64.205472] Backtrace:
<snip>

irqs_disabled() is the problem here. The RT kernel rightfully warn me 
that I'm trying to sleep in a context where some interrupts are blocked.

> [   64.205689] [<c02de434>] (rt_spin_lock+0x0/0x64) from [<c0095908>] 
> (kmem_cache_alloc+0x40/0x15c)
> [   64.205711]  r4:c5bd1df0
> [   64.205866] [<c01c811c>] (dev_alloc_skb+0x0/0x44) from [<bf0d9a88>] 
> (do_dev_stop+0x11c/0x2e4 [ixp400_eth])
> [   64.205909] [<bf0d9a60>] (do_dev_stop+0xf4/0x2e4 [ixp400_eth]) from 
> [<bf0d9ba8>] (do_dev_stop+0x23c/0x2e4 [ixp400_eth])
<snip>

And the problem comes from the ixp400 ethernet driver (from intel ; 
GPLv2, as clearly stated in the different code files, although the 
module does not declare MODULE_LICENSE. I'm going to file a bug wrt 
this, if I can find an Intel representative.

The issue really lies in intel's driver architecture, which is not 
PREEMPT-RT friendly. The driver maintains a list of skb, and this list 
is used by an ISR. When maintenance tasks are run, the driver disable 
IRQs to avoid concurrency issues. But then, it allocates memory using 
dev_alloc_skb().

Since I'm not willing to modify intel's driver architecture, and I'm not 
willing to modify the PREMPT-RT patch (as I will not have enough cycles 
to test even the simplest change), my only solution is to let this 
problem as it is. Not only the ixp400_eth driver has not been coded with 
the RT patch in mind, but this BUG message does not prevent the system 
to work correctly.

Still, there is question for which I'd like to get an answer, and this 
question is directly related to the code of __might_sleep() in 
kernel/sched.c (when CONFIG_DEBUG_PREEMPT is defined):

/* 10115 */ void __might_sleep(char *file, int line, int preempt_offset)
/* 10116 */ {
/* 10117 */ #ifdef in_atomic
/* 10118 */     static unsigned long prev_jiffy;    /* ratelimiting */
/* 10119 */
/* 10120 */     if ((preempt_count_equals(preempt_offset) && 
!irqs_disabled()) ||
/* 10121 */         system_state != SYSTEM_RUNNING || oops_in_progress)
/* 10122 */         return;
/* 10123 */     if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
/* 10124 */         return;
/* 10125 */     prev_jiffy = jiffies;
/* 10126 */
/* 10127 */     printk(KERN_ERR
/* 10128 */         "BUG: sleeping function called from invalid context 
at %s:%d\n",
<...snip...>
/* 10139 */     dump_stack();
/* 10140 */ #endif
/* 10141 */ }
/* 10142 */ EXPORT_SYMBOL(__might_sleep);

(keep in mind that this is an OpenWRT version ; some patches (other than 
the prempt-rt patch) might have been applied on this file, and the line 
numbers might vary).

My question is related to line 10120, and more precisely to the 
!irqs_disabled() test. I understand that when IRQs are disabled, it's a 
good idea to never sleep. But then, not all IRQs are equal - some arise 
quite rarely, or might be OK with seeing themselves postponned. In other 
words, only a limited set of interrupts are important enough to justify 
such a behavior.

Wouldn't it be better to check for these interrupts instead of checking 
for *all* interrupts, as irqs_disabled() does ?

Best regards,

-- Emmanuel Deloget


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-05-24 12:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-05-20 14:30 How do I get rid of these "BUG: sleeping function called from ... kernel/rtmutex.c:707"? Emmanuel Deloget
2011-05-24 13:05 ` Emmanuel Deloget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).