From: John Kacur <jkacur@redhat.com>
To: Thomas Schauss <schauss@tum.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
RT <linux-rt-users@vger.kernel.org>
Subject: Re: 3.2-rc1 and nvidia drivers
Date: Mon, 28 Nov 2011 12:31:48 +0100 [thread overview]
Message-ID: <CAONaPpE7Jgz0GhFUB41ZKoa8dUKOqKVFKj6jHfi3EAnPX0SRbQ@mail.gmail.com> (raw)
In-Reply-To: <4ED35D9A.7090401@tum.de>
On Mon, Nov 28, 2011 at 11:08 AM, Thomas Schauss <schauss@tum.de> wrote:
> On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
>>
>> On Wed, 16 Nov 2011, Thomas Schauss wrote:
>>>
>>> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
>>> freezes when starting X on several different hardware setups (a few
>>> systems
>>> work fine). This is certainly caused by this combination. When using the
>>> nouveau-driver everything works fine.
>>
>> Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
>>
>
> Hello,
>
> thank you for that tip. I have tried this now and have not found any
> warnings which seem related to the nvidia-driver. Further testing revealed,
> that the driver works fine with CONFIG_PREEMPT_RTB and the freezes when
> running startx occur as soon as we switch to CONFIG_PREEMPT_RT_FULL.
>
> Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray
> that however seem unrelated to nvidia. As we could not find any other bugs
> with the same locking warning I attached one example below. You can find
> some complete bootlogs (all with deadlock-warnings, all with slightly
> different call-stack) and my kernel-config at
>
> http://www.lsr.ei.tum.de/team/schauss/lockdep/
>
> On rt-base I also get a lockdep-warning which however seems unrelated to the
> rt-full one (not in cache_flusharray). You can find that log on the same
> page.
>
> Best Regards,
> Thomas
>
>
>
> Nov 17 17:34:49 fix kernel: [ 30.750925]
> =============================================
> Nov 17 17:34:49 fix kernel: [ 30.750927] [ INFO: possible recursive
> locking detected ]
> Nov 17 17:34:49 fix kernel: [ 30.750930] 3.0.9-25-rt #0
> Nov 17 17:34:49 fix kernel: [ 30.750931]
> ---------------------------------------------
> Nov 17 17:34:49 fix kernel: [ 30.750933] udevd/517 is trying to acquire
> lock:
> Nov 17 17:34:49 fix kernel: [ 30.750935] (&parent->list_lock){+.+...}, at:
> [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.750944]
> Nov 17 17:34:49 fix kernel: [ 30.750945] but task is already holding lock:
> Nov 17 17:34:49 fix kernel: [ 30.750946] (&parent->list_lock){+.+...}, at:
> [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.750950]
> Nov 17 17:34:49 fix kernel: [ 30.750951] other info that might help us
> debug this:
> Nov 17 17:34:49 fix kernel: [ 30.750952] Possible unsafe locking
> scenario:
> Nov 17 17:34:49 fix kernel: [ 30.750953]
> Nov 17 17:34:49 fix kernel: [ 30.750954] CPU0
> Nov 17 17:34:49 fix kernel: [ 30.750955] ----
> Nov 17 17:34:49 fix kernel: [ 30.750956] lock(&parent->list_lock);
> Nov 17 17:34:49 fix kernel: [ 30.750958] lock(&parent->list_lock);
> Nov 17 17:34:49 fix kernel: [ 30.750959]
> Nov 17 17:34:49 fix kernel: [ 30.750960] *** DEADLOCK ***
> Nov 17 17:34:49 fix kernel: [ 30.750961]
> Nov 17 17:34:49 fix kernel: [ 30.750962] May be due to missing lock
> nesting notation
> Nov 17 17:34:49 fix kernel: [ 30.750963]
> Nov 17 17:34:49 fix kernel: [ 30.750964] 2 locks held by udevd/517:
> Nov 17 17:34:49 fix kernel: [ 30.750966] #0: (&per_cpu(slab_lock,
> __cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380
> Nov 17 17:34:49 fix kernel: [ 30.750973] #1:
> (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>]
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.750977]
> Nov 17 17:34:49 fix kernel: [ 30.750977] stack backtrace:
> Nov 17 17:34:49 fix kernel: [ 30.750980] Pid: 517, comm: udevd Not tainted
> 3.0.9-25-rt #0
> Nov 17 17:34:49 fix kernel: [ 30.750982] Call Trace:
> Nov 17 17:34:49 fix kernel: [ 30.750987] [<ffffffff810a0097>]
> print_deadlock_bug+0xf7/0x100
> Nov 17 17:34:49 fix kernel: [ 30.750991] [<ffffffff810a1add>]
> validate_chain.isra.37+0x67d/0x720
> Nov 17 17:34:49 fix kernel: [ 30.750995] [<ffffffff810a2478>]
> __lock_acquire+0x478/0x9c0
> Nov 17 17:34:49 fix kernel: [ 30.750999] [<ffffffff8162ae19>] ?
> sub_preempt_count+0x29/0x60
> Nov 17 17:34:49 fix kernel: [ 30.751003] [<ffffffff81627475>] ?
> _raw_spin_unlock+0x35/0x60
> Nov 17 17:34:49 fix kernel: [ 30.751007] [<ffffffff81625f0b>] ?
> rt_spin_lock_slowlock+0x2eb/0x340
> Nov 17 17:34:49 fix kernel: [ 30.751011] [<ffffffff81056be1>] ?
> get_parent_ip+0x11/0x50
> Nov 17 17:34:49 fix kernel: [ 30.751014] [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff810a2f64>]
> lock_acquire+0x94/0x160
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81626999>]
> rt_spin_lock+0x39/0x40
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8105a90b>] ?
> migrate_disable+0x6b/0xe0
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613e63>]
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81167a41>]
> kmem_cache_free+0x221/0x300
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81167b8f>]
> slab_destroy+0x6f/0xa0
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81167d32>]
> free_block+0x172/0x190
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81613eb4>]
> cache_flusharray+0x98/0xd6
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f1110>] ?
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f1110>] ?
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8116a806>]
> kfree+0x316/0x380
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f5328>] ?
> skb_queue_purge+0x28/0x40
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f1110>]
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814f11d5>]
> sk_free+0x25/0x30
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8152d908>]
> netlink_release+0x128/0x200
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814ea388>]
> sock_release+0x28/0x90
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff814eaa57>]
> sock_close+0x17/0x30
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8117b914>]
> __fput+0xb4/0x200
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8117ba85>]
> fput+0x25/0x30
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81177d0c>]
> filp_close+0x6c/0x90
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff81177df0>]
> sys_close+0xc0/0x130
> Nov 17 17:34:49 fix kernel: [ 30.751015] [<ffffffff8162ed02>]
> system_call_fastpath+0x16/0x1b
>
Hmm, I think I see how this can happen.
cache_flusharray()
spin_lock(&l3->list_lock);
free_block(cachep, ac->entry, batchcount, node);
slab_destroy()
kmem_cache_free()
__cache_free()
cache_flusharray()
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2011-11-28 11:31 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-16 9:10 3.2-rc1 and nvidia drivers Javier Sanz
2011-11-16 9:40 ` Thomas Schauss
2011-11-16 15:06 ` Thomas Gleixner
2011-11-28 10:08 ` Thomas Schauss
2011-11-28 11:31 ` John Kacur [this message]
2011-11-29 14:31 ` John Kacur
2011-11-30 2:36 ` Steven Rostedt
2011-11-30 8:23 ` John Kacur
2011-11-30 11:14 ` Peter Zijlstra
2011-11-30 14:14 ` Steven Rostedt
2011-11-30 14:16 ` Peter Zijlstra
2011-11-30 14:28 ` Steven Rostedt
2011-11-30 14:31 ` Steven Rostedt
2011-11-30 14:34 ` Peter Zijlstra
2011-11-30 15:07 ` Thomas Schauss
2011-11-30 15:20 ` Steven Rostedt
2011-12-02 17:41 ` Thomas Schauss
2011-12-02 19:37 ` Steven Rostedt
2011-11-30 13:34 ` Steven Rostedt
2011-11-30 13:39 ` John Kacur
2011-11-30 13:49 ` Steven Rostedt
2011-11-30 13:53 ` John Kacur
2011-11-30 9:06 ` Thomas Schauss
2011-11-16 9:52 ` Mike Galbraith
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAONaPpE7Jgz0GhFUB41ZKoa8dUKOqKVFKj6jHfi3EAnPX0SRbQ@mail.gmail.com \
--to=jkacur@redhat.com \
--cc=linux-rt-users@vger.kernel.org \
--cc=schauss@tum.de \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).