Re: 3.2-rc1 and nvidia drivers

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Kacur <jkacur@redhat.com>
To: Thomas Schauss <schauss@tum.de>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	RT <linux-rt-users@vger.kernel.org>
Subject: Re: 3.2-rc1 and nvidia drivers
Date: Mon, 28 Nov 2011 12:31:48 +0100	[thread overview]
Message-ID: <CAONaPpE7Jgz0GhFUB41ZKoa8dUKOqKVFKj6jHfi3EAnPX0SRbQ@mail.gmail.com> (raw)
In-Reply-To: <4ED35D9A.7090401@tum.de>

On Mon, Nov 28, 2011 at 11:08 AM, Thomas Schauss <schauss@tum.de> wrote:
> On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
>>
>> On Wed, 16 Nov 2011, Thomas Schauss wrote:
>>>
>>> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
>>> freezes when starting X on several different hardware setups (a few
>>> systems
>>> work fine). This is certainly caused by this combination. When using the
>>> nouveau-driver everything works fine.
>>
>> Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
>>
>
> Hello,
>
> thank you for that tip. I have tried this now and have not found any
> warnings which seem related to the nvidia-driver. Further testing revealed,
> that the driver works fine with CONFIG_PREEMPT_RTB and the freezes when
> running startx occur as soon as we switch to CONFIG_PREEMPT_RT_FULL.
>
> Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray
> that however seem unrelated to nvidia. As we could not find any other bugs
> with the same locking warning I attached one example below. You can find
> some complete bootlogs (all with deadlock-warnings, all with slightly
> different call-stack) and my kernel-config at
>
> http://www.lsr.ei.tum.de/team/schauss/lockdep/
>
> On rt-base I also get a lockdep-warning which however seems unrelated to the
> rt-full one (not in cache_flusharray). You can find that log on the same
> page.
>
> Best Regards,
> Thomas
>
>
>
> Nov 17 17:34:49 fix kernel: [   30.750925]
> =============================================
> Nov 17 17:34:49 fix kernel: [   30.750927] [ INFO: possible recursive
> locking detected ]
> Nov 17 17:34:49 fix kernel: [   30.750930] 3.0.9-25-rt #0
> Nov 17 17:34:49 fix kernel: [   30.750931]
> ---------------------------------------------
> Nov 17 17:34:49 fix kernel: [   30.750933] udevd/517 is trying to acquire
> lock:
> Nov 17 17:34:49 fix kernel: [   30.750935] (&parent->list_lock){+.+...}, at:
> [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.750944]
> Nov 17 17:34:49 fix kernel: [   30.750945] but task is already holding lock:
> Nov 17 17:34:49 fix kernel: [   30.750946] (&parent->list_lock){+.+...}, at:
> [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.750950]
> Nov 17 17:34:49 fix kernel: [   30.750951] other info that might help us
> debug this:
> Nov 17 17:34:49 fix kernel: [   30.750952]  Possible unsafe locking
> scenario:
> Nov 17 17:34:49 fix kernel: [   30.750953]
> Nov 17 17:34:49 fix kernel: [   30.750954]        CPU0
> Nov 17 17:34:49 fix kernel: [   30.750955]        ----
> Nov 17 17:34:49 fix kernel: [   30.750956]   lock(&parent->list_lock);
> Nov 17 17:34:49 fix kernel: [   30.750958]   lock(&parent->list_lock);
> Nov 17 17:34:49 fix kernel: [   30.750959]
> Nov 17 17:34:49 fix kernel: [   30.750960]  *** DEADLOCK ***
> Nov 17 17:34:49 fix kernel: [   30.750961]
> Nov 17 17:34:49 fix kernel: [   30.750962]  May be due to missing lock
> nesting notation
> Nov 17 17:34:49 fix kernel: [   30.750963]
> Nov 17 17:34:49 fix kernel: [   30.750964] 2 locks held by udevd/517:
> Nov 17 17:34:49 fix kernel: [   30.750966]  #0:  (&per_cpu(slab_lock,
> __cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380
> Nov 17 17:34:49 fix kernel: [   30.750973]  #1:
> (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>]
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.750977]
> Nov 17 17:34:49 fix kernel: [   30.750977] stack backtrace:
> Nov 17 17:34:49 fix kernel: [   30.750980] Pid: 517, comm: udevd Not tainted
> 3.0.9-25-rt #0
> Nov 17 17:34:49 fix kernel: [   30.750982] Call Trace:
> Nov 17 17:34:49 fix kernel: [   30.750987]  [<ffffffff810a0097>]
> print_deadlock_bug+0xf7/0x100
> Nov 17 17:34:49 fix kernel: [   30.750991]  [<ffffffff810a1add>]
> validate_chain.isra.37+0x67d/0x720
> Nov 17 17:34:49 fix kernel: [   30.750995]  [<ffffffff810a2478>]
> __lock_acquire+0x478/0x9c0
> Nov 17 17:34:49 fix kernel: [   30.750999]  [<ffffffff8162ae19>] ?
> sub_preempt_count+0x29/0x60
> Nov 17 17:34:49 fix kernel: [   30.751003]  [<ffffffff81627475>] ?
> _raw_spin_unlock+0x35/0x60
> Nov 17 17:34:49 fix kernel: [   30.751007]  [<ffffffff81625f0b>] ?
> rt_spin_lock_slowlock+0x2eb/0x340
> Nov 17 17:34:49 fix kernel: [   30.751011]  [<ffffffff81056be1>] ?
> get_parent_ip+0x11/0x50
> Nov 17 17:34:49 fix kernel: [   30.751014]  [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff810a2f64>]
> lock_acquire+0x94/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81626999>]
> rt_spin_lock+0x39/0x40
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8105a90b>] ?
> migrate_disable+0x6b/0xe0
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>]
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167a41>]
> kmem_cache_free+0x221/0x300
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167b8f>]
> slab_destroy+0x6f/0xa0
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167d32>]
> free_block+0x172/0x190
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613eb4>]
> cache_flusharray+0x98/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ?
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ?
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8116a806>]
> kfree+0x316/0x380
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f5328>] ?
> skb_queue_purge+0x28/0x40
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>]
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f11d5>]
> sk_free+0x25/0x30
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8152d908>]
> netlink_release+0x128/0x200
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814ea388>]
> sock_release+0x28/0x90
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814eaa57>]
> sock_close+0x17/0x30
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117b914>]
> __fput+0xb4/0x200
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117ba85>]
> fput+0x25/0x30
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177d0c>]
> filp_close+0x6c/0x90
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177df0>]
> sys_close+0xc0/0x130
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8162ed02>]
> system_call_fastpath+0x16/0x1b
>

Hmm, I think I see how this can happen.

cache_flusharray()
spin_lock(&l3->list_lock);
free_block(cachep, ac->entry, batchcount, node);
        slab_destroy()
        kmem_cache_free()
                __cache_free()
                cache_flusharray()
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2011-11-28 11:31 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-16  9:10 3.2-rc1 and nvidia drivers Javier Sanz
2011-11-16  9:40 ` Thomas Schauss
2011-11-16 15:06   ` Thomas Gleixner
2011-11-28 10:08     ` Thomas Schauss
2011-11-28 11:31       ` John Kacur [this message]
2011-11-29 14:31         ` John Kacur
2011-11-30  2:36           ` Steven Rostedt
2011-11-30  8:23             ` John Kacur
2011-11-30 11:14               ` Peter Zijlstra
2011-11-30 14:14                 ` Steven Rostedt
2011-11-30 14:16                   ` Peter Zijlstra
2011-11-30 14:28                     ` Steven Rostedt
2011-11-30 14:31                     ` Steven Rostedt
2011-11-30 14:34                       ` Peter Zijlstra
2011-11-30 15:07                       ` Thomas Schauss
2011-11-30 15:20                         ` Steven Rostedt
2011-12-02 17:41                           ` Thomas Schauss
2011-12-02 19:37                             ` Steven Rostedt
2011-11-30 13:34               ` Steven Rostedt
2011-11-30 13:39                 ` John Kacur
2011-11-30 13:49                   ` Steven Rostedt
2011-11-30 13:53                     ` John Kacur
2011-11-30  9:06           ` Thomas Schauss
2011-11-16  9:52 ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAONaPpE7Jgz0GhFUB41ZKoa8dUKOqKVFKj6jHfi3EAnPX0SRbQ@mail.gmail.com \
    --to=jkacur@redhat.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=schauss@tum.de \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).