Re: 3.2-rc1 and nvidia drivers

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Thomas Schauss <schauss@tum.de>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: RT <linux-rt-users@vger.kernel.org>
Subject: Re: 3.2-rc1 and nvidia drivers
Date: Mon, 28 Nov 2011 11:08:26 +0100	[thread overview]
Message-ID: <4ED35D9A.7090401@tum.de> (raw)
In-Reply-To: <alpine.LFD.2.02.1111161605580.4902@ionos>

[-- Attachment #1: Type: text/plain, Size: 6379 bytes --]

On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
> On Wed, 16 Nov 2011, Thomas Schauss wrote:
>> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
>> freezes when starting X on several different hardware setups (a few systems
>> work fine). This is certainly caused by this combination. When using the
>> nouveau-driver everything works fine.
>
> Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
>

Hello,

thank you for that tip. I have tried this now and have not found any 
warnings which seem related to the nvidia-driver. Further testing 
revealed, that the driver works fine with CONFIG_PREEMPT_RTB and the 
freezes when running startx occur as soon as we switch to 
CONFIG_PREEMPT_RT_FULL.

Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray 
that however seem unrelated to nvidia. As we could not find any other 
bugs with the same locking warning I attached one example below. You can 
find some complete bootlogs (all with deadlock-warnings, all with 
slightly different call-stack) and my kernel-config at

http://www.lsr.ei.tum.de/team/schauss/lockdep/

On rt-base I also get a lockdep-warning which however seems unrelated to 
the rt-full one (not in cache_flusharray). You can find that log on the 
same page.

Best Regards,
Thomas



Nov 17 17:34:49 fix kernel: [   30.750925] 
=============================================
Nov 17 17:34:49 fix kernel: [   30.750927] [ INFO: possible recursive 
locking detected ]
Nov 17 17:34:49 fix kernel: [   30.750930] 3.0.9-25-rt #0
Nov 17 17:34:49 fix kernel: [   30.750931] 
---------------------------------------------
Nov 17 17:34:49 fix kernel: [   30.750933] udevd/517 is trying to 
acquire lock:
Nov 17 17:34:49 fix kernel: [   30.750935] 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750944]
Nov 17 17:34:49 fix kernel: [   30.750945] but task is already holding lock:
Nov 17 17:34:49 fix kernel: [   30.750946] 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750950]
Nov 17 17:34:49 fix kernel: [   30.750951] other info that might help us 
debug this:
Nov 17 17:34:49 fix kernel: [   30.750952]  Possible unsafe locking 
scenario:
Nov 17 17:34:49 fix kernel: [   30.750953]
Nov 17 17:34:49 fix kernel: [   30.750954]        CPU0
Nov 17 17:34:49 fix kernel: [   30.750955]        ----
Nov 17 17:34:49 fix kernel: [   30.750956]   lock(&parent->list_lock);
Nov 17 17:34:49 fix kernel: [   30.750958]   lock(&parent->list_lock);
Nov 17 17:34:49 fix kernel: [   30.750959]
Nov 17 17:34:49 fix kernel: [   30.750960]  *** DEADLOCK ***
Nov 17 17:34:49 fix kernel: [   30.750961]
Nov 17 17:34:49 fix kernel: [   30.750962]  May be due to missing lock 
nesting notation
Nov 17 17:34:49 fix kernel: [   30.750963]
Nov 17 17:34:49 fix kernel: [   30.750964] 2 locks held by udevd/517:
Nov 17 17:34:49 fix kernel: [   30.750966]  #0:  (&per_cpu(slab_lock, 
__cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380
Nov 17 17:34:49 fix kernel: [   30.750973]  #1: 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750977]
Nov 17 17:34:49 fix kernel: [   30.750977] stack backtrace:
Nov 17 17:34:49 fix kernel: [   30.750980] Pid: 517, comm: udevd Not 
tainted 3.0.9-25-rt #0
Nov 17 17:34:49 fix kernel: [   30.750982] Call Trace:
Nov 17 17:34:49 fix kernel: [   30.750987]  [<ffffffff810a0097>] 
print_deadlock_bug+0xf7/0x100
Nov 17 17:34:49 fix kernel: [   30.750991]  [<ffffffff810a1add>] 
validate_chain.isra.37+0x67d/0x720
Nov 17 17:34:49 fix kernel: [   30.750995]  [<ffffffff810a2478>] 
__lock_acquire+0x478/0x9c0
Nov 17 17:34:49 fix kernel: [   30.750999]  [<ffffffff8162ae19>] ? 
sub_preempt_count+0x29/0x60
Nov 17 17:34:49 fix kernel: [   30.751003]  [<ffffffff81627475>] ? 
_raw_spin_unlock+0x35/0x60
Nov 17 17:34:49 fix kernel: [   30.751007]  [<ffffffff81625f0b>] ? 
rt_spin_lock_slowlock+0x2eb/0x340
Nov 17 17:34:49 fix kernel: [   30.751011]  [<ffffffff81056be1>] ? 
get_parent_ip+0x11/0x50
Nov 17 17:34:49 fix kernel: [   30.751014]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff810a2f64>] 
lock_acquire+0x94/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81626999>] 
rt_spin_lock+0x39/0x40
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8105a90b>] ? 
migrate_disable+0x6b/0xe0
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167a41>] 
kmem_cache_free+0x221/0x300
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167b8f>] 
slab_destroy+0x6f/0xa0
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167d32>] 
free_block+0x172/0x190
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613eb4>] 
cache_flusharray+0x98/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ? 
__sk_free+0x130/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ? 
__sk_free+0x130/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8116a806>] 
kfree+0x316/0x380
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f5328>] ? 
skb_queue_purge+0x28/0x40
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] 
__sk_free+0x130/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f11d5>] 
sk_free+0x25/0x30
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8152d908>] 
netlink_release+0x128/0x200
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814ea388>] 
sock_release+0x28/0x90
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814eaa57>] 
sock_close+0x17/0x30
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117b914>] 
__fput+0xb4/0x200
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117ba85>] 
fput+0x25/0x30
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177d0c>] 
filp_close+0x6c/0x90
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177df0>] 
sys_close+0xc0/0x130
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8162ed02>] 
system_call_fastpath+0x16/0x1b

[-- Attachment #2: schauss.vcf --]
[-- Type: text/x-vcard, Size: 342 bytes --]

begin:vcard
fn:Thomas Schauss
n:Schauss;Thomas
org:Technische Universitaet Muenchen (TUM);Institute of Automatic Control Engineering (LSR)
adr:;;Theresienstr. 90;Munich;;80333;Germany
email;internet:schauss@tum.de
title:Dipl.-Ing. (Univ.)
tel;work:+49 89 289 23406
tel;fax:+49 89 289 28340
url:http://www.lsr.ei.tum.de
version:2.1
end:vcard

next prev parent reply	other threads:[~2011-11-28 10:08 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-16  9:10 3.2-rc1 and nvidia drivers Javier Sanz
2011-11-16  9:40 ` Thomas Schauss
2011-11-16 15:06   ` Thomas Gleixner
2011-11-28 10:08     ` Thomas Schauss [this message]
2011-11-28 11:31       ` John Kacur
2011-11-29 14:31         ` John Kacur
2011-11-30  2:36           ` Steven Rostedt
2011-11-30  8:23             ` John Kacur
2011-11-30 11:14               ` Peter Zijlstra
2011-11-30 14:14                 ` Steven Rostedt
2011-11-30 14:16                   ` Peter Zijlstra
2011-11-30 14:28                     ` Steven Rostedt
2011-11-30 14:31                     ` Steven Rostedt
2011-11-30 14:34                       ` Peter Zijlstra
2011-11-30 15:07                       ` Thomas Schauss
2011-11-30 15:20                         ` Steven Rostedt
2011-12-02 17:41                           ` Thomas Schauss
2011-12-02 19:37                             ` Steven Rostedt
2011-11-30 13:34               ` Steven Rostedt
2011-11-30 13:39                 ` John Kacur
2011-11-30 13:49                   ` Steven Rostedt
2011-11-30 13:53                     ` John Kacur
2011-11-30  9:06           ` Thomas Schauss
2011-11-16  9:52 ` Mike Galbraith

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ED35D9A.7090401@tum.de \
    --to=schauss@tum.de \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.