From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Schauss Subject: Re: 3.2-rc1 and nvidia drivers Date: Mon, 28 Nov 2011 11:08:26 +0100 Message-ID: <4ED35D9A.7090401@tum.de> References: <4EC384FD.1040106@tum.de> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------050601020002090906090802" Cc: RT To: Thomas Gleixner Return-path: Received: from mailrelay1.lrz-muenchen.de ([129.187.254.106]:53932 "EHLO mailrelay1.lrz-muenchen.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752591Ab1K1KIe (ORCPT ); Mon, 28 Nov 2011 05:08:34 -0500 Received: from scan6.mail.lrz.de (scan6.mail.lrz.de [10.156.6.40]) by mailrelay1.lrz-muenchen.de with ESMTP for linux-rt-users@vger.kernel.org; Mon, 28 Nov 2011 11:08:27 +0100 Received: from mailrelay1.lrz-muenchen.de ([10.156.6.201]) by scan6.mail.lrz.de (scan6.mail.lrz.de [10.156.6.26]) (amavisd-new, port 10024) with ESMTP id OW14fMV82zMT for ; Mon, 28 Nov 2011 11:08:27 +0100 (CET) Received: from robusta.lsr.ei.tum.de (robusta.lsr.ei.tum.de [129.187.147.176]) by mailrelay1.lrz-muenchen.de with ESMTP for linux-rt-users@vger.kernel.org; Mon, 28 Nov 2011 11:08:26 +0100 Received: from mail.lsr.ei.tum.de (mail.lsr.ei.tum.de [129.187.147.212]) by robusta.lsr.ei.tum.de (Postfix) with ESMTP id DB9FB150CAED for ; Mon, 28 Nov 2011 11:08:26 +0100 (CET) In-Reply-To: Sender: linux-rt-users-owner@vger.kernel.org List-ID: This is a multi-part message in MIME format. --------------050601020002090906090802 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit On 11/16/2011 04:06 PM, Thomas Gleixner wrote: > On Wed, 16 Nov 2011, Thomas Schauss wrote: >> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system >> freezes when starting X on several different hardware setups (a few systems >> work fine). This is certainly caused by this combination. When using the >> nouveau-driver everything works fine. > > Have you ever tried to run with CONFIG_PROVE_LOCKING=y ? > Hello, thank you for that tip. I have tried this now and have not found any warnings which seem related to the nvidia-driver. Further testing revealed, that the driver works fine with CONFIG_PREEMPT_RTB and the freezes when running startx occur as soon as we switch to CONFIG_PREEMPT_RT_FULL. Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray that however seem unrelated to nvidia. As we could not find any other bugs with the same locking warning I attached one example below. You can find some complete bootlogs (all with deadlock-warnings, all with slightly different call-stack) and my kernel-config at http://www.lsr.ei.tum.de/team/schauss/lockdep/ On rt-base I also get a lockdep-warning which however seems unrelated to the rt-full one (not in cache_flusharray). You can find that log on the same page. Best Regards, Thomas Nov 17 17:34:49 fix kernel: [ 30.750925] ============================================= Nov 17 17:34:49 fix kernel: [ 30.750927] [ INFO: possible recursive locking detected ] Nov 17 17:34:49 fix kernel: [ 30.750930] 3.0.9-25-rt #0 Nov 17 17:34:49 fix kernel: [ 30.750931] --------------------------------------------- Nov 17 17:34:49 fix kernel: [ 30.750933] udevd/517 is trying to acquire lock: Nov 17 17:34:49 fix kernel: [ 30.750935] (&parent->list_lock){+.+...}, at: [] cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.750944] Nov 17 17:34:49 fix kernel: [ 30.750945] but task is already holding lock: Nov 17 17:34:49 fix kernel: [ 30.750946] (&parent->list_lock){+.+...}, at: [] cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.750950] Nov 17 17:34:49 fix kernel: [ 30.750951] other info that might help us debug this: Nov 17 17:34:49 fix kernel: [ 30.750952] Possible unsafe locking scenario: Nov 17 17:34:49 fix kernel: [ 30.750953] Nov 17 17:34:49 fix kernel: [ 30.750954] CPU0 Nov 17 17:34:49 fix kernel: [ 30.750955] ---- Nov 17 17:34:49 fix kernel: [ 30.750956] lock(&parent->list_lock); Nov 17 17:34:49 fix kernel: [ 30.750958] lock(&parent->list_lock); Nov 17 17:34:49 fix kernel: [ 30.750959] Nov 17 17:34:49 fix kernel: [ 30.750960] *** DEADLOCK *** Nov 17 17:34:49 fix kernel: [ 30.750961] Nov 17 17:34:49 fix kernel: [ 30.750962] May be due to missing lock nesting notation Nov 17 17:34:49 fix kernel: [ 30.750963] Nov 17 17:34:49 fix kernel: [ 30.750964] 2 locks held by udevd/517: Nov 17 17:34:49 fix kernel: [ 30.750966] #0: (&per_cpu(slab_lock, __cpu).lock){+.+...}, at: [] kfree+0xd6/0x380 Nov 17 17:34:49 fix kernel: [ 30.750973] #1: (&parent->list_lock){+.+...}, at: [] cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.750977] Nov 17 17:34:49 fix kernel: [ 30.750977] stack backtrace: Nov 17 17:34:49 fix kernel: [ 30.750980] Pid: 517, comm: udevd Not tainted 3.0.9-25-rt #0 Nov 17 17:34:49 fix kernel: [ 30.750982] Call Trace: Nov 17 17:34:49 fix kernel: [ 30.750987] [] print_deadlock_bug+0xf7/0x100 Nov 17 17:34:49 fix kernel: [ 30.750991] [] validate_chain.isra.37+0x67d/0x720 Nov 17 17:34:49 fix kernel: [ 30.750995] [] __lock_acquire+0x478/0x9c0 Nov 17 17:34:49 fix kernel: [ 30.750999] [] ? sub_preempt_count+0x29/0x60 Nov 17 17:34:49 fix kernel: [ 30.751003] [] ? _raw_spin_unlock+0x35/0x60 Nov 17 17:34:49 fix kernel: [ 30.751007] [] ? rt_spin_lock_slowlock+0x2eb/0x340 Nov 17 17:34:49 fix kernel: [ 30.751011] [] ? get_parent_ip+0x11/0x50 Nov 17 17:34:49 fix kernel: [ 30.751014] [] ? cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [] lock_acquire+0x94/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [] ? cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [] rt_spin_lock+0x39/0x40 Nov 17 17:34:49 fix kernel: [ 30.751015] [] ? cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [] ? migrate_disable+0x6b/0xe0 Nov 17 17:34:49 fix kernel: [ 30.751015] [] cache_flusharray+0x47/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [] kmem_cache_free+0x221/0x300 Nov 17 17:34:49 fix kernel: [ 30.751015] [] slab_destroy+0x6f/0xa0 Nov 17 17:34:49 fix kernel: [ 30.751015] [] free_block+0x172/0x190 Nov 17 17:34:49 fix kernel: [ 30.751015] [] cache_flusharray+0x98/0xd6 Nov 17 17:34:49 fix kernel: [ 30.751015] [] ? __sk_free+0x130/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [] ? __sk_free+0x130/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [] kfree+0x316/0x380 Nov 17 17:34:49 fix kernel: [ 30.751015] [] ? skb_queue_purge+0x28/0x40 Nov 17 17:34:49 fix kernel: [ 30.751015] [] __sk_free+0x130/0x160 Nov 17 17:34:49 fix kernel: [ 30.751015] [] sk_free+0x25/0x30 Nov 17 17:34:49 fix kernel: [ 30.751015] [] netlink_release+0x128/0x200 Nov 17 17:34:49 fix kernel: [ 30.751015] [] sock_release+0x28/0x90 Nov 17 17:34:49 fix kernel: [ 30.751015] [] sock_close+0x17/0x30 Nov 17 17:34:49 fix kernel: [ 30.751015] [] __fput+0xb4/0x200 Nov 17 17:34:49 fix kernel: [ 30.751015] [] fput+0x25/0x30 Nov 17 17:34:49 fix kernel: [ 30.751015] [] filp_close+0x6c/0x90 Nov 17 17:34:49 fix kernel: [ 30.751015] [] sys_close+0xc0/0x130 Nov 17 17:34:49 fix kernel: [ 30.751015] [] system_call_fastpath+0x16/0x1b --------------050601020002090906090802 Content-Type: text/x-vcard; charset=utf-8; name="schauss.vcf" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="schauss.vcf" begin:vcard fn:Thomas Schauss n:Schauss;Thomas org:Technische Universitaet Muenchen (TUM);Institute of Automatic Control Engineering (LSR) adr:;;Theresienstr. 90;Munich;;80333;Germany email;internet:schauss@tum.de title:Dipl.-Ing. (Univ.) tel;work:+49 89 289 23406 tel;fax:+49 89 289 28340 url:http://www.lsr.ei.tum.de version:2.1 end:vcard --------------050601020002090906090802--