3.2-rc1 and nvidia drivers

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 3.2-rc1 and nvidia drivers
@ 2011-11-16  9:10 Javier Sanz
  2011-11-16  9:40 ` Thomas Schauss
  2011-11-16  9:52 ` Mike Galbraith
  0 siblings, 2 replies; 24+ messages in thread
From: Javier Sanz @ 2011-11-16  9:10 UTC (permalink / raw)
  To: RT

Hello,

Congratulations all people involved in 3.2-rc1-rt2 release, great job !

So, can i ask a favor ..

Can anyone publish a patch for nvidia drivers and rt series ...
"resolving" the "GPL" string issue?

Really, i tried to get it run , but, it doen't work .. and i think
there are a lot of people that need it , but ,..
why you don't ask ?

So, if some has running it .,. please, 3 minutes ... publish it

thank you

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-16  9:10 3.2-rc1 and nvidia drivers Javier Sanz
@ 2011-11-16  9:40 ` Thomas Schauss
  2011-11-16 15:06   ` Thomas Gleixner
  2011-11-16  9:52 ` Mike Galbraith
  1 sibling, 1 reply; 24+ messages in thread
From: Thomas Schauss @ 2011-11-16  9:40 UTC (permalink / raw)
  To: Javier Sanz; +Cc: RT

[-- Attachment #1: Type: text/plain, Size: 5761 bytes --]

Hello,

actually I have wanted to write to this list about NVIDIA-drivers for 
some time and will use this opportunity.

Generally, it is obviously preferable to use the nouveau-driver if 
possible which is working just fine. At our institute however, we need 
to use the rt-patch in combination with the proprietary NVIDIA driver as 
we need CUDA-support combined with low-latency control (e.g. for visual 
servoing). And I am sure we are not the only ones requiring this 
combination.

We have been using the 2.6.33-rt29 for some time now without any issues 
and with very satisfactory latency. This required a small patch to the 
NVIDIA-driver (there is a check for PREEMPT_RT) as atomic_spin* was 
renamed to raw_spin* in one of the 2.6-rt-patches.

3.0-rt requires an additional slight modification as CONFIG_PREEMPT_RT 
is not defined anymore and we must check instead for 
CONFIG_PREEMPT_RT_FULL to decide whether to use raw spin_locks. Using 
this patch it is then not necessary to change any EXPORT_SYMBOL_GPL to 
EXPORT_SYMBOL in the kernel as was proposed in some other threads here.

You can find the patches for the nvidia-driver for 2.6.33-rt and 3.0-rt 
below.

Unfortunately, with 3.0-rt and the nvidia-driver we get complete system 
freezes when starting X on several different hardware setups (a few 
systems work fine). This is certainly caused by this combination. When 
using the nouveau-driver everything works fine.

I tested this with rt13 through rt20 and will check with the current 
version tomorrow. If it still fails I will put together a list of 
working vs. non-working hardware setups.

Unfortunately, apart from that I am not sure how to debug this issue, as 
the complete system freezes (including the serial console) and I can't 
find any suspicious messages in the logs. Any ideas?

Btw, changing EXPORT_SYMBOL_GPL to EXPORT_SYMBOL for 
migrate_enable/disable, ... and then using the non-raw spinlocks results 
in exactly the same behavior.

Best Regards,
Thomas Schauss

====================================================================

Patch for 3.0-rt:

--- a/nv-linux.h        2011-10-26 13:35:32.866579965 +0200
+++ b/nv-linux.h        2011-10-26 13:35:47.265117607 +0200
@@ -262,17 +262,17 @@
  #endif
  #endif

-#if defined(CONFIG_PREEMPT_RT)
-typedef atomic_spinlock_t         nv_spinlock_t;
-#define NV_SPIN_LOCK_INIT(lock)   atomic_spin_lock_init(lock)
-#define NV_SPIN_LOCK_IRQ(lock)    atomic_spin_lock_irq(lock)
-#define NV_SPIN_UNLOCK_IRQ(lock)  atomic_spin_unlock_irq(lock)
-#define NV_SPIN_LOCK_IRQSAVE(lock,flags) 
atomic_spin_lock_irqsave(lock,flags)
+#if defined(CONFIG_PREEMPT_RT_FULL)
+typedef raw_spinlock_t            nv_spinlock_t;
+#define NV_SPIN_LOCK_INIT(lock)   raw_spin_lock_init(lock)
+#define NV_SPIN_LOCK_IRQ(lock)    raw_spin_lock_irq(lock)
+#define NV_SPIN_UNLOCK_IRQ(lock)  raw_spin_unlock_irq(lock)
+#define NV_SPIN_LOCK_IRQSAVE(lock,flags) raw_spin_lock_irqsave(lock,flags)
  #define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) \
-  atomic_spin_unlock_irqrestore(lock,flags)
-#define NV_SPIN_LOCK(lock)        atomic_spin_lock(lock)
-#define NV_SPIN_UNLOCK(lock)      atomic_spin_unlock(lock)
-#define NV_SPIN_UNLOCK_WAIT(lock) atomic_spin_unlock_wait(lock)
+  raw_spin_unlock_irqrestore(lock,flags)
+#define NV_SPIN_LOCK(lock)        raw_spin_lock(lock)
+#define NV_SPIN_UNLOCK(lock)      raw_spin_unlock(lock)
+#define NV_SPIN_UNLOCK_WAIT(lock) raw_spin_unlock_wait(lock)
  #else
  typedef spinlock_t                nv_spinlock_t;
  #define NV_SPIN_LOCK_INIT(lock)   spin_lock_init(lock)
@@ -854,8 +854,8 @@
      return ret;
  }

-#if defined(CONFIG_PREEMPT_RT)
-#define NV_INIT_MUTEX(mutex) semaphore_init(mutex)
+#if defined(CONFIG_PREEMPT_RT_FULL)
+#define NV_INIT_MUTEX(mutex) sema_init(mutex,1)
  #else
  #if !defined(__SEMAPHORE_INITIALIZER) && 
defined(__COMPAT_SEMAPHORE_INITIALIZER)
  #define __SEMAPHORE_INITIALIZER __COMPAT_SEMAPHORE_INITIALIZER

====================================================================

Patch for 2.6.33-rt:

--- a/nv-linux.h        2011-10-28 10:31:47.416915958 +0200
+++ b/nv-linux.h        2011-10-28 10:32:48.592195509 +0200
@@ -263,16 +263,16 @@
  #endif

  #if defined(CONFIG_PREEMPT_RT)
-typedef atomic_spinlock_t         nv_spinlock_t;
-#define NV_SPIN_LOCK_INIT(lock)   atomic_spin_lock_init(lock)
-#define NV_SPIN_LOCK_IRQ(lock)    atomic_spin_lock_irq(lock)
-#define NV_SPIN_UNLOCK_IRQ(lock)  atomic_spin_unlock_irq(lock)
-#define NV_SPIN_LOCK_IRQSAVE(lock,flags) 
atomic_spin_lock_irqsave(lock,flags)
+typedef raw_spinlock_t            nv_spinlock_t;
+#define NV_SPIN_LOCK_INIT(lock)   raw_spin_lock_init(lock)
+#define NV_SPIN_LOCK_IRQ(lock)    raw_spin_lock_irq(lock)
+#define NV_SPIN_UNLOCK_IRQ(lock)  raw_spin_unlock_irq(lock)
+#define NV_SPIN_LOCK_IRQSAVE(lock,flags) raw_spin_lock_irqsave(lock,flags)
  #define NV_SPIN_UNLOCK_IRQRESTORE(lock,flags) \
-  atomic_spin_unlock_irqrestore(lock,flags)
-#define NV_SPIN_LOCK(lock)        atomic_spin_lock(lock)
-#define NV_SPIN_UNLOCK(lock)      atomic_spin_unlock(lock)
-#define NV_SPIN_UNLOCK_WAIT(lock) atomic_spin_unlock_wait(lock)
+  raw_spin_unlock_irqrestore(lock,flags)
+#define NV_SPIN_LOCK(lock)        raw_spin_lock(lock)
+#define NV_SPIN_UNLOCK(lock)      raw_spin_unlock(lock)
+#define NV_SPIN_UNLOCK_WAIT(lock) raw_spin_unlock_wait(lock)
  #else
  typedef spinlock_t                nv_spinlock_t;
  #define NV_SPIN_LOCK_INIT(lock)   spin_lock_init(lock)
@@ -855,7 +855,7 @@
  }

  #if defined(CONFIG_PREEMPT_RT)
-#define NV_INIT_MUTEX(mutex) semaphore_init(mutex)
+#define NV_INIT_MUTEX(mutex) sema_init(mutex,1)
  #else
  #if !defined(__SEMAPHORE_INITIALIZER) && 
defined(__COMPAT_SEMAPHORE_INITIALIZER)
  #define __SEMAPHORE_INITIALIZER __COMPAT_SEMAPHORE_INITIALIZER

[-- Attachment #2: schauss.vcf --]
[-- Type: text/x-vcard, Size: 342 bytes --]

begin:vcard
fn:Thomas Schauss
n:Schauss;Thomas
org:Technische Universitaet Muenchen (TUM);Institute of Automatic Control Engineering (LSR)
adr:;;Theresienstr. 90;Munich;;80333;Germany
email;internet:schauss@tum.de
title:Dipl.-Ing. (Univ.)
tel;work:+49 89 289 23406
tel;fax:+49 89 289 28340
url:http://www.lsr.ei.tum.de
version:2.1
end:vcard

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-16  9:40 ` Thomas Schauss
@ 2011-11-16 15:06   ` Thomas Gleixner
  2011-11-28 10:08     ` Thomas Schauss
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Gleixner @ 2011-11-16 15:06 UTC (permalink / raw)
  To: Thomas Schauss; +Cc: Javier Sanz, RT

On Wed, 16 Nov 2011, Thomas Schauss wrote:
> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
> freezes when starting X on several different hardware setups (a few systems
> work fine). This is certainly caused by this combination. When using the
> nouveau-driver everything works fine.

Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-16 15:06   ` Thomas Gleixner
@ 2011-11-28 10:08     ` Thomas Schauss
  2011-11-28 11:31       ` John Kacur
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Schauss @ 2011-11-28 10:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: RT

[-- Attachment #1: Type: text/plain, Size: 6379 bytes --]

On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
> On Wed, 16 Nov 2011, Thomas Schauss wrote:
>> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
>> freezes when starting X on several different hardware setups (a few systems
>> work fine). This is certainly caused by this combination. When using the
>> nouveau-driver everything works fine.
>
> Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
>

Hello,

thank you for that tip. I have tried this now and have not found any 
warnings which seem related to the nvidia-driver. Further testing 
revealed, that the driver works fine with CONFIG_PREEMPT_RTB and the 
freezes when running startx occur as soon as we switch to 
CONFIG_PREEMPT_RT_FULL.

Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray 
that however seem unrelated to nvidia. As we could not find any other 
bugs with the same locking warning I attached one example below. You can 
find some complete bootlogs (all with deadlock-warnings, all with 
slightly different call-stack) and my kernel-config at

http://www.lsr.ei.tum.de/team/schauss/lockdep/

On rt-base I also get a lockdep-warning which however seems unrelated to 
the rt-full one (not in cache_flusharray). You can find that log on the 
same page.

Best Regards,
Thomas



Nov 17 17:34:49 fix kernel: [   30.750925] 
=============================================
Nov 17 17:34:49 fix kernel: [   30.750927] [ INFO: possible recursive 
locking detected ]
Nov 17 17:34:49 fix kernel: [   30.750930] 3.0.9-25-rt #0
Nov 17 17:34:49 fix kernel: [   30.750931] 
---------------------------------------------
Nov 17 17:34:49 fix kernel: [   30.750933] udevd/517 is trying to 
acquire lock:
Nov 17 17:34:49 fix kernel: [   30.750935] 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750944]
Nov 17 17:34:49 fix kernel: [   30.750945] but task is already holding lock:
Nov 17 17:34:49 fix kernel: [   30.750946] 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750950]
Nov 17 17:34:49 fix kernel: [   30.750951] other info that might help us 
debug this:
Nov 17 17:34:49 fix kernel: [   30.750952]  Possible unsafe locking 
scenario:
Nov 17 17:34:49 fix kernel: [   30.750953]
Nov 17 17:34:49 fix kernel: [   30.750954]        CPU0
Nov 17 17:34:49 fix kernel: [   30.750955]        ----
Nov 17 17:34:49 fix kernel: [   30.750956]   lock(&parent->list_lock);
Nov 17 17:34:49 fix kernel: [   30.750958]   lock(&parent->list_lock);
Nov 17 17:34:49 fix kernel: [   30.750959]
Nov 17 17:34:49 fix kernel: [   30.750960]  *** DEADLOCK ***
Nov 17 17:34:49 fix kernel: [   30.750961]
Nov 17 17:34:49 fix kernel: [   30.750962]  May be due to missing lock 
nesting notation
Nov 17 17:34:49 fix kernel: [   30.750963]
Nov 17 17:34:49 fix kernel: [   30.750964] 2 locks held by udevd/517:
Nov 17 17:34:49 fix kernel: [   30.750966]  #0:  (&per_cpu(slab_lock, 
__cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380
Nov 17 17:34:49 fix kernel: [   30.750973]  #1: 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.750977]
Nov 17 17:34:49 fix kernel: [   30.750977] stack backtrace:
Nov 17 17:34:49 fix kernel: [   30.750980] Pid: 517, comm: udevd Not 
tainted 3.0.9-25-rt #0
Nov 17 17:34:49 fix kernel: [   30.750982] Call Trace:
Nov 17 17:34:49 fix kernel: [   30.750987]  [<ffffffff810a0097>] 
print_deadlock_bug+0xf7/0x100
Nov 17 17:34:49 fix kernel: [   30.750991]  [<ffffffff810a1add>] 
validate_chain.isra.37+0x67d/0x720
Nov 17 17:34:49 fix kernel: [   30.750995]  [<ffffffff810a2478>] 
__lock_acquire+0x478/0x9c0
Nov 17 17:34:49 fix kernel: [   30.750999]  [<ffffffff8162ae19>] ? 
sub_preempt_count+0x29/0x60
Nov 17 17:34:49 fix kernel: [   30.751003]  [<ffffffff81627475>] ? 
_raw_spin_unlock+0x35/0x60
Nov 17 17:34:49 fix kernel: [   30.751007]  [<ffffffff81625f0b>] ? 
rt_spin_lock_slowlock+0x2eb/0x340
Nov 17 17:34:49 fix kernel: [   30.751011]  [<ffffffff81056be1>] ? 
get_parent_ip+0x11/0x50
Nov 17 17:34:49 fix kernel: [   30.751014]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff810a2f64>] 
lock_acquire+0x94/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81626999>] 
rt_spin_lock+0x39/0x40
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8105a90b>] ? 
migrate_disable+0x6b/0xe0
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167a41>] 
kmem_cache_free+0x221/0x300
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167b8f>] 
slab_destroy+0x6f/0xa0
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167d32>] 
free_block+0x172/0x190
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613eb4>] 
cache_flusharray+0x98/0xd6
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ? 
__sk_free+0x130/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ? 
__sk_free+0x130/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8116a806>] 
kfree+0x316/0x380
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f5328>] ? 
skb_queue_purge+0x28/0x40
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] 
__sk_free+0x130/0x160
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f11d5>] 
sk_free+0x25/0x30
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8152d908>] 
netlink_release+0x128/0x200
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814ea388>] 
sock_release+0x28/0x90
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814eaa57>] 
sock_close+0x17/0x30
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117b914>] 
__fput+0xb4/0x200
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117ba85>] 
fput+0x25/0x30
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177d0c>] 
filp_close+0x6c/0x90
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177df0>] 
sys_close+0xc0/0x130
Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8162ed02>] 
system_call_fastpath+0x16/0x1b

[-- Attachment #2: schauss.vcf --]
[-- Type: text/x-vcard, Size: 342 bytes --]

begin:vcard
fn:Thomas Schauss
n:Schauss;Thomas
org:Technische Universitaet Muenchen (TUM);Institute of Automatic Control Engineering (LSR)
adr:;;Theresienstr. 90;Munich;;80333;Germany
email;internet:schauss@tum.de
title:Dipl.-Ing. (Univ.)
tel;work:+49 89 289 23406
tel;fax:+49 89 289 28340
url:http://www.lsr.ei.tum.de
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-28 10:08     ` Thomas Schauss
@ 2011-11-28 11:31       ` John Kacur
  2011-11-29 14:31         ` John Kacur
  0 siblings, 1 reply; 24+ messages in thread
From: John Kacur @ 2011-11-28 11:31 UTC (permalink / raw)
  To: Thomas Schauss; +Cc: Thomas Gleixner, RT

On Mon, Nov 28, 2011 at 11:08 AM, Thomas Schauss <schauss@tum.de> wrote:
> On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
>>
>> On Wed, 16 Nov 2011, Thomas Schauss wrote:
>>>
>>> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
>>> freezes when starting X on several different hardware setups (a few
>>> systems
>>> work fine). This is certainly caused by this combination. When using the
>>> nouveau-driver everything works fine.
>>
>> Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
>>
>
> Hello,
>
> thank you for that tip. I have tried this now and have not found any
> warnings which seem related to the nvidia-driver. Further testing revealed,
> that the driver works fine with CONFIG_PREEMPT_RTB and the freezes when
> running startx occur as soon as we switch to CONFIG_PREEMPT_RT_FULL.
>
> Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray
> that however seem unrelated to nvidia. As we could not find any other bugs
> with the same locking warning I attached one example below. You can find
> some complete bootlogs (all with deadlock-warnings, all with slightly
> different call-stack) and my kernel-config at
>
> http://www.lsr.ei.tum.de/team/schauss/lockdep/
>
> On rt-base I also get a lockdep-warning which however seems unrelated to the
> rt-full one (not in cache_flusharray). You can find that log on the same
> page.
>
> Best Regards,
> Thomas
>
>
>
> Nov 17 17:34:49 fix kernel: [   30.750925]
> =============================================
> Nov 17 17:34:49 fix kernel: [   30.750927] [ INFO: possible recursive
> locking detected ]
> Nov 17 17:34:49 fix kernel: [   30.750930] 3.0.9-25-rt #0
> Nov 17 17:34:49 fix kernel: [   30.750931]
> ---------------------------------------------
> Nov 17 17:34:49 fix kernel: [   30.750933] udevd/517 is trying to acquire
> lock:
> Nov 17 17:34:49 fix kernel: [   30.750935] (&parent->list_lock){+.+...}, at:
> [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.750944]
> Nov 17 17:34:49 fix kernel: [   30.750945] but task is already holding lock:
> Nov 17 17:34:49 fix kernel: [   30.750946] (&parent->list_lock){+.+...}, at:
> [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.750950]
> Nov 17 17:34:49 fix kernel: [   30.750951] other info that might help us
> debug this:
> Nov 17 17:34:49 fix kernel: [   30.750952]  Possible unsafe locking
> scenario:
> Nov 17 17:34:49 fix kernel: [   30.750953]
> Nov 17 17:34:49 fix kernel: [   30.750954]        CPU0
> Nov 17 17:34:49 fix kernel: [   30.750955]        ----
> Nov 17 17:34:49 fix kernel: [   30.750956]   lock(&parent->list_lock);
> Nov 17 17:34:49 fix kernel: [   30.750958]   lock(&parent->list_lock);
> Nov 17 17:34:49 fix kernel: [   30.750959]
> Nov 17 17:34:49 fix kernel: [   30.750960]  *** DEADLOCK ***
> Nov 17 17:34:49 fix kernel: [   30.750961]
> Nov 17 17:34:49 fix kernel: [   30.750962]  May be due to missing lock
> nesting notation
> Nov 17 17:34:49 fix kernel: [   30.750963]
> Nov 17 17:34:49 fix kernel: [   30.750964] 2 locks held by udevd/517:
> Nov 17 17:34:49 fix kernel: [   30.750966]  #0:  (&per_cpu(slab_lock,
> __cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380
> Nov 17 17:34:49 fix kernel: [   30.750973]  #1:
> (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>]
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.750977]
> Nov 17 17:34:49 fix kernel: [   30.750977] stack backtrace:
> Nov 17 17:34:49 fix kernel: [   30.750980] Pid: 517, comm: udevd Not tainted
> 3.0.9-25-rt #0
> Nov 17 17:34:49 fix kernel: [   30.750982] Call Trace:
> Nov 17 17:34:49 fix kernel: [   30.750987]  [<ffffffff810a0097>]
> print_deadlock_bug+0xf7/0x100
> Nov 17 17:34:49 fix kernel: [   30.750991]  [<ffffffff810a1add>]
> validate_chain.isra.37+0x67d/0x720
> Nov 17 17:34:49 fix kernel: [   30.750995]  [<ffffffff810a2478>]
> __lock_acquire+0x478/0x9c0
> Nov 17 17:34:49 fix kernel: [   30.750999]  [<ffffffff8162ae19>] ?
> sub_preempt_count+0x29/0x60
> Nov 17 17:34:49 fix kernel: [   30.751003]  [<ffffffff81627475>] ?
> _raw_spin_unlock+0x35/0x60
> Nov 17 17:34:49 fix kernel: [   30.751007]  [<ffffffff81625f0b>] ?
> rt_spin_lock_slowlock+0x2eb/0x340
> Nov 17 17:34:49 fix kernel: [   30.751011]  [<ffffffff81056be1>] ?
> get_parent_ip+0x11/0x50
> Nov 17 17:34:49 fix kernel: [   30.751014]  [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff810a2f64>]
> lock_acquire+0x94/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81626999>]
> rt_spin_lock+0x39/0x40
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ?
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8105a90b>] ?
> migrate_disable+0x6b/0xe0
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>]
> cache_flusharray+0x47/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167a41>]
> kmem_cache_free+0x221/0x300
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167b8f>]
> slab_destroy+0x6f/0xa0
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167d32>]
> free_block+0x172/0x190
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613eb4>]
> cache_flusharray+0x98/0xd6
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ?
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ?
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8116a806>]
> kfree+0x316/0x380
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f5328>] ?
> skb_queue_purge+0x28/0x40
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>]
> __sk_free+0x130/0x160
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f11d5>]
> sk_free+0x25/0x30
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8152d908>]
> netlink_release+0x128/0x200
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814ea388>]
> sock_release+0x28/0x90
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814eaa57>]
> sock_close+0x17/0x30
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117b914>]
> __fput+0xb4/0x200
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117ba85>]
> fput+0x25/0x30
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177d0c>]
> filp_close+0x6c/0x90
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177df0>]
> sys_close+0xc0/0x130
> Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8162ed02>]
> system_call_fastpath+0x16/0x1b
>

Hmm, I think I see how this can happen.

cache_flusharray()
spin_lock(&l3->list_lock);
free_block(cachep, ac->entry, batchcount, node);
        slab_destroy()
        kmem_cache_free()
                __cache_free()
                cache_flusharray()
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-28 11:31       ` John Kacur
@ 2011-11-29 14:31         ` John Kacur
  2011-11-30  2:36           ` Steven Rostedt
  2011-11-30  9:06           ` Thomas Schauss
  0 siblings, 2 replies; 24+ messages in thread
From: John Kacur @ 2011-11-29 14:31 UTC (permalink / raw)
  To: Thomas Schauss; +Cc: Thomas Gleixner, RT

[-- Attachment #1: Type: TEXT/PLAIN, Size: 8514 bytes --]


On Mon, 28 Nov 2011, John Kacur wrote:

> On Mon, Nov 28, 2011 at 11:08 AM, Thomas Schauss <schauss@tum.de> wrote:
> > On 11/16/2011 04:06 PM, Thomas Gleixner wrote:
> >>
> >> On Wed, 16 Nov 2011, Thomas Schauss wrote:
> >>>
> >>> Unfortunately, with 3.0-rt and the nvidia-driver we get complete system
> >>> freezes when starting X on several different hardware setups (a few
> >>> systems
> >>> work fine). This is certainly caused by this combination. When using the
> >>> nouveau-driver everything works fine.
> >>
> >> Have you ever tried to run with CONFIG_PROVE_LOCKING=y ?
> >>
> >
> > Hello,
> >
> > thank you for that tip. I have tried this now and have not found any
> > warnings which seem related to the nvidia-driver. Further testing revealed,
> > that the driver works fine with CONFIG_PREEMPT_RTB and the freezes when
> > running startx occur as soon as we switch to CONFIG_PREEMPT_RT_FULL.
> >
> > Regarding lockdep, we do get some warnings in slab.c -> cache_flusharray
> > that however seem unrelated to nvidia. As we could not find any other bugs
> > with the same locking warning I attached one example below. You can find
> > some complete bootlogs (all with deadlock-warnings, all with slightly
> > different call-stack) and my kernel-config at
> >
> > http://www.lsr.ei.tum.de/team/schauss/lockdep/
> >
> > On rt-base I also get a lockdep-warning which however seems unrelated to the
> > rt-full one (not in cache_flusharray). You can find that log on the same
> > page.
> >
> > Best Regards,
> > Thomas
> >
> >
> >
> > Nov 17 17:34:49 fix kernel: [   30.750925]
> > =============================================
> > Nov 17 17:34:49 fix kernel: [   30.750927] [ INFO: possible recursive
> > locking detected ]
> > Nov 17 17:34:49 fix kernel: [   30.750930] 3.0.9-25-rt #0
> > Nov 17 17:34:49 fix kernel: [   30.750931]
> > ---------------------------------------------
> > Nov 17 17:34:49 fix kernel: [   30.750933] udevd/517 is trying to acquire
> > lock:
> > Nov 17 17:34:49 fix kernel: [   30.750935] (&parent->list_lock){+.+...}, at:
> > [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.750944]
> > Nov 17 17:34:49 fix kernel: [   30.750945] but task is already holding lock:
> > Nov 17 17:34:49 fix kernel: [   30.750946] (&parent->list_lock){+.+...}, at:
> > [<ffffffff81613e63>] cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.750950]
> > Nov 17 17:34:49 fix kernel: [   30.750951] other info that might help us
> > debug this:
> > Nov 17 17:34:49 fix kernel: [   30.750952]  Possible unsafe locking
> > scenario:
> > Nov 17 17:34:49 fix kernel: [   30.750953]
> > Nov 17 17:34:49 fix kernel: [   30.750954]        CPU0
> > Nov 17 17:34:49 fix kernel: [   30.750955]        ----
> > Nov 17 17:34:49 fix kernel: [   30.750956]   lock(&parent->list_lock);
> > Nov 17 17:34:49 fix kernel: [   30.750958]   lock(&parent->list_lock);
> > Nov 17 17:34:49 fix kernel: [   30.750959]
> > Nov 17 17:34:49 fix kernel: [   30.750960]  *** DEADLOCK ***
> > Nov 17 17:34:49 fix kernel: [   30.750961]
> > Nov 17 17:34:49 fix kernel: [   30.750962]  May be due to missing lock
> > nesting notation
> > Nov 17 17:34:49 fix kernel: [   30.750963]
> > Nov 17 17:34:49 fix kernel: [   30.750964] 2 locks held by udevd/517:
> > Nov 17 17:34:49 fix kernel: [   30.750966]  #0:  (&per_cpu(slab_lock,
> > __cpu).lock){+.+...}, at: [<ffffffff8116a5c6>] kfree+0xd6/0x380
> > Nov 17 17:34:49 fix kernel: [   30.750973]  #1:
> > (&parent->list_lock){+.+...}, at: [<ffffffff81613e63>]
> > cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.750977]
> > Nov 17 17:34:49 fix kernel: [   30.750977] stack backtrace:
> > Nov 17 17:34:49 fix kernel: [   30.750980] Pid: 517, comm: udevd Not tainted
> > 3.0.9-25-rt #0
> > Nov 17 17:34:49 fix kernel: [   30.750982] Call Trace:
> > Nov 17 17:34:49 fix kernel: [   30.750987]  [<ffffffff810a0097>]
> > print_deadlock_bug+0xf7/0x100
> > Nov 17 17:34:49 fix kernel: [   30.750991]  [<ffffffff810a1add>]
> > validate_chain.isra.37+0x67d/0x720
> > Nov 17 17:34:49 fix kernel: [   30.750995]  [<ffffffff810a2478>]
> > __lock_acquire+0x478/0x9c0
> > Nov 17 17:34:49 fix kernel: [   30.750999]  [<ffffffff8162ae19>] ?
> > sub_preempt_count+0x29/0x60
> > Nov 17 17:34:49 fix kernel: [   30.751003]  [<ffffffff81627475>] ?
> > _raw_spin_unlock+0x35/0x60
> > Nov 17 17:34:49 fix kernel: [   30.751007]  [<ffffffff81625f0b>] ?
> > rt_spin_lock_slowlock+0x2eb/0x340
> > Nov 17 17:34:49 fix kernel: [   30.751011]  [<ffffffff81056be1>] ?
> > get_parent_ip+0x11/0x50
> > Nov 17 17:34:49 fix kernel: [   30.751014]  [<ffffffff81613e63>] ?
> > cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff810a2f64>]
> > lock_acquire+0x94/0x160
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ?
> > cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81626999>]
> > rt_spin_lock+0x39/0x40
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>] ?
> > cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8105a90b>] ?
> > migrate_disable+0x6b/0xe0
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613e63>]
> > cache_flusharray+0x47/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167a41>]
> > kmem_cache_free+0x221/0x300
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167b8f>]
> > slab_destroy+0x6f/0xa0
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81167d32>]
> > free_block+0x172/0x190
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81613eb4>]
> > cache_flusharray+0x98/0xd6
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ?
> > __sk_free+0x130/0x160
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>] ?
> > __sk_free+0x130/0x160
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8116a806>]
> > kfree+0x316/0x380
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f5328>] ?
> > skb_queue_purge+0x28/0x40
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f1110>]
> > __sk_free+0x130/0x160
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814f11d5>]
> > sk_free+0x25/0x30
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8152d908>]
> > netlink_release+0x128/0x200
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814ea388>]
> > sock_release+0x28/0x90
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff814eaa57>]
> > sock_close+0x17/0x30
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117b914>]
> > __fput+0xb4/0x200
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8117ba85>]
> > fput+0x25/0x30
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177d0c>]
> > filp_close+0x6c/0x90
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff81177df0>]
> > sys_close+0xc0/0x130
> > Nov 17 17:34:49 fix kernel: [   30.751015]  [<ffffffff8162ed02>]
> > system_call_fastpath+0x16/0x1b
> >
> 
> Hmm, I think I see how this can happen.
> 
> cache_flusharray()
> spin_lock(&l3->list_lock);
> free_block(cachep, ac->entry, batchcount, node);
>         slab_destroy()
>         kmem_cache_free()
>                 __cache_free()
>                 cache_flusharray()
> 

Could you try the following patch to see if it gets rid of your lockdep 
splat? (plan to neaten it up and send it to lkml if it works for you.)

>From 29bf37fc62098bc87960e78f365083d9f52cf36a Mon Sep 17 00:00:00 2001
From: John Kacur <jkacur@redhat.com>
Date: Tue, 29 Nov 2011 15:17:54 +0100
Subject: [PATCH] Drop lock in free_block before calling slab_destroy to prevent lockdep splats

This prevents lockdep splats due to this call chain
cache_flusharray()
spin_lock(&l3->list_lock);
free_block(cachep, ac->entry, batchcount, node);
       slab_destroy()
       kmem_cache_free()
               __cache_free()
               cache_flusharray()

Signed-off-by: John Kacur <jkacur@redhat.com>
---
 mm/slab.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index b615658..635e16a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3667,7 +3667,9 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
 				 * a different cache, refer to comments before
 				 * alloc_slabmgmt.
 				 */
+				spin_unlock(&l3->list_lock);
 				slab_destroy(cachep, slabp, true);
+				spin_lock(&l3->list_lock);
 			} else {
 				list_add(&slabp->list, &l3->slabs_free);
 			}
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-29 14:31         ` John Kacur
@ 2011-11-30  2:36           ` Steven Rostedt
  2011-11-30  8:23             ` John Kacur
  2011-11-30  9:06           ` Thomas Schauss
  1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30  2:36 UTC (permalink / raw)
  To: John Kacur; +Cc: Thomas Schauss, Thomas Gleixner, RT, Peter Zijlstra

On Tue, 2011-11-29 at 15:31 +0100, John Kacur wrote:
> On Mon, 28 Nov 2011, John Kacur wrote:

> > Hmm, I think I see how this can happen.
> > 
> > cache_flusharray()
> > spin_lock(&l3->list_lock);
> > free_block(cachep, ac->entry, batchcount, node);
> >         slab_destroy()
> >         kmem_cache_free()
> >                 __cache_free()
> >                 cache_flusharray()
> > 
> 
> Could you try the following patch to see if it gets rid of your lockdep 
> splat? (plan to neaten it up and send it to lkml if it works for you.)
> 
> >From 29bf37fc62098bc87960e78f365083d9f52cf36a Mon Sep 17 00:00:00 2001
> From: John Kacur <jkacur@redhat.com>
> Date: Tue, 29 Nov 2011 15:17:54 +0100
> Subject: [PATCH] Drop lock in free_block before calling slab_destroy to prevent lockdep splats
> 
> This prevents lockdep splats due to this call chain
> cache_flusharray()
> spin_lock(&l3->list_lock);
> free_block(cachep, ac->entry, batchcount, node);
>        slab_destroy()
>        kmem_cache_free()
>                __cache_free()
>                cache_flusharray()

John,

No, this is a false positive, and the code is correct, lockdep just
needs to be tweaked. If this was a real bug, then it would have locked
up here and not have continued, as spinlocks are not recursive.

This was complained about in mainline too:

  https://lkml.org/lkml/2011/10/3/364

There was a fix to a similar bug that Peter pointed out, but this bug
doesn't look like it was fixed.

Peter?

-- Steve


> 
> Signed-off-by: John Kacur <jkacur@redhat.com>
> ---
>  mm/slab.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index b615658..635e16a 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3667,7 +3667,9 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
>  				 * a different cache, refer to comments before
>  				 * alloc_slabmgmt.
>  				 */
> +				spin_unlock(&l3->list_lock);
>  				slab_destroy(cachep, slabp, true);
> +				spin_lock(&l3->list_lock);
>  			} else {
>  				list_add(&slabp->list, &l3->slabs_free);
>  			}



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30  2:36           ` Steven Rostedt
@ 2011-11-30  8:23             ` John Kacur
  2011-11-30 11:14               ` Peter Zijlstra
  2011-11-30 13:34               ` Steven Rostedt
  0 siblings, 2 replies; 24+ messages in thread
From: John Kacur @ 2011-11-30  8:23 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Schauss, Thomas Gleixner, RT, Peter Zijlstra

On Wed, Nov 30, 2011 at 3:36 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, 2011-11-29 at 15:31 +0100, John Kacur wrote:
>> On Mon, 28 Nov 2011, John Kacur wrote:
>
>> > Hmm, I think I see how this can happen.
>> >
>> > cache_flusharray()
>> > spin_lock(&l3->list_lock);
>> > free_block(cachep, ac->entry, batchcount, node);
>> >         slab_destroy()
>> >         kmem_cache_free()
>> >                 __cache_free()
>> >                 cache_flusharray()
>> >
>>
>> Could you try the following patch to see if it gets rid of your lockdep
>> splat? (plan to neaten it up and send it to lkml if it works for you.)
>>
>> >From 29bf37fc62098bc87960e78f365083d9f52cf36a Mon Sep 17 00:00:00 2001
>> From: John Kacur <jkacur@redhat.com>
>> Date: Tue, 29 Nov 2011 15:17:54 +0100
>> Subject: [PATCH] Drop lock in free_block before calling slab_destroy to prevent lockdep splats
>>
>> This prevents lockdep splats due to this call chain
>> cache_flusharray()
>> spin_lock(&l3->list_lock);
>> free_block(cachep, ac->entry, batchcount, node);
>>        slab_destroy()
>>        kmem_cache_free()
>>                __cache_free()
>>                cache_flusharray()
>
> John,
>
> No, this is a false positive, and the code is correct, lockdep just
> needs to be tweaked. If this was a real bug, then it would have locked
> up here and not have continued, as spinlocks are not recursive.
>
> This was complained about in mainline too:
>
>  https://lkml.org/lkml/2011/10/3/364
>
> There was a fix to a similar bug that Peter pointed out, but this bug
> doesn't look like it was fixed.
>
> Peter?
>

Steve - I'm aware that this is a false positive, I discussed this with
Peter already. Normally I don't like the idea of changing code for a
tool, but if you see the comment that they wrote above where I put the
unlock - it was an extraordinary thing NOT to drop the lock here, as
slab_destroy is normally called without it. It doesn't seem like good
form to me to hold a lock longer than you need it, and it is a simple
solution to getting rid of the lockdep splat. (false positive, or
false negative, depending on how you see it.) That being said, I'm not
adverse to another solution either, but this one should work and is
simple.

John
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30  8:23             ` John Kacur
@ 2011-11-30 11:14               ` Peter Zijlstra
  2011-11-30 14:14                 ` Steven Rostedt
  2011-11-30 13:34               ` Steven Rostedt
  1 sibling, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2011-11-30 11:14 UTC (permalink / raw)
  To: John Kacur; +Cc: Steven Rostedt, Thomas Schauss, Thomas Gleixner, RT

On Wed, 2011-11-30 at 09:23 +0100, John Kacur wrote:
> > This was complained about in mainline too:
> >
> >  https://lkml.org/lkml/2011/10/3/364
> >
> > There was a fix to a similar bug that Peter pointed out, but this bug
> > doesn't look like it was fixed.
> >
> > Peter? 

Re to the subject, every borkage of the nvidiot binary driver is a
personal victory, I try as hard as possible to increase their pain.

As to the actual subject of the email, see:

http://article.gmane.org/gmane.linux.kernel.mm/70863/match=

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 11:14               ` Peter Zijlstra
@ 2011-11-30 14:14                 ` Steven Rostedt
  2011-11-30 14:16                   ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30 14:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: John Kacur, Thomas Schauss, Thomas Gleixner, RT

On Wed, 2011-11-30 at 12:14 +0100, Peter Zijlstra wrote:
> On Wed, 2011-11-30 at 09:23 +0100, John Kacur wrote:
> > > This was complained about in mainline too:
> > >
> > >  https://lkml.org/lkml/2011/10/3/364
> > >
> > > There was a fix to a similar bug that Peter pointed out, but this bug
> > > doesn't look like it was fixed.
> > >
> > > Peter? 
> 
> Re to the subject, every borkage of the nvidiot binary driver is a
> personal victory, I try as hard as possible to increase their pain.
> 

Well, this bug is not caused by nvidiot, but it prevents us from seeing
if there's locking issues in nvidiot. Because Thomas tripped over this
bug, lockdep shutdown before it could analyze anything further down,
including nvidiot too. But then again, maybe the bug Thomas is seeing is
in mainline, and nvidiot is helping us find bugs :)


> As to the actual subject of the email, see:
> 
> http://article.gmane.org/gmane.linux.kernel.mm/70863/match=

Thomas (Schauss),

Could you try this patch?

I took Peter's patch and ported it to 3.0-rt. Hopefully, I didn't screw
it up.

-- Steve

diff --git a/mm/slab.c b/mm/slab.c
index 096bf0a..966a8c4 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -764,6 +764,7 @@ static enum {
 	PARTIAL_AC,
 	PARTIAL_L3,
 	EARLY,
+	LATE,
 	FULL
 } g_cpucache_up;
 
@@ -795,7 +796,7 @@ static void init_node_lock_keys(int q)
 {
 	struct cache_sizes *s = malloc_sizes;
 
-	if (g_cpucache_up != FULL)
+	if (g_cpucache_up < LATE)
 		return;
 
 	for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
@@ -1752,7 +1753,7 @@ void __init kmem_cache_init_late(void)
 	mutex_unlock(&cache_chain_mutex);
 
 	/* Done! */
-	g_cpucache_up = FULL;
+	g_cpucache_up = LATE;
 
 	/* Annotate slab for lockdep -- annotate the malloc caches */
 	init_lock_keys();



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 14:14                 ` Steven Rostedt
@ 2011-11-30 14:16                   ` Peter Zijlstra
  2011-11-30 14:28                     ` Steven Rostedt
  2011-11-30 14:31                     ` Steven Rostedt
  0 siblings, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2011-11-30 14:16 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: John Kacur, Thomas Schauss, Thomas Gleixner, RT


> I took Peter's patch and ported it to 3.0-rt. Hopefully, I didn't screw
> it up.
> 
> -- Steve
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index 096bf0a..966a8c4 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -764,6 +764,7 @@ static enum {
>  	PARTIAL_AC,
>  	PARTIAL_L3,
>  	EARLY,
> +	LATE,
>  	FULL
>  } g_cpucache_up;
>  
> @@ -795,7 +796,7 @@ static void init_node_lock_keys(int q)
>  {
>  	struct cache_sizes *s = malloc_sizes;
>  
> -	if (g_cpucache_up != FULL)
> +	if (g_cpucache_up < LATE)
>  		return;
>  
>  	for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
> @@ -1752,7 +1753,7 @@ void __init kmem_cache_init_late(void)
>  	mutex_unlock(&cache_chain_mutex);
>  
>  	/* Done! */
> -	g_cpucache_up = FULL;
> +	g_cpucache_up = LATE;
>  
>  	/* Annotate slab for lockdep -- annotate the malloc caches */
>  	init_lock_keys();
> 
> 

You did screw it up.. now g_cpucache_up will never be FULL. It works
much better if you also apply 30765b92 to this tree.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 14:16                   ` Peter Zijlstra
@ 2011-11-30 14:28                     ` Steven Rostedt
  2011-11-30 14:31                     ` Steven Rostedt
  1 sibling, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30 14:28 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: John Kacur, Thomas Schauss, Thomas Gleixner, RT

On Wed, 2011-11-30 at 15:16 +0100, Peter Zijlstra wrote:

> You did screw it up.. now g_cpucache_up will never be FULL. It works
> much better if you also apply 30765b92 to this tree.

I didn't like that fix I made. Thanks. Strange, I thought I tried
applying that patch before and it didn't take.

-- Steve



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 14:16                   ` Peter Zijlstra
  2011-11-30 14:28                     ` Steven Rostedt
@ 2011-11-30 14:31                     ` Steven Rostedt
  2011-11-30 14:34                       ` Peter Zijlstra
  2011-11-30 15:07                       ` Thomas Schauss
  1 sibling, 2 replies; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30 14:31 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: John Kacur, Thomas Schauss, Thomas Gleixner, RT

Thomas (Schauss),

Can you try this patch. It has both patches that Peter pointed to
applied.

Thanks,

-- Steve

diff --git a/mm/slab.c b/mm/slab.c
index 096bf0a..86a8dec 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -764,6 +764,7 @@ static enum {
 	PARTIAL_AC,
 	PARTIAL_L3,
 	EARLY,
+	LATE,
 	FULL
 } g_cpucache_up;
 
@@ -795,7 +796,7 @@ static void init_node_lock_keys(int q)
 {
 	struct cache_sizes *s = malloc_sizes;
 
-	if (g_cpucache_up != FULL)
+	if (g_cpucache_up < LATE)
 		return;
 
 	for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
@@ -1744,6 +1745,11 @@ void __init kmem_cache_init_late(void)
 {
 	struct kmem_cache *cachep;
 
+	g_cpucache_up = LATE;
+
+	/* Annotate slab for lockdep -- annotate the malloc caches */
+	init_lock_keys();
+
 	/* 6) resize the head arrays to their final sizes */
 	mutex_lock(&cache_chain_mutex);
 	list_for_each_entry(cachep, &cache_chain, next)
@@ -1754,9 +1760,6 @@ void __init kmem_cache_init_late(void)
 	/* Done! */
 	g_cpucache_up = FULL;
 
-	/* Annotate slab for lockdep -- annotate the malloc caches */
-	init_lock_keys();

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 14:31                     ` Steven Rostedt
@ 2011-11-30 14:34                       ` Peter Zijlstra
  2011-11-30 15:07                       ` Thomas Schauss
  1 sibling, 0 replies; 24+ messages in thread
From: Peter Zijlstra @ 2011-11-30 14:34 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: John Kacur, Thomas Schauss, Thomas Gleixner, RT

On Wed, 2011-11-30 at 09:31 -0500, Steven Rostedt wrote:
> Can you try this patch. It has both patches that Peter pointed to
> applied. 

Yep, that looks about right.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 14:31                     ` Steven Rostedt
  2011-11-30 14:34                       ` Peter Zijlstra
@ 2011-11-30 15:07                       ` Thomas Schauss
  2011-11-30 15:20                         ` Steven Rostedt
  1 sibling, 1 reply; 24+ messages in thread
From: Thomas Schauss @ 2011-11-30 15:07 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, John Kacur, Thomas Gleixner, RT

On 11/30/2011 03:31 PM, Steven Rostedt wrote:
> Thomas (Schauss),
>
> Can you try this patch. It has both patches that Peter pointed to
> applied.
>
> Thanks,
>
> -- Steve
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 096bf0a..86a8dec 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -764,6 +764,7 @@ static enum {
>   	PARTIAL_AC,
>   	PARTIAL_L3,
>   	EARLY,
> +	LATE,
>   	FULL
>   } g_cpucache_up;
>
> @@ -795,7 +796,7 @@ static void init_node_lock_keys(int q)
>   {
>   	struct cache_sizes *s = malloc_sizes;
>
> -	if (g_cpucache_up != FULL)
> +	if (g_cpucache_up<  LATE)
>   		return;
>
>   	for (s = malloc_sizes; s->cs_size != ULONG_MAX; s++) {
> @@ -1744,6 +1745,11 @@ void __init kmem_cache_init_late(void)
>   {
>   	struct kmem_cache *cachep;
>
> +	g_cpucache_up = LATE;
> +
> +	/* Annotate slab for lockdep -- annotate the malloc caches */
> +	init_lock_keys();
> +
>   	/* 6) resize the head arrays to their final sizes */
>   	mutex_lock(&cache_chain_mutex);
>   	list_for_each_entry(cachep,&cache_chain, next)
> @@ -1754,9 +1760,6 @@ void __init kmem_cache_init_late(void)
>   	/* Done! */
>   	g_cpucache_up = FULL;
>
> -	/* Annotate slab for lockdep -- annotate the malloc caches */
> -	init_lock_keys();
> -
>   	/*
>   	 * Register a cpu startup notifier callback that initializes
>   	 * cpu_cache_get for all new cpus

Hello,

I will test this but can only do so on Friday.

On 3.0.9-rt25 (which I have been using here) patch 30765b92 is already 
applied. And it is also present in patch-3.0.10-rt27.patch.

So the original patch from Peter 
(http://article.gmane.org/gmane.linux.kernel.mm/70863/match=) applies 
cleanly here.

Regards,
Thomas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 15:07                       ` Thomas Schauss
@ 2011-11-30 15:20                         ` Steven Rostedt
  2011-12-02 17:41                           ` Thomas Schauss
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30 15:20 UTC (permalink / raw)
  To: Thomas Schauss; +Cc: Peter Zijlstra, John Kacur, Thomas Gleixner, RT

On Wed, 2011-11-30 at 16:07 +0100, Thomas Schauss wrote:

> On 3.0.9-rt25 (which I have been using here) patch 30765b92 is already 
> applied. And it is also present in patch-3.0.10-rt27.patch.

I'm such a fscking idiot!

I've been working on a port of a patch in 2.6.33-rt, and I didn't
realize that I was still in that kernel, applying these patches. As I
said, I thought the one Peter had was already applied. It was, I'm the
idiot that didn't realize I was in the wrong kernel!

> 
> So the original patch from Peter 
> (http://article.gmane.org/gmane.linux.kernel.mm/70863/match=) applies 
> cleanly here.

Yeah, thanks. That's the patch you want.

-- Steve



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 15:20                         ` Steven Rostedt
@ 2011-12-02 17:41                           ` Thomas Schauss
  2011-12-02 19:37                             ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: Thomas Schauss @ 2011-12-02 17:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, John Kacur, Thomas Gleixner, RT

On 11/30/2011 04:20 PM, Steven Rostedt wrote:
> On Wed, 2011-11-30 at 16:07 +0100, Thomas Schauss wrote:
>
>> On 3.0.9-rt25 (which I have been using here) patch 30765b92 is already
>> applied. And it is also present in patch-3.0.10-rt27.patch.
>
> I'm such a fscking idiot!
>
> I've been working on a port of a patch in 2.6.33-rt, and I didn't
> realize that I was still in that kernel, applying these patches. As I
> said, I thought the one Peter had was already applied. It was, I'm the
> idiot that didn't realize I was in the wrong kernel!
>
>>
>> So the original patch from Peter
>> (http://article.gmane.org/gmane.linux.kernel.mm/70863/match=) applies
>> cleanly here.
>
> Yeah, thanks. That's the patch you want.
>
> -- Steve
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hello,

this patch does indeed get rid of the lockdep-splat.

Regarding the original subject of the post:

Running startx with the nvidia binary driver fails on 3.0.9-rt25 and 
3.2-rc2-rt3 when CONFIG_PREEMPT_RT_FULL=y and works fine for 
CONFIG_PREEMPT_RTB=y. There is no lockdep-warning, kernel oops/bug, 
etc., neither in any log-files nor on the serial console.

This happens on several machines which ran fine with 2.6.33-rt29.

I know many people here are no big fans of the nvidia driver (and 
rightly so). Unfortunately we really need this.

If anyone has any further ideas on debugging this issue I would be 
really thankful.

Best Regards,
Thomas

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-12-02 17:41                           ` Thomas Schauss
@ 2011-12-02 19:37                             ` Steven Rostedt
  0 siblings, 0 replies; 24+ messages in thread
From: Steven Rostedt @ 2011-12-02 19:37 UTC (permalink / raw)
  To: Thomas Schauss; +Cc: Peter Zijlstra, John Kacur, Thomas Gleixner, RT

On Fri, 2011-12-02 at 18:41 +0100, Thomas Schauss wrote:

> this patch does indeed get rid of the lockdep-splat.
> 
> Regarding the original subject of the post:
> 
> Running startx with the nvidia binary driver fails on 3.0.9-rt25 and 
> 3.2-rc2-rt3 when CONFIG_PREEMPT_RT_FULL=y and works fine for 
> CONFIG_PREEMPT_RTB=y. There is no lockdep-warning, kernel oops/bug, 
> etc., neither in any log-files nor on the serial console.
> 
> This happens on several machines which ran fine with 2.6.33-rt29.
> 
> I know many people here are no big fans of the nvidia driver (and 
> rightly so). Unfortunately we really need this.

The biggest problem with nvidia is that it's a black box for us. We have
no idea what's happening behind the scenes. Once the nvidia drive takes
over, we're in the dark and can't do much about it. If you depend on
nvidia so much, perhaps you could contact them. Their engineers may be
able to help you here.

> 
> If anyone has any further ideas on debugging this issue I would be 
> really thankful.

Perhaps kick off nmi_watchdog and hope that it produces something?

-- Steve



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30  8:23             ` John Kacur
  2011-11-30 11:14               ` Peter Zijlstra
@ 2011-11-30 13:34               ` Steven Rostedt
  2011-11-30 13:39                 ` John Kacur
  1 sibling, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30 13:34 UTC (permalink / raw)
  To: John Kacur; +Cc: Thomas Schauss, Thomas Gleixner, RT, Peter Zijlstra

On Wed, 2011-11-30 at 09:23 +0100, John Kacur wrote:

> Steve - I'm aware that this is a false positive, I discussed this with
> Peter already. Normally I don't like the idea of changing code for a
> tool, but if you see the comment that they wrote above where I put the
> unlock - it was an extraordinary thing NOT to drop the lock here, as
> slab_destroy is normally called without it. It doesn't seem like good
> form to me to hold a lock longer than you need it, and it is a simple
> solution to getting rid of the lockdep splat. (false positive, or
> false negative, depending on how you see it.) That being said, I'm not
> adverse to another solution either, but this one should work and is
> simple.

This is a mainline issue, and it should go there. If mainline accepts
it, then fine. Otherwise, it's not going to go into -rt.

-- Steve



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 13:34               ` Steven Rostedt
@ 2011-11-30 13:39                 ` John Kacur
  2011-11-30 13:49                   ` Steven Rostedt
  0 siblings, 1 reply; 24+ messages in thread
From: John Kacur @ 2011-11-30 13:39 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Schauss, Thomas Gleixner, RT, Peter Zijlstra

On Wed, Nov 30, 2011 at 2:34 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 2011-11-30 at 09:23 +0100, John Kacur wrote:
>
>> Steve - I'm aware that this is a false positive, I discussed this with
>> Peter already. Normally I don't like the idea of changing code for a
>> tool, but if you see the comment that they wrote above where I put the
>> unlock - it was an extraordinary thing NOT to drop the lock here, as
>> slab_destroy is normally called without it. It doesn't seem like good
>> form to me to hold a lock longer than you need it, and it is a simple
>> solution to getting rid of the lockdep splat. (false positive, or
>> false negative, depending on how you see it.) That being said, I'm not
>> adverse to another solution either, but this one should work and is
>> simple.
>
> This is a mainline issue, and it should go there. If mainline accepts
> it, then fine. Otherwise, it's not going to go into -rt.
>

Quoting myself "Could you try the following patch to see if it gets
rid of your lockdep
splat? (plan to neaten it up and send it to lkml if it works for you.)"

I never requested this go into -rt directly.

John

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 13:39                 ` John Kacur
@ 2011-11-30 13:49                   ` Steven Rostedt
  2011-11-30 13:53                     ` John Kacur
  0 siblings, 1 reply; 24+ messages in thread
From: Steven Rostedt @ 2011-11-30 13:49 UTC (permalink / raw)
  To: John Kacur; +Cc: Thomas Schauss, Thomas Gleixner, RT, Peter Zijlstra

On Wed, 2011-11-30 at 14:39 +0100, John Kacur wrote:

> Quoting myself "Could you try the following patch to see if it gets
> rid of your lockdep
> splat? (plan to neaten it up and send it to lkml if it works for you.)"
> 
> I never requested this go into -rt directly.


Heh, OK, but it looks like there's already a fix to it going to
mainline, as Peter pointed out.

-- Steve



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-30 13:49                   ` Steven Rostedt
@ 2011-11-30 13:53                     ` John Kacur
  0 siblings, 0 replies; 24+ messages in thread
From: John Kacur @ 2011-11-30 13:53 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Thomas Schauss, Thomas Gleixner, RT, Peter Zijlstra

On Wed, Nov 30, 2011 at 2:49 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 2011-11-30 at 14:39 +0100, John Kacur wrote:
>
>> Quoting myself "Could you try the following patch to see if it gets
>> rid of your lockdep
>> splat? (plan to neaten it up and send it to lkml if it works for you.)"
>>
>> I never requested this go into -rt directly.
>
>
> Heh, OK, but it looks like there's already a fix to it going to
> mainline, as Peter pointed out.
>

Yeah cool. I'm checking those patches out too.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-29 14:31         ` John Kacur
  2011-11-30  2:36           ` Steven Rostedt
@ 2011-11-30  9:06           ` Thomas Schauss
  1 sibling, 0 replies; 24+ messages in thread
From: Thomas Schauss @ 2011-11-30  9:06 UTC (permalink / raw)
  To: John Kacur; +Cc: RT

On 11/29/2011 03:31 PM, John Kacur wrote:
>
> On Mon, 28 Nov 2011, John Kacur wrote:
>
> Could you try the following patch to see if it gets rid of your lockdep
> splat? (plan to neaten it up and send it to lkml if it works for you.)
>
>  From 29bf37fc62098bc87960e78f365083d9f52cf36a Mon Sep 17 00:00:00 2001
> From: John Kacur<jkacur@redhat.com>
> Date: Tue, 29 Nov 2011 15:17:54 +0100
> Subject: [PATCH] Drop lock in free_block before calling slab_destroy to prevent lockdep splats
>
> This prevents lockdep splats due to this call chain
> cache_flusharray()
> spin_lock(&l3->list_lock);
> free_block(cachep, ac->entry, batchcount, node);
>         slab_destroy()
>         kmem_cache_free()
>                 __cache_free()
>                 cache_flusharray()
>
> Signed-off-by: John Kacur<jkacur@redhat.com>
> ---
>   mm/slab.c |    2 ++
>   1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index b615658..635e16a 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3667,7 +3667,9 @@ static void free_block(struct kmem_cache *cachep, void **objpp, int nr_objects,
>   				 * a different cache, refer to comments before
>   				 * alloc_slabmgmt.
>   				 */
> +				spin_unlock(&l3->list_lock);
>   				slab_destroy(cachep, slabp, true);
> +				spin_lock(&l3->list_lock);
>   			} else {
>   				list_add(&slabp->list,&l3->slabs_free);
>   			}

Yes, that seems like the path that causes the warning. I can test this 
on friday if no other patch was proposed by then.

It should also solve a slightly different situation where I get the same 
warning, see below.

Btw., the subject of this thread is very misleading, sorry for that. 
Should be something like "Lockdep-warning in slab.c on 3.0.9-rt25".

I guess it is a bad idea to change the subject of an existing thread?

Nov 17 17:18:17 fix kernel: [   11.170313] 
=============================================
Nov 17 17:18:17 fix kernel: [   11.170315] [ INFO: possible recursive 
locking detected ]
Nov 17 17:18:17 fix kernel: [   11.170317] 3.0.9-25-rt #0
Nov 17 17:18:17 fix kernel: [   11.170319] 
---------------------------------------------
Nov 17 17:18:17 fix kernel: [   11.170321] kworker/0:1/20 is trying to 
acquire lock:
Nov 17 17:18:17 fix kernel: [   11.170323] 
(&parent->list_lock){+.+...}, at: [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:18:17 fix kernel: [   11.170331]
Nov 17 17:18:17 fix kernel: [   11.170332] but task is already holding lock:
Nov 17 17:18:17 fix kernel: [   11.170333] 
(&parent->list_lock){+.+...}, at: [<ffffffff811682c2>] 
drain_array.part.43+0xc2/0x220
Nov 17 17:18:17 fix kernel: [   11.170339]
Nov 17 17:18:17 fix kernel: [   11.170340] other info that might help us 
debug this:
Nov 17 17:18:17 fix kernel: [   11.170342]  Possible unsafe locking 
scenario:
Nov 17 17:18:17 fix kernel: [   11.170342]
Nov 17 17:18:17 fix kernel: [   11.170343]        CPU0
Nov 17 17:18:17 fix kernel: [   11.170344]        ----
Nov 17 17:18:17 fix kernel: [   11.170345]   lock(&parent->list_lock);
Nov 17 17:18:17 fix kernel: [   11.170347]   lock(&parent->list_lock);
Nov 17 17:18:17 fix kernel: [   11.170349]
Nov 17 17:18:17 fix kernel: [   11.170349]  *** DEADLOCK ***
Nov 17 17:18:17 fix kernel: [   11.170350]
Nov 17 17:18:17 fix kernel: [   11.170351]  May be due to missing lock 
nesting notation
Nov 17 17:18:17 fix kernel: [   11.170352]
Nov 17 17:18:17 fix kernel: [   11.170354] 5 locks held by kworker/0:1/20:
Nov 17 17:18:17 fix kernel: [   11.170355]  #0:  (events){.+.+.+}, at: 
[<ffffffff810834ec>] process_one_work+0x12c/0x5a0
Nov 17 17:18:17 fix kernel: [   11.170360]  #1: 
((&(reap_work)->work)){+.+...}, at: [<ffffffff810834ec>] 
process_one_work+0x12c/0x5a0
Nov 17 17:18:17 fix kernel: [   11.170364]  #2: 
(cache_chain_mutex){+.+.+.}, at: [<ffffffff811689ae>] cache_reap+0x2e/0x1b0
Nov 17 17:18:17 fix kernel: [   11.170369]  #3:  (&per_cpu(slab_lock, 
__cpu).lock){+.+...}, at: [<ffffffff81168277>] 
drain_array.part.43+0x77/0x220
Nov 17 17:18:17 fix kernel: [   11.170374]  #4: 
(&parent->list_lock){+.+...}, at: [<ffffffff811682c2>] 
drain_array.part.43+0xc2/0x220
Nov 17 17:18:17 fix kernel: [   11.170378]
Nov 17 17:18:17 fix kernel: [   11.170379] stack backtrace:
Nov 17 17:18:17 fix kernel: [   11.170381] Pid: 20, comm: kworker/0:1 
Not tainted 3.0.9-25-rt #0
Nov 17 17:18:17 fix kernel: [   11.170383] Call Trace:
Nov 17 17:18:17 fix kernel: [   11.170388]  [<ffffffff810a0097>] 
print_deadlock_bug+0xf7/0x100
Nov 17 17:18:17 fix kernel: [   11.170392]  [<ffffffff810a1add>] 
validate_chain.isra.37+0x67d/0x720
Nov 17 17:18:17 fix kernel: [   11.170396]  [<ffffffff810a2478>] 
__lock_acquire+0x478/0x9c0
Nov 17 17:18:17 fix kernel: [   11.170399]  [<ffffffff8162ae19>] ? 
sub_preempt_count+0x29/0x60
Nov 17 17:18:17 fix kernel: [   11.170404]  [<ffffffff81627475>] ? 
_raw_spin_unlock+0x35/0x60
Nov 17 17:18:17 fix kernel: [   11.170407]  [<ffffffff81625f0b>] ? 
rt_spin_lock_slowlock+0x2eb/0x340
Nov 17 17:18:17 fix kernel: [   11.170410]  [<ffffffff8162ae19>] ? 
sub_preempt_count+0x29/0x60
Nov 17 17:18:17 fix kernel: [   11.170413]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:18:17 fix kernel: [   11.170416]  [<ffffffff810a2f64>] 
lock_acquire+0x94/0x160
Nov 17 17:18:17 fix kernel: [   11.170419]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:18:17 fix kernel: [   11.170422]  [<ffffffff81626999>] 
rt_spin_lock+0x39/0x40
Nov 17 17:18:17 fix kernel: [   11.170425]  [<ffffffff81613e63>] ? 
cache_flusharray+0x47/0xd6
Nov 17 17:18:17 fix kernel: [   11.170428]  [<ffffffff810a3a4d>] ? 
trace_hardirqs_on_caller+0x13d/0x180
Nov 17 17:18:17 fix kernel: [   11.170431]  [<ffffffff81613e63>] 
cache_flusharray+0x47/0xd6
Nov 17 17:18:17 fix kernel: [   11.170435]  [<ffffffff81167a41>] 
kmem_cache_free+0x221/0x300
Nov 17 17:18:17 fix kernel: [   11.170438]  [<ffffffff81167b8f>] 
slab_destroy+0x6f/0xa0
Nov 17 17:18:17 fix kernel: [   11.170441]  [<ffffffff81167d32>] 
free_block+0x172/0x190
Nov 17 17:18:17 fix kernel: [   11.170444]  [<ffffffff81168313>] 
drain_array.part.43+0x113/0x220
Nov 17 17:18:17 fix kernel: [   11.170448]  [<ffffffff81168455>] 
drain_array+0x35/0x40
Nov 17 17:18:17 fix kernel: [   11.170451]  [<ffffffff81168a36>] 
cache_reap+0xb6/0x1b0
Nov 17 17:18:17 fix kernel: [   11.170454]  [<ffffffff81168980>] ? 
drain_freelist+0x2c0/0x2c0
Nov 17 17:18:17 fix kernel: [   11.170457]  [<ffffffff81083554>] 
process_one_work+0x194/0x5a0
Nov 17 17:18:17 fix kernel: [   11.170459]  [<ffffffff810834ec>] ? 
process_one_work+0x12c/0x5a0
Nov 17 17:18:17 fix kernel: [   11.170462]  [<ffffffff81083d72>] 
worker_thread+0x182/0x380
Nov 17 17:18:17 fix kernel: [   11.170465]  [<ffffffff81083bf0>] ? 
rescuer_thread+0x250/0x250
Nov 17 17:18:17 fix kernel: [   11.170469]  [<ffffffff81088b81>] 
kthread+0xa1/0xb0
Nov 17 17:18:17 fix kernel: [   11.170472]  [<ffffffff81627411>] ? 
_raw_spin_unlock_irq+0x41/0x70
Nov 17 17:18:17 fix kernel: [   11.170476]  [<ffffffff8104adec>] ? 
finish_task_switch+0x7c/0x130
Nov 17 17:18:17 fix kernel: [   11.170480]  [<ffffffff8162fea4>] 
kernel_thread_helper+0x4/0x10
Nov 17 17:18:17 fix kernel: [   11.170483]  [<ffffffff81627411>] ? 
_raw_spin_unlock_irq+0x41/0x70
Nov 17 17:18:17 fix kernel: [   11.170486]  [<ffffffff81627898>] ? 
retint_restore_args+0x13/0x13
Nov 17 17:18:17 fix kernel: [   11.170490]  [<ffffffff81088ae0>] ? 
__init_kthread_worker+0xa0/0xa0
Nov 17 17:18:17 fix kernel: [   11.170492]  [<ffffffff8162fea0>] ? 
gs_change+0x13/0x13

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: 3.2-rc1 and nvidia drivers
  2011-11-16  9:10 3.2-rc1 and nvidia drivers Javier Sanz
  2011-11-16  9:40 ` Thomas Schauss
@ 2011-11-16  9:52 ` Mike Galbraith
  1 sibling, 0 replies; 24+ messages in thread
From: Mike Galbraith @ 2011-11-16  9:52 UTC (permalink / raw)
  To: Javier Sanz; +Cc: RT

On Wed, 2011-11-16 at 10:10 +0100, Javier Sanz wrote:
> Hello,
> 
> Congratulations all people involved in 3.2-rc1-rt2 release, great job !
> 
> So, can i ask a favor ..
> 
> Can anyone publish a patch for nvidia drivers and rt series ...
> "resolving" the "GPL" string issue?
> 
> Really, i tried to get it run , but, it doen't work .. and i think
> there are a lot of people that need it , but ,..
> why you don't ask ?
> 
> So, if some has running it .,. please, 3 minutes ... publish it

Nope, no GPL avoidance patch publishing here.

	-Mike 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2011-12-02 19:37 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-16  9:10 3.2-rc1 and nvidia drivers Javier Sanz
2011-11-16  9:40 ` Thomas Schauss
2011-11-16 15:06   ` Thomas Gleixner
2011-11-28 10:08     ` Thomas Schauss
2011-11-28 11:31       ` John Kacur
2011-11-29 14:31         ` John Kacur
2011-11-30  2:36           ` Steven Rostedt
2011-11-30  8:23             ` John Kacur
2011-11-30 11:14               ` Peter Zijlstra
2011-11-30 14:14                 ` Steven Rostedt
2011-11-30 14:16                   ` Peter Zijlstra
2011-11-30 14:28                     ` Steven Rostedt
2011-11-30 14:31                     ` Steven Rostedt
2011-11-30 14:34                       ` Peter Zijlstra
2011-11-30 15:07                       ` Thomas Schauss
2011-11-30 15:20                         ` Steven Rostedt
2011-12-02 17:41                           ` Thomas Schauss
2011-12-02 19:37                             ` Steven Rostedt
2011-11-30 13:34               ` Steven Rostedt
2011-11-30 13:39                 ` John Kacur
2011-11-30 13:49                   ` Steven Rostedt
2011-11-30 13:53                     ` John Kacur
2011-11-30  9:06           ` Thomas Schauss
2011-11-16  9:52 ` Mike Galbraith

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).