Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]

linux-rt-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
@ 2013-12-05 21:20 jordan
  2013-12-05 23:04 ` Pavel Vasilyev
  2013-12-17 18:10 ` Glenn Elliott
  0 siblings, 2 replies; 6+ messages in thread
From: jordan @ 2013-12-05 21:20 UTC (permalink / raw)
  To: linux-rt-users@vger.kernel.org

Hey list,

I have been experimenting with the nvidia blob, monitoring behavior /
latencies, etc. Through some experimentation, googling/research and
looking over previous nvidia-rt/compat patches from all over the
interwebs; I've put together a few patches for the latest
nvidia-331.20 driver. (probably portable to older 3xx.xx series too).
You can obtain the patches from my Archlinux package (Arch User Repo)
- download the package and extract the patches from the tarball.

https://aur.archlinux.org/packages/nvidia-l-pa/

the PKGBUILD file, (which is a build script (for Arch/ABS pakcages)
much like bsd/ports, gentoo /portage for those unfamiliar) shows how I
patch/install the driver For Archlinux - although, obviously different
distros do this different - so; DISCLAIMER: I'm not the person to ask
for help :)
---

1. nvidia-rt_no_wbinvd.patch - It turns out that nvidia uses wbinvdt()
<An intel instruction> that through it's operations, in a nutshell,
ends up halting all cpus - which is the source of nearly all pain when
using nvidia-rt. :\  ...

WBINVD: http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc323.htm

nvidia-rt_no_wbinvd.patch disables that behavior. The patch is used
along with (2.) nvidia-rt_explicit.patch <failing to do so will result
in X not starting>, but the kernel module will still load.

* expect an nvidia-rt/linux-rt system to report VERY impressive
results in Cyclictest. (any places you used to see latency spikes
should be reduced to almost (or) nothing. (ie: competative with OSS
drivers).
----

2. nvidia-rt_explicit.patch - i wrote this patch for nvidia-325.xx
because skipping the PREEMPT_RT_FULL check, while resulting in a
driver that could compile/install - it still deadlocked. So this patch
just explicitly sets CONFIG_PREEMPT_RT_FULL (1). (which also allows
the 1st patch to work).

Lastly (but not needed to test the latency spikes, just use the above two);
----

3. nvidia-rt_mutexes.patch - Lastly, as the name of the patch implies;
this patch converts all of the semaphores in the nvidia driver into
(regular) mutexes. Nvidia had moved some around some of the semaphore
code in the 3xx.xx series, but i found them all.   1st off - * this
patch will likely kill any scheduling type-bugs encountered when using
the semaphore version. (i have yet to see any 'complaints' in the last
few days of uptime).

* (especially) this last patch I would (Please!) like someone with
more experience to look at for me, if at all possible? It would be
greatly appreciated -> I hack a bit, but am not an accomplished
programmer, so making a rookie mistake is not out of the question. ~
That is why i do not have the feature enabled for users, as it is
commented out.). That being said; I have tortured/tested the hell out
of my system using the nvidia-rt_mutexes (all patches combined) and
nvidia. (very deterministic,  great cyclictest results, just
fantastic).

Anyway, I thought i should share, as i know there are some nvidia-rt
users subbed to the list and my hope is someone can also review these
patches, help fix them up, etc... as like i said - not much of a
programmer, so expertise would be helpful ;)

cheerz

Jordan

PS: I've also taken the time to 'rattle nvidia's cage' on this issue,
feel free to pipe in on my thread, if you like;
https://devtalk.nvidia.com/default/topic/654639/linux/nvidia-using-wbinvdt-lt-an-intel-instruction-gt-causes-huge-latency-spikes-/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
  2013-12-05 21:20 Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided] jordan
@ 2013-12-05 23:04 ` Pavel Vasilyev
  2013-12-05 23:43   ` jordan
  2013-12-17 18:10 ` Glenn Elliott
  1 sibling, 1 reply; 6+ messages in thread
From: Pavel Vasilyev @ 2013-12-05 23:04 UTC (permalink / raw)
  To: jordan; +Cc: linux-rt-users@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 753 bytes --]

06.12.2013 01:20, jordan пишет:
> Hey list,
> 
> I have been experimenting with the nvidia blob, monitoring behavior /
> latencies, etc. Through some experimentation, googling/research and
> looking over previous nvidia-rt/compat patches from all over the
> interwebs; I've put together a few patches for the latest
> nvidia-331.20 driver. 


1. Blob after 325.xx.xx does not compile and does not work with our RT patches!
2. Nvidia developers specifically inserted function WBINVD to slowly work? :)
3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
4. Do you really think, that mutex_lock/unlock more realtime than up/down?


-- 

                                                         Pavel.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
  2013-12-05 23:04 ` Pavel Vasilyev
@ 2013-12-05 23:43   ` jordan
  2013-12-06  1:01     ` Pavel Vasilyev
  0 siblings, 1 reply; 6+ messages in thread
From: jordan @ 2013-12-05 23:43 UTC (permalink / raw)
  To: pavel; +Cc: linux-rt-users@vger.kernel.org

> 1. Blob after 325.xx.xx does not compile and does not work with our RT patches!

Sure it does.

make IGNORE_PREEMPT_RT_PRESENCE=1
SYSSRC=/usr/lib/modules/"${_kernver}/build" module

I've used every nvidia stable (or beta) driver post-325xx just fine on
-rt. The last remaning issues (for me) is getting nvidia to
remove/replace wbinvd(); and fix up scheduling bugs (with
semaphores)... or I will use mutexes, instead, i guess.

> 2. Nvidia developers specifically inserted function WBINVD to slowly work? :)

well, to find out this answer, i started a thread at nvidia's
developer forums and inquired about this. But i can tell you that the
Intel OSS driver was using wbinvdt() at some point in 3.10 cycle and
it was problematic there too, from what I understand.

> 3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful

Would you care to goto into more detail on this?  - I have seen no
negative impacts on my systems (I've been watching all system
resources and H/w related info, etc), but am curious about this, as
like i said, it is the source of problems in the nvidia blob and if
the Intel OSS driver was able to work around it, i don't see why
Nvidia couldn't do the same.

> 4. Do you really think, that mutex_lock/unlock more realtime than up/down?

Did i say that, Pavel ...What i can tell you is;

1. nvidia's driver on -rt when using semaphores, causing scheduling
bugs (ie: "scheduling while atomic"... that kind of thing),
periodically. It's non-fatal but happens 5-6 times a day on my
machines.  I've yet to see even one scheduling while atomic bug, when
using  mutexes. - so whether or not semaphores are more realtime, is
sort of a moot point for me - i use linux-rt for Proaudio - ie: GFX is
NOT as important, but what is important is that the (nvidia) driver
works properly (safely).

2. With Semaphores i seem to get more variance/jitter in my cyclictest
results or slightly higher latency spikes, then when i am using
mutexes. (and i feel a little safer without seeing a bunch of
scheduling while atomic bugs).

Jordan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
  2013-12-05 23:43   ` jordan
@ 2013-12-06  1:01     ` Pavel Vasilyev
  2013-12-06  1:16       ` jordan
  0 siblings, 1 reply; 6+ messages in thread
From: Pavel Vasilyev @ 2013-12-06  1:01 UTC (permalink / raw)
  To: jordan; +Cc: linux-rt-users@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]

06.12.2013 03:43, jordan пишет:
>> 1. Blob after 325.xx.xx does not compile and does not work with our RT patches!
> 
> Sure it does.
> 
> make IGNORE_PREEMPT_RT_PRESENCE=1

Wow, I did not know, thanks. I until on 319.xx

> 
>> 3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
> Would you care to goto into more detail on this? 

https://www.kernel.org/doc/Documentation/memory-barriers.txt

-- 

                                                         Pavel.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
  2013-12-06  1:01     ` Pavel Vasilyev
@ 2013-12-06  1:16       ` jordan
  0 siblings, 0 replies; 6+ messages in thread
From: jordan @ 2013-12-06  1:16 UTC (permalink / raw)
  To: pavel; +Cc: linux-rt-users@vger.kernel.org

Hey Pavel

> Wow, I did not know, thanks. I until on 319.xx

I had wondered. lol. ~ because telling someone who is using nvidia-rt
>325xx that it doesn't work on rt is a bit odd. ;)  ... I thought you
must have been mistaken :)

take a look in kernel/conftest.sh (of nvidia installer/sources) - You
were mistaken because nvidia has changed some of the internals around
/ how their installer works. It actually appears they are making it
more RT friendly... Which is great because i (personally) have been
bugging them every couple of months about PREEMPT_RT_FULL :)

>>> 3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
>> Would you care to goto into more detail on this?
>
> https://www.kernel.org/doc/Documentation/memory-barriers.txt

Thanks, i'll have a read through + wait for some feedback from nvidia
in my thread.

jordan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
  2013-12-05 21:20 Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided] jordan
  2013-12-05 23:04 ` Pavel Vasilyev
@ 2013-12-17 18:10 ` Glenn Elliott
  1 sibling, 0 replies; 6+ messages in thread
From: Glenn Elliott @ 2013-12-17 18:10 UTC (permalink / raw)
  To: linux-rt-users

jordan <triplesquarednine <at> gmail.com> writes:

> 
> Hey list,
> 
> I have been experimenting with the nvidia blob, monitoring behavior /
> latencies, etc. Through some experimentation, googling/research and
> looking over previous nvidia-rt/compat patches from all over the
> interwebs; I've put together a few patches for the latest
> nvidia-331.20 driver. (probably portable to older 3xx.xx series too).
> You can obtain the patches from my Archlinux package (Arch User Repo)
> - download the package and extract the patches from the tarball.
> 
> https://aur.archlinux.org/packages/nvidia-l-pa/
> 
> the PKGBUILD file, (which is a build script (for Arch/ABS pakcages)
> much like bsd/ports, gentoo /portage for those unfamiliar) shows how I
> patch/install the driver For Archlinux - although, obviously different
> distros do this different - so; DISCLAIMER: I'm not the person to ask
> for help :)
> 
> cheerz
> 
> Jordan
> 

Have you taken at the SteamOS beta release?  Apparently it's built on
top of PREEMPT_RT.

Link: http://www.phoronix.com/scan.php?page=news_item&px=MTU0MzY.

I believe NVIDIA has been involved in its development.  Hopefully this will
mean NVIDIA will actively develop/test their driver on PREEMPT_RT.

Anyway, I was wondering if you had taken a peak at any NVIDIA GPL'ed code
in SteamOS to see if they include  similar changes.  (The official driver version
number is 331.20.

Link: http://www.phoronix.com/scan.php? 
page=article&item=steamos_linux_benchmarks&num=1


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-12-17 18:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 21:20 Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided] jordan
2013-12-05 23:04 ` Pavel Vasilyev
2013-12-05 23:43   ` jordan
2013-12-06  1:01     ` Pavel Vasilyev
2013-12-06  1:16       ` jordan
2013-12-17 18:10 ` Glenn Elliott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).