* Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
@ 2013-12-05 21:20 jordan
2013-12-05 23:04 ` Pavel Vasilyev
2013-12-17 18:10 ` Glenn Elliott
0 siblings, 2 replies; 6+ messages in thread
From: jordan @ 2013-12-05 21:20 UTC (permalink / raw)
To: linux-rt-users@vger.kernel.org
Hey list,
I have been experimenting with the nvidia blob, monitoring behavior /
latencies, etc. Through some experimentation, googling/research and
looking over previous nvidia-rt/compat patches from all over the
interwebs; I've put together a few patches for the latest
nvidia-331.20 driver. (probably portable to older 3xx.xx series too).
You can obtain the patches from my Archlinux package (Arch User Repo)
- download the package and extract the patches from the tarball.
https://aur.archlinux.org/packages/nvidia-l-pa/
the PKGBUILD file, (which is a build script (for Arch/ABS pakcages)
much like bsd/ports, gentoo /portage for those unfamiliar) shows how I
patch/install the driver For Archlinux - although, obviously different
distros do this different - so; DISCLAIMER: I'm not the person to ask
for help :)
---
1. nvidia-rt_no_wbinvd.patch - It turns out that nvidia uses wbinvdt()
<An intel instruction> that through it's operations, in a nutshell,
ends up halting all cpus - which is the source of nearly all pain when
using nvidia-rt. :\ ...
WBINVD: http://www.jaist.ac.jp/iscenter-new/mpc/altix/altixdata/opt/intel/vtune/doc/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc323.htm
nvidia-rt_no_wbinvd.patch disables that behavior. The patch is used
along with (2.) nvidia-rt_explicit.patch <failing to do so will result
in X not starting>, but the kernel module will still load.
* expect an nvidia-rt/linux-rt system to report VERY impressive
results in Cyclictest. (any places you used to see latency spikes
should be reduced to almost (or) nothing. (ie: competative with OSS
drivers).
----
2. nvidia-rt_explicit.patch - i wrote this patch for nvidia-325.xx
because skipping the PREEMPT_RT_FULL check, while resulting in a
driver that could compile/install - it still deadlocked. So this patch
just explicitly sets CONFIG_PREEMPT_RT_FULL (1). (which also allows
the 1st patch to work).
Lastly (but not needed to test the latency spikes, just use the above two);
----
3. nvidia-rt_mutexes.patch - Lastly, as the name of the patch implies;
this patch converts all of the semaphores in the nvidia driver into
(regular) mutexes. Nvidia had moved some around some of the semaphore
code in the 3xx.xx series, but i found them all. 1st off - * this
patch will likely kill any scheduling type-bugs encountered when using
the semaphore version. (i have yet to see any 'complaints' in the last
few days of uptime).
* (especially) this last patch I would (Please!) like someone with
more experience to look at for me, if at all possible? It would be
greatly appreciated -> I hack a bit, but am not an accomplished
programmer, so making a rookie mistake is not out of the question. ~
That is why i do not have the feature enabled for users, as it is
commented out.). That being said; I have tortured/tested the hell out
of my system using the nvidia-rt_mutexes (all patches combined) and
nvidia. (very deterministic, great cyclictest results, just
fantastic).
Anyway, I thought i should share, as i know there are some nvidia-rt
users subbed to the list and my hope is someone can also review these
patches, help fix them up, etc... as like i said - not much of a
programmer, so expertise would be helpful ;)
cheerz
Jordan
PS: I've also taken the time to 'rattle nvidia's cage' on this issue,
feel free to pipe in on my thread, if you like;
https://devtalk.nvidia.com/default/topic/654639/linux/nvidia-using-wbinvdt-lt-an-intel-instruction-gt-causes-huge-latency-spikes-/
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
2013-12-05 21:20 Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided] jordan
@ 2013-12-05 23:04 ` Pavel Vasilyev
2013-12-05 23:43 ` jordan
2013-12-17 18:10 ` Glenn Elliott
1 sibling, 1 reply; 6+ messages in thread
From: Pavel Vasilyev @ 2013-12-05 23:04 UTC (permalink / raw)
To: jordan; +Cc: linux-rt-users@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 753 bytes --]
06.12.2013 01:20, jordan пишет:
> Hey list,
>
> I have been experimenting with the nvidia blob, monitoring behavior /
> latencies, etc. Through some experimentation, googling/research and
> looking over previous nvidia-rt/compat patches from all over the
> interwebs; I've put together a few patches for the latest
> nvidia-331.20 driver.
1. Blob after 325.xx.xx does not compile and does not work with our RT patches!
2. Nvidia developers specifically inserted function WBINVD to slowly work? :)
3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
4. Do you really think, that mutex_lock/unlock more realtime than up/down?
--
Pavel.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
2013-12-05 23:04 ` Pavel Vasilyev
@ 2013-12-05 23:43 ` jordan
2013-12-06 1:01 ` Pavel Vasilyev
0 siblings, 1 reply; 6+ messages in thread
From: jordan @ 2013-12-05 23:43 UTC (permalink / raw)
To: pavel; +Cc: linux-rt-users@vger.kernel.org
> 1. Blob after 325.xx.xx does not compile and does not work with our RT patches!
Sure it does.
make IGNORE_PREEMPT_RT_PRESENCE=1
SYSSRC=/usr/lib/modules/"${_kernver}/build" module
I've used every nvidia stable (or beta) driver post-325xx just fine on
-rt. The last remaning issues (for me) is getting nvidia to
remove/replace wbinvd(); and fix up scheduling bugs (with
semaphores)... or I will use mutexes, instead, i guess.
> 2. Nvidia developers specifically inserted function WBINVD to slowly work? :)
well, to find out this answer, i started a thread at nvidia's
developer forums and inquired about this. But i can tell you that the
Intel OSS driver was using wbinvdt() at some point in 3.10 cycle and
it was problematic there too, from what I understand.
> 3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
Would you care to goto into more detail on this? - I have seen no
negative impacts on my systems (I've been watching all system
resources and H/w related info, etc), but am curious about this, as
like i said, it is the source of problems in the nvidia blob and if
the Intel OSS driver was able to work around it, i don't see why
Nvidia couldn't do the same.
> 4. Do you really think, that mutex_lock/unlock more realtime than up/down?
Did i say that, Pavel ...What i can tell you is;
1. nvidia's driver on -rt when using semaphores, causing scheduling
bugs (ie: "scheduling while atomic"... that kind of thing),
periodically. It's non-fatal but happens 5-6 times a day on my
machines. I've yet to see even one scheduling while atomic bug, when
using mutexes. - so whether or not semaphores are more realtime, is
sort of a moot point for me - i use linux-rt for Proaudio - ie: GFX is
NOT as important, but what is important is that the (nvidia) driver
works properly (safely).
2. With Semaphores i seem to get more variance/jitter in my cyclictest
results or slightly higher latency spikes, then when i am using
mutexes. (and i feel a little safer without seeing a bunch of
scheduling while atomic bugs).
Jordan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
2013-12-05 23:43 ` jordan
@ 2013-12-06 1:01 ` Pavel Vasilyev
2013-12-06 1:16 ` jordan
0 siblings, 1 reply; 6+ messages in thread
From: Pavel Vasilyev @ 2013-12-06 1:01 UTC (permalink / raw)
To: jordan; +Cc: linux-rt-users@vger.kernel.org
[-- Attachment #1: Type: text/plain, Size: 517 bytes --]
06.12.2013 03:43, jordan пишет:
>> 1. Blob after 325.xx.xx does not compile and does not work with our RT patches!
>
> Sure it does.
>
> make IGNORE_PREEMPT_RT_PRESENCE=1
Wow, I did not know, thanks. I until on 319.xx
>
>> 3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
> Would you care to goto into more detail on this?
https://www.kernel.org/doc/Documentation/memory-barriers.txt
--
Pavel.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
2013-12-06 1:01 ` Pavel Vasilyev
@ 2013-12-06 1:16 ` jordan
0 siblings, 0 replies; 6+ messages in thread
From: jordan @ 2013-12-06 1:16 UTC (permalink / raw)
To: pavel; +Cc: linux-rt-users@vger.kernel.org
Hey Pavel
> Wow, I did not know, thanks. I until on 319.xx
I had wondered. lol. ~ because telling someone who is using nvidia-rt
>325xx that it doesn't work on rt is a bit odd. ;) ... I thought you
must have been mistaken :)
take a look in kernel/conftest.sh (of nvidia installer/sources) - You
were mistaken because nvidia has changed some of the internals around
/ how their installer works. It actually appears they are making it
more RT friendly... Which is great because i (personally) have been
bugging them every couple of months about PREEMPT_RT_FULL :)
>>> 3. SFENCE need on SMP systems. Although, delay in 1000 nanoseconds - it's awful
>> Would you care to goto into more detail on this?
>
> https://www.kernel.org/doc/Documentation/memory-barriers.txt
Thanks, i'll have a read through + wait for some feedback from nvidia
in my thread.
jordan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided]
2013-12-05 21:20 Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided] jordan
2013-12-05 23:04 ` Pavel Vasilyev
@ 2013-12-17 18:10 ` Glenn Elliott
1 sibling, 0 replies; 6+ messages in thread
From: Glenn Elliott @ 2013-12-17 18:10 UTC (permalink / raw)
To: linux-rt-users
jordan <triplesquarednine <at> gmail.com> writes:
>
> Hey list,
>
> I have been experimenting with the nvidia blob, monitoring behavior /
> latencies, etc. Through some experimentation, googling/research and
> looking over previous nvidia-rt/compat patches from all over the
> interwebs; I've put together a few patches for the latest
> nvidia-331.20 driver. (probably portable to older 3xx.xx series too).
> You can obtain the patches from my Archlinux package (Arch User Repo)
> - download the package and extract the patches from the tarball.
>
> https://aur.archlinux.org/packages/nvidia-l-pa/
>
> the PKGBUILD file, (which is a build script (for Arch/ABS pakcages)
> much like bsd/ports, gentoo /portage for those unfamiliar) shows how I
> patch/install the driver For Archlinux - although, obviously different
> distros do this different - so; DISCLAIMER: I'm not the person to ask
> for help :)
>
> cheerz
>
> Jordan
>
Have you taken at the SteamOS beta release? Apparently it's built on
top of PREEMPT_RT.
Link: http://www.phoronix.com/scan.php?page=news_item&px=MTU0MzY.
I believe NVIDIA has been involved in its development. Hopefully this will
mean NVIDIA will actively develop/test their driver on PREEMPT_RT.
Anyway, I was wondering if you had taken a peak at any NVIDIA GPL'ed code
in SteamOS to see if they include similar changes. (The official driver version
number is 331.20.
Link: http://www.phoronix.com/scan.php?
page=article&item=steamos_linux_benchmarks&num=1
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-12-17 18:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-12-05 21:20 Solving Nvidia (blob) latency spikes + convert (nvidia's) semaphores into mutexes patches for 331.20 [patches and links provided] jordan
2013-12-05 23:04 ` Pavel Vasilyev
2013-12-05 23:43 ` jordan
2013-12-06 1:01 ` Pavel Vasilyev
2013-12-06 1:16 ` jordan
2013-12-17 18:10 ` Glenn Elliott
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).