* Second copy engine on GF116
@ 2014-11-20 19:18 Ilia Mirkin
[not found] ` <CAKb7UvjB4fY+7eERavM=dZ5HYX+=CwHKyFkm3Px=j-7Ap38ZCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 10+ messages in thread
From: Ilia Mirkin @ 2014-11-20 19:18 UTC (permalink / raw)
To: gpu-public-documentation
Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Hello,
There's a long-standing bug on nouveau (this is a sample bug, but the
issue has been around for a while:
https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt
to use the second PCOPY engine on GF116, and it is sometimes does
nothing, despite mmio register 22500 saying that it's not disabled
(0x22500 == 0 for this user). In the bug you can see a dump from
22400..22600, and all values after 22440 are read as 0. The issue
appears to be more common on mobile GF116's, but I don't know that the
correlation is 100%. No errors are reported by the FIFO or invalid
mmio reads, but the data transfer just does not happen. Switching to
using the first copy engine resolves things, so it's unlikely to be a
more systemic issue in nouveau's usage of the copy engine.
To be clear, when I'm talking about the second PCOPY engine, I'm
talking about the engine at mmio 0x105000, and whose fifo class id is
0x90b8.
Any information on properly detecting that the engine is, in fact,
missing, would be greatly appreciated. Or, conversely, an assurance
that the engine _is_ there on all GF116's and we're just not
initializing something properly, along with perhaps some suggestions
as to what we might be missing.
Thanks,
Ilia Mirkin
imirkin@alum.mit.edu
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau
^ permalink raw reply [flat|nested] 10+ messages in thread[parent not found: <CAKb7UvjB4fY+7eERavM=dZ5HYX+=CwHKyFkm3Px=j-7Ap38ZCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <CAKb7UvjB4fY+7eERavM=dZ5HYX+=CwHKyFkm3Px=j-7Ap38ZCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-11-21 6:16 ` Andy Ritger [not found] ` <20141121061656.GA897-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Andy Ritger @ 2014-11-21 6:16 UTC (permalink / raw) To: Ilia Mirkin Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation Hi Ilia, Actually 0x90b8 is different than copy engine. I'm not very familiar with it, but 0x90b8 is an engine for performing LZO decompression as part of performing the copy. It has a variety of limitations (e.g., cannot handle blocklinear format), and was only in a few Fermi chips, as I understand it. It is probably easiest to just ignore it. You can distinguish this decompress engine from normal copy engine by looking at the CE capability register on falcon (0x00000650). If bit 2 is '1', then the falcon is a decompress engine. I hope that helps, - Andy On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote: > Hello, > > There's a long-standing bug on nouveau (this is a sample bug, but the > issue has been around for a while: > https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt > to use the second PCOPY engine on GF116, and it is sometimes does > nothing, despite mmio register 22500 saying that it's not disabled > (0x22500 == 0 for this user). In the bug you can see a dump from > 22400..22600, and all values after 22440 are read as 0. The issue > appears to be more common on mobile GF116's, but I don't know that the > correlation is 100%. No errors are reported by the FIFO or invalid > mmio reads, but the data transfer just does not happen. Switching to > using the first copy engine resolves things, so it's unlikely to be a > more systemic issue in nouveau's usage of the copy engine. > > To be clear, when I'm talking about the second PCOPY engine, I'm > talking about the engine at mmio 0x105000, and whose fifo class id is > 0x90b8. > > Any information on properly detecting that the engine is, in fact, > missing, would be greatly appreciated. Or, conversely, an assurance > that the engine _is_ there on all GF116's and we're just not > initializing something properly, along with perhaps some suggestions > as to what we might be missing. > > Thanks, > > Ilia Mirkin > imirkin@alum.mit.edu > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20141121061656.GA897-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <20141121061656.GA897-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> @ 2014-11-21 6:39 ` Ilia Mirkin [not found] ` <CAKb7UviMqzsBbbJBmTFH+Bu2+uTv=oOK2w3CWeCovBfsBys8wA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Ilia Mirkin @ 2014-11-21 6:39 UTC (permalink / raw) To: Andy Ritger Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: > Hi Ilia, > > Actually 0x90b8 is different than copy engine. I'm not very familiar > with it, but 0x90b8 is an engine for performing LZO decompression as > part of performing the copy. It has a variety of limitations (e.g., > cannot handle blocklinear format), and was only in a few Fermi chips, > as I understand it. According to our driver source, GF100, GF104, GF110, GF114, and GF116 all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only had problems reported against GF116... and only for some people. > > It is probably easiest to just ignore it. You can distinguish this > decompress engine from normal copy engine by looking at the CE capability > register on falcon (0x00000650). If bit 2 is '1', then the falcon is > a decompress engine. I presume you mean a +0x650 register on the pcopy engines (0x104000 and 0x105000). I only have access to the GF108 right now, which returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at 0x104000 for copy on the GF108... From my admittedly limited understanding, both 0x104000 and 0x105000 appear to be falcon engines, where the fuc is presumably able to drive some underlying hardware. The actual fifo methods are implemented in the fuc, which in turn does iowr/etc commands. Are you saying that the "decompress" engine (at 0x105000 right?) has a different piece of hardware behind it than the copy engine at 0x104000, or does NVIDIA simply provide different fuc for it that exposes somewhat different functionality via FIFO methods? > > I hope that helps, > - Andy > > > On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote: >> Hello, >> >> There's a long-standing bug on nouveau (this is a sample bug, but the >> issue has been around for a while: >> https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt >> to use the second PCOPY engine on GF116, and it is sometimes does >> nothing, despite mmio register 22500 saying that it's not disabled >> (0x22500 == 0 for this user). In the bug you can see a dump from >> 22400..22600, and all values after 22440 are read as 0. The issue >> appears to be more common on mobile GF116's, but I don't know that the >> correlation is 100%. No errors are reported by the FIFO or invalid >> mmio reads, but the data transfer just does not happen. Switching to >> using the first copy engine resolves things, so it's unlikely to be a >> more systemic issue in nouveau's usage of the copy engine. >> >> To be clear, when I'm talking about the second PCOPY engine, I'm >> talking about the engine at mmio 0x105000, and whose fifo class id is >> 0x90b8. >> >> Any information on properly detecting that the engine is, in fact, >> missing, would be greatly appreciated. Or, conversely, an assurance >> that the engine _is_ there on all GF116's and we're just not >> initializing something properly, along with perhaps some suggestions >> as to what we might be missing. >> >> Thanks, >> >> Ilia Mirkin >> imirkin@alum.mit.edu >> _______________________________________________ >> Nouveau mailing list >> Nouveau@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/nouveau _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAKb7UviMqzsBbbJBmTFH+Bu2+uTv=oOK2w3CWeCovBfsBys8wA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <CAKb7UviMqzsBbbJBmTFH+Bu2+uTv=oOK2w3CWeCovBfsBys8wA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-11-25 1:33 ` Andy Ritger [not found] ` <20141125013301.GL22016-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Andy Ritger @ 2014-11-25 1:33 UTC (permalink / raw) To: Ilia Mirkin Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote: > On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: > > Hi Ilia, > > > > Actually 0x90b8 is different than copy engine. I'm not very familiar > > with it, but 0x90b8 is an engine for performing LZO decompression as > > part of performing the copy. It has a variety of limitations (e.g., > > cannot handle blocklinear format), and was only in a few Fermi chips, > > as I understand it. > > According to our driver source, GF100, GF104, GF110, GF114, and GF116 > all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only > had problems reported against GF116... and only for some people. Hmm, some of our internal documentation is inconsistent about whether it applies to GF100, but otherwise what I see matches your list. I guess "few" was not entirely accurate. > > It is probably easiest to just ignore it. You can distinguish this > > decompress engine from normal copy engine by looking at the CE capability > > register on falcon (0x00000650). If bit 2 is '1', then the falcon is > > a decompress engine. > > I presume you mean a +0x650 register on the pcopy engines (0x104000 > and 0x105000). I only have access to the GF108 right now, which > returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at > 0x104000 for copy on the GF108... Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. FWIW, the other capability bits are: bit 0: "DMACOPY_SUPPORTED" bit 1: "PIXREMAP_SUPPORTED" (I think PIXREMAP_SUPPORTED is in reference to the component remapping controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the copy engine class). > From my admittedly limited understanding, both 0x104000 and 0x105000 > appear to be falcon engines, where the fuc is presumably able to drive > some underlying hardware. The actual fifo methods are implemented in > the fuc, which in turn does iowr/etc commands. > > Are you saying that the "decompress" engine (at 0x105000 right?) has a > different piece of hardware behind it than the copy engine at > 0x104000, or does NVIDIA simply provide different fuc for it that > exposes somewhat different functionality via FIFO methods? There is definitely a falcon at the frontend, and there is different falcon ucode for "normal" copy engine versus the "decompress" engine. But, I don't know off hand what dedicated hardware, if any, is behind it. - Andy > > > > I hope that helps, > > - Andy > > > > > > On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote: > >> Hello, > >> > >> There's a long-standing bug on nouveau (this is a sample bug, but the > >> issue has been around for a while: > >> https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt > >> to use the second PCOPY engine on GF116, and it is sometimes does > >> nothing, despite mmio register 22500 saying that it's not disabled > >> (0x22500 == 0 for this user). In the bug you can see a dump from > >> 22400..22600, and all values after 22440 are read as 0. The issue > >> appears to be more common on mobile GF116's, but I don't know that the > >> correlation is 100%. No errors are reported by the FIFO or invalid > >> mmio reads, but the data transfer just does not happen. Switching to > >> using the first copy engine resolves things, so it's unlikely to be a > >> more systemic issue in nouveau's usage of the copy engine. > >> > >> To be clear, when I'm talking about the second PCOPY engine, I'm > >> talking about the engine at mmio 0x105000, and whose fifo class id is > >> 0x90b8. > >> > >> Any information on properly detecting that the engine is, in fact, > >> missing, would be greatly appreciated. Or, conversely, an assurance > >> that the engine _is_ there on all GF116's and we're just not > >> initializing something properly, along with perhaps some suggestions > >> as to what we might be missing. > >> > >> Thanks, > >> > >> Ilia Mirkin > >> imirkin@alum.mit.edu > >> _______________________________________________ > >> Nouveau mailing list > >> Nouveau@lists.freedesktop.org > >> http://lists.freedesktop.org/mailman/listinfo/nouveau _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20141125013301.GL22016-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <20141125013301.GL22016-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> @ 2014-11-25 15:57 ` Ilia Mirkin [not found] ` <CAKb7Uvh1dw4OfPsB9gzjq-En7eFaek+efo2N2dGSRj+xPJAw+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2014-11-25 18:28 ` Marcin Kościelnicki 1 sibling, 1 reply; 10+ messages in thread From: Ilia Mirkin @ 2014-11-25 15:57 UTC (permalink / raw) To: Andy Ritger Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On Mon, Nov 24, 2014 at 8:33 PM, Andy Ritger <aritger@nvidia.com> wrote: > On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote: >> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: >> > Hi Ilia, >> > >> > Actually 0x90b8 is different than copy engine. I'm not very familiar >> > with it, but 0x90b8 is an engine for performing LZO decompression as >> > part of performing the copy. It has a variety of limitations (e.g., >> > cannot handle blocklinear format), and was only in a few Fermi chips, >> > as I understand it. >> >> According to our driver source, GF100, GF104, GF110, GF114, and GF116 >> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only >> had problems reported against GF116... and only for some people. > > Hmm, some of our internal documentation is inconsistent about whether it > applies to GF100, but otherwise what I see matches your list. I guess > "few" was not entirely accurate. > >> > It is probably easiest to just ignore it. You can distinguish this >> > decompress engine from normal copy engine by looking at the CE capability >> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is >> > a decompress engine. >> >> I presume you mean a +0x650 register on the pcopy engines (0x104000 >> and 0x105000). I only have access to the GF108 right now, which >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at >> 0x104000 for copy on the GF108... > > Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. > > FWIW, the other capability bits are: > bit 0: "DMACOPY_SUPPORTED" > bit 1: "PIXREMAP_SUPPORTED" > > (I think PIXREMAP_SUPPORTED is in reference to the component remapping > controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the > copy engine class). Neat. We went around and grabbed that 0x650 register on a bunch of GPUs, see the CE* columns at: http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family It looks like it's actually returning 0 on both "copy" engines for a bunch of those cards -- GF100, GF104, GF114, probably GF110. But other cards have them as either 3 or 4. I'm guessing that '0' should be treated as if it were a '3' (or a '7')? Curiously, a GF116 card that I thought was working fine on nouveau actually has 3 for the first engine and 4 for the second. Perhaps it just had enough VRAM that I never triggered the conditions required for nouveau to use that second copy engine (we use it, when available, for drm-initiated buffer moves). > >> From my admittedly limited understanding, both 0x104000 and 0x105000 >> appear to be falcon engines, where the fuc is presumably able to drive >> some underlying hardware. The actual fifo methods are implemented in >> the fuc, which in turn does iowr/etc commands. >> >> Are you saying that the "decompress" engine (at 0x105000 right?) has a >> different piece of hardware behind it than the copy engine at >> 0x104000, or does NVIDIA simply provide different fuc for it that >> exposes somewhat different functionality via FIFO methods? > > There is definitely a falcon at the frontend, and there is different > falcon ucode for "normal" copy engine versus the "decompress" engine. > But, I don't know off hand what dedicated hardware, if any, is behind it. Seems likely that the HW is different, since it'd be madness to try to do decompression in the falcon code itself. (Not to say that the ISA isn't suited to it, just they have relatively slow clocks.) mwk is in the process of working it all out. -ilia _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <CAKb7Uvh1dw4OfPsB9gzjq-En7eFaek+efo2N2dGSRj+xPJAw+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <CAKb7Uvh1dw4OfPsB9gzjq-En7eFaek+efo2N2dGSRj+xPJAw+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2014-11-25 21:05 ` Andy Ritger [not found] ` <20141125210520.GF32262-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 10+ messages in thread From: Andy Ritger @ 2014-11-25 21:05 UTC (permalink / raw) To: Ilia Mirkin Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On Tue, Nov 25, 2014 at 10:57:44AM -0500, Ilia Mirkin wrote: > On Mon, Nov 24, 2014 at 8:33 PM, Andy Ritger <aritger@nvidia.com> wrote: > > On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote: > >> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: > >> > Hi Ilia, > >> > > >> > Actually 0x90b8 is different than copy engine. I'm not very familiar > >> > with it, but 0x90b8 is an engine for performing LZO decompression as > >> > part of performing the copy. It has a variety of limitations (e.g., > >> > cannot handle blocklinear format), and was only in a few Fermi chips, > >> > as I understand it. > >> > >> According to our driver source, GF100, GF104, GF110, GF114, and GF116 > >> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only > >> had problems reported against GF116... and only for some people. > > > > Hmm, some of our internal documentation is inconsistent about whether it > > applies to GF100, but otherwise what I see matches your list. I guess > > "few" was not entirely accurate. > > > >> > It is probably easiest to just ignore it. You can distinguish this > >> > decompress engine from normal copy engine by looking at the CE capability > >> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is > >> > a decompress engine. > >> > >> I presume you mean a +0x650 register on the pcopy engines (0x104000 > >> and 0x105000). I only have access to the GF108 right now, which > >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at > >> 0x104000 for copy on the GF108... > > > > Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. > > > > FWIW, the other capability bits are: > > bit 0: "DMACOPY_SUPPORTED" > > bit 1: "PIXREMAP_SUPPORTED" > > > > (I think PIXREMAP_SUPPORTED is in reference to the component remapping > > controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the > > copy engine class). > > Neat. We went around and grabbed that 0x650 register on a bunch of > GPUs, see the CE* columns at: > > http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family I don't see the 0x650 register values on that page. Maybe I'm not looking at the right place? > It looks like it's actually returning 0 on both "copy" engines for a > bunch of those cards -- GF100, GF104, GF114, probably GF110. But other > cards have them as either 3 or 4. I'm guessing that '0' should be > treated as if it were a '3' (or a '7')? That's curious. If I can get the table of where that reads zero, I can try to investigate how to interpret that. > Curiously, a GF116 card that I thought was working fine on nouveau > actually has 3 for the first engine and 4 for the second. Perhaps it > just had enough VRAM that I never triggered the conditions required > for nouveau to use that second copy engine (we use it, when available, > for drm-initiated buffer moves). Interesting. Would that explain why this hasn't manifested on configs other than the GF116 user reports? Thanks, - Andy > >> From my admittedly limited understanding, both 0x104000 and 0x105000 > >> appear to be falcon engines, where the fuc is presumably able to drive > >> some underlying hardware. The actual fifo methods are implemented in > >> the fuc, which in turn does iowr/etc commands. > >> > >> Are you saying that the "decompress" engine (at 0x105000 right?) has a > >> different piece of hardware behind it than the copy engine at > >> 0x104000, or does NVIDIA simply provide different fuc for it that > >> exposes somewhat different functionality via FIFO methods? > > > > There is definitely a falcon at the frontend, and there is different > > falcon ucode for "normal" copy engine versus the "decompress" engine. > > But, I don't know off hand what dedicated hardware, if any, is behind it. > > Seems likely that the HW is different, since it'd be madness to try to > do decompression in the falcon code itself. (Not to say that the ISA > isn't suited to it, just they have relatively slow clocks.) mwk is in > the process of working it all out. > > -ilia _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <20141125210520.GF32262-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <20141125210520.GF32262-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> @ 2014-11-25 21:12 ` Ilia Mirkin 2014-11-26 1:18 ` Marcin Kościelnicki 1 sibling, 0 replies; 10+ messages in thread From: Ilia Mirkin @ 2014-11-25 21:12 UTC (permalink / raw) To: Andy Ritger Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On Tue, Nov 25, 2014 at 4:05 PM, Andy Ritger <aritger@nvidia.com> wrote: > On Tue, Nov 25, 2014 at 10:57:44AM -0500, Ilia Mirkin wrote: >> On Mon, Nov 24, 2014 at 8:33 PM, Andy Ritger <aritger@nvidia.com> wrote: >> > On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote: >> >> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: >> >> > Hi Ilia, >> >> > >> >> > Actually 0x90b8 is different than copy engine. I'm not very familiar >> >> > with it, but 0x90b8 is an engine for performing LZO decompression as >> >> > part of performing the copy. It has a variety of limitations (e.g., >> >> > cannot handle blocklinear format), and was only in a few Fermi chips, >> >> > as I understand it. >> >> >> >> According to our driver source, GF100, GF104, GF110, GF114, and GF116 >> >> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only >> >> had problems reported against GF116... and only for some people. >> > >> > Hmm, some of our internal documentation is inconsistent about whether it >> > applies to GF100, but otherwise what I see matches your list. I guess >> > "few" was not entirely accurate. >> > >> >> > It is probably easiest to just ignore it. You can distinguish this >> >> > decompress engine from normal copy engine by looking at the CE capability >> >> > register on falcon (0x00000650). If bit 2 is '1', then the falcon is >> >> > a decompress engine. >> >> >> >> I presume you mean a +0x650 register on the pcopy engines (0x104000 >> >> and 0x105000). I only have access to the GF108 right now, which >> >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at >> >> 0x104000 for copy on the GF108... >> > >> > Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. >> > >> > FWIW, the other capability bits are: >> > bit 0: "DMACOPY_SUPPORTED" >> > bit 1: "PIXREMAP_SUPPORTED" >> > >> > (I think PIXREMAP_SUPPORTED is in reference to the component remapping >> > controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the >> > copy engine class). >> >> Neat. We went around and grabbed that 0x650 register on a bunch of >> GPUs, see the CE* columns at: >> >> http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family > > I don't see the 0x650 register values on that page. Maybe I'm not > looking at the right place? No, you're looking in the right place. Someone who shall remain nameless killed something in the formatting... hopefully it'll get fixed shortly, but in the meanwhile: https://github.com/envytools/envytools/commit/5344c92108227ab7138d5130afc0203fa79b4f3c Look at the CE0/CE1 columns. > >> It looks like it's actually returning 0 on both "copy" engines for a >> bunch of those cards -- GF100, GF104, GF114, probably GF110. But other >> cards have them as either 3 or 4. I'm guessing that '0' should be >> treated as if it were a '3' (or a '7')? > > That's curious. If I can get the table of where that reads zero, I can > try to investigate how to interpret that. > >> Curiously, a GF116 card that I thought was working fine on nouveau >> actually has 3 for the first engine and 4 for the second. Perhaps it >> just had enough VRAM that I never triggered the conditions required >> for nouveau to use that second copy engine (we use it, when available, >> for drm-initiated buffer moves). > > Interesting. Would that explain why this hasn't manifested on configs > other than the GF116 user reports? Well, all the other GPU's where we try to use the secondary copy engine report 0 for both +0x650 registers. -ilia _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Second copy engine on GF116 [not found] ` <20141125210520.GF32262-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> 2014-11-25 21:12 ` Ilia Mirkin @ 2014-11-26 1:18 ` Marcin Kościelnicki [not found] ` <54752A61.4070303-mP9o5jsk0RY@public.gmane.org> 1 sibling, 1 reply; 10+ messages in thread From: Marcin Kościelnicki @ 2014-11-26 1:18 UTC (permalink / raw) To: Andy Ritger, Ilia Mirkin Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On 25/11/14 22:05, Andy Ritger wrote: > On Tue, Nov 25, 2014 at 10:57:44AM -0500, Ilia Mirkin wrote: >> On Mon, Nov 24, 2014 at 8:33 PM, Andy Ritger <aritger@nvidia.com> wrote: >>> On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote: >>>> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: >>>>> Hi Ilia, >>>>> >>>>> Actually 0x90b8 is different than copy engine. I'm not very familiar >>>>> with it, but 0x90b8 is an engine for performing LZO decompression as >>>>> part of performing the copy. It has a variety of limitations (e.g., >>>>> cannot handle blocklinear format), and was only in a few Fermi chips, >>>>> as I understand it. >>>> >>>> According to our driver source, GF100, GF104, GF110, GF114, and GF116 >>>> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only >>>> had problems reported against GF116... and only for some people. >>> >>> Hmm, some of our internal documentation is inconsistent about whether it >>> applies to GF100, but otherwise what I see matches your list. I guess >>> "few" was not entirely accurate. >>> >>>>> It is probably easiest to just ignore it. You can distinguish this >>>>> decompress engine from normal copy engine by looking at the CE capability >>>>> register on falcon (0x00000650). If bit 2 is '1', then the falcon is >>>>> a decompress engine. >>>> >>>> I presume you mean a +0x650 register on the pcopy engines (0x104000 >>>> and 0x105000). I only have access to the GF108 right now, which >>>> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at >>>> 0x104000 for copy on the GF108... >>> >>> Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. >>> >>> FWIW, the other capability bits are: >>> bit 0: "DMACOPY_SUPPORTED" >>> bit 1: "PIXREMAP_SUPPORTED" >>> >>> (I think PIXREMAP_SUPPORTED is in reference to the component remapping >>> controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the >>> copy engine class). >> >> Neat. We went around and grabbed that 0x650 register on a bunch of >> GPUs, see the CE* columns at: >> >> http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family > > I don't see the 0x650 register values on that page. Maybe I'm not > looking at the right place? The table at the bottom, CE0-CE2 columns. > >> It looks like it's actually returning 0 on both "copy" engines for a >> bunch of those cards -- GF100, GF104, GF114, probably GF110. But other >> cards have them as either 3 or 4. I'm guessing that '0' should be >> treated as if it were a '3' (or a '7')? > > That's curious. If I can get the table of where that reads zero, I can > try to investigate how to interpret that. GF100, GF110, GF104, GF114. Sounds obvious to me - the caps register wasn't needed before GF106 and thus didn't exist. I don't think there's any more need for information here - we know how to tell apart a decompression engine by the caps register, *and* we know which cards have it (GF106, GF116, GF108 - unless someone resurrected it on GKsomething or GMsomething). We also know the difference between a normal copy engine and a decompression engine (basically: all dedicated copy hw is missing and replaced by dedicated decompression hw - effectively a completely different engine). In fact, given the decomp engine's simplicity, it shouldn't be hard at all to write firmware for it. We are, however, quite curious about the purpose of an LZO1X decompression engine on a GPU... Fun fact, I knew of the existence of decompression engines for some time, but never managed to locate them - I guess I didn't consider copy engines to warrant a second look on all possible GPUs... Which brings me to ask: are there any more FIFO engines we somehow missed on Fermi+? There's apparently a new VIC class (0xa0b6), but I've never seen a VIC other than the MCP89 one (0x86b6). AFAICS there's also one unknown enum value in NVRM's FIFO engine enum... (I know of GRAPH, CE0, CE1, CE2, VP1/VP2/MSPDEC, MSRCH/ME, MSPPP, BSP/MSVLD/MSDEC, MPEG, SOFTWARE, CIPHER/SEC, VIC, MSENC). > >> Curiously, a GF116 card that I thought was working fine on nouveau >> actually has 3 for the first engine and 4 for the second. Perhaps it >> just had enough VRAM that I never triggered the conditions required >> for nouveau to use that second copy engine (we use it, when available, >> for drm-initiated buffer moves). > > Interesting. Would that explain why this hasn't manifested on configs > other than the GF116 user reports? > > Thanks, > - Andy > >>>> From my admittedly limited understanding, both 0x104000 and 0x105000 >>>> appear to be falcon engines, where the fuc is presumably able to drive >>>> some underlying hardware. The actual fifo methods are implemented in >>>> the fuc, which in turn does iowr/etc commands. >>>> >>>> Are you saying that the "decompress" engine (at 0x105000 right?) has a >>>> different piece of hardware behind it than the copy engine at >>>> 0x104000, or does NVIDIA simply provide different fuc for it that >>>> exposes somewhat different functionality via FIFO methods? >>> >>> There is definitely a falcon at the frontend, and there is different >>> falcon ucode for "normal" copy engine versus the "decompress" engine. >>> But, I don't know off hand what dedicated hardware, if any, is behind it. >> >> Seems likely that the HW is different, since it'd be madness to try to >> do decompression in the falcon code itself. (Not to say that the ISA >> isn't suited to it, just they have relatively slow clocks.) mwk is in >> the process of working it all out. >> >> -ilia > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau > _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
[parent not found: <54752A61.4070303-mP9o5jsk0RY@public.gmane.org>]
* Re: Second copy engine on GF116 [not found] ` <54752A61.4070303-mP9o5jsk0RY@public.gmane.org> @ 2014-11-27 1:05 ` Andy Ritger 0 siblings, 0 replies; 10+ messages in thread From: Andy Ritger @ 2014-11-27 1:05 UTC (permalink / raw) To: Marcin Kościelnicki Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation On Wed, Nov 26, 2014 at 02:18:25AM +0100, Marcin Kościelnicki wrote: [...] > >>http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family > > > >I don't see the 0x650 register values on that page. Maybe I'm not > >looking at the right place? > > The table at the bottom, CE0-CE2 columns. Thanks. > >>It looks like it's actually returning 0 on both "copy" engines for a > >>bunch of those cards -- GF100, GF104, GF114, probably GF110. But other > >>cards have them as either 3 or 4. I'm guessing that '0' should be > >>treated as if it were a '3' (or a '7')? > > > >That's curious. If I can get the table of where that reads zero, I can > >try to investigate how to interpret that. > > GF100, GF110, GF104, GF114. > > Sounds obvious to me - the caps register wasn't needed before GF106 > and thus didn't exist. > > I don't think there's any more need for information here - we know > how to tell apart a decompression engine by the caps register, *and* > we know which cards have it (GF106, GF116, GF108 - unless someone > resurrected it on GKsomething or GMsomething). I cannot find any information to suggest that the decompress engine exists on anything >= Kepler. I think your list of GPUs that had the decompress engine is accurate, and the capability register wasn't added until GF106 when the decompress engine was added. > We also know the > difference between a normal copy engine and a decompression engine > (basically: all dedicated copy hw is missing and replaced by > dedicated decompression hw - effectively a completely different > engine). In fact, given the decomp engine's simplicity, it shouldn't > be hard at all to write firmware for it. Enjoy :) > We are, however, quite curious about the purpose of an LZO1X > decompression engine on a GPU... From what I can tell, the motivation was to better utilize bandwidth across limited PCIe buses (e.g., PCIe 1x configurations, like on some notebooks). I don't believe we ever attempted to use it in the OpenGL driver, but the DX driver tried it. I think the intent was for the driver to compress content in sysmem using the CPU, then use the decompress engine to transfer to vidmem and decompress inflight. I don't know LZO compression performance characteristics, but I'm a little suspicious of that CPU/bandwidth tradeoff. Anyway, it seems like it was somewhat of a failed experiment and we eventually gave up on the decompress engine. > Fun fact, I knew of the existence of decompression engines for some > time, but never managed to locate them - I guess I didn't consider > copy engines to warrant a second look on all possible GPUs... :) > Which brings me to ask: are there any more FIFO engines we somehow > missed on Fermi+? There's apparently a new VIC class (0xa0b6), but > I've never seen a VIC other than the MCP89 one (0x86b6). VIC is supposed to be pretty good for reducing power usage. I don't have first hand experience programming it. I don't know why it wasn't included in any other subsequent GPU, but it made a rebirth for Tegra (I'm pretty sure it is in Tegra K1). Off hand, I'm not sure if it saw any method interface changes between MCP and Tegra. We haven't implemented anything to take advantage of it in the proprietary X driver (yet? -- we'll probably need to eventually), but I'm pretty sure it is used somewhere in the Android stack. > AFAICS there's also one unknown enum value in NVRM's FIFO engine > enum... (I know of GRAPH, CE0, CE1, CE2, VP1/VP2/MSPDEC, MSRCH/ME, > MSPPP, BSP/MSVLD/MSDEC, MPEG, SOFTWARE, CIPHER/SEC, VIC, MSENC). I'll see what information I can dig up. Thanks, - Andy > >>Curiously, a GF116 card that I thought was working fine on nouveau > >>actually has 3 for the first engine and 4 for the second. Perhaps it > >>just had enough VRAM that I never triggered the conditions required > >>for nouveau to use that second copy engine (we use it, when available, > >>for drm-initiated buffer moves). > > > >Interesting. Would that explain why this hasn't manifested on configs > >other than the GF116 user reports? > > > >Thanks, > >- Andy > > > >>>> From my admittedly limited understanding, both 0x104000 and 0x105000 > >>>>appear to be falcon engines, where the fuc is presumably able to drive > >>>>some underlying hardware. The actual fifo methods are implemented in > >>>>the fuc, which in turn does iowr/etc commands. > >>>> > >>>>Are you saying that the "decompress" engine (at 0x105000 right?) has a > >>>>different piece of hardware behind it than the copy engine at > >>>>0x104000, or does NVIDIA simply provide different fuc for it that > >>>>exposes somewhat different functionality via FIFO methods? > >>> > >>>There is definitely a falcon at the frontend, and there is different > >>>falcon ucode for "normal" copy engine versus the "decompress" engine. > >>>But, I don't know off hand what dedicated hardware, if any, is behind it. > >> > >>Seems likely that the HW is different, since it'd be madness to try to > >>do decompression in the falcon code itself. (Not to say that the ISA > >>isn't suited to it, just they have relatively slow clocks.) mwk is in > >>the process of working it all out. > >> > >> -ilia > >_______________________________________________ > >Nouveau mailing list > >Nouveau@lists.freedesktop.org > >http://lists.freedesktop.org/mailman/listinfo/nouveau > > > _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Second copy engine on GF116 [not found] ` <20141125013301.GL22016-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org> 2014-11-25 15:57 ` Ilia Mirkin @ 2014-11-25 18:28 ` Marcin Kościelnicki 1 sibling, 0 replies; 10+ messages in thread From: Marcin Kościelnicki @ 2014-11-25 18:28 UTC (permalink / raw) To: Andy Ritger, Ilia Mirkin Cc: nouveau-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, gpu-public-documentation For what it's worth, I managed to get the engine to work in the simplest mode (ie. decompressing LZO1X bytestream). Triggering the operation is dead simple, and the whole thing is done in hw: 1. Destination and source have to be 0x100-byte aligned 2. Destination bufffer length is in bytes, but it's rounded up to a multiple of 0x100 3. Poke source address >> 8 to base+0xa00 4. Poke source length, in bytes, to base+0xa04 5. Poke destination address >> 8 to base+0xa20 6. Poke destination buffer length, in bytes, to base+0xa24 7. Poke 1 to base+0xa1c However, I haven't figured out error handling, or other operation modes (there is at least one, judging by nv hardware - raw copy without decompression, perhaps?). The whole thing has a grand total of 17 MMIO registers, 9 of them writable. Shouldn't be that hard to figure it out... Marcin Kościelnicki On 25/11/14 02:33, Andy Ritger wrote: > On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote: >> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger@nvidia.com> wrote: >>> Hi Ilia, >>> >>> Actually 0x90b8 is different than copy engine. I'm not very familiar >>> with it, but 0x90b8 is an engine for performing LZO decompression as >>> part of performing the copy. It has a variety of limitations (e.g., >>> cannot handle blocklinear format), and was only in a few Fermi chips, >>> as I understand it. >> >> According to our driver source, GF100, GF104, GF110, GF114, and GF116 >> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only >> had problems reported against GF116... and only for some people. > > Hmm, some of our internal documentation is inconsistent about whether it > applies to GF100, but otherwise what I see matches your list. I guess > "few" was not entirely accurate. > >>> It is probably easiest to just ignore it. You can distinguish this >>> decompress engine from normal copy engine by looking at the CE capability >>> register on falcon (0x00000650). If bit 2 is '1', then the falcon is >>> a decompress engine. >> >> I presume you mean a +0x650 register on the pcopy engines (0x104000 >> and 0x105000). I only have access to the GF108 right now, which >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at >> 0x104000 for copy on the GF108... > > Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell. > > FWIW, the other capability bits are: > bit 0: "DMACOPY_SUPPORTED" > bit 1: "PIXREMAP_SUPPORTED" > > (I think PIXREMAP_SUPPORTED is in reference to the component remapping > controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the > copy engine class). > >> From my admittedly limited understanding, both 0x104000 and 0x105000 >> appear to be falcon engines, where the fuc is presumably able to drive >> some underlying hardware. The actual fifo methods are implemented in >> the fuc, which in turn does iowr/etc commands. >> >> Are you saying that the "decompress" engine (at 0x105000 right?) has a >> different piece of hardware behind it than the copy engine at >> 0x104000, or does NVIDIA simply provide different fuc for it that >> exposes somewhat different functionality via FIFO methods? > > There is definitely a falcon at the frontend, and there is different > falcon ucode for "normal" copy engine versus the "decompress" engine. > But, I don't know off hand what dedicated hardware, if any, is behind it. > > - Andy > > >>> >>> I hope that helps, >>> - Andy >>> >>> >>> On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote: >>>> Hello, >>>> >>>> There's a long-standing bug on nouveau (this is a sample bug, but the >>>> issue has been around for a while: >>>> https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt >>>> to use the second PCOPY engine on GF116, and it is sometimes does >>>> nothing, despite mmio register 22500 saying that it's not disabled >>>> (0x22500 == 0 for this user). In the bug you can see a dump from >>>> 22400..22600, and all values after 22440 are read as 0. The issue >>>> appears to be more common on mobile GF116's, but I don't know that the >>>> correlation is 100%. No errors are reported by the FIFO or invalid >>>> mmio reads, but the data transfer just does not happen. Switching to >>>> using the first copy engine resolves things, so it's unlikely to be a >>>> more systemic issue in nouveau's usage of the copy engine. >>>> >>>> To be clear, when I'm talking about the second PCOPY engine, I'm >>>> talking about the engine at mmio 0x105000, and whose fifo class id is >>>> 0x90b8. >>>> >>>> Any information on properly detecting that the engine is, in fact, >>>> missing, would be greatly appreciated. Or, conversely, an assurance >>>> that the engine _is_ there on all GF116's and we're just not >>>> initializing something properly, along with perhaps some suggestions >>>> as to what we might be missing. >>>> >>>> Thanks, >>>> >>>> Ilia Mirkin >>>> imirkin@alum.mit.edu >>>> _______________________________________________ >>>> Nouveau mailing list >>>> Nouveau@lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/nouveau > _______________________________________________ > Nouveau mailing list > Nouveau@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/nouveau > _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-11-27 1:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-20 19:18 Second copy engine on GF116 Ilia Mirkin
[not found] ` <CAKb7UvjB4fY+7eERavM=dZ5HYX+=CwHKyFkm3Px=j-7Ap38ZCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-21 6:16 ` Andy Ritger
[not found] ` <20141121061656.GA897-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org>
2014-11-21 6:39 ` Ilia Mirkin
[not found] ` <CAKb7UviMqzsBbbJBmTFH+Bu2+uTv=oOK2w3CWeCovBfsBys8wA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-25 1:33 ` Andy Ritger
[not found] ` <20141125013301.GL22016-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org>
2014-11-25 15:57 ` Ilia Mirkin
[not found] ` <CAKb7Uvh1dw4OfPsB9gzjq-En7eFaek+efo2N2dGSRj+xPJAw+w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-11-25 21:05 ` Andy Ritger
[not found] ` <20141125210520.GF32262-4K9zQNqW3/fFT5IIyIEb6QC/G2K4zDHf@public.gmane.org>
2014-11-25 21:12 ` Ilia Mirkin
2014-11-26 1:18 ` Marcin Kościelnicki
[not found] ` <54752A61.4070303-mP9o5jsk0RY@public.gmane.org>
2014-11-27 1:05 ` Andy Ritger
2014-11-25 18:28 ` Marcin Kościelnicki
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.