public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] drm/nvdla: Add driver support for NVDLA
@ 2022-04-19 13:58 Cai Huoqing
  2022-04-19 13:58 ` [PATCH 1/2] MAINTAINERS: Add the driver info of the NVDLA Cai Huoqing
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Cai Huoqing @ 2022-04-19 13:58 UTC (permalink / raw)
  To: cai.huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Sumit Semwal, Christian König, linux-kernel,
	dri-devel, linux-media, linaro-mm-sig

The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
which is integrated into NVIDIA Jetson AGX Xavier,
so add driver support for this accelerator.

NVDLA introduce:
http://nvdla.org/primer.html

User mode driver:
https://github.com/caihuoq/nvdla/tree/main/sw/umd


Cai Huoqing (2):
  MAINTAINERS: Add the driver info of the NVDLA
  drm/nvdla: Add driver support for NVDLA

 MAINTAINERS                             |    7 +
 drivers/gpu/drm/Kconfig                 |    2 +
 drivers/gpu/drm/Makefile                |    1 +
 drivers/gpu/drm/nvdla/Kconfig           |    8 +
 drivers/gpu/drm/nvdla/Makefile          |   19 +
 drivers/gpu/drm/nvdla/nvdla_bdma.c      |  200 +
 drivers/gpu/drm/nvdla/nvdla_cache.c     |  215 +
 drivers/gpu/drm/nvdla/nvdla_cdp.c       |  300 ++
 drivers/gpu/drm/nvdla/nvdla_common.c    |  295 ++
 drivers/gpu/drm/nvdla/nvdla_common.h    |  835 +++
 drivers/gpu/drm/nvdla/nvdla_conv.c      |  683 +++
 drivers/gpu/drm/nvdla/nvdla_drm.c       |  695 +++
 drivers/gpu/drm/nvdla/nvdla_drm.h       |  127 +
 drivers/gpu/drm/nvdla/nvdla_engine.c    |  233 +
 drivers/gpu/drm/nvdla/nvdla_engine.h    |  272 +
 drivers/gpu/drm/nvdla/nvdla_gem.c       |  393 ++
 drivers/gpu/drm/nvdla/nvdla_ioctl.h     |   99 +
 drivers/gpu/drm/nvdla/nvdla_pdp.c       |  446 ++
 drivers/gpu/drm/nvdla/nvdla_reg.h       | 6411 +++++++++++++++++++++++
 drivers/gpu/drm/nvdla/nvdla_rubik.c     |  217 +
 drivers/gpu/drm/nvdla/nvdla_sched.h     |   52 +
 drivers/gpu/drm/nvdla/nvdla_scheduler.c | 1005 ++++
 drivers/gpu/drm/nvdla/nvdla_sdp.c       |  728 +++
 23 files changed, 13243 insertions(+)
 create mode 100644 drivers/gpu/drm/nvdla/Kconfig
 create mode 100644 drivers/gpu/drm/nvdla/Makefile
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_bdma.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_cache.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_cdp.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_conv.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_gem.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_ioctl.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_pdp.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_reg.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_rubik.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_sched.h
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_scheduler.c
 create mode 100644 drivers/gpu/drm/nvdla/nvdla_sdp.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] MAINTAINERS: Add the driver info of the NVDLA
  2022-04-19 13:58 [PATCH 0/2] drm/nvdla: Add driver support for NVDLA Cai Huoqing
@ 2022-04-19 13:58 ` Cai Huoqing
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
  2022-04-26  5:20 ` [PATCH 0/2] " Peter Robinson
  2 siblings, 0 replies; 15+ messages in thread
From: Cai Huoqing @ 2022-04-19 13:58 UTC (permalink / raw)
  To: cai.huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Sumit Semwal, Christian König, linux-kernel,
	dri-devel, linux-media, linaro-mm-sig

The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
which is integrated into Jetson AGX Xavier. After adding the driver
support for it, I add the driver info of the NVDLA to MAINTAINERS file.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 452f3662e5ac..0e5464d61a6d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6485,6 +6485,13 @@ T:	git git://anongit.freedesktop.org/drm/drm-misc
 F:	Documentation/devicetree/bindings/display/panel/samsung,lms380kf01.yaml
 F:	drivers/gpu/drm/panel/panel-widechips-ws2401.c
 
+DRM DRIVER FOR NVDLA
+M:	Cai Huoqing <cai.huoqing@linux.dev>
+L:	dri-devel@lists.freedesktop.org
+S:	Maintained
+T:	git git://anongit.freedesktop.org/drm/drm-misc
+F:	drivers/gpu/drm/nvdla/
+
 DRM DRIVERS
 M:	David Airlie <airlied@linux.ie>
 M:	Daniel Vetter <daniel@ffwll.ch>
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
@ 2022-04-19 14:07   ` Christian König
  2022-04-19 14:35     ` Cai Huoqing
  2022-04-20  7:53   ` kernel test robot
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Christian König @ 2022-04-19 14:07 UTC (permalink / raw)
  To: Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Sumit Semwal, linux-kernel, dri-devel, linux-media,
	linaro-mm-sig

Am 19.04.22 um 15:59 schrieb Cai Huoqing:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add driver support for this accelerator.
>
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
>

Well doesn't looks so bad on first glance (regarding coding style etc..)

But am I blind or isn't there any UAPI for the driver? I mean adding a 
DRM driver without any change to include/uapi/drm is really odd.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-19 14:07   ` [PATCH 2/2] drm/nvdla: Add driver support for NVDLA Christian König
@ 2022-04-19 14:35     ` Cai Huoqing
  0 siblings, 0 replies; 15+ messages in thread
From: Cai Huoqing @ 2022-04-19 14:35 UTC (permalink / raw)
  To: Christian König
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Sumit Semwal, linux-kernel, dri-devel, linux-media,
	linaro-mm-sig

On 19 4月 22 16:07:44, Christian König wrote:
> Am 19.04.22 um 15:59 schrieb Cai Huoqing:
> > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > which is integrated into NVIDIA Jetson AGX Xavier,
> > so add driver support for this accelerator.
> > 
> > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> > 
> 
> Well doesn't looks so bad on first glance (regarding coding style etc..)
> 
> But am I blind or isn't there any UAPI for the driver? I mean adding a DRM
> driver without any change to include/uapi/drm is really odd.
thanks for your reply, I will rename nvdla_ioctl.h which is UAPI headfile,
then put it to include/uapi/drm.

thanks,
Cai
> 
> Regards,
> Christian.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
  2022-04-19 14:07   ` [PATCH 2/2] drm/nvdla: Add driver support for NVDLA Christian König
@ 2022-04-20  7:53   ` kernel test robot
  2022-04-20  9:57   ` kernel test robot
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2022-04-20  7:53 UTC (permalink / raw)
  To: Cai Huoqing
  Cc: kbuild-all, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sumit Semwal, Christian König,
	linux-kernel, dri-devel, linux-media, linaro-mm-sig

Hi Cai,

I love your patch! Yet something to improve:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on drm-intel/for-linux-next drm-tip/drm-tip linus/master v5.18-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220419-220255
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: h8300-randconfig-r014-20220420 (https://download.01.org/0day-ci/archive/20220420/202204201512.pp20MXT5-lkp@intel.com/config)
compiler: h8300-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/7539e5487eb7d0c6f13c03bba596e51a2238106d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220419-220255
        git checkout 7539e5487eb7d0c6f13c03bba596e51a2238106d
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=h8300 SHELL=/bin/bash drivers/gpu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

>> make[5]: *** No rule to make target 'drivers/gpu/drm/nvdla/nvdla_engine_data.o', needed by 'drivers/gpu/drm/nvdla/built-in.a'.
>> make[5]: *** No rule to make target 'drivers/gpu/drm/nvdla/nvdla_engine_debug.o', needed by 'drivers/gpu/drm/nvdla/built-in.a'.
   make[5]: Target '__build' not remade because of errors.
--
>> drivers/gpu/drm/nvdla/nvdla_drm.c:45:9: warning: no previous prototype for 'dla_get_time_us' [-Wmissing-prototypes]
      45 | int64_t dla_get_time_us(void)
         |         ^~~~~~~~~~~~~~~
   drivers/gpu/drm/nvdla/nvdla_drm.c: In function 'nvdla_engine_isr':
>> drivers/gpu/drm/nvdla/nvdla_drm.c:75:18: warning: variable 'mask' set but not used [-Wunused-but-set-variable]
      75 |         uint32_t mask;
         |                  ^~~~
--
   drivers/gpu/drm/nvdla/nvdla_gem.c: In function 'nvdla_fill_task_desc':
>> drivers/gpu/drm/nvdla/nvdla_gem.c:39:17: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
      39 |                 (void __user *)local_task->address_list,
         |                 ^
--
   drivers/gpu/drm/nvdla/nvdla_scheduler.c: In function 'dla_op_completion':
>> drivers/gpu/drm/nvdla/nvdla_scheduler.c:513:26: warning: variable 'task' set but not used [-Wunused-but-set-variable]
     513 |         struct dla_task *task;
         |                          ^~~~
--
>> drivers/gpu/drm/nvdla/nvdla_scheduler.c:363: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Dequeue next operation of same type from list of operations
   drivers/gpu/drm/nvdla/nvdla_scheduler.c:505: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Handle operation completion notification
   drivers/gpu/drm/nvdla/nvdla_scheduler.c:610: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Read network configuration from DRAM, network descriptor address
   drivers/gpu/drm/nvdla/nvdla_scheduler.c:920: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Execute task selected by task scheduler
--
   drivers/gpu/drm/nvdla/nvdla_bdma.c: In function 'dla_bdma_dump_config':
>> drivers/gpu/drm/nvdla/nvdla_bdma.c:157:39: warning: variable 'bdma_surface' set but not used [-Wunused-but-set-variable]
     157 |         struct dla_bdma_surface_desc *bdma_surface;
         |                                       ^~~~~~~~~~~~
>> drivers/gpu/drm/nvdla/nvdla_bdma.c:156:34: warning: variable 'bdma_op' set but not used [-Wunused-but-set-variable]
     156 |         struct dla_bdma_op_desc *bdma_op;
         |                                  ^~~~~~~
--
   drivers/gpu/drm/nvdla/nvdla_conv.c: In function 'dla_conv_dump_config':
>> drivers/gpu/drm/nvdla/nvdla_conv.c:666:39: warning: variable 'conv_surface' set but not used [-Wunused-but-set-variable]
     666 |         struct dla_conv_surface_desc *conv_surface;
         |                                       ^~~~~~~~~~~~
>> drivers/gpu/drm/nvdla/nvdla_conv.c:665:34: warning: variable 'conv_op' set but not used [-Wunused-but-set-variable]
     665 |         struct dla_conv_op_desc *conv_op;
         |                                  ^~~~~~~
--
>> drivers/gpu/drm/nvdla/nvdla_engine.c:67: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Get DMA data cube address
   drivers/gpu/drm/nvdla/nvdla_engine.c:88: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Read input buffer address
--
>> drivers/gpu/drm/nvdla/nvdla_bdma.c:56: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * Program BDMA slot for transfer
--
   drivers/gpu/drm/nvdla/nvdla_sdp.c: In function 'processor_sdp_program':
>> drivers/gpu/drm/nvdla/nvdla_sdp.c:190:18: warning: variable 'atom_size' set but not used [-Wunused-but-set-variable]
     190 |         uint32_t atom_size;
         |                  ^~~~~~~~~
   drivers/gpu/drm/nvdla/nvdla_sdp.c: In function 'dla_sdp_dump_config':
>> drivers/gpu/drm/nvdla/nvdla_sdp.c:708:38: warning: variable 'sdp_surface' set but not used [-Wunused-but-set-variable]
     708 |         struct dla_sdp_surface_desc *sdp_surface;
         |                                      ^~~~~~~~~~~
>> drivers/gpu/drm/nvdla/nvdla_sdp.c:707:33: warning: variable 'sdp_op' set but not used [-Wunused-but-set-variable]
     707 |         struct dla_sdp_op_desc *sdp_op;
         |                                 ^~~~~~
   At top level:
   drivers/gpu/drm/nvdla/nvdla_sdp.c:118:22: warning: 'map_perf_nan_inf' defined but not used [-Wunused-const-variable=]
     118 | static const uint8_t map_perf_nan_inf[] = {
         |                      ^~~~~~~~~~~~~~~~
   drivers/gpu/drm/nvdla/nvdla_sdp.c:113:22: warning: 'map_perf_sat' defined but not used [-Wunused-const-variable=]
     113 | static const uint8_t map_perf_sat[] = {
         |                      ^~~~~~~~~~~~
   drivers/gpu/drm/nvdla/nvdla_sdp.c:108:22: warning: 'map_perf_lut' defined but not used [-Wunused-const-variable=]
     108 | static const uint8_t map_perf_lut[] = {
         |                      ^~~~~~~~~~~~
   drivers/gpu/drm/nvdla/nvdla_sdp.c:103:22: warning: 'map_perf_dma' defined but not used [-Wunused-const-variable=]
     103 | static const uint8_t map_perf_dma[] = {
         |                      ^~~~~~~~~~~~
--
   drivers/gpu/drm/nvdla/nvdla_cdp.c: In function 'dla_cdp_dump_config':
>> drivers/gpu/drm/nvdla/nvdla_cdp.c:280:38: warning: variable 'cdp_surface' set but not used [-Wunused-but-set-variable]
     280 |         struct dla_cdp_surface_desc *cdp_surface;
         |                                      ^~~~~~~~~~~
>> drivers/gpu/drm/nvdla/nvdla_cdp.c:279:33: warning: variable 'cdp_op' set but not used [-Wunused-but-set-variable]
     279 |         struct dla_cdp_op_desc *cdp_op;
         |                                 ^~~~~~
   At top level:
   drivers/gpu/drm/nvdla/nvdla_cdp.c:28:22: warning: 'map_perf_lut' defined but not used [-Wunused-const-variable=]
      28 | static const uint8_t map_perf_lut[] = {
         |                      ^~~~~~~~~~~~
   drivers/gpu/drm/nvdla/nvdla_cdp.c:23:22: warning: 'map_perf_dma' defined but not used [-Wunused-const-variable=]
      23 | static const uint8_t map_perf_dma[] = {
         |                      ^~~~~~~~~~~~
..


vim +/dla_get_time_us +45 drivers/gpu/drm/nvdla/nvdla_drm.c

    44	
  > 45	int64_t dla_get_time_us(void)
    46	{
    47		return ktime_get_ns() / NSEC_PER_USEC;
    48	}
    49	
    50	void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
    51	{
    52		struct nvdla_device *nvdla_dev =
    53				(struct nvdla_device *)driver_context;
    54	
    55		if (!nvdla_dev)
    56			return;
    57	
    58		writel(reg, nvdla_dev->base + addr);
    59	}
    60	
    61	uint32_t dla_reg_read(void *driver_context, uint32_t addr)
    62	{
    63		struct nvdla_device *nvdla_dev =
    64				(struct nvdla_device *)driver_context;
    65	
    66		if (!nvdla_dev)
    67			return 0;
    68	
    69		return readl(nvdla_dev->base + addr);
    70	}
    71	
    72	static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
    73	{
    74		unsigned long flags;
  > 75		uint32_t mask;
    76		uint32_t reg;
    77		struct dla_processor *processor = NULL;
    78		struct dla_processor_group *group;
    79		struct dla_engine *engine;
    80		struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
    81	
    82		if (!nvdla_dev)
    83			return IRQ_NONE;
    84	
    85		engine = nvdla_dev->engine_context;
    86		spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
    87	
    88		mask = glb_reg_read(engine, S_INTR_MASK);
    89		reg = glb_reg_read(engine, S_INTR_STATUS);
    90	
    91		if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
    92			processor = &engine->processors[DLA_OP_CONV];
    93			group = &processor->groups[0];
    94			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
    95		}
    96		if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
    97			processor = &engine->processors[DLA_OP_CONV];
    98			group = &processor->groups[1];
    99			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   100		}
   101		if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
   102			processor = &engine->processors[DLA_OP_SDP];
   103			group = &processor->groups[0];
   104			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   105		}
   106		if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
   107			processor = &engine->processors[DLA_OP_SDP];
   108			group = &processor->groups[1];
   109			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   110		}
   111		if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
   112			processor = &engine->processors[DLA_OP_CDP];
   113			group = &processor->groups[0];
   114			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   115		}
   116		if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
   117			processor = &engine->processors[DLA_OP_CDP];
   118			group = &processor->groups[1];
   119			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   120		}
   121		if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
   122			processor = &engine->processors[DLA_OP_RUBIK];
   123			group = &processor->groups[0];
   124			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   125		}
   126		if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
   127			processor = &engine->processors[DLA_OP_RUBIK];
   128			group = &processor->groups[1];
   129			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   130		}
   131		if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
   132			processor = &engine->processors[DLA_OP_PDP];
   133			group = &processor->groups[0];
   134			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   135		}
   136		if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
   137			processor = &engine->processors[DLA_OP_PDP];
   138			group = &processor->groups[1];
   139			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   140		}
   141		if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
   142			processor = &engine->processors[DLA_OP_BDMA];
   143			group = &processor->groups[0];
   144			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   145		}
   146		if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
   147			processor = &engine->processors[DLA_OP_BDMA];
   148			group = &processor->groups[1];
   149			group->events |= (1 << DLA_EVENT_OP_COMPLETED);
   150		}
   151		if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
   152			processor = &engine->processors[DLA_OP_CONV];
   153			group = &processor->groups[0];
   154			group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
   155		}
   156		if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
   157			processor = &engine->processors[DLA_OP_CONV];
   158			group = &processor->groups[1];
   159			group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
   160		}
   161		if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
   162			processor = &engine->processors[DLA_OP_CONV];
   163			group = &processor->groups[0];
   164			group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
   165		}
   166		if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
   167			processor = &engine->processors[DLA_OP_CONV];
   168			group = &processor->groups[1];
   169			group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
   170		}
   171	
   172		glb_reg_write(engine, S_INTR_STATUS, reg);
   173		mask = glb_reg_read(engine, S_INTR_MASK);
   174		reg = glb_reg_read(engine, S_INTR_STATUS);
   175	
   176		complete(&nvdla_dev->event_notifier);
   177		spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
   178	
   179		return IRQ_HANDLED;
   180	}
   181	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
  2022-04-19 14:07   ` [PATCH 2/2] drm/nvdla: Add driver support for NVDLA Christian König
  2022-04-20  7:53   ` kernel test robot
@ 2022-04-20  9:57   ` kernel test robot
  2022-04-21  8:30   ` Thomas Zimmermann
  2022-04-21 22:01   ` Kari Argillander
  4 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2022-04-20  9:57 UTC (permalink / raw)
  To: Cai Huoqing
  Cc: kbuild-all, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Daniel Vetter, Sumit Semwal, Christian König,
	linux-kernel, dri-devel, linux-media, linaro-mm-sig

Hi Cai,

I love your patch! Yet something to improve:

[auto build test ERROR on drm/drm-next]
[also build test ERROR on drm-intel/for-linux-next drm-tip/drm-tip linus/master v5.18-rc3 next-20220419]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220419-220255
base:   git://anongit.freedesktop.org/drm/drm drm-next
config: m68k-allmodconfig (https://download.01.org/0day-ci/archive/20220420/202204201710.5Gcg1PUu-lkp@intel.com/config)
compiler: m68k-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/7539e5487eb7d0c6f13c03bba596e51a2238106d
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Cai-Huoqing/drm-nvdla-Add-driver-support-for-NVDLA/20220419-220255
        git checkout 7539e5487eb7d0c6f13c03bba596e51a2238106d
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross W=1 O=build_dir ARCH=m68k SHELL=/bin/bash drivers/gpu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> make[5]: *** No rule to make target 'drivers/gpu/drm/nvdla/nvdla_engine_data.o', needed by 'drivers/gpu/drm/nvdla/nvdla-drm.o'.
>> make[5]: *** No rule to make target 'drivers/gpu/drm/nvdla/nvdla_engine_debug.o', needed by 'drivers/gpu/drm/nvdla/nvdla-drm.o'.
   make[5]: Target '__build' not remade because of errors.

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
                     ` (2 preceding siblings ...)
  2022-04-20  9:57   ` kernel test robot
@ 2022-04-21  8:30   ` Thomas Zimmermann
  2022-04-21  8:34     ` Christian König
  2022-04-21 22:01   ` Kari Argillander
  4 siblings, 1 reply; 15+ messages in thread
From: Thomas Zimmermann @ 2022-04-21  8:30 UTC (permalink / raw)
  To: Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sumit Semwal, Christian König, linux-kernel, dri-devel,
	linux-media, linaro-mm-sig


[-- Attachment #1.1: Type: text/plain, Size: 26456 bytes --]

(Resending, as some MLs didn't like the size of the origninal mail.)

Hi,

thanks for your submission. Some general comments:

   * some functions are prefixed with dla_, others use nvdla_. It seems 
arbitrary to me. Please use nvdla_ consistently throughout the source code.

   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
suggest to rearrange the error messages to not be located in the 
innermost functions.

   * Could you please split this patch into smaller pieces? It currently 
hits size limits of some mailing lists. Maybe add the register constants 
separately.

Please find more review comments below. It's not a full review, but at 
least something to start with.

Best regards
Thomas

Am 19.04.22 um 15:59 schrieb Cai Huoqing:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add driver support for this accelerator.
> 
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> ---
>   drivers/gpu/drm/Kconfig                 |    2 +
>   drivers/gpu/drm/Makefile                |    1 +
>   drivers/gpu/drm/nvdla/Kconfig           |    8 +
>   drivers/gpu/drm/nvdla/Makefile          |   19 +
>   drivers/gpu/drm/nvdla/nvdla_bdma.c      |  200 +
>   drivers/gpu/drm/nvdla/nvdla_cache.c     |  215 +
>   drivers/gpu/drm/nvdla/nvdla_cdp.c       |  300 ++
>   drivers/gpu/drm/nvdla/nvdla_common.c    |  295 ++
>   drivers/gpu/drm/nvdla/nvdla_common.h    |  835 +++
>   drivers/gpu/drm/nvdla/nvdla_conv.c      |  683 +++
>   drivers/gpu/drm/nvdla/nvdla_drm.c       |  695 +++
>   drivers/gpu/drm/nvdla/nvdla_drm.h       |  127 +
>   drivers/gpu/drm/nvdla/nvdla_engine.c    |  233 +
>   drivers/gpu/drm/nvdla/nvdla_engine.h    |  272 +
>   drivers/gpu/drm/nvdla/nvdla_gem.c       |  393 ++
>   drivers/gpu/drm/nvdla/nvdla_ioctl.h     |   99 +
>   drivers/gpu/drm/nvdla/nvdla_pdp.c       |  446 ++
>   drivers/gpu/drm/nvdla/nvdla_reg.h       | 6411 +++++++++++++++++++++++
>   drivers/gpu/drm/nvdla/nvdla_rubik.c     |  217 +
>   drivers/gpu/drm/nvdla/nvdla_sched.h     |   52 +
>   drivers/gpu/drm/nvdla/nvdla_scheduler.c | 1005 ++++
>   drivers/gpu/drm/nvdla/nvdla_sdp.c       |  728 +++
>   22 files changed, 13236 insertions(+)
>   create mode 100644 drivers/gpu/drm/nvdla/Kconfig
>   create mode 100644 drivers/gpu/drm/nvdla/Makefile
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_bdma.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_cache.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_cdp.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_conv.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_gem.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_ioctl.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_pdp.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_reg.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_rubik.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_sched.h
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_scheduler.c
>   create mode 100644 drivers/gpu/drm/nvdla/nvdla_sdp.c
> 
> diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
> index 5133c3f028ab..a55cff374abd 100644
> --- a/drivers/gpu/drm/Kconfig
> +++ b/drivers/gpu/drm/Kconfig
> @@ -409,6 +409,8 @@ source "drivers/gpu/drm/solomon/Kconfig"
>   
>   source "drivers/gpu/drm/sprd/Kconfig"
>   
> +source "drivers/gpu/drm/nvdla/Kconfig"
> +
>   config DRM_HYPERV
>   	tristate "DRM Support for Hyper-V synthetic video device"
>   	depends on DRM && PCI && MMU && HYPERV
> diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
> index c2ef5f9fce54..8fa3537f308a 100644
> --- a/drivers/gpu/drm/Makefile
> +++ b/drivers/gpu/drm/Makefile
> @@ -134,3 +134,4 @@ obj-y			+= gud/
>   obj-$(CONFIG_DRM_HYPERV) += hyperv/
>   obj-y			+= solomon/
>   obj-$(CONFIG_DRM_SPRD) += sprd/
> +obj-$(CONFIG_DRM_NVDLA) += nvdla/
> diff --git a/drivers/gpu/drm/nvdla/Kconfig b/drivers/gpu/drm/nvdla/Kconfig
> new file mode 100644
> index 000000000000..11c04f5da877
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/Kconfig
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config DRM_NVDLA
> +	tristate "NVDLA DRM"
> +	depends on DRM
> +	select DRM_GEM_CMA_HELPER
> +	help
> +	  Choose this option for open-source NVIDIA DLA support.
> +	  If M is selected the module will be called nvdla-drm.
> diff --git a/drivers/gpu/drm/nvdla/Makefile b/drivers/gpu/drm/nvdla/Makefile
> new file mode 100644
> index 000000000000..74f37d258f8d
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/Makefile
> @@ -0,0 +1,19 @@
> +
> +# SPDX-License-Identifier: GPL-2.0
> +nvdla-drm-y := \
> +	nvdla_drm.o \
> +	nvdla_gem.o \
> +	nvdla_scheduler.o \
> +	nvdla_engine.o \
> +	nvdla_bdma.o \
> +	nvdla_conv.o \
> +	nvdla_sdp.o \
> +	nvdla_cdp.o \
> +	nvdla_pdp.o \
> +	nvdla_rubik.o \
> +	nvdla_cache.o \
> +	nvdla_common.o \
> +	nvdla_engine_data.o \
> +	nvdla_engine_debug.o \

File names should be sorted alphabetically here.

> +
> +obj-$(CONFIG_DRM_NVDLA) += nvdla-drm.o
> diff --git a/drivers/gpu/drm/nvdla/nvdla_bdma.c b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> new file mode 100644
> index 000000000000..225613f27acf
> --- /dev/null
}


> diff --git a/drivers/gpu/drm/nvdla/nvdla_drm.c b/drivers/gpu/drm/nvdla/nvdla_drm.c
> new file mode 100644
> index 000000000000..9217eee1de3b
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_drm.c
> @@ -0,0 +1,695 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/time.h>
> +#include <linux/uaccess.h>
> +#include <linux/types.h>
> +
> +#include "nvdla_drm.h"
> +#include "nvdla_ioctl.h"
> +#include "nvdla_engine.h"
> +
> +static struct nvdla_config nvdla_config_os_initial = {
> +	.atom_size = 32,
> +	.bdma_enable = true,
> +	.rubik_enable = true,
> +	.weight_compress_support = true,
> +};
> +
> +static struct nvdla_config nvdla_config_small = {
> +	//.atom_size = 8,
> +	.atom_size = 32,  // nv_large config
> +	.bdma_enable = false,
> +	.rubik_enable = false,
> +	.weight_compress_support = false,
> +};
> +
> +int64_t dla_get_time_us(void)
> +{
> +	return ktime_get_ns() / NSEC_PER_USEC;
> +}
> +
> +void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return;
> +
> +	writel(reg, nvdla_dev->base + addr);
> +}
> +
> +uint32_t dla_reg_read(void *driver_context, uint32_t addr)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return 0;
> +
> +	return readl(nvdla_dev->base + addr);
> +}
> +
> +static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
> +{
> +	unsigned long flags;
> +	uint32_t mask;
> +	uint32_t reg;
> +	struct dla_processor *processor = NULL;
> +	struct dla_processor_group *group;
> +	struct dla_engine *engine;
> +	struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
> +
> +	if (!nvdla_dev)
> +		return IRQ_NONE;
> +
> +	engine = nvdla_dev->engine_context;
> +	spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
> +
> +	mask = glb_reg_read(engine, S_INTR_MASK);
> +	reg = glb_reg_read(engine, S_INTR_STATUS);
> +
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +
> +	glb_reg_write(engine, S_INTR_STATUS, reg);
> +	mask = glb_reg_read(engine, S_INTR_MASK);
> +	reg = glb_reg_read(engine, S_INTR_STATUS);
> +
> +	complete(&nvdla_dev->event_notifier);
> +	spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int32_t dla_read_dma_address(void *driver_context, void *task_data,
> +						int16_t index, void *dst)
> +{
> +	int32_t ret = 0;
> +	struct nvdla_mem_handle *handles;
> +	dma_addr_t *phys_addr = (dma_addr_t *)(dst);
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	if (index == -1 || index > task->num_addresses)
> +		return -EINVAL;
> +
> +	handles = (struct nvdla_mem_handle *)task->address_list;
> +	ret = nvdla_gem_dma_addr(nvdla_dev->drm, task->file,
> +					handles[index].handle,
> +					phys_addr);
> +
> +	/* Add offset to IOVA address */
> +	*phys_addr = *phys_addr + handles[index].offset;
> +
> +	return ret;
> +}
> +
> +static int32_t dla_read_cpu_address(void *driver_context, void *task_data,
> +						int16_t index, void *dst)
> +{
> +	uint64_t *temp = (uint64_t *)dst;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	if (index == -1 || index > task->num_addresses)
> +		return -EINVAL;
> +
> +	*temp = (uint64_t)index;
> +	return 0;
> +}
> +
> +int32_t dla_get_dma_address(void *driver_context, void *task_data,
> +					int16_t index, void *dst_ptr,
> +					uint32_t destination)
> +{
> +	int32_t ret = 0;
> +
> +	if (destination == DESTINATION_PROCESSOR) {
> +		ret = dla_read_cpu_address(driver_context, task_data,
> +						index, dst_ptr);
> +	} else if (destination == DESTINATION_DMA) {
> +		ret = dla_read_dma_address(driver_context, task_data,
> +						index, dst_ptr);
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +int32_t dla_data_write(void *driver_context, void *task_data,
> +				void *src, uint64_t dst,
> +				uint32_t size, uint64_t offset)
> +{
> +	int32_t ret;
> +	void *ptr = NULL;
> +	struct dma_buf *buf;
> +	struct iosys_map map;
> +	struct nvdla_mem_handle *handles;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	handles = task->address_list;
> +	buf = dma_buf_get(handles[dst].handle);
> +	if (IS_ERR(buf)) {
> +		pr_err("%s: Failed get dma_buf for handle=%d\n", __func__,
> +						handles[dst].handle);
> +		return -EFAULT;
> +	}
> +
> +	ret = dma_buf_begin_cpu_access(buf, DMA_BIDIRECTIONAL);
> +	if (ret)
> +		goto put_dma_buf;
> +
> +	ret = dma_buf_vmap(buf, &map);
> +	ptr = ret ? NULL : map.vaddr;

Never extract the pointer's address without good reason. You don't know 
if this points to a location in I/O memory.

> +	if (!ptr) {

Simply test for ret here.

> +		pr_err("%s: Failed to vmap dma_buf for handle=%d\n", __func__,
> +						handles[dst].handle);
> +		ret = -ENOMEM;

You already got an errno code. Don't override it.

> +		goto end_cpu_access;
> +	}
> +
> +
> +	memcpy((void *)((uint8_t *)ptr + offset), src, size);

Use iosys_map_memcpy_to() here.  It does the right thing

> +
> +	dma_buf_vunmap(buf, ptr);

You have to pass map as the second argument.

> +
> +end_cpu_access:
> +	dma_buf_end_cpu_access(buf, DMA_BIDIRECTIONAL);
> +
> +put_dma_buf:
> +	dma_buf_put(buf);
> +
> +	return ret;
> +}
> +
> +int32_t dla_data_read(void *driver_context, void *task_data,
> +				uint64_t src, void *dst,
> +				uint32_t size, uint64_t offset)
> +{
> +	int32_t ret;
> +	void *ptr = NULL;
> +	struct dma_buf *buf;
> +	struct iosys_map map;
> +	struct nvdla_mem_handle *handles;
> +	struct nvdla_task *task = (struct nvdla_task *)task_data;
> +
> +	handles = task->address_list;
> +
> +	buf = dma_buf_get(handles[src].handle);
> +	if (IS_ERR(buf)) {
> +		pr_err("%s: Failed get dma_buf for handle=%d\n", __func__,
> +						handles[src].handle);
> +		return -EFAULT;
> +	}
> +
> +	ret = dma_buf_begin_cpu_access(buf, DMA_BIDIRECTIONAL);
> +	if (ret)
> +		goto put_dma_buf;
> +
> +	ret = dma_buf_vmap(buf, &map);
> +	ptr = ret ? NULL : map.vaddr;
> +	if (!ptr) {
> +		pr_err("%s: Failed to vmap dma_buf for handle=%d\n", __func__,
> +						handles[src].handle);
> +		ret = -ENOMEM;
> +		goto end_cpu_access;
> +	}

All the same problems as in dla_data_write().

> +
> +	memcpy(dst, (void *)(((uint8_t *)ptr) + offset), size);

Use iosys_map_memcpy_from() here.

> +
> +	dma_buf_vunmap(buf, ptr);

'map' instead of 'ptr'

> +
> +end_cpu_access:
> +	dma_buf_end_cpu_access(buf, DMA_BIDIRECTIONAL);
> +
> +put_dma_buf:
> +	dma_buf_put(buf);
> +
> +	return ret;
> +}
> +

> +
> +static int32_t nvdla_submit(struct drm_device *drm, void *arg,
> +					struct drm_file *file)
> +{
> +	int32_t err = 0;
> +	struct nvdla_task *task;
> +	struct nvdla_ioctl_submit_task local_task;
> +	struct nvdla_ioctl_submit_task __user *user_task;
> +	struct nvdla_device *nvdla_dev = dev_get_drvdata(drm->dev);
> +	struct nvdla_submit_args *args =
> +			(struct nvdla_submit_args *)arg;
> +
> +	user_task = (struct nvdla_ioctl_submit_task __user *)
> +			(uintptr_t)args->tasks;
> +	if (!user_task)
> +		return -EINVAL;
> +
> +	/* IOCTL copy descriptors */
> +	if (copy_from_user(&local_task, (void __user *)user_task,
> +			(sizeof(*user_task))))
> +		return -EFAULT;
> +
> +	task = kzalloc(sizeof(*task), GFP_KERNEL);
> +	if (task == NULL)
> +		return -EFAULT;
> +
> +	nvdla_dev->task = task;
> +	kref_init(&task->ref);
> +	task->nvdla_dev = nvdla_dev;
> +	task->file = file;
> +
> +	/* update task desc fields */
> +	err = nvdla_fill_task_desc(&local_task, task);
> +	if (err)
> +		goto free_task_desc;
> +
> +	err = nvdla_task_submit(nvdla_dev, task);
> +
> +	kfree(task->address_list);
> +
> +free_task_desc:
> +	kfree(task);
> +	return err;
> +}
> +
> +static int32_t nvdla_gem_alloc(struct nvdla_gem_object *nobj)
> +{
> +	struct drm_gem_object *dobj = &nobj->object;
> +	struct drm_device *drm = dobj->dev;
> +
> +	nobj->dma_attrs = DMA_ATTR_WRITE_COMBINE;
> +
> +	nobj->kvaddr = dma_alloc_attrs(drm->dev, dobj->size, &nobj->dma_addr,
> +						GFP_KERNEL, nobj->dma_attrs);
> +

Store an iosys-map address in nobj and initialize it with 
iosys_map_set_vaddr(); or iosys_map_set_vaddr_iomem() if you're working 
with I/O memory.

> +	if (!nobj->kvaddr)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void nvdla_gem_free(struct nvdla_gem_object *nobj)
> +{
> +	struct drm_gem_object *dobj = &nobj->object;
> +	struct drm_device *drm = dobj->dev;
> +
> +	dma_free_attrs(drm->dev, dobj->size, nobj->kvaddr, nobj->dma_addr,
> +				nobj->dma_attrs);
> +}
> +
> +static void nvdla_gem_free_object(struct drm_gem_object *dobj)
> +{
> +	struct nvdla_gem_object *nobj;
> +
> +	drm_gem_free_mmap_offset(dobj);
> +
> +	nobj = to_nvdla_obj(dobj);
> +
> +	nvdla_gem_free(nobj);
> +
> +	kfree(nobj);
> +}
> +
> +static struct nvdla_gem_object *
> +nvdla_gem_create_object(struct drm_device *drm, uint32_t size)
> +{
> +	int32_t ret;
> +	struct drm_gem_object *dobj;
> +	struct nvdla_gem_object *nobj;
> +
> +	size = round_up(size, PAGE_SIZE);
> +
> +	nobj = kzalloc(sizeof(*nobj), GFP_KERNEL);
> +	if (!nobj)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dobj = &nobj->object;
> +
> +	drm_gem_private_object_init(drm, dobj, size);
> +
> +	ret = nvdla_gem_alloc(nobj);
> +	if (ret)
> +		goto free_nvdla_obj;
> +
> +	return nobj;
> +
> +free_nvdla_obj:
> +	kfree(nobj);
> +	return ERR_PTR(ret);
> +}
> +
> +static struct sg_table*
> +nvdla_drm_gem_prime_get_sg_table(struct drm_gem_object *dobj)
> +{
> +	int32_t ret;
> +	struct sg_table *sgt;
> +	struct drm_device *drm = dobj->dev;
> +	struct nvdla_gem_object *nobj = to_nvdla_obj(dobj);
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ret = dma_get_sgtable_attrs(drm->dev, sgt, nobj->kvaddr,
> +				    nobj->dma_addr, dobj->size,
> +				    nobj->dma_attrs);
> +	if (ret) {
> +		DRM_ERROR("failed to allocate sgt, %d\n", ret);
> +		kfree(sgt);
> +		return ERR_PTR(ret);
> +	}
> +
> +	return sgt;
> +}
> +
> +static int nvdla_drm_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
> +{
> +	struct nvdla_gem_object *nobj = to_nvdla_obj(obj);
> +
> +	map->vaddr = nobj->kvaddr;

Instead of kvaddr, store the pointer as struct iosys_map. Then simply 
copy it here, as in

    *map = nobj->map;

> +
> +	return 0;
> +}
> +
> +static void nvdla_drm_gem_prime_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
> +{
> +	/* Nothing to do */
> +}
> +
> +static int32_t nvdla_drm_gem_object_mmap(struct drm_gem_object *dobj,
> +					struct vm_area_struct *vma)
> +{
> +	int32_t ret;
> +	struct nvdla_gem_object *nobj = to_nvdla_obj(dobj);
> +	struct drm_device *drm = dobj->dev;
> +
> +	vma->vm_flags &= ~VM_PFNMAP;
> +	vma->vm_pgoff = 0;

It's cleaner to do this as

    vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node)

> +
> +	ret = dma_mmap_attrs(drm->dev, vma, nobj->kvaddr, nobj->dma_addr,
> +			     dobj->size, nobj->dma_attrs);
> +	if (ret)
> +		drm_gem_vm_close(vma);
> +
> +	return ret;
> +}
> +
> +static const struct drm_gem_object_funcs nvdla_gem_object_funcs = {
> +	.free = nvdla_gem_free_object,
> +	.get_sg_table = nvdla_drm_gem_prime_get_sg_table,
> +	.vmap = nvdla_drm_gem_prime_vmap,
> +	.vunmap = nvdla_drm_gem_prime_vunmap,
> +	.mmap = nvdla_drm_gem_object_mmap,
> +};
> +
> +static struct nvdla_gem_object*
> +nvdla_gem_create_with_handle(struct drm_file *file_priv,
> +							 struct drm_device *drm, uint32_t size,
> +							 uint32_t *handle)
> +{
> +	int32_t ret;
> +	struct drm_gem_object *dobj;
> +	struct nvdla_gem_object *nobj;
> +
> +	nobj = nvdla_gem_create_object(drm, size);
> +	if (IS_ERR(nobj))
> +		return ERR_CAST(nobj);
> +
> +	dobj = &nobj->object;
> +	dobj->funcs = &nvdla_gem_object_funcs;
> +	ret = drm_gem_handle_create(file_priv, dobj, handle);
> +	if (ret)
> +		goto free_drm_object;
> +
> +	drm_gem_object_put(dobj);
> +
> +	return nobj;
> +
> +free_drm_object:
> +	nvdla_gem_free_object(dobj);
> +
> +	return ERR_PTR(ret);
> +}
> +
> +static int32_t nvdla_gem_create(struct drm_device *drm, void *data,
> +								struct drm_file *file)
> +{
> +	struct nvdla_gem_object *nobj;
> +	struct nvdla_gem_create_args *args = data;
> +
> +	nobj = nvdla_gem_create_with_handle(file, drm, args->size,
> +					 &args->handle);
> +	if (IS_ERR(nobj))
> +		return PTR_ERR(nobj);
> +
> +	return 0;
> +}
> +
> +static int32_t nvdla_drm_gem_mmap_buf(struct drm_gem_object *obj,
> +									  struct vm_area_struct *vma)
> +{
> +	int32_t ret;
> +
> +	ret = drm_gem_mmap_obj(obj, obj->size, vma);
> +	if (ret)
> +		return ret;
> +
> +	return nvdla_drm_gem_object_mmap(obj, vma);
> +}
> +
> +static int32_t nvdla_drm_gem_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	int32_t ret;
> +	struct drm_gem_object *obj;
> +
> +	ret = drm_gem_mmap(filp, vma);
> +	if (ret)
> +		return ret;
> +
> +	obj = vma->vm_private_data;
> +
> +	return nvdla_drm_gem_object_mmap(obj, vma);

I don't understand these two lines. This is part of what drm_gem_mmap() 
does. It shouldn't be necessary here.

> +}
> +
> +int32_t nvdla_gem_dma_addr(struct drm_device *dev, struct drm_file *file,
> +						   uint32_t fd, dma_addr_t *addr)
> +{
> +	int32_t ret;
> +	uint32_t handle;
> +	struct nvdla_gem_object *nobj;
> +	struct drm_gem_object *dobj;
> +
> +	ret = drm_gem_prime_fd_to_handle(dev, file, fd, &handle);
> +	if (ret)
> +		return ret;
> +
> +	dobj = drm_gem_object_lookup(file, handle);
> +	if (!dobj)
> +		return -EINVAL;
> +
> +	nobj = to_nvdla_obj(dobj);
> +
> +	*addr = nobj->dma_addr;
> +
> +	drm_gem_object_put(dobj);
> +
> +	return 0;
> +}
> +
> +static int32_t nvdla_gem_map_offset(struct drm_device *drm, void *data,
> +									struct drm_file *file)
> +{
> +	struct nvdla_gem_map_offset_args *args = data;
> +
> +	return drm_gem_dumb_map_offset(file, drm, args->handle,
> +								   &args->offset);
> +}
> +
> +static const struct file_operations nvdla_drm_fops = {
> +	.owner = THIS_MODULE,
> +	.open = drm_open,
> +	.release = drm_release,
> +	.unlocked_ioctl = drm_ioctl,
> +	.mmap = nvdla_drm_gem_mmap,

It should be fine to use drm_gem_mmap here. Then you should use 
DEFINE_DRM_GEM_FOPS() to define nvdla_drm_fops.

> +	.poll = drm_poll,
> +	.read = drm_read,
> +#ifdef CONFIG_COMPAT
> +	.compat_ioctl = drm_compat_ioctl,
> +#endif
> +	.llseek = noop_llseek,
> +};
> +
> +static const struct drm_ioctl_desc nvdla_drm_ioctls[] = {
> +	DRM_IOCTL_DEF_DRV(NVDLA_SUBMIT, nvdla_submit, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_CREATE, nvdla_gem_create, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_MMAP, nvdla_gem_map_offset, DRM_RENDER_ALLOW),
> +	/* use DRM_IOCTL_MODE_DESTROY_DUMB to destory */
> +};
> +
> +static struct drm_driver nvdla_drm_driver = {
> +	.driver_features = DRIVER_GEM | DRIVER_RENDER,
> +
> +	.ioctls = nvdla_drm_ioctls,
> +	.num_ioctls = ARRAY_SIZE(nvdla_drm_ioctls),
> +	.fops = &nvdla_drm_fops,
> +	.gem_prime_mmap		= nvdla_drm_gem_mmap_buf,

Use drm_gem_prime_mmap() here.

Some context: the situation with these mmap functions has been confusing 
and inconsistent among DRM drivers. But we cleaned it up so that you 
only have to provide a minimal implementation of struct 
drm_gem_object_funcs.mmap.  All other mmap callbacks can then be filled 
with standard DRM helpers.

> +
> +	.name = "nvdla",
> +	.desc = "NVDLA driver",
> +	.date = "20171017",
> +	.major = 0,
> +	.minor = 0,
> +	.patchlevel = 0,
> +};
> +
> +int32_t nvdla_drm_probe(struct nvdla_device *nvdla_dev)
> +{
> +	int32_t err;
> +	struct drm_device *drm;
> +	struct drm_driver *driver = &nvdla_drm_driver;
> +
> +	drm = drm_dev_alloc(driver, &nvdla_dev->pdev->dev);
> +	if (IS_ERR(drm))
> +		return PTR_ERR(drm);
> +
> +	nvdla_dev->drm = drm;
> +
> +	err = drm_dev_register(drm, 0);
> +	if (err < 0)
> +		goto unref;
> +
> +	return 0;
> +
> +unref:
> +	drm_dev_put(drm);
> +	return err;
> +}
> +
> +void nvdla_drm_remove(struct nvdla_device *nvdla_dev)
> +{
> +	drm_dev_unregister(nvdla_dev->drm);
> +	drm_dev_put(nvdla_dev->drm);
> +}

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-21  8:30   ` Thomas Zimmermann
@ 2022-04-21  8:34     ` Christian König
  2022-04-21  8:57       ` Thomas Zimmermann
                         ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Christian König @ 2022-04-21  8:34 UTC (permalink / raw)
  To: Thomas Zimmermann, Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sumit Semwal, linux-kernel, dri-devel, linux-media, linaro-mm-sig

Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
> (Resending, as some MLs didn't like the size of the origninal mail.)
>
> Hi,
>
> thanks for your submission. Some general comments:
>
>   * some functions are prefixed with dla_, others use nvdla_. It seems 
> arbitrary to me. Please use nvdla_ consistently throughout the source 
> code.
>
>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
> suggest to rearrange the error messages to not be located in the 
> innermost functions.

If you plan to have multiple instances of the driver loaded at the same 
time, using drm_dev_err(), drm_dev_warn() etc.. would be even better.

BTW: I'm still absolutely not keen to enforcing drm_* log functions. So 
if you prefer to stick with pr_err() and dev_err() we could discuss that 
once more.

Regards,
Christian.

>
>   * Could you please split this patch into smaller pieces? It 
> currently hits size limits of some mailing lists. Maybe add the 
> register constants separately.
>
> Please find more review comments below. It's not a full review, but at 
> least something to start with.
>
> Best regards
> Thomas


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-21  8:34     ` Christian König
@ 2022-04-21  8:57       ` Thomas Zimmermann
  2022-04-21  9:07       ` Thomas Zimmermann
  2022-04-21  9:13       ` Thomas Zimmermann
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Zimmermann @ 2022-04-21  8:57 UTC (permalink / raw)
  To: Christian König, Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sumit Semwal, linux-kernel, dri-devel, linux-media, linaro-mm-sig


[-- Attachment #1.1: Type: text/plain, Size: 1548 bytes --]

Hi

Am 21.04.22 um 10:34 schrieb Christian König:
> Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
>> (Resending, as some MLs didn't like the size of the origninal mail.)
>>
>> Hi,
>>
>> thanks for your submission. Some general comments:
>>
>>   * some functions are prefixed with dla_, others use nvdla_. It seems 
>> arbitrary to me. Please use nvdla_ consistently throughout the source 
>> code.
>>
>>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
>> suggest to rearrange the error messages to not be located in the 
>> innermost functions.
> 
> If you plan to have multiple instances of the driver loaded at the same 
> time, using drm_dev_err(), drm_dev_warn() etc.. would be even better.

I think that's what I mean. Thanks for pointing this out.

Best regards
Thomas

> 
> BTW: I'm still absolutely not keen to enforcing drm_* log functions. So 
> if you prefer to stick with pr_err() and dev_err() we could discuss that 
> once more.
> 
> Regards,
> Christian.
> 
>>
>>   * Could you please split this patch into smaller pieces? It 
>> currently hits size limits of some mailing lists. Maybe add the 
>> register constants separately.
>>
>> Please find more review comments below. It's not a full review, but at 
>> least something to start with.
>>
>> Best regards
>> Thomas
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-21  8:34     ` Christian König
  2022-04-21  8:57       ` Thomas Zimmermann
@ 2022-04-21  9:07       ` Thomas Zimmermann
  2022-04-21  9:13       ` Thomas Zimmermann
  2 siblings, 0 replies; 15+ messages in thread
From: Thomas Zimmermann @ 2022-04-21  9:07 UTC (permalink / raw)
  To: Christian König, Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sumit Semwal, linux-kernel, dri-devel, linux-media, linaro-mm-sig


[-- Attachment #1.1: Type: text/plain, Size: 1602 bytes --]



Am 21.04.22 um 10:34 schrieb Christian König:
> Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
>> (Resending, as some MLs didn't like the size of the origninal mail.)
>>
>> Hi,
>>
>> thanks for your submission. Some general comments:
>>
>>   * some functions are prefixed with dla_, others use nvdla_. It seems 
>> arbitrary to me. Please use nvdla_ consistently throughout the source 
>> code.
>>
>>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
>> suggest to rearrange the error messages to not be located in the 
>> innermost functions.
> 
> If you plan to have multiple instances of the driver loaded at the same 
> time, using drm_dev_err(), drm_dev_warn() etc.. would be even better.
> 
> BTW: I'm still absolutely not keen to enforcing drm_* log functions. So 
> if you prefer to stick with pr_err() and dev_err() we could discuss that 
> once more.

I often do 'dmesg | grep drm' to quickly look for errors. Not using drm 
logging helpers makes this less useful.

Best regards
Thomas

> 
> Regards,
> Christian.
> 
>>
>>   * Could you please split this patch into smaller pieces? It 
>> currently hits size limits of some mailing lists. Maybe add the 
>> register constants separately.
>>
>> Please find more review comments below. It's not a full review, but at 
>> least something to start with.
>>
>> Best regards
>> Thomas
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-21  8:34     ` Christian König
  2022-04-21  8:57       ` Thomas Zimmermann
  2022-04-21  9:07       ` Thomas Zimmermann
@ 2022-04-21  9:13       ` Thomas Zimmermann
  2022-04-21  9:23         ` Christian König
  2 siblings, 1 reply; 15+ messages in thread
From: Thomas Zimmermann @ 2022-04-21  9:13 UTC (permalink / raw)
  To: Christian König, Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sumit Semwal, linux-kernel, dri-devel, linux-media, linaro-mm-sig


[-- Attachment #1.1: Type: text/plain, Size: 1597 bytes --]

Hi

Am 21.04.22 um 10:34 schrieb Christian König:
> Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
>> (Resending, as some MLs didn't like the size of the origninal mail.)
>>
>> Hi,
>>
>> thanks for your submission. Some general comments:
>>
>>   * some functions are prefixed with dla_, others use nvdla_. It seems 
>> arbitrary to me. Please use nvdla_ consistently throughout the source 
>> code.
>>
>>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
>> suggest to rearrange the error messages to not be located in the 
>> innermost functions.
> 
> If you plan to have multiple instances of the driver loaded at the same 
> time, using drm_dev_err(), drm_dev_warn() etc.. would be even better.

I thought that these functions exist, but looking for them now I cannot 
find them. The macros DRM_DEV_ERR(), etc are deprecated.

> 
> BTW: I'm still absolutely not keen to enforcing drm_* log functions. So 
> if you prefer to stick with pr_err() and dev_err() we could discuss that 
> once more.
> 
> Regards,
> Christian.
> 
>>
>>   * Could you please split this patch into smaller pieces? It 
>> currently hits size limits of some mailing lists. Maybe add the 
>> register constants separately.
>>
>> Please find more review comments below. It's not a full review, but at 
>> least something to start with.
>>
>> Best regards
>> Thomas
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-21  9:13       ` Thomas Zimmermann
@ 2022-04-21  9:23         ` Christian König
  0 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2022-04-21  9:23 UTC (permalink / raw)
  To: Thomas Zimmermann, Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, David Airlie, Daniel Vetter,
	Sumit Semwal, linux-kernel, dri-devel, linux-media, linaro-mm-sig

Am 21.04.22 um 11:13 schrieb Thomas Zimmermann:
> Hi
>
> Am 21.04.22 um 10:34 schrieb Christian König:
>> Am 21.04.22 um 10:30 schrieb Thomas Zimmermann:
>>> (Resending, as some MLs didn't like the size of the origninal mail.)
>>>
>>> Hi,
>>>
>>> thanks for your submission. Some general comments:
>>>
>>>   * some functions are prefixed with dla_, others use nvdla_. It 
>>> seems arbitrary to me. Please use nvdla_ consistently throughout the 
>>> source code.
>>>
>>>   * For reporting errors, please use drm_err(), drm_warn(), etc. I 
>>> suggest to rearrange the error messages to not be located in the 
>>> innermost functions.
>>
>> If you plan to have multiple instances of the driver loaded at the 
>> same time, using drm_dev_err(), drm_dev_warn() etc.. would be even 
>> better.
>
> I thought that these functions exist, but looking for them now I 
> cannot find them. The macros DRM_DEV_ERR(), etc are deprecated.

That's what I meant with the comment below.

I wasn't 100%, but dev_err() etc.. seems to now be the preferred form 
since that allows filtering for log messages of a certain device.

Regards,
Christian.

>
>>
>> BTW: I'm still absolutely not keen to enforcing drm_* log functions. 
>> So if you prefer to stick with pr_err() and dev_err() we could 
>> discuss that once more.
>>
>> Regards,
>> Christian.
>>
>>>
>>>   * Could you please split this patch into smaller pieces? It 
>>> currently hits size limits of some mailing lists. Maybe add the 
>>> register constants separately.
>>>
>>> Please find more review comments below. It's not a full review, but 
>>> at least something to start with.
>>>
>>> Best regards
>>> Thomas
>>
>


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
                     ` (3 preceding siblings ...)
  2022-04-21  8:30   ` Thomas Zimmermann
@ 2022-04-21 22:01   ` Kari Argillander
  2022-04-25 14:28     ` Cai Huoqing
  4 siblings, 1 reply; 15+ messages in thread
From: Kari Argillander @ 2022-04-21 22:01 UTC (permalink / raw)
  To: Cai Huoqing
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Sumit Semwal, Christian König, linux-kernel,
	dri-devel, linux-media, linaro-mm-sig

This is just quick look up. I basically check some style issues and did
some basic static analyzing.

I have run
  - cppcheck (which found couple mistakes)
  - flawfinder (did not found anything to my eyes)
  - codespell (did find couple typo)

You can run these yourself also or check below.

Couple common things which you can ignore or not	.
- Usually in this code there is goto exit and it is just return. Maybe
   use just return straight away. No need to jump.
- Some comments start capital others not. Maybe all should start
   capital. Very small nit, but makes nice touch to the code.
- Lot of oneline comments are unneccessary three line comments.

On 19.4.2022 16.59, Cai Huoqing wrote:
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add driver support for this accelerator.
> 
> Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_bdma.c b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> new file mode 100644
> index 000000000000..225613f27acf
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_bdma.c

... snip

> +static int32_t
> +processor_bdma_program_slot(struct dla_engine *engine,
> +							struct dla_bdma_surface_desc *bdma_surface,
> +							struct dla_bdma_transfer_desc *transfer)
> +{
> +	int32_t ret = 0;
> +	uint64_t source_addr = 0;
> +	uint64_t destination_addr = 0;
> +	uint32_t high, low, reg;
> +	uint8_t  bdma_free_slots = 0;
> +
> +	/* make sure there're enough free slots */
> +	if (bdma_free_slots <= 0) {

This is always true right now.

> +		do {
> +			reg = bdma_reg_read(engine, STATUS);
> +			reg = (reg & MASK(BDMA_STATUS_0, FREE_SLOT)) >>
> +					SHIFT(BDMA_STATUS_0, FREE_SLOT);
> +		} while (reg == 0);
> +		bdma_free_slots = (uint8_t)reg;
> +	}
> +
> +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> +						transfer->source_address,
> +						(void *)&source_addr,
> +						DESTINATION_DMA);
> +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> +						transfer->destination_address,
> +						(void *)&destination_addr,
> +						DESTINATION_DMA);
> +
> +	ASSERT_GOTO((transfer->line_repeat <= 8192),
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO((transfer->surface_repeat <= 8192),
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO((transfer->line_size % 32) == 0,
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->source_line >= transfer->line_size,
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->destination_line >= transfer->line_size,
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->source_surface >=
> +			(transfer->source_line * transfer->line_repeat),
> +				ret, -EINVAL, exit);
> +	ASSERT_GOTO(transfer->destination_surface >=
> +			(transfer->destination_line * transfer->line_repeat),
> +				ret, -EINVAL, exit);
> +
> +	/* config registers */
> +	high = upper_32_bits(source_addr);
> +	low = lower_32_bits(source_addr);
> +	bdma_reg_write(engine, CFG_SRC_ADDR_LOW, low);
> +	bdma_reg_write(engine, CFG_SRC_ADDR_HIGH, high);
> +	high = upper_32_bits(destination_addr);
> +	low = lower_32_bits(destination_addr);
> +	bdma_reg_write(engine, CFG_DST_ADDR_LOW, low);
> +	bdma_reg_write(engine, CFG_DST_ADDR_HIGH, high);
> +	bdma_reg_write(engine, CFG_LINE, (transfer->line_size >> 5) - 1);
> +	reg = (map_mem[bdma_surface->source_type] <<
> +				SHIFT(BDMA_CFG_CMD_0, SRC_RAM_TYPE)) |
> +		(map_mem[bdma_surface->destination_type] <<
> +				SHIFT(BDMA_CFG_CMD_0, DST_RAM_TYPE));
> +	bdma_reg_write(engine, CFG_CMD, reg);
> +	bdma_reg_write(engine, CFG_LINE_REPEAT, transfer->line_repeat - 1);
> +	bdma_reg_write(engine, CFG_SRC_LINE, transfer->source_line);
> +	bdma_reg_write(engine, CFG_DST_LINE, transfer->destination_line);
> +	bdma_reg_write(engine, CFG_SURF_REPEAT, transfer->surface_repeat - 1);
> +	bdma_reg_write(engine, CFG_SRC_SURF, transfer->source_surface);
> +	bdma_reg_write(engine, CFG_DST_SURF, transfer->destination_surface);
> +	bdma_reg_write(engine, CFG_OP, FIELD_ENUM(BDMA_CFG_OP_0, EN, ENABLE));
> +
> +exit:
> +	return ret;
> +}

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_cache.c b/drivers/gpu/drm/nvdla/nvdla_cache.c
> new file mode 100644
> index 000000000000..f8bd7b514aab
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_cache.c
> @@ -0,0 +1,215 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#include "nvdla_common.h"
> +#include "nvdla_drm.h"
> +#include "nvdla_reg.h"
> +#include "nvdla_engine.h"
> +
> +#define DLA_OP_CACHE_SIZE (DLA_NUM_GROUPS * ((DLA_OP_NUM + 2) * 2))
> +
> +static struct dla_common_op_desc desc_cache[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> +static int32_t desc_refcount[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> +
> +void
> +dla_get_refcount(struct dla_common_op_desc *op_desc)
> +{
> +	int32_t i;
> +	struct dla_common_op_desc *desc = NULL;
> +
> +	if (op_desc == NULL)
> +		return;
> +
> +	if (op_desc->index == -1)
> +		return;
> +
> +	desc = &desc_cache[op_desc->op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == op_desc->index &&
> +				desc->roi_index == op_desc->roi_index) {

reverse if

		if (desc->index != op_desc->index)
			continue;
		if (desc->roi_index != op_desc->roi_index)
			continue;

> +			desc_refcount[op_desc->op_type][i]++;
> +			return;
> +		}
> +	}
> +}
> +
> +struct dla_common_op_desc *
> +dla_get_op_desc(struct dla_engine *engine,
> +				struct dla_task *task, int16_t index,
> +				uint8_t op_type, uint8_t roi_index)
> +{
> +	int32_t i;
> +	int32_t ret;
> +	uint64_t op_base;
> +	uint64_t dep_graph_addr;
> +	struct dla_common_op_desc *desc = NULL;
> +
> +	if (index == -1) {
> +		pr_debug("no desc get due to index==-1\n");
> +		goto exit;
> +	}
> +
> +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> +				engine->network->num_operations * roi_index);
> +
> +	desc = &desc_cache[op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == index && desc->roi_index == roi_index) {
> +			if (desc->op_type != op_type) {
> +				pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
> +					   op_type, desc->op_type);
> +				continue;
> +			}

reverse if so this will be pretty clean

		if (desc->index != index)
			continue;
		if (desc->roi_index != roi_index)
			continue;
		if (desc->op_type != op_type) {
			pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
					op_type, desc->op_type);
			continue;
		}


> +			desc_refcount[op_type][i]++;
> +			goto exit;
> +		}
> +	}
> +
> +	desc = &desc_cache[op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == -1) {

reverse if
		if (desc->index != -1)
			continue;

> +			op_base = dep_graph_addr +
> +					(sizeof(struct dla_common_op_desc) *
> +					(uint64_t)index);
> +			ret = dla_data_read(engine->driver_context,
> +					task->task_data,
> +					task->dependency_graph_addr,
> +					(void *)(desc),
> +					sizeof(struct dla_common_op_desc),
> +					op_base);
> +			if (ret) {
> +				desc = NULL;
> +				goto exit;
> +			}
> +
> +			if (op_type != desc->op_type) {
> +				/*
> +				 * op_type of entry read from DRAM should not
> +				 * mismatch with given op_type. If they
> +				 * mismatches, then wrong entry is fetched, so
> +				 * report this issue by throwing error.
> +				 */
> +				pr_err("Fetched [op_type=%u] from DRAM doesn't match with op_type[%u]\n",
> +					   desc->op_type, op_type);
> +				desc->op_type = op_type;
> +				desc->index = -1;
> +				desc->roi_index = -1;
> +				desc = NULL;
> +				goto exit;
> +			}
> +
> +			desc->index = index;
> +			desc->roi_index = roi_index;
> +
> +			desc_refcount[op_type][i]++;
> +			goto exit;
> +		}
> +	}
> +
> +exit:
> +	return desc;
> +}
> +
> +static void
> +dla_free_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> +{
> +	uint64_t op_base;
> +	uint64_t dep_graph_addr;
> +	struct dla_task *task;
> +
> +	pr_debug("Enter: %s op desc index %u ROI %d\n", __func__,
> +				op_desc->index, op_desc->roi_index);

Possiple null pointer dereference

> +	task = engine->task;
> +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> +				engine->network->num_operations *
> +				op_desc->roi_index);
> +
> +	if (op_desc->index == -1)
> +		goto exit;

Possiple null pointer dereference

> +	if (op_desc == NULL)
> +		goto exit;

Or this is unnecessary.

> +
> +	/**
> +	 * TODO: keeping the depth value hardcoded as 0 for now,
> +	 * need to replace it once corresponding implementation is done.
> +	 */
> +	op_base = (dep_graph_addr +
> +			(sizeof(struct dla_common_op_desc) *
> +			(uint64_t)op_desc->index));
> +
> +	/**
> +	 * Flush descriptor to DRAM
> +	 */
> +	dla_data_write(engine->driver_context,
> +			task->task_data,
> +			(void *)op_desc,
> +			task->dependency_graph_addr,
> +			sizeof(struct dla_common_op_desc),
> +			op_base);
> +
> +	/**
> +	 * Release it
> +	 */
> +	op_desc->index = -1;
> +	op_desc->roi_index = -1;
> +exit:
> +	return;
> +}
> +
> +void
> +dla_put_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> +{
> +	int32_t i;
> +	struct dla_common_op_desc *desc;
> +
> +	if (op_desc == NULL)
> +		return;
> +
> +	if (op_desc->index == -1)
> +		return;
> +
> +	desc = &desc_cache[op_desc->op_type][0];
> +
> +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> +		if (desc->index == op_desc->index &&
> +				desc->roi_index == op_desc->roi_index) {

Reverse if.

		if (desc->index != op_desc->index)
			continue;
		if (desc->roi_index != op_desc->roi_index)
			continue;

> +
> +			desc_refcount[op_desc->op_type][i]--;
> +
> +			/**
> +			 * Free desc if refcount is 0
> +			 */
Pretty useless comment and totally not needed three line for this.

> +			if (desc_refcount[op_desc->op_type][i] == 0)
> +				dla_free_op_desc(engine, op_desc);
> +
> +			return;
> +		}
> +	}
> +}
> +
> +void
> +dla_init_op_cache(struct dla_engine *engine)
> +{
> +	int32_t i, j;
> +	struct dla_common_op_desc *desc = &desc_cache[0][0];
> +
> +	memset((uint8_t *)&desc_cache[0][0], 0, sizeof(desc_cache));
> +	memset((uint8_t *)&desc_refcount[0][0], 0, sizeof(desc_refcount));
> +
> +	for (i = 0; i < DLA_OP_NUM; i++) {
> +		for (j = 0; j < DLA_OP_CACHE_SIZE; j++) {
> +			desc->index = -1;
> +			desc->roi_index = -1;
> +			desc->op_type = (uint8_t)i;
> +			desc++;
> +		}
> +	}
> +}

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_common.h b/drivers/gpu/drm/nvdla/nvdla_common.h
> new file mode 100644
> index 000000000000..38cf43246890
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_common.h
> @@ -0,0 +1,835 @@

... snip

> +struct dla_conv_op_desc {
> +	/* Performance parameters */
> +
> +	/* dla_conv_mode */
> +	uint8_t conv_mode;
> +	uint8_t data_reuse;
> +	uint8_t weight_reuse;
> +	uint8_t skip_data_rls;
> +
> +	uint8_t skip_weight_rls;
> +	uint8_t reserved0;
> +	uint16_t entry_per_slice;
> +
> +	/* dla_data_format */
> +	uint8_t data_format;
> +	/* dla_pixel_mapping */
> +	uint8_t pixel_mapping;
> +	/* number of free slices before fetch */
> +	uint16_t fetch_grain;
> +
> +	uint8_t reserved_b[8];
> +
> +	/* batch_num */
> +	uint8_t batch;
> +	/* dla_weight_format */
> +	uint8_t weight_format;
> +	uint8_t data_bank;
> +	uint8_t weight_bank;
> +
> +	/* the offset in bytes of each data cube in a batch */
> +	uint32_t batch_stride;
> +
> +	uint8_t post_extension;
> +	uint8_t pixel_override;
> +	/* number of slices need to be released */
> +	uint16_t release;
> +
> +	 /* The input cube dimension for CSC */
> +	uint16_t input_width_csc;
> +	uint16_t input_height_csc;
> +
> +	uint16_t input_channel_csc;
> +	uint16_t kernel_width_csc;
> +
> +	uint16_t kernel_height_csc;
> +	uint16_t kernel_channel_csc;
> +
> +	/* The input cube dimension for CMAC */
> +	uint16_t input_width_cmac;
> +	uint16_t input_height_cmac;
> +
> +	/* actual size in bytes */
> +	uint32_t bytes_per_kernel;
> +
> +	/* Algorithm parameters */
> +
> +	int16_t mean_ry; /* mean value for red in RGB or Y in YUV */
> +	int16_t mean_gu; /* mean value for green in RGB or U in YUV */
> +
> +	int16_t mean_bv; /* mean value for blue in RGB or V in YUV */
> +	int16_t mean_ax;
> +
> +	uint8_t mean_format; /* dla_mean_format */
> +	uint8_t conv_stride_x;
> +	uint8_t conv_stride_y;
> +	uint8_t pad_x_left;
> +
> +	uint8_t pad_x_right;
> +	uint8_t pad_y_top;
> +	uint8_t pad_y_bottom;
> +	uint8_t dilation_x;
> +
> +	uint8_t dilation_y;
> +	uint8_t reserved2[2];
> +
> +	/* Precision parameters */
> +	uint8_t pra_truncate;
> +
> +	uint8_t in_precision;
> +	/* The output precision from CONV, it's the MAC processing precison */

./nvdla_common.h:428: precison ==> precision

> +	uint8_t out_precision;
> +	int16_t pad_val;
> +
> +	/* input converter parameters */
> +	struct dla_cvt_param in_cvt;
> +	/* output converter parameters, support truncate only */
> +	struct dla_cvt_param out_cvt;
> +
> +} __packed __aligned(4);
> +
> +struct dla_conv_stat_desc {
> +	uint32_t data_read_stall;
> +	uint32_t weight_read_stall;
> +	uint32_t data_read_latency;
> +	uint32_t weight_read_latency;
> +	uint32_t saturation_count;
> +	uint32_t nan_data_num;
> +	uint32_t nan_weight_num;
> +	uint32_t inf_data_num;
> +	uint32_t inf_weight_num;
> +} __packed __aligned(4);
> +
> +/**
> + * @ingroup SDP
> + * @name Activation functions
> + * @brief Activation functions supported in SDP
> + * @{
> + */
> +#define ACTIVATION_NONE		0
> +#define ACTIVATION_RELU		1
> +#define ACTIVATION_LUT		2
> +#define ACTIVATION_PRELU	3
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT size
> + * @brief LUT sizes for linear and exponentila LUT
> + * @{
> + */
> +#define LUT_LINEAR_EXP_TABLE_ENTRY_LOG2		6
> +#define LUT_LINEAR_ONLY_TABLE_ENTRY_LOG2	8
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT types
> + * @brief DLA supports two types of LUT, linear and exonential
> + * @{
> + */
> +#define LUT_LINEAR_EXP_TABLE		0
> +#define LUT_LINEAR_ONLY_TABLE		1
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT methods
> + * @brief DLA supports two types of LUT, linear and exonential
> + * @{
> + */
> +#define LUT_METHOD_EXPONENTIAL		0
> +#define LUT_METHOD_LINEAR		1
> +/** @} */
> +
> +/**
> + * @ingroup LUT
> + * @name LUT
> + * @brief DLA supports two types of LUT, linear and exonential
> + * @{
> + */
> +#define LUT_PRI_LINEAR_EXP		0
> +#define LUT_PRI_LINEAR_ONLY		1
> +/** @} */
> +
> +union dla_lut_offset {
> +	/**
> +	 * Number should be substracted on log domain before look up

./nvdla_common.h:505: substracted ==> subtracted

> +	 * exponetial table it has the same definition as hardware

./nvdla_common.h:506: exponetial ==> exponential

> +	 * thus input scaling should also take into account when
> +	 * set this field.
> +	 */
> +	int8_t exp_offset;
> +	/**
> +	 * Number of bits should be right shift before looking
> +	 * up linear table
> +	 */
> +	int8_t frac_bits;
> +	uint16_t reserved0;
> +};

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_drm.c b/drivers/gpu/drm/nvdla/nvdla_drm.c
> new file mode 100644
> index 000000000000..9217eee1de3b
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_drm.c
> @@ -0,0 +1,695 @@
> +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> +/*
> + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> + * Copyright (C) 2022 Cai Huoqing
> + */
> +
> +#include <linux/dma-buf.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/fs.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/irqdomain.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_device.h>
> +#include <linux/of_irq.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +#include <linux/time.h>
> +#include <linux/uaccess.h>
> +#include <linux/types.h>
> +
> +#include "nvdla_drm.h"
> +#include "nvdla_ioctl.h"
> +#include "nvdla_engine.h"
> +
> +static struct nvdla_config nvdla_config_os_initial = {
> +	.atom_size = 32,
> +	.bdma_enable = true,
> +	.rubik_enable = true,
> +	.weight_compress_support = true,
> +};
> +
> +static struct nvdla_config nvdla_config_small = {
> +	//.atom_size = 8,
> +	.atom_size = 32,  // nv_large config
> +	.bdma_enable = false,
> +	.rubik_enable = false,
> +	.weight_compress_support = false,
> +};
> +
> +int64_t dla_get_time_us(void)

Funtion is never used.

> +{
> +	return ktime_get_ns() / NSEC_PER_USEC;
> +}
> +
> +void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return;
> +
> +	writel(reg, nvdla_dev->base + addr);
> +}
> +
> +uint32_t dla_reg_read(void *driver_context, uint32_t addr)
> +{
> +	struct nvdla_device *nvdla_dev =
> +			(struct nvdla_device *)driver_context;
> +
> +	if (!nvdla_dev)
> +		return 0;
> +
> +	return readl(nvdla_dev->base + addr);
> +}
> +
> +static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
> +{
> +	unsigned long flags;
> +	uint32_t mask;
> +	uint32_t reg;
> +	struct dla_processor *processor = NULL;
> +	struct dla_processor_group *group;
> +	struct dla_engine *engine;
> +	struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
> +
> +	if (!nvdla_dev)
> +		return IRQ_NONE;
> +
> +	engine = nvdla_dev->engine_context;
> +	spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
> +
> +	mask = glb_reg_read(engine, S_INTR_MASK);

Never used. It would be nice so that static analyzer will not complain
these anymore, but your choice what you want to do.

> +	reg = glb_reg_read(engine, S_INTR_STATUS);
> +
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_SDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_RUBIK];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_PDP];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_BDMA];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[0];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
> +		processor = &engine->processors[DLA_OP_CONV];
> +		group = &processor->groups[1];
> +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> +	}
> +
> +	glb_reg_write(engine, S_INTR_STATUS, reg);
> +	mask = glb_reg_read(engine, S_INTR_MASK);

Never used

> +	reg = glb_reg_read(engine, S_INTR_STATUS);

Never used.

> +
> +	complete(&nvdla_dev->event_notifier);
> +	spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
> +
> +	return IRQ_HANDLED;
> +}

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_gem.c b/drivers/gpu/drm/nvdla/nvdla_gem.c
> new file mode 100644
> index 000000000000..cccf6d01a564
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_gem.c

... snip

> +static const struct drm_ioctl_desc nvdla_drm_ioctls[] = {
> +	DRM_IOCTL_DEF_DRV(NVDLA_SUBMIT, nvdla_submit, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_CREATE, nvdla_gem_create, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_MMAP, nvdla_gem_map_offset, DRM_RENDER_ALLOW),
> +	/* use DRM_IOCTL_MODE_DESTROY_DUMB to destory */

./nvdla_gem.c:347: destory ==> destroy

> +};

... snip

> diff --git a/drivers/gpu/drm/nvdla/nvdla_scheduler.c b/drivers/gpu/drm/nvdla/nvdla_scheduler.c
> new file mode 100644
> index 000000000000..b814077478c6
> --- /dev/null
> +++ b/drivers/gpu/drm/nvdla/nvdla_scheduler.c

... snip

> +static int
> +dla_update_dependency(struct dla_engine *engine,
> +					  struct dla_consumer *consumer,
> +					  struct dla_common_op_desc *op_desc,
> +					  uint8_t event, uint8_t roi_index)
> +{
> +	int32_t ret = 0;
> +	struct dla_processor *processor;
> +
> +	if (consumer->index == -1)
> +		goto exit;
> +
> +	/* Update dependency only if event matches */
> +	if (event != consumer->event)
> +		goto exit;
> +
> +	/**
> +	 * If consumer index is valid but op desc is NULL means
> +	 * op desc for consumer was not pre-fetched
> +	 */
> +	if (op_desc == NULL) {
> +		ret = -EINVAL;
> +		pr_err("Operation descriptor is NULL, consumer index %d",
> +				consumer->index);
> +		goto exit;
> +	}
> +
> +	pr_debug("Update dependency operation index %d ROI %d DEP_COUNT=%d\n",
> +					op_desc->index, op_desc->roi_index,
> +					op_desc->dependency_count);
> +	op_desc->dependency_count--;
> +
> +	if (op_desc->dependency_count == 0) {
> +		processor = &engine->processors[op_desc->op_type];
> +		pr_debug("enable %s in %s as depdency are resolved\n",

./nvdla_scheduler.c:455: depdency ==> dependency

> +			processor->name, __func__);
> +
> +		ret = dla_enable_operation(engine, processor, op_desc);
> +		if (ret)
> +			goto exit;
> +	}
> +exit:
> +	return ret;
> +}

... snip

> +int
> +dla_process_events(struct dla_engine *engine, uint32_t *task_complete)
> +{
> +	int32_t i;
> +	int32_t ret = 0;
> +
> +	for (i = 0; i < DLA_OP_NUM; i++) {
> +		struct dla_processor *processor;
> +
> +		processor = &engine->processors[i];
> +		ret = dla_handle_events(engine, processor);
> +		/**
> +		 * Incase engine status is non-zero, then don't

./nvdla_scheduler.c:905: Incase ==> In case

> +		 * update the engine status. We should keep its
> +		 * status for later cleaning of engine.
> +		 */
> +		if (!engine->status)
> +			engine->status = ret;
> +	}
> +
> +	if (engine->network->num_operations == engine->num_proc_hwl)
> +		*task_complete = 1;
> +
> +	return ret;
> +}

... snip

   Argillander

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/2] drm/nvdla: Add driver support for NVDLA
  2022-04-21 22:01   ` Kari Argillander
@ 2022-04-25 14:28     ` Cai Huoqing
  0 siblings, 0 replies; 15+ messages in thread
From: Cai Huoqing @ 2022-04-25 14:28 UTC (permalink / raw)
  To: Kari Argillander
  Cc: Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Daniel Vetter, Sumit Semwal, Christian König, linux-kernel,
	dri-devel, linux-media, linaro-mm-sig

On 22 4月 22 01:01:14, Kari Argillander wrote:
> This is just quick look up. I basically check some style issues and did
> some basic static analyzing.
> 
> I have run
>  - cppcheck (which found couple mistakes)
>  - flawfinder (did not found anything to my eyes)
>  - codespell (did find couple typo)
> 
> You can run these yourself also or check below.
> 
> Couple common things which you can ignore or not	.
> - Usually in this code there is goto exit and it is just return. Maybe
>   use just return straight away. No need to jump.
> - Some comments start capital others not. Maybe all should start
>   capital. Very small nit, but makes nice touch to the code.
> - Lot of oneline comments are unneccessary three line comments.
> 
> On 19.4.2022 16.59, Cai Huoqing wrote:
> > The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> > which is integrated into NVIDIA Jetson AGX Xavier,
> > so add driver support for this accelerator.
> > 
> > Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_bdma.c b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> > new file mode 100644
> > index 000000000000..225613f27acf
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_bdma.c
> 
> ... snip
> 
> > +static int32_t
> > +processor_bdma_program_slot(struct dla_engine *engine,
> > +							struct dla_bdma_surface_desc *bdma_surface,
> > +							struct dla_bdma_transfer_desc *transfer)
> > +{
> > +	int32_t ret = 0;
> > +	uint64_t source_addr = 0;
> > +	uint64_t destination_addr = 0;
> > +	uint32_t high, low, reg;
> > +	uint8_t  bdma_free_slots = 0;
> > +
> > +	/* make sure there're enough free slots */
> > +	if (bdma_free_slots <= 0) {
> 
> This is always true right now.
> 
> > +		do {
> > +			reg = bdma_reg_read(engine, STATUS);
> > +			reg = (reg & MASK(BDMA_STATUS_0, FREE_SLOT)) >>
> > +					SHIFT(BDMA_STATUS_0, FREE_SLOT);
> > +		} while (reg == 0);
> > +		bdma_free_slots = (uint8_t)reg;
> > +	}
> > +
> > +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> > +						transfer->source_address,
> > +						(void *)&source_addr,
> > +						DESTINATION_DMA);
> > +	dla_get_dma_address(engine->driver_context, engine->task->task_data,
> > +						transfer->destination_address,
> > +						(void *)&destination_addr,
> > +						DESTINATION_DMA);
> > +
> > +	ASSERT_GOTO((transfer->line_repeat <= 8192),
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO((transfer->surface_repeat <= 8192),
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO((transfer->line_size % 32) == 0,
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->source_line >= transfer->line_size,
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->destination_line >= transfer->line_size,
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->source_surface >=
> > +			(transfer->source_line * transfer->line_repeat),
> > +				ret, -EINVAL, exit);
> > +	ASSERT_GOTO(transfer->destination_surface >=
> > +			(transfer->destination_line * transfer->line_repeat),
> > +				ret, -EINVAL, exit);
> > +
> > +	/* config registers */
> > +	high = upper_32_bits(source_addr);
> > +	low = lower_32_bits(source_addr);
> > +	bdma_reg_write(engine, CFG_SRC_ADDR_LOW, low);
> > +	bdma_reg_write(engine, CFG_SRC_ADDR_HIGH, high);
> > +	high = upper_32_bits(destination_addr);
> > +	low = lower_32_bits(destination_addr);
> > +	bdma_reg_write(engine, CFG_DST_ADDR_LOW, low);
> > +	bdma_reg_write(engine, CFG_DST_ADDR_HIGH, high);
> > +	bdma_reg_write(engine, CFG_LINE, (transfer->line_size >> 5) - 1);
> > +	reg = (map_mem[bdma_surface->source_type] <<
> > +				SHIFT(BDMA_CFG_CMD_0, SRC_RAM_TYPE)) |
> > +		(map_mem[bdma_surface->destination_type] <<
> > +				SHIFT(BDMA_CFG_CMD_0, DST_RAM_TYPE));
> > +	bdma_reg_write(engine, CFG_CMD, reg);
> > +	bdma_reg_write(engine, CFG_LINE_REPEAT, transfer->line_repeat - 1);
> > +	bdma_reg_write(engine, CFG_SRC_LINE, transfer->source_line);
> > +	bdma_reg_write(engine, CFG_DST_LINE, transfer->destination_line);
> > +	bdma_reg_write(engine, CFG_SURF_REPEAT, transfer->surface_repeat - 1);
> > +	bdma_reg_write(engine, CFG_SRC_SURF, transfer->source_surface);
> > +	bdma_reg_write(engine, CFG_DST_SURF, transfer->destination_surface);
> > +	bdma_reg_write(engine, CFG_OP, FIELD_ENUM(BDMA_CFG_OP_0, EN, ENABLE));
> > +
> > +exit:
> > +	return ret;
> > +}
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_cache.c b/drivers/gpu/drm/nvdla/nvdla_cache.c
> > new file mode 100644
> > index 000000000000..f8bd7b514aab
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_cache.c
> > @@ -0,0 +1,215 @@
> > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > +/*
> > + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> > + * Copyright (C) 2022 Cai Huoqing
> > + */
> > +
> > +#include "nvdla_common.h"
> > +#include "nvdla_drm.h"
> > +#include "nvdla_reg.h"
> > +#include "nvdla_engine.h"
> > +
> > +#define DLA_OP_CACHE_SIZE (DLA_NUM_GROUPS * ((DLA_OP_NUM + 2) * 2))
> > +
> > +static struct dla_common_op_desc desc_cache[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> > +static int32_t desc_refcount[DLA_OP_NUM][DLA_OP_CACHE_SIZE];
> > +
> > +void
> > +dla_get_refcount(struct dla_common_op_desc *op_desc)
> > +{
> > +	int32_t i;
> > +	struct dla_common_op_desc *desc = NULL;
> > +
> > +	if (op_desc == NULL)
> > +		return;
> > +
> > +	if (op_desc->index == -1)
> > +		return;
> > +
> > +	desc = &desc_cache[op_desc->op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == op_desc->index &&
> > +				desc->roi_index == op_desc->roi_index) {
> 
> reverse if
> 
> 		if (desc->index != op_desc->index)
> 			continue;
> 		if (desc->roi_index != op_desc->roi_index)
> 			continue;
> 
> > +			desc_refcount[op_desc->op_type][i]++;
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +struct dla_common_op_desc *
> > +dla_get_op_desc(struct dla_engine *engine,
> > +				struct dla_task *task, int16_t index,
> > +				uint8_t op_type, uint8_t roi_index)
> > +{
> > +	int32_t i;
> > +	int32_t ret;
> > +	uint64_t op_base;
> > +	uint64_t dep_graph_addr;
> > +	struct dla_common_op_desc *desc = NULL;
> > +
> > +	if (index == -1) {
> > +		pr_debug("no desc get due to index==-1\n");
> > +		goto exit;
> > +	}
> > +
> > +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> > +				engine->network->num_operations * roi_index);
> > +
> > +	desc = &desc_cache[op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == index && desc->roi_index == roi_index) {
> > +			if (desc->op_type != op_type) {
> > +				pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
> > +					   op_type, desc->op_type);
> > +				continue;
> > +			}
> 
> reverse if so this will be pretty clean
> 
> 		if (desc->index != index)
> 			continue;
> 		if (desc->roi_index != roi_index)
> 			continue;
> 		if (desc->op_type != op_type) {
> 			pr_err("op_cache[op=%u] contains incorrect entry of op[%u]\n",
> 					op_type, desc->op_type);
> 			continue;
> 		}
> 
> 
> > +			desc_refcount[op_type][i]++;
> > +			goto exit;
> > +		}
> > +	}
> > +
> > +	desc = &desc_cache[op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == -1) {
> 
> reverse if
> 		if (desc->index != -1)
> 			continue;
> 
> > +			op_base = dep_graph_addr +
> > +					(sizeof(struct dla_common_op_desc) *
> > +					(uint64_t)index);
> > +			ret = dla_data_read(engine->driver_context,
> > +					task->task_data,
> > +					task->dependency_graph_addr,
> > +					(void *)(desc),
> > +					sizeof(struct dla_common_op_desc),
> > +					op_base);
> > +			if (ret) {
> > +				desc = NULL;
> > +				goto exit;
> > +			}
> > +
> > +			if (op_type != desc->op_type) {
> > +				/*
> > +				 * op_type of entry read from DRAM should not
> > +				 * mismatch with given op_type. If they
> > +				 * mismatches, then wrong entry is fetched, so
> > +				 * report this issue by throwing error.
> > +				 */
> > +				pr_err("Fetched [op_type=%u] from DRAM doesn't match with op_type[%u]\n",
> > +					   desc->op_type, op_type);
> > +				desc->op_type = op_type;
> > +				desc->index = -1;
> > +				desc->roi_index = -1;
> > +				desc = NULL;
> > +				goto exit;
> > +			}
> > +
> > +			desc->index = index;
> > +			desc->roi_index = roi_index;
> > +
> > +			desc_refcount[op_type][i]++;
> > +			goto exit;
> > +		}
> > +	}
> > +
> > +exit:
> > +	return desc;
> > +}
> > +
> > +static void
> > +dla_free_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> > +{
> > +	uint64_t op_base;
> > +	uint64_t dep_graph_addr;
> > +	struct dla_task *task;
> > +
> > +	pr_debug("Enter: %s op desc index %u ROI %d\n", __func__,
> > +				op_desc->index, op_desc->roi_index);
> 
> Possiple null pointer dereference
> 
> > +	task = engine->task;
> > +	dep_graph_addr = (sizeof(struct dla_common_op_desc) *
> > +				engine->network->num_operations *
> > +				op_desc->roi_index);
> > +
> > +	if (op_desc->index == -1)
> > +		goto exit;
> 
> Possiple null pointer dereference
> 
> > +	if (op_desc == NULL)
> > +		goto exit;
> 
> Or this is unnecessary.
> 
> > +
> > +	/**
> > +	 * TODO: keeping the depth value hardcoded as 0 for now,
> > +	 * need to replace it once corresponding implementation is done.
> > +	 */
> > +	op_base = (dep_graph_addr +
> > +			(sizeof(struct dla_common_op_desc) *
> > +			(uint64_t)op_desc->index));
> > +
> > +	/**
> > +	 * Flush descriptor to DRAM
> > +	 */
> > +	dla_data_write(engine->driver_context,
> > +			task->task_data,
> > +			(void *)op_desc,
> > +			task->dependency_graph_addr,
> > +			sizeof(struct dla_common_op_desc),
> > +			op_base);
> > +
> > +	/**
> > +	 * Release it
> > +	 */
> > +	op_desc->index = -1;
> > +	op_desc->roi_index = -1;
> > +exit:
> > +	return;
> > +}
> > +
> > +void
> > +dla_put_op_desc(struct dla_engine *engine, struct dla_common_op_desc *op_desc)
> > +{
> > +	int32_t i;
> > +	struct dla_common_op_desc *desc;
> > +
> > +	if (op_desc == NULL)
> > +		return;
> > +
> > +	if (op_desc->index == -1)
> > +		return;
> > +
> > +	desc = &desc_cache[op_desc->op_type][0];
> > +
> > +	for (i = 0; i < DLA_OP_CACHE_SIZE; i++, desc++) {
> > +		if (desc->index == op_desc->index &&
> > +				desc->roi_index == op_desc->roi_index) {
> 
> Reverse if.
> 
> 		if (desc->index != op_desc->index)
> 			continue;
> 		if (desc->roi_index != op_desc->roi_index)
> 			continue;
> 
> > +
> > +			desc_refcount[op_desc->op_type][i]--;
> > +
> > +			/**
> > +			 * Free desc if refcount is 0
> > +			 */
> Pretty useless comment and totally not needed three line for this.
> 
> > +			if (desc_refcount[op_desc->op_type][i] == 0)
> > +				dla_free_op_desc(engine, op_desc);
> > +
> > +			return;
> > +		}
> > +	}
> > +}
> > +
> > +void
> > +dla_init_op_cache(struct dla_engine *engine)
> > +{
> > +	int32_t i, j;
> > +	struct dla_common_op_desc *desc = &desc_cache[0][0];
> > +
> > +	memset((uint8_t *)&desc_cache[0][0], 0, sizeof(desc_cache));
> > +	memset((uint8_t *)&desc_refcount[0][0], 0, sizeof(desc_refcount));
> > +
> > +	for (i = 0; i < DLA_OP_NUM; i++) {
> > +		for (j = 0; j < DLA_OP_CACHE_SIZE; j++) {
> > +			desc->index = -1;
> > +			desc->roi_index = -1;
> > +			desc->op_type = (uint8_t)i;
> > +			desc++;
> > +		}
> > +	}
> > +}
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_common.h b/drivers/gpu/drm/nvdla/nvdla_common.h
> > new file mode 100644
> > index 000000000000..38cf43246890
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_common.h
> > @@ -0,0 +1,835 @@
> 
> ... snip
> 
> > +struct dla_conv_op_desc {
> > +	/* Performance parameters */
> > +
> > +	/* dla_conv_mode */
> > +	uint8_t conv_mode;
> > +	uint8_t data_reuse;
> > +	uint8_t weight_reuse;
> > +	uint8_t skip_data_rls;
> > +
> > +	uint8_t skip_weight_rls;
> > +	uint8_t reserved0;
> > +	uint16_t entry_per_slice;
> > +
> > +	/* dla_data_format */
> > +	uint8_t data_format;
> > +	/* dla_pixel_mapping */
> > +	uint8_t pixel_mapping;
> > +	/* number of free slices before fetch */
> > +	uint16_t fetch_grain;
> > +
> > +	uint8_t reserved_b[8];
> > +
> > +	/* batch_num */
> > +	uint8_t batch;
> > +	/* dla_weight_format */
> > +	uint8_t weight_format;
> > +	uint8_t data_bank;
> > +	uint8_t weight_bank;
> > +
> > +	/* the offset in bytes of each data cube in a batch */
> > +	uint32_t batch_stride;
> > +
> > +	uint8_t post_extension;
> > +	uint8_t pixel_override;
> > +	/* number of slices need to be released */
> > +	uint16_t release;
> > +
> > +	 /* The input cube dimension for CSC */
> > +	uint16_t input_width_csc;
> > +	uint16_t input_height_csc;
> > +
> > +	uint16_t input_channel_csc;
> > +	uint16_t kernel_width_csc;
> > +
> > +	uint16_t kernel_height_csc;
> > +	uint16_t kernel_channel_csc;
> > +
> > +	/* The input cube dimension for CMAC */
> > +	uint16_t input_width_cmac;
> > +	uint16_t input_height_cmac;
> > +
> > +	/* actual size in bytes */
> > +	uint32_t bytes_per_kernel;
> > +
> > +	/* Algorithm parameters */
> > +
> > +	int16_t mean_ry; /* mean value for red in RGB or Y in YUV */
> > +	int16_t mean_gu; /* mean value for green in RGB or U in YUV */
> > +
> > +	int16_t mean_bv; /* mean value for blue in RGB or V in YUV */
> > +	int16_t mean_ax;
> > +
> > +	uint8_t mean_format; /* dla_mean_format */
> > +	uint8_t conv_stride_x;
> > +	uint8_t conv_stride_y;
> > +	uint8_t pad_x_left;
> > +
> > +	uint8_t pad_x_right;
> > +	uint8_t pad_y_top;
> > +	uint8_t pad_y_bottom;
> > +	uint8_t dilation_x;
> > +
> > +	uint8_t dilation_y;
> > +	uint8_t reserved2[2];
> > +
> > +	/* Precision parameters */
> > +	uint8_t pra_truncate;
> > +
> > +	uint8_t in_precision;
> > +	/* The output precision from CONV, it's the MAC processing precison */
> 
> ./nvdla_common.h:428: precison ==> precision
> 
> > +	uint8_t out_precision;
> > +	int16_t pad_val;
> > +
> > +	/* input converter parameters */
> > +	struct dla_cvt_param in_cvt;
> > +	/* output converter parameters, support truncate only */
> > +	struct dla_cvt_param out_cvt;
> > +
> > +} __packed __aligned(4);
> > +
> > +struct dla_conv_stat_desc {
> > +	uint32_t data_read_stall;
> > +	uint32_t weight_read_stall;
> > +	uint32_t data_read_latency;
> > +	uint32_t weight_read_latency;
> > +	uint32_t saturation_count;
> > +	uint32_t nan_data_num;
> > +	uint32_t nan_weight_num;
> > +	uint32_t inf_data_num;
> > +	uint32_t inf_weight_num;
> > +} __packed __aligned(4);
> > +
> > +/**
> > + * @ingroup SDP
> > + * @name Activation functions
> > + * @brief Activation functions supported in SDP
> > + * @{
> > + */
> > +#define ACTIVATION_NONE		0
> > +#define ACTIVATION_RELU		1
> > +#define ACTIVATION_LUT		2
> > +#define ACTIVATION_PRELU	3
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT size
> > + * @brief LUT sizes for linear and exponentila LUT
> > + * @{
> > + */
> > +#define LUT_LINEAR_EXP_TABLE_ENTRY_LOG2		6
> > +#define LUT_LINEAR_ONLY_TABLE_ENTRY_LOG2	8
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT types
> > + * @brief DLA supports two types of LUT, linear and exonential
> > + * @{
> > + */
> > +#define LUT_LINEAR_EXP_TABLE		0
> > +#define LUT_LINEAR_ONLY_TABLE		1
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT methods
> > + * @brief DLA supports two types of LUT, linear and exonential
> > + * @{
> > + */
> > +#define LUT_METHOD_EXPONENTIAL		0
> > +#define LUT_METHOD_LINEAR		1
> > +/** @} */
> > +
> > +/**
> > + * @ingroup LUT
> > + * @name LUT
> > + * @brief DLA supports two types of LUT, linear and exonential
> > + * @{
> > + */
> > +#define LUT_PRI_LINEAR_EXP		0
> > +#define LUT_PRI_LINEAR_ONLY		1
> > +/** @} */
> > +
> > +union dla_lut_offset {
> > +	/**
> > +	 * Number should be substracted on log domain before look up
> 
> ./nvdla_common.h:505: substracted ==> subtracted
> 
> > +	 * exponetial table it has the same definition as hardware
> 
> ./nvdla_common.h:506: exponetial ==> exponential
> 
> > +	 * thus input scaling should also take into account when
> > +	 * set this field.
> > +	 */
> > +	int8_t exp_offset;
> > +	/**
> > +	 * Number of bits should be right shift before looking
> > +	 * up linear table
> > +	 */
> > +	int8_t frac_bits;
> > +	uint16_t reserved0;
> > +};
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_drm.c b/drivers/gpu/drm/nvdla/nvdla_drm.c
> > new file mode 100644
> > index 000000000000..9217eee1de3b
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_drm.c
> > @@ -0,0 +1,695 @@
> > +// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
> > +/*
> > + * Copyright (C) 2017-2018 NVIDIA CORPORATION
> > + * Copyright (C) 2022 Cai Huoqing
> > + */
> > +
> > +#include <linux/dma-buf.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/fs.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/irq.h>
> > +#include <linux/irqdomain.h>
> > +#include <linux/module.h>
> > +#include <linux/of.h>
> > +#include <linux/of_device.h>
> > +#include <linux/of_irq.h>
> > +#include <linux/of_platform.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/printk.h>
> > +#include <linux/slab.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/time.h>
> > +#include <linux/uaccess.h>
> > +#include <linux/types.h>
> > +
> > +#include "nvdla_drm.h"
> > +#include "nvdla_ioctl.h"
> > +#include "nvdla_engine.h"
> > +
> > +static struct nvdla_config nvdla_config_os_initial = {
> > +	.atom_size = 32,
> > +	.bdma_enable = true,
> > +	.rubik_enable = true,
> > +	.weight_compress_support = true,
> > +};
> > +
> > +static struct nvdla_config nvdla_config_small = {
> > +	//.atom_size = 8,
> > +	.atom_size = 32,  // nv_large config
> > +	.bdma_enable = false,
> > +	.rubik_enable = false,
> > +	.weight_compress_support = false,
> > +};
> > +
> > +int64_t dla_get_time_us(void)
> 
> Funtion is never used.
> 
> > +{
> > +	return ktime_get_ns() / NSEC_PER_USEC;
> > +}
> > +
> > +void dla_reg_write(void *driver_context, uint32_t addr, uint32_t reg)
> > +{
> > +	struct nvdla_device *nvdla_dev =
> > +			(struct nvdla_device *)driver_context;
> > +
> > +	if (!nvdla_dev)
> > +		return;
> > +
> > +	writel(reg, nvdla_dev->base + addr);
> > +}
> > +
> > +uint32_t dla_reg_read(void *driver_context, uint32_t addr)
> > +{
> > +	struct nvdla_device *nvdla_dev =
> > +			(struct nvdla_device *)driver_context;
> > +
> > +	if (!nvdla_dev)
> > +		return 0;
> > +
> > +	return readl(nvdla_dev->base + addr);
> > +}
> > +
> > +static irqreturn_t nvdla_engine_isr(int32_t irq, void *data)
> > +{
> > +	unsigned long flags;
> > +	uint32_t mask;
> > +	uint32_t reg;
> > +	struct dla_processor *processor = NULL;
> > +	struct dla_processor_group *group;
> > +	struct dla_engine *engine;
> > +	struct nvdla_device *nvdla_dev = (struct nvdla_device *)data;
> > +
> > +	if (!nvdla_dev)
> > +		return IRQ_NONE;
> > +
> > +	engine = nvdla_dev->engine_context;
> > +	spin_lock_irqsave(&nvdla_dev->nvdla_lock, flags);
> > +
> > +	mask = glb_reg_read(engine, S_INTR_MASK);
> 
> Never used. It would be nice so that static analyzer will not complain
> these anymore, but your choice what you want to do.
thanks for your check. this line is an read clear register to clear interrupt,
it'is ok to leave here.
for others, code style and typo. I will try to fix

Thanks,
Cai
> 
> > +	reg = glb_reg_read(engine, S_INTR_STATUS);
> > +
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CACC_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_SDP];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, SDP_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_SDP];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CDP];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDP_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CDP];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_RUBIK];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, RUBIK_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_RUBIK];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_PDP];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, PDP_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_PDP];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_BDMA];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, BDMA_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_BDMA];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_OP_COMPLETED);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_DAT_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_CDMA_DT_DONE);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS0)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[0];
> > +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> > +	}
> > +	if (reg & MASK(GLB_S_INTR_STATUS_0, CDMA_WT_DONE_STATUS1)) {
> > +		processor = &engine->processors[DLA_OP_CONV];
> > +		group = &processor->groups[1];
> > +		group->events |= (1 << DLA_EVENT_CDMA_WT_DONE);
> > +	}
> > +
> > +	glb_reg_write(engine, S_INTR_STATUS, reg);
> > +	mask = glb_reg_read(engine, S_INTR_MASK);
> 
> Never used
> 
> > +	reg = glb_reg_read(engine, S_INTR_STATUS);
> 
> Never used.
> 
> > +
> > +	complete(&nvdla_dev->event_notifier);
> > +	spin_unlock_irqrestore(&nvdla_dev->nvdla_lock, flags);
> > +
> > +	return IRQ_HANDLED;
> > +}
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_gem.c b/drivers/gpu/drm/nvdla/nvdla_gem.c
> > new file mode 100644
> > index 000000000000..cccf6d01a564
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_gem.c
> 
> ... snip
> 
> > +static const struct drm_ioctl_desc nvdla_drm_ioctls[] = {
> > +	DRM_IOCTL_DEF_DRV(NVDLA_SUBMIT, nvdla_submit, DRM_RENDER_ALLOW),
> > +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_CREATE, nvdla_gem_create, DRM_RENDER_ALLOW),
> > +	DRM_IOCTL_DEF_DRV(NVDLA_GEM_MMAP, nvdla_gem_map_offset, DRM_RENDER_ALLOW),
> > +	/* use DRM_IOCTL_MODE_DESTROY_DUMB to destory */
> 
> ./nvdla_gem.c:347: destory ==> destroy
> 
> > +};
> 
> ... snip
> 
> > diff --git a/drivers/gpu/drm/nvdla/nvdla_scheduler.c b/drivers/gpu/drm/nvdla/nvdla_scheduler.c
> > new file mode 100644
> > index 000000000000..b814077478c6
> > --- /dev/null
> > +++ b/drivers/gpu/drm/nvdla/nvdla_scheduler.c
> 
> ... snip
> 
> > +static int
> > +dla_update_dependency(struct dla_engine *engine,
> > +					  struct dla_consumer *consumer,
> > +					  struct dla_common_op_desc *op_desc,
> > +					  uint8_t event, uint8_t roi_index)
> > +{
> > +	int32_t ret = 0;
> > +	struct dla_processor *processor;
> > +
> > +	if (consumer->index == -1)
> > +		goto exit;
> > +
> > +	/* Update dependency only if event matches */
> > +	if (event != consumer->event)
> > +		goto exit;
> > +
> > +	/**
> > +	 * If consumer index is valid but op desc is NULL means
> > +	 * op desc for consumer was not pre-fetched
> > +	 */
> > +	if (op_desc == NULL) {
> > +		ret = -EINVAL;
> > +		pr_err("Operation descriptor is NULL, consumer index %d",
> > +				consumer->index);
> > +		goto exit;
> > +	}
> > +
> > +	pr_debug("Update dependency operation index %d ROI %d DEP_COUNT=%d\n",
> > +					op_desc->index, op_desc->roi_index,
> > +					op_desc->dependency_count);
> > +	op_desc->dependency_count--;
> > +
> > +	if (op_desc->dependency_count == 0) {
> > +		processor = &engine->processors[op_desc->op_type];
> > +		pr_debug("enable %s in %s as depdency are resolved\n",
> 
> ./nvdla_scheduler.c:455: depdency ==> dependency
> 
> > +			processor->name, __func__);
> > +
> > +		ret = dla_enable_operation(engine, processor, op_desc);
> > +		if (ret)
> > +			goto exit;
> > +	}
> > +exit:
> > +	return ret;
> > +}
> 
> ... snip
> 
> > +int
> > +dla_process_events(struct dla_engine *engine, uint32_t *task_complete)
> > +{
> > +	int32_t i;
> > +	int32_t ret = 0;
> > +
> > +	for (i = 0; i < DLA_OP_NUM; i++) {
> > +		struct dla_processor *processor;
> > +
> > +		processor = &engine->processors[i];
> > +		ret = dla_handle_events(engine, processor);
> > +		/**
> > +		 * Incase engine status is non-zero, then don't
> 
> ./nvdla_scheduler.c:905: Incase ==> In case
> 
> > +		 * update the engine status. We should keep its
> > +		 * status for later cleaning of engine.
> > +		 */
> > +		if (!engine->status)
> > +			engine->status = ret;
> > +	}
> > +
> > +	if (engine->network->num_operations == engine->num_proc_hwl)
> > +		*task_complete = 1;
> > +
> > +	return ret;
> > +}
> 
> ... snip
> 
>   Argillander

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/2] drm/nvdla: Add driver support for NVDLA
  2022-04-19 13:58 [PATCH 0/2] drm/nvdla: Add driver support for NVDLA Cai Huoqing
  2022-04-19 13:58 ` [PATCH 1/2] MAINTAINERS: Add the driver info of the NVDLA Cai Huoqing
       [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
@ 2022-04-26  5:20 ` Peter Robinson
  2 siblings, 0 replies; 15+ messages in thread
From: Peter Robinson @ 2022-04-26  5:20 UTC (permalink / raw)
  To: Cai Huoqing
  Cc: Thomas Zimmermann, David Airlie, linux-kernel,
	Christian König, linaro-mm-sig, dri-devel, Sumit Semwal,
	linux-media

On Tue, Apr 19, 2022 at 3:08 PM Cai Huoqing <cai.huoqing@linux.dev> wrote:
>
> The NVIDIA Deep Learning Accelerator (NVDLA) is an open source IP
> which is integrated into NVIDIA Jetson AGX Xavier,
> so add driver support for this accelerator.
>
> NVDLA introduce:
> http://nvdla.org/primer.html
>
> User mode driver:
> https://github.com/caihuoq/nvdla/tree/main/sw/umd
>
>
> Cai Huoqing (2):
>   MAINTAINERS: Add the driver info of the NVDLA
>   drm/nvdla: Add driver support for NVDLA

Are there device tree bindings that are required to test this IP, are
there additions for the Xavier SoCs to test them? They should also be
published as patches as part of this series.

>  MAINTAINERS                             |    7 +
>  drivers/gpu/drm/Kconfig                 |    2 +
>  drivers/gpu/drm/Makefile                |    1 +
>  drivers/gpu/drm/nvdla/Kconfig           |    8 +
>  drivers/gpu/drm/nvdla/Makefile          |   19 +
>  drivers/gpu/drm/nvdla/nvdla_bdma.c      |  200 +
>  drivers/gpu/drm/nvdla/nvdla_cache.c     |  215 +
>  drivers/gpu/drm/nvdla/nvdla_cdp.c       |  300 ++
>  drivers/gpu/drm/nvdla/nvdla_common.c    |  295 ++
>  drivers/gpu/drm/nvdla/nvdla_common.h    |  835 +++
>  drivers/gpu/drm/nvdla/nvdla_conv.c      |  683 +++
>  drivers/gpu/drm/nvdla/nvdla_drm.c       |  695 +++
>  drivers/gpu/drm/nvdla/nvdla_drm.h       |  127 +
>  drivers/gpu/drm/nvdla/nvdla_engine.c    |  233 +
>  drivers/gpu/drm/nvdla/nvdla_engine.h    |  272 +
>  drivers/gpu/drm/nvdla/nvdla_gem.c       |  393 ++
>  drivers/gpu/drm/nvdla/nvdla_ioctl.h     |   99 +
>  drivers/gpu/drm/nvdla/nvdla_pdp.c       |  446 ++
>  drivers/gpu/drm/nvdla/nvdla_reg.h       | 6411 +++++++++++++++++++++++
>  drivers/gpu/drm/nvdla/nvdla_rubik.c     |  217 +
>  drivers/gpu/drm/nvdla/nvdla_sched.h     |   52 +
>  drivers/gpu/drm/nvdla/nvdla_scheduler.c | 1005 ++++
>  drivers/gpu/drm/nvdla/nvdla_sdp.c       |  728 +++
>  23 files changed, 13243 insertions(+)
>  create mode 100644 drivers/gpu/drm/nvdla/Kconfig
>  create mode 100644 drivers/gpu/drm/nvdla/Makefile
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_bdma.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_cache.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_cdp.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_common.h
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_conv.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_drm.h
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_engine.h
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_gem.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_ioctl.h
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_pdp.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_reg.h
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_rubik.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_sched.h
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_scheduler.c
>  create mode 100644 drivers/gpu/drm/nvdla/nvdla_sdp.c
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-04-26  5:20 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-04-19 13:58 [PATCH 0/2] drm/nvdla: Add driver support for NVDLA Cai Huoqing
2022-04-19 13:58 ` [PATCH 1/2] MAINTAINERS: Add the driver info of the NVDLA Cai Huoqing
     [not found] ` <20220419135908.39606-3-cai.huoqing@linux.dev>
2022-04-19 14:07   ` [PATCH 2/2] drm/nvdla: Add driver support for NVDLA Christian König
2022-04-19 14:35     ` Cai Huoqing
2022-04-20  7:53   ` kernel test robot
2022-04-20  9:57   ` kernel test robot
2022-04-21  8:30   ` Thomas Zimmermann
2022-04-21  8:34     ` Christian König
2022-04-21  8:57       ` Thomas Zimmermann
2022-04-21  9:07       ` Thomas Zimmermann
2022-04-21  9:13       ` Thomas Zimmermann
2022-04-21  9:23         ` Christian König
2022-04-21 22:01   ` Kari Argillander
2022-04-25 14:28     ` Cai Huoqing
2022-04-26  5:20 ` [PATCH 0/2] " Peter Robinson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox