All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] drm/mgag200: Use DMA to copy the framebuffer to the VRAM
@ 2023-05-05 12:43 Jocelyn Falempe
  2023-05-05 12:43 ` [PATCH 1/4] drm/mgag200: Rename constant MGAREG_Status to MGAREG_STATUS Jocelyn Falempe
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Jocelyn Falempe @ 2023-05-05 12:43 UTC (permalink / raw)
  To: dri-devel, tzimmermann, airlied, javierm, lyude; +Cc: Jocelyn Falempe


This series adds DMA and IRQ for the mgag200 driver.
Unfortunately the DMA doesn't make the driver faster.
But it's still a big improvement regarding CPU usage and latency.

CPU usage goes from 100% of 1 CPU to 3% (using top and refreshing the screen continuously).

top without DMA, and a bash script to refresh the screen continuously
    PID S  %CPU     TIME+ COMMAND
   1536 R 100.0   4:02.78 kworker/1:0+events
   1612 S   3.0   0:03.82 bash
     16 I   0.3   0:01.56 rcu_preempt
   1467 I   0.3   0:00.11 kworker/u64:1-events_unbound
   3650 R   0.3   0:00.02 top

top with DMA, and the same bash script:
    PID S  %CPU     TIME+ COMMAND
   1335 D   3.0   0:01.26 kworker/2:0+events
   1486 S   0.3   0:00.14 bash
   1846 R   0.3   0:00.03 top
      1 S   0.0   0:01.87 systemd
      2 S   0.0   0:00.00 kthreadd

Latency, measured with cyclictest -s -l 10000:
Without DMA:
# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 1.52 0.52 0.33 3/358 2025          
T: 0 ( 1977) P: 0 I:1000 C:  10000 Min:      7 Act:   56 Avg:   85 Max:    2542

With DMA:
# /dev/cpu_dma_latency set to 0us
policy: other/other: loadavg: 1.27 0.48 0.18 2/363 2498          
T: 0 ( 2403) P: 0 I:1000 C:  10000 Min:      8 Act:   62 Avg:   59 Max:     339

Last benchmark is glxgears. It's still software rendering, but on my 2 core CPU,
freeing one CPU constantly doing memcpy(), allows it to draw more frames.
Without DMA:
415 frames in 5.0 seconds = 82.973 FPS
356 frames in 5.0 seconds = 71.167 FPS
with DMA:
717 frames in 5.0 seconds = 143.343 FPS
720 frames in 5.0 seconds = 143.993 FPS

Regarding the implementation:
The driver uses primary DMA to send drawing engine commands, and secondary DMA to send the pixels to an ILOAD command.
You can directly program the ILOAD command, and use Primary DMA to send the pixels, but in this case, you can't use the softrap interrupt to wait for the DMA completion.
The pixels are copied from the gem framebuffer to the DMA buffer, but as system memory is much faster than VRAM, it has a negligible impact.

DMA buffer size:
On my test machine, I can allocate only 4MB of dma coherent memory, and the framebuffer is 5MB.
So the driver has to cut it into small chunks when the full framebuffer is refreshed.
My implementation tries to allocate 4MB, and then smaller allocation until it succeeds.
If it fails to allocate, DMA will be disabled. That's probably not perfect, but at least it's simple.
It's also possible to do some kind of scatter gather DMA, by sending multiple ILOAD/SECDMA, but that increases the complexity a bit.

Adding a module parameter to disable DMA:
I think before merging this work, I will add a module parameter to disable DMA, so that if
something goes wrong it's easy to turn it off.

Pixel width:
I tested this in 16 bits per pixels RGB565 and 32 bits per pixels (XRGB8888).
I didn't find a userspace able to use 24 bits (RGB888), Xorg uses XRGB8888 when specifying
"DefaultDepth" to 24.

Big endian:
The DMA can be configured to handle the be->le conversion, but I can't test it, so it's not done yet.
As I don't know if there are still big endian systems with mgag200, maybe disabling DMA for big endian is the safest option ?

I think the complexity is low, as it only adds ~350 lines, less than 10% of the whole mgag200 driver (~5000 lines).

 drivers/gpu/drm/mgag200/Makefile       |   3 +-
 drivers/gpu/drm/mgag200/mgag200_dma.c  | 114 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/mgag200/mgag200_drv.c  |  43 +++++++++++++++++++++++++++
 drivers/gpu/drm/mgag200/mgag200_drv.h  |  28 ++++++++++++++++++
 drivers/gpu/drm/mgag200/mgag200_mode.c | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------------
 drivers/gpu/drm/mgag200/mgag200_reg.h  |  30 ++++++++++++++++++-
 6 files changed, 362 insertions(+), 56 deletions(-)

Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>

base-commit: 457391b0380335d5 (tag: v6.3)



^ permalink raw reply	[flat|nested] 19+ messages in thread
* Re: [PATCH 4/4] drm/mgag200: Use DMA to copy the framebuffer to the VRAM
@ 2023-05-09 21:54 kernel test robot
  0 siblings, 0 replies; 19+ messages in thread
From: kernel test robot @ 2023-05-09 21:54 UTC (permalink / raw)
  To: oe-kbuild; +Cc: lkp, Dan Carpenter

BCC: lkp@intel.com
CC: oe-kbuild-all@lists.linux.dev
In-Reply-To: <20230505124337.854845-5-jfalempe@redhat.com>
References: <20230505124337.854845-5-jfalempe@redhat.com>
TO: Jocelyn Falempe <jfalempe@redhat.com>
TO: dri-devel@lists.freedesktop.org
TO: tzimmermann@suse.de
TO: airlied@redhat.com
TO: javierm@redhat.com
TO: lyude@redhat.com
CC: Jocelyn Falempe <jfalempe@redhat.com>

Hi Jocelyn,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 457391b0380335d5e9a5babdec90ac53928b23b4]

url:    https://github.com/intel-lab-lkp/linux/commits/Jocelyn-Falempe/drm-mgag200-Rename-constant-MGAREG_Status-to-MGAREG_STATUS/20230505-204705
base:   457391b0380335d5e9a5babdec90ac53928b23b4
patch link:    https://lore.kernel.org/r/20230505124337.854845-5-jfalempe%40redhat.com
patch subject: [PATCH 4/4] drm/mgag200: Use DMA to copy the framebuffer to the VRAM
:::::: branch date: 4 days ago
:::::: commit date: 4 days ago
config: i386-randconfig-m021 (https://download.01.org/0day-ci/archive/20230510/202305100554.Q6f2fDlx-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-12) 11.3.0

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Link: https://lore.kernel.org/r/202305100554.Q6f2fDlx-lkp@intel.com/

smatch warnings:
drivers/gpu/drm/mgag200/mgag200_mode.c:419 mgag200_dwg_setup() error: uninitialized symbol 'maccess'.

vim +/maccess +419 drivers/gpu/drm/mgag200/mgag200_mode.c

414c453106255b Dave Airlie     2012-04-17  400  
b3cab0043427ad Jocelyn Falempe 2023-05-05  401  static void mgag200_dwg_setup(struct mga_device *mdev, struct drm_framebuffer *fb)
b3cab0043427ad Jocelyn Falempe 2023-05-05  402  {
b3cab0043427ad Jocelyn Falempe 2023-05-05  403  	u32 maccess;
b3cab0043427ad Jocelyn Falempe 2023-05-05  404  
b3cab0043427ad Jocelyn Falempe 2023-05-05  405  	drm_dbg(&mdev->base, "Setup DWG with %dx%d %p4cc\n",
b3cab0043427ad Jocelyn Falempe 2023-05-05  406  		fb->width, fb->height, &fb->format->format);
b3cab0043427ad Jocelyn Falempe 2023-05-05  407  
b3cab0043427ad Jocelyn Falempe 2023-05-05  408  	switch (fb->format->format) {
b3cab0043427ad Jocelyn Falempe 2023-05-05  409  	case DRM_FORMAT_RGB565:
b3cab0043427ad Jocelyn Falempe 2023-05-05  410  		maccess = MGAMAC_PW16;
b3cab0043427ad Jocelyn Falempe 2023-05-05  411  		break;
b3cab0043427ad Jocelyn Falempe 2023-05-05  412  	case DRM_FORMAT_RGB888:
b3cab0043427ad Jocelyn Falempe 2023-05-05  413  		maccess = MGAMAC_PW24;
b3cab0043427ad Jocelyn Falempe 2023-05-05  414  		break;
b3cab0043427ad Jocelyn Falempe 2023-05-05  415  	case DRM_FORMAT_XRGB8888:
b3cab0043427ad Jocelyn Falempe 2023-05-05  416  		maccess = MGAMAC_PW32;
b3cab0043427ad Jocelyn Falempe 2023-05-05  417  		break;
b3cab0043427ad Jocelyn Falempe 2023-05-05  418  	}
b3cab0043427ad Jocelyn Falempe 2023-05-05 @419  	WREG32(MGAREG_MACCESS, maccess);
b3cab0043427ad Jocelyn Falempe 2023-05-05  420  
b3cab0043427ad Jocelyn Falempe 2023-05-05  421  	/* Framebuffer width in pixel */
b3cab0043427ad Jocelyn Falempe 2023-05-05  422  	WREG32(MGAREG_PITCH, fb->width);
b3cab0043427ad Jocelyn Falempe 2023-05-05  423  
b3cab0043427ad Jocelyn Falempe 2023-05-05  424  	/* Sane default value for the drawing engine registers */
b3cab0043427ad Jocelyn Falempe 2023-05-05  425  	WREG32(MGAREG_DSTORG, 0);
b3cab0043427ad Jocelyn Falempe 2023-05-05  426  	WREG32(MGAREG_YDSTORG, 0);
b3cab0043427ad Jocelyn Falempe 2023-05-05  427  	WREG32(MGAREG_SRCORG, 0);
b3cab0043427ad Jocelyn Falempe 2023-05-05  428  	WREG32(MGAREG_CXBNDRY, 0x0FFF0000);
b3cab0043427ad Jocelyn Falempe 2023-05-05  429  	WREG32(MGAREG_YTOP, 0);
b3cab0043427ad Jocelyn Falempe 2023-05-05  430  	WREG32(MGAREG_YBOT, 0x00FFFFFF);
b3cab0043427ad Jocelyn Falempe 2023-05-05  431  
b3cab0043427ad Jocelyn Falempe 2023-05-05  432  	/* Activate blit mode DMA, only write the low part of the register */
b3cab0043427ad Jocelyn Falempe 2023-05-05  433  	WREG8(MGAREG_OPMODE, MGAOPM_DMA_BLIT);
b3cab0043427ad Jocelyn Falempe 2023-05-05  434  }
b3cab0043427ad Jocelyn Falempe 2023-05-05  435  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-05-23  6:55 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-05 12:43 [RFC PATCH 0/4] drm/mgag200: Use DMA to copy the framebuffer to the VRAM Jocelyn Falempe
2023-05-05 12:43 ` [PATCH 1/4] drm/mgag200: Rename constant MGAREG_Status to MGAREG_STATUS Jocelyn Falempe
2023-05-05 12:43 ` [PATCH 2/4] drm/mgag200: Simplify offset and scale computation Jocelyn Falempe
2023-05-08  7:44   ` Thomas Zimmermann
2023-05-09  7:25     ` Jocelyn Falempe
2023-05-05 12:43 ` [PATCH 3/4] drm/mgag200: Add IRQ support Jocelyn Falempe
2023-05-05 14:49   ` kernel test robot
2023-05-05 14:49     ` kernel test robot
2023-05-05 15:01   ` kernel test robot
2023-05-05 15:01     ` kernel test robot
2023-05-05 12:43 ` [PATCH 4/4] drm/mgag200: Use DMA to copy the framebuffer to the VRAM Jocelyn Falempe
2023-05-05 15:01   ` kernel test robot
2023-05-05 15:01     ` kernel test robot
2023-05-05 15:43   ` kernel test robot
2023-05-05 15:43     ` kernel test robot
2023-05-08  8:04   ` Thomas Zimmermann
2023-05-09  9:49     ` Jocelyn Falempe
2023-05-23  6:55       ` Jocelyn Falempe
  -- strict thread matches above, loose matches on Subject: below --
2023-05-09 21:54 kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.