All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pekka Paalanen <ppaalanen@gmail.com>
To: Igor Torrente <igormtorrente@gmail.com>
Cc: hamohammed.sa@gmail.com, Thomas Zimmermann <tzimmermann@suse.de>,
	rodrigosiqueiramelo@gmail.com, airlied@linux.ie,
	Leandro Ribeiro <leandro.ribeiro@collabora.com>,
	melissa.srw@gmail.com, dri-devel@lists.freedesktop.org,
	kernel test robot <lkp@intel.com>
Subject: Re: [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats
Date: Thu, 11 Nov 2021 16:37:34 +0200	[thread overview]
Message-ID: <20211111163734.0a6aefa6@eldfell> (raw)
In-Reply-To: <CAOA8r4H-=NAxuMzJumDDHxgq2_opg6509sJN-W7EM3+LNsey2g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7655 bytes --]

On Thu, 11 Nov 2021 11:07:21 -0300
Igor Torrente <igormtorrente@gmail.com> wrote:

> Hi Pekka,
> 
> On Thu, Nov 11, 2021 at 6:33 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Wed, 10 Nov 2021 13:56:54 -0300
> > Igor Torrente <igormtorrente@gmail.com> wrote:
> >  
> > > On Tue, Nov 9, 2021 at 8:40 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:  
> > > >
> > > > Hi Igor,
> > > >
> > > > again, that is a really nice speed-up. Unfortunately, I find the code
> > > > rather messy and hard to follow. I hope my comments below help with
> > > > re-designing it to be easier to understand.
> > > >
> > > >
> > > > On Tue, 26 Oct 2021 08:34:06 -0300
> > > > Igor Torrente <igormtorrente@gmail.com> wrote:
> > > >  
> > > > > Currently the blend function only accepts XRGB_8888 and ARGB_8888
> > > > > as a color input.
> > > > >
> > > > > This patch refactors all the functions related to the plane composition
> > > > > to overcome this limitation.
> > > > >
> > > > > Now the blend function receives a struct `vkms_pixel_composition_functions`
> > > > > containing two handlers.
> > > > >
> > > > > One will generate a buffer of each line of the frame with the pixels
> > > > > converted to ARGB16161616. And the other will take this line buffer,
> > > > > do some computation on it, and store the pixels in the destination.
> > > > >
> > > > > Both the handlers have the same signature. They receive a pointer to
> > > > > the pixels that will be processed(`pixels_addr`), the number of pixels
> > > > > that will be treated(`length`), and the intermediate buffer of the size
> > > > > of a frame line (`line_buffer`).
> > > > >
> > > > > The first function has been totally described previously.  
> > > >
> > > > What does this sentence mean?  
> > >
> > > In the sentence "One will generate...", I give an overview of the two types of
> > > handlers. And the overview of the first handler describes the full behavior of
> > > it.
> > >
> > > But it doesn't look clear enough, I will improve it in the future.
> > >  
> > > >  
> > > > >
> > > > > The second is more interesting, as it has to perform two roles depending
> > > > > on where it is called in the code.
> > > > >
> > > > > The first is to convert(if necessary) the data received in the
> > > > > `line_buffer` and write in the memory pointed by `pixels_addr`.
> > > > >
> > > > > The second role is to perform the `alpha_blend`. So, it takes the pixels
> > > > > in the `line_buffer` and `pixels_addr`, executes the blend, and stores
> > > > > the result back to the `pixels_addr`.
> > > > >
> > > > > The per-line implementation was chosen for performance reasons.
> > > > > The per-pixel functions were having performance issues due to indirect
> > > > > function call overhead.
> > > > >
> > > > > The per-line code trades off memory for execution time. The `line_buffer`
> > > > > allows us to diminish the number of function calls.
> > > > >
> > > > > Results in the IGT test `kms_cursor_crc`:
> > > > >
> > > > > |                     Frametime                       |
> > > > > |:---------------:|:---------:|:----------:|:--------:|
> > > > > |  implmentation  |  Current  |  Per-pixel | Per-line |
> > > > > | frametime range |  8~22 ms  |  32~56 ms  |  6~19 ms |
> > > > > |     Average     |  10.0 ms  |   35.8 ms  |  8.6 ms  |
> > > > >
> > > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > > > > ---
> > > > > V2: Improves the performance drastically, by perfoming the operations
> > > > >     per-line and not per-pixel(Pekka Paalanen).
> > > > >     Minor improvements(Pekka Paalanen).
> > > > > ---
> > > > >  drivers/gpu/drm/vkms/vkms_composer.c | 321 ++++++++++++++++-----------
> > > > >  drivers/gpu/drm/vkms/vkms_formats.h  | 155 +++++++++++++
> > > > >  2 files changed, 342 insertions(+), 134 deletions(-)
> > > > >  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > > >
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > index 383ca657ddf7..69fe3a89bdc9 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c

...

> > > > > +struct vkms_pixel_composition_functions {
> > > > > +     void (*get_src_line)(void *pixels_addr, int length, u64 *line_buffer);
> > > > > +     void (*set_output_line)(void *pixels_addr, int length, u64 *line_buffer);  
> > > >
> > > > I would be a little more comfortable if instead of u64 *line_buffer you
> > > > would have something like
> > > >
> > > > struct line_buffer {
> > > >         u16 *row;
> > > >         size_t nelem;
> > > > }
> > > >
> > > > so that the functions to be plugged into these function pointers could
> > > > assert that you do not accidentally overflow the array (which would
> > > > imply a code bug in kernel).
> > > >
> > > > One could perhaps go even for:
> > > >
> > > > struct line_pixel {
> > > >         u16 r, g, b, a;
> > > > };
> > > >
> > > > struct line_buffer {
> > > >         struct line_pixel *row;
> > > >         size_t npixels;
> > > > };  
> > >
> > > If we decide to follow this representation, would it be possible
> > > to calculate the crc in the similar way that is being done currently?
> > >
> > > Something like that:
> > >
> > > crc = crc32_le(crc, line_buffer.row, w * sizeof(line_pixel));  
> >
> > Hi Igor,
> >
> > yes. I think the CRC calculated does not need to be reproducible in
> > userspace, so you can very well compute it from the internal
> > intermediate representation. It also does not need to be portable
> > between architectures, AFAIU.  
> 
> Great! This will make things easier.
> 
> >  
> > > I mean, If the compiler can decide to put a padding somewhere, it
> > > would mess with the crc value. Right?  
> >
> > Padding could mess it up, yes. However, I think in kernel it is a
> > convention to define structs (especially UAPI structs but this is not
> > one) such that there is no implicit padding. So there must be some
> > recommended practises on how to achieve and ensure that.
> >
> > The size of struct line_pixel as defined above is 8 bytes which is a
> > "very round" number, and every field has the same type, so there won't
> > be gaps between fields either. So I think the struct should already be
> > fine and have no padding, but how to make sure it is, I'm not sure what
> > you would do in kernel land.
> >
> > In userspace I would put a static assert to ensure that
> > sizeof(struct line_pixel) = 8. That would be enough, because sizeof
> > counts not just internal implicit padding but also the needed size
> > extension for alignment in an array of those. The accumulated size of
> > the fields individually is 8 bytes, so if the struct size is 8, there
> > cannot be padding.
> >  
> 
> Apparently the kernel uses a compiler extension in a macro to do this
> kind of struct packing.
> 
> include/linux/compiler_attributes.h
> 265:#define __packed                        __attribute__((__packed__))

Hi Igor,

we do not actually want to force packing, though.

If there would be padding without packing, then packing may incur a
noticeable speed penalty in accessing the fields. We don't want to risk
that.

So I think it's better to just assert that no padding exists instead.
There would be something quite strange going on if there was padding in
this case, but better safe than sorry, because debugging that would be
awful.


Thanks,
pq

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2021-11-11 14:37 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-26 11:34 [PATCH v2 0/8] Add new formats support to vkms Igor Torrente
2021-10-26 11:34 ` [PATCH v2 1/8] drm: vkms: Replace the deprecated drm_mode_config_init Igor Torrente
2021-10-26 11:34 ` [PATCH v2 2/8] drm: vkms: Alloc the compose frame using vzalloc Igor Torrente
2021-10-26 11:34 ` [PATCH v2 3/8] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES Igor Torrente
2021-11-03 15:40   ` Thomas Zimmermann
2021-10-26 11:34 ` [PATCH v2 4/8] drm: vkms: Add fb information to `vkms_writeback_job` Igor Torrente
2021-11-03 15:45   ` Thomas Zimmermann
2021-11-03 19:18     ` Igor Torrente
2021-11-04  7:21       ` Thomas Zimmermann
2021-10-26 11:34 ` [PATCH v2 5/8] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation Igor Torrente
2021-10-28 21:38   ` Leandro Ribeiro
2021-11-03 15:03     ` Igor Torrente
2021-11-03 15:11       ` Leandro Ribeiro
2021-11-03 15:37         ` Thomas Zimmermann
2021-11-03 18:41           ` Igor Torrente
2021-10-26 11:34 ` [PATCH v2 6/8] drm: vkms: Refactor the plane composer to accept new formats Igor Torrente
2021-11-09 11:40   ` Pekka Paalanen
2021-11-10 16:56     ` Igor Torrente
2021-11-11  9:33       ` Pekka Paalanen
2021-11-11 14:07         ` Igor Torrente
2021-11-11 14:37           ` Pekka Paalanen [this message]
2021-11-12 12:50             ` Igor Torrente
2021-10-26 11:34 ` [PATCH v2 7/8] drm: vkms: Exposes ARGB_1616161616 and adds XRGB_16161616 formats Igor Torrente
2021-10-26 11:34 ` [PATCH v2 8/8] drm: vkms: Add support the RGB565 format Igor Torrente
2021-10-26 11:34 ` [PATCH v2 8/8] drm: vkms: Add support to " Igor Torrente
2021-11-09  9:32 ` [PATCH v2 0/8] Add new formats support to vkms Pekka Paalanen
2021-11-10 17:32   ` Igor Torrente
2021-11-11  8:32     ` Pekka Paalanen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211111163734.0a6aefa6@eldfell \
    --to=ppaalanen@gmail.com \
    --cc=airlied@linux.ie \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hamohammed.sa@gmail.com \
    --cc=igormtorrente@gmail.com \
    --cc=leandro.ribeiro@collabora.com \
    --cc=lkp@intel.com \
    --cc=melissa.srw@gmail.com \
    --cc=rodrigosiqueiramelo@gmail.com \
    --cc=tzimmermann@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.