Re: A few questions about the best way to implement RandR 1.4 / PRIME buffer sharing

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Aaron Plattner <aplattner@nvidia.com>
To: Dave Airlie <airlied@gmail.com>
Cc: "linaro-mm-sig@lists.linaro.org" <linaro-mm-sig@lists.linaro.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>
Subject: Re: A few questions about the best way to implement RandR 1.4 / PRIME buffer sharing
Date: Tue, 4 Sep 2012 13:57:32 -0700	[thread overview]
Message-ID: <50466B3C.9050204@nvidia.com> (raw)
In-Reply-To: <CAPM=9tyJzVFOQfrFScL6987PXqnrj0Te69eUR5L90MwTwMdtLQ@mail.gmail.com>

On 08/31/2012 08:00 PM, Dave Airlie wrote:
>> object interface, so that Optimus-based laptops can use our driver to drive
>> the discrete GPU and display on the integrated GPU.  The good news is that
>> I've got a proof of concept working.
>
> Don't suppose you'll be interested in adding the other method at some point as
> well? since saving power is probably important to a lot of people

That's milestone 2.  I'm focusing on display offload to start because it's
easier to implement and lays the groundwork for the kernel pieces.  I have to
emphasize that I'm just doing a feasibility study right now and I can't promise
that we're going to officially support this stuff.

>> During a review of the current code, we came up with a few concerns:
>>
>> 1. The output source is responsible for allocating the shared memory
>>
>> Right now, the X server calls CreatePixmap on the output source screen and
>> then expects the output sink screen to be able to display from whatever memory
>> the source allocates.  Right now, the source has no mechanism for asking the
>> sink what its requirements are for the surface.  I'm using our own internal
>> pitch alignment requirements and that seems to be good enough for the Intel
>> device to scan out, but that could be pure luck.
>
> Well in theory it might be nice but it would have been premature since so far
> the only interactions for prime are combination of intel, nvidia and AMD, and
> I think everyone has fairly similar pitch alignment requirements, I'd be
> interested in adding such an interface but I don't think its some I personally
> would be working on.

Okay.  Hopefully that won't be too painful to add if we ever need it in the
future.

>> other, or is it sufficient to just define a lowest common denominator format
>> and if your hardware can't deal with that format, you just don't get to share
>> buffers?
>
> At the moment I'm happy to just go with linear, minimum pitch alignment 64 or

256, for us.

> something as a base standard, but yeah I'm happy for it to work either way,
> just don't have enough evidence it's worth it yet. I've not looked at ARM
> stuff, so patches welcome if people consider they need to use this stuff for
> SoC devices.

We can always hack it to whatever is necessary if we see that the sink side
driver is Tegra, but I was hoping for something more general.

>> 2. There's no fallback mechanism if sharing can't be negotiated
>>
>> If RandR fails to share a pixmap with the output sink screen, the whole
>> modeset fails.  This means you'll end up not seeing anything on the screen and
>> you'll probably think your computer locked up.  Should there be some sort of
>> software copy fallback to ensure that something at least shows up on the
>> display?
>
> Uggh, it would be fairly slow and unuseable, I'd rather they saw nothing, but
> again open to suggestions on how to make this work, since it might fail for
> other reasons and in that case there is still nothing a sw copy can do. What
> happens if the slave intel device just fails to allocate a pixmap, but yeah
> I'm willing to think about it a bit more when we have some reference
> implementations.

Just rolling back the modeset operation to whatever was working before would be
a good start.

It's worse than that on my current laptop, though, since our driver sees a
phantom CRT output and we happily start driving pixels to it that end up going
nowhere.  I'll need to think about what the right behavior is there since I
don't know if we want to rely on an X client to make that configuration work.

>> 3. How should the memory be allocated?
>>
>> In the prototype I threw together, I'm allocating the shared memory using
>> shm_open and then exporting that as a dma-buf file descriptor using an ioctl I
>> added to the kernel, and then importing that memory back into our driver
>> through dma_buf_attach & dma_buf_map_attachment.  Does it make sense for
>> user-space programs to be able to export shmfs files like that?  Should that
>> interface go in DRM / GEM / PRIME instead?  Something else?  I'm pretty
>> unfamiliar with this kernel code so any suggestions would be appreciated.
>
> Your kernel driver should in theory be doing it all, if you allocate shared
> pixmaps in GTT accessible memory, then you need an ioctl to tell your kernel
> driver to export the dma buf to the fd handle.  (assuming we get rid of the
> _GPL, which people have mentioned they are open to doing). We have handle->fd
> and fd->handle interfaces on DRM, you'd need something similiar on the nvidia
> kernel driver interface.

Okay, I can do that.  We already have a mechanism for importing buffers
allocated elsewhere so reusing that for shmfs and/or dma-buf seemed like a
natural extension.  I don't think adding a separate ioctl for exporting our own
allocations will add too much extra code.

> Yes for 4 some sort of fencing is being worked on by Maarten for other stuff
> but would be a pre-req for doing this, and also some devices don't want
> fullscreen updates, like USB, so doing flipped updates would have to be
> optional or negoitated. It makes sense for us as well since things like
> gnome-shell can do full screen pageflips and we have to do full screen dirty
> updates.

Right now my implementation has two sources of tearing:

1. The dGPU reads the vidmem primary surface asynchronously from its own
    rendering to it.

2. The iGPU fetches the shared surface for display asynchronously from the dGPU
    writing into it.

#1 I can fix within our driver.  For #2, I don't want to rely on the dGPU being
able to push complete frames over the bus during vblank in response to an iGPU
fence trigger so I was thinking we would want double-buffering all the time.
Also, I was hoping to set up a proper flip chain between the dGPU, the dGPU's
DMA engine, and the Intel display engine so that for full-screen applications,
glXSwapBuffers is stalled properly without relying on the CPU to schedule
things.  Maybe that's overly ambitious for now?

-- Aaron

next prev parent reply	other threads:[~2012-09-04 20:57 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-30 17:31 A few questions about the best way to implement RandR 1.4 / PRIME buffer sharing Aaron Plattner
2012-08-30 17:34 ` [Linaro-mm-sig] " Aaron Plattner
2012-09-01 11:28   ` Daniel Vetter
2012-09-01  3:00 ` Dave Airlie
2012-09-04 20:57   ` Aaron Plattner [this message]
2012-09-04 21:22     ` [Linaro-mm-sig] " Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50466B3C.9050204@nvidia.com \
    --to=aplattner@nvidia.com \
    --cc=airlied@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.