From mboxrd@z Thu Jan  1 00:00:00 1970
From: Thomas Hellstrom <thellstrom@vmware.com>
Subject: Re: [PATCH 10/13] drm/ttm: provide dma aware ttm page pool code V7
Date: Fri, 11 Nov 2011 09:06:37 +0100
Message-ID: <4EBCD78D.2010304@vmware.com>
References: <1320975417-13871-1-git-send-email-j.glisse@gmail.com>
	<1320975417-13871-11-git-send-email-j.glisse@gmail.com>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============0781228640=="
Return-path: <dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org>
Received: from smtp-outbound-2.vmware.com (smtp-outbound-2.vmware.com
	[65.115.85.73])
	by gabe.freedesktop.org (Postfix) with ESMTP id F2DE2A0D58
	for <dri-devel@lists.freedesktop.org>;
	Fri, 11 Nov 2011 00:09:14 -0800 (PST)
In-Reply-To: <1320975417-13871-11-git-send-email-j.glisse@gmail.com>
List-Unsubscribe: <http://lists.freedesktop.org/mailman/options/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=unsubscribe>
List-Archive: <http://lists.freedesktop.org/archives/dri-devel>
List-Post: <mailto:dri-devel@lists.freedesktop.org>
List-Help: <mailto:dri-devel-request@lists.freedesktop.org?subject=help>
List-Subscribe: <http://lists.freedesktop.org/mailman/listinfo/dri-devel>,
	<mailto:dri-devel-request@lists.freedesktop.org?subject=subscribe>
Sender: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
Errors-To: dri-devel-bounces+sf-dri-devel=m.gmane.org@lists.freedesktop.org
To: j.glisse@gmail.com
Cc: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org

This is a multi-part message in MIME format.
--===============0781228640==
Content-Type: multipart/alternative;
 boundary="------------000009050809080403000708"

This is a multi-part message in MIME format.
--------------000009050809080403000708
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable

On 11/11/2011 02:36 AM, j.glisse@gmail.com wrote:
> From: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com>
>
> In TTM world the pages for the graphic drivers are kept in three differ=
ent
> pools: write combined, uncached, and cached (write-back). When the page=
s
> are used by the graphic driver the graphic adapter via its built in MMU
> (or AGP) programs these pages in. The programming requires the virtual =
address
> (from the graphic adapter perspective) and the physical address (either=
 System RAM
> or the memory on the card) which is obtained using the pci_map_* calls =
(which does the
> virtual to physical - or bus address translation). During the graphic a=
pplication's
> "life" those pages can be shuffled around, swapped out to disk, moved f=
rom the
> VRAM to System RAM or vice-versa. This all works with the existing TTM =
pool code
> - except when we want to use the software IOTLB (SWIOTLB) code to "map"=
 the physical
> addresses to the graphic adapter MMU. We end up programming the bounce =
buffer's
> physical address instead of the TTM pool memory's and get a non-worky d=
river.
> There are two solutions:
> 1) using the DMA API to allocate pages that are screened by the DMA API=
, or
> 2) using the pci_sync_* calls to copy the pages from the bounce-buffer =
and back.
>
> This patch fixes the issue by allocating pages using the DMA API. The s=
econd
> is a viable option - but it has performance drawbacks and potential cor=
rectness
> issues - think of the write cache page being bounced (SWIOTLB->TTM), th=
e
> WC is set on the TTM page and the copy from SWIOTLB not making it to th=
e TTM
> page until the page has been recycled in the pool (and used by another =
application).
>
> The bounce buffer does not get activated often - only in cases where we=
 have
> a 32-bit capable card and we want to use a page that is allocated above=
 the
> 4GB limit. The bounce buffer offers the solution of copying the content=
s
> of that 4GB page to an location below 4GB and then back when the operat=
ion has been
> completed (or vice-versa). This is done by using the 'pci_sync_*' calls=
.
> Note: If you look carefully enough in the existing TTM page pool code y=
ou will
> notice the GFP_DMA32 flag is used  - which should guarantee that the pr=
ovided page
> is under 4GB. It certainly is the case, except this gets ignored in two=
 cases:
>   - If user specifies 'swiotlb=3Dforce' which bounces_every_  page.
>   - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and=
 the
>     underlaying PFN's aren't necessarily under 4GB).
>
> To not have this extra copying done the other option is to allocate the=
 pages
> using the DMA API so that there is not need to map the page and perform=
 the
> expensive 'pci_sync_*' calls.
>
> This DMA API capable TTM pool requires for this the 'struct device' to
> properly call the DMA API. It also has to track the virtual and bus add=
ress of
> the page being handed out in case it ends up being swapped out or de-al=
located -
> to make sure it is de-allocated using the proper's 'struct device'.
>
> Implementation wise the code keeps two lists: one that is attached to t=
he
> 'struct device' (via the dev->dma_pools list) and a global one to be us=
ed when
> the 'struct device' is unavailable (think shrinker code). The global li=
st can
> iterate over all of the 'struct device' and its associated dma_pool. Th=
e list
> in dev->dma_pools can only iterate the device's dma_pool.
>                                                              /[struct d=
evice_pool]\
>          /---------------------------------------------------| dev     =
           |
>         /                                            +-------| dma_pool=
           |
>   /-----+------\                                    /        \---------=
-----------/
>   |struct device|      /-->[struct dma_pool for WC]</         /[struct =
device_pool]\
>   | dma_pools   +----+                                     /-| dev     =
           |
>   |  ...        |    \--->[struct dma_pool for uncached]<-/--| dma_pool=
           |
>   \-----+------/                                         /   \---------=
-----------/
>          \----------------------------------------------/
> [Two pools associated with the device (WC and UC), and the parallel lis=
t
> containing the 'struct dev' and 'struct dma_pool' entries]
>
> The maximum amount of dma pools a device can have is six: write-combine=
d,
> uncached, and cached; then there are the DMA32 variants which are:
> write-combined dma32, uncached dma32, and cached dma32.
>
> Currently this code only gets activated when any variant of the SWIOTLB=
 IOMMU
> code is running (Intel without VT-d, AMD without GART, IBM Calgary and =
Xen PV
> with PCI devices).
>
> Tested-by: Michel D=C3=A4nzer<michel@daenzer.net>
> [v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
> [v2: Major overhaul - added 'inuse_list' to seperate used from inuse an=
d reorder
> the order of lists to get better performance.]
> [v3: Added comments/and some logic based on review, Added Jerome tag]
> [v4: rebase on top of ttm_tt&  ttm_backend merge]
> [v5: rebase on top of ttm memory accounting overhaul]
> [v6: New rebase on top of more memory accouting changes]
> [v7: well rebase on top of no memory accounting changes]
> Reviewed-by: Jerome Glisse<jglisse@redhat.com>
> Signed-off-by: Konrad Rzeszutek Wilk<konrad.wilk@oracle.com>
> ---
>   =20
Acked-by: Thomas Hellstrom <thellstrom@vmware.com>

--------------000009050809080403000708
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content=3D"text/html; charset=3DUTF-8" http-equiv=3D"Content-Type=
">
</head>
<body bgcolor=3D"#ffffff" text=3D"#000000">
On 11/11/2011 02:36 AM, <a class=3D"moz-txt-link-abbreviated" href=3D"mai=
lto:j.glisse@gmail.com">j.glisse@gmail.com</a> wrote:
<blockquote
 cite=3D"mid:1320975417-13871-11-git-send-email-j.glisse@gmail.com"
 type=3D"cite">
  <pre wrap=3D"">From: Konrad Rzeszutek Wilk <a moz-do-not-send=3D"true"
 class=3D"moz-txt-link-rfc2396E" href=3D"mailto:konrad.wilk@oracle.com">&=
lt;konrad.wilk@oracle.com&gt;</a>

In TTM world the pages for the graphic drivers are kept in three differen=
t
pools: write combined, uncached, and cached (write-back). When the pages
are used by the graphic driver the graphic adapter via its built in MMU
(or AGP) programs these pages in. The programming requires the virtual ad=
dress
(from the graphic adapter perspective) and the physical address (either S=
ystem RAM
or the memory on the card) which is obtained using the pci_map_* calls (w=
hich does the
virtual to physical - or bus address translation). During the graphic app=
lication's
"life" those pages can be shuffled around, swapped out to disk, moved fro=
m the
VRAM to System RAM or vice-versa. This all works with the existing TTM po=
ol code
- except when we want to use the software IOTLB (SWIOTLB) code to "map" t=
he physical
addresses to the graphic adapter MMU. We end up programming the bounce bu=
ffer's
physical address instead of the TTM pool memory's and get a non-worky dri=
ver.
There are two solutions:
1) using the DMA API to allocate pages that are screened by the DMA API, =
or
2) using the pci_sync_* calls to copy the pages from the bounce-buffer an=
d back.

This patch fixes the issue by allocating pages using the DMA API. The sec=
ond
is a viable option - but it has performance drawbacks and potential corre=
ctness
issues - think of the write cache page being bounced (SWIOTLB-&gt;TTM), t=
he
WC is set on the TTM page and the copy from SWIOTLB not making it to the =
TTM
page until the page has been recycled in the pool (and used by another ap=
plication).

The bounce buffer does not get activated often - only in cases where we h=
ave
a 32-bit capable card and we want to use a page that is allocated above t=
he
4GB limit. The bounce buffer offers the solution of copying the contents
of that 4GB page to an location below 4GB and then back when the operatio=
n has been
completed (or vice-versa). This is done by using the 'pci_sync_*' calls.
Note: If you look carefully enough in the existing TTM page pool code you=
 will
notice the GFP_DMA32 flag is used  - which should guarantee that the prov=
ided page
is under 4GB. It certainly is the case, except this gets ignored in two c=
ases:
 - If user specifies 'swiotlb=3Dforce' which bounces <span
 class=3D"moz-txt-underscore"><span class=3D"moz-txt-tag">_</span>every<s=
pan
 class=3D"moz-txt-tag">_</span></span> page.
 - If user is using a Xen's PV Linux guest (which uses the SWIOTLB and th=
e
   underlaying PFN's aren't necessarily under 4GB).

To not have this extra copying done the other option is to allocate the p=
ages
using the DMA API so that there is not need to map the page and perform t=
he
expensive 'pci_sync_*' calls.

This DMA API capable TTM pool requires for this the 'struct device' to
properly call the DMA API. It also has to track the virtual and bus addre=
ss of
the page being handed out in case it ends up being swapped out or de-allo=
cated -
to make sure it is de-allocated using the proper's 'struct device'.

Implementation wise the code keeps two lists: one that is attached to the
'struct device' (via the dev-&gt;dma_pools list) and a global one to be u=
sed when
the 'struct device' is unavailable (think shrinker code). The global list=
 can
iterate over all of the 'struct device' and its associated dma_pool. The =
list
in dev-&gt;dma_pools can only iterate the device's dma_pool.
                                                            /[struct devi=
ce_pool]\
        /---------------------------------------------------| dev        =
        |
       /                                            +-------| dma_pool   =
        |
 /-----+------\                                    /        \------------=
--------/
 <code class=3D"moz-txt-verticalline"><span class=3D"moz-txt-tag">|</span=
>struct device<span
 class=3D"moz-txt-tag">|</span></code>     /--&gt;[struct dma_pool for WC=
]&lt;/         /[struct device_pool]\
 | dma_pools   +----+                                     /-| dev        =
        |
 |  ...        |    \---&gt;[struct dma_pool for uncached]&lt;-/--| dma_p=
ool           |
 \-----+------/                                         /   \------------=
--------/
        \----------------------------------------------/
[Two pools associated with the device (WC and UC), and the parallel list
containing the 'struct dev' and 'struct dma_pool' entries]

The maximum amount of dma pools a device can have is six: write-combined,
uncached, and cached; then there are the DMA32 variants which are:
write-combined dma32, uncached dma32, and cached dma32.

Currently this code only gets activated when any variant of the SWIOTLB I=
OMMU
code is running (Intel without VT-d, AMD without GART, IBM Calgary and Xe=
n PV
with PCI devices).

Tested-by: Michel D=C3=A4nzer <a moz-do-not-send=3D"true"
 class=3D"moz-txt-link-rfc2396E" href=3D"mailto:michel@daenzer.net">&lt;m=
ichel@daenzer.net&gt;</a>
[v1: Using swiotlb_nr_tbl instead of swiotlb_enabled]
[v2: Major overhaul - added 'inuse_list' to seperate used from inuse and =
reorder
the order of lists to get better performance.]
[v3: Added comments/and some logic based on review, Added Jerome tag]
[v4: rebase on top of ttm_tt &amp; ttm_backend merge]
[v5: rebase on top of ttm memory accounting overhaul]
[v6: New rebase on top of more memory accouting changes]
[v7: well rebase on top of no memory accounting changes]
Reviewed-by: Jerome Glisse <a moz-do-not-send=3D"true"
 class=3D"moz-txt-link-rfc2396E" href=3D"mailto:jglisse@redhat.com">&lt;j=
glisse@redhat.com&gt;</a>
Signed-off-by: Konrad Rzeszutek Wilk <a moz-do-not-send=3D"true"
 class=3D"moz-txt-link-rfc2396E" href=3D"mailto:konrad.wilk@oracle.com">&=
lt;konrad.wilk@oracle.com&gt;</a>
---
  </pre>
</blockquote>
Acked-by: Thomas Hellstrom <a class=3D"moz-txt-link-rfc2396E" href=3D"mai=
lto:thellstrom@vmware.com">&lt;thellstrom@vmware.com&gt;</a><br>
</body>
</html>

--------------000009050809080403000708--

--===============0781228640==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

--===============0781228640==--