From: Mario Kleiner <mario.kleiner.de@gmail.com>
To: "dri-devel@lists.freedesktop.org" <dri-devel@lists.freedesktop.org>
Cc: "Ben Skeggs" <skeggsb@gmail.com>,
"Alex Deucher" <alexdeucher@gmail.com>,
"Christian König" <deathsimple@vodafone.de>,
"Thomas Hellstrom" <thellstrom@vmware.com>,
m.szyprowski@samsung.com, LKML <linux-kernel@vger.kernel.org>,
kamal@canonical.com, ben@decadent.org.uk,
"Mario Kleiner" <mario.kleiner.de@gmail.com>
Subject: CONFIG_DMA_CMA causes ttm performance problems/hangs.
Date: Fri, 08 Aug 2014 19:42:51 +0200 [thread overview]
Message-ID: <53E50C1B.9080507@gmail.com> (raw)
Hi all,
there is a rather severe performance problem i accidentally found when
trying to give Linux 3.16.0 a final test on a x86_64 MacBookPro under
Ubuntu 14.04 LTS with nouveau as graphics driver.
I was lazy and just installed the Ubuntu precompiled mainline kernel.
That kernel happens to have CONFIG_DMA_CMA=y set, with a default CMA
(contiguous memory allocator) size of 64 MB. Older Ubuntu kernels
weren't compiled with CMA, so i only observed this on 3.16, but previous
kernels would likely be affected too.
After a few minutes of regular desktop use like switching workspaces,
scrolling text in a terminal window, Firefox with multiple tabs open,
Thunderbird etc. (tested with KDE/Kwin, with/without desktop
composition), i get chunky desktop updates, then multi-second freezes,
after a few minutes the desktop hangs for over a minute on almost any
GUI action like switching windows etc. --> Unuseable.
ftrace'ing shows the culprit being this callchain (typical good/bad
example ftrace snippets at the end of this mail):
...ttm dma coherent memory allocations, e.g., from
__ttm_dma_alloc_page() ... --> dma_alloc_coherent() --> platform
specific hooks ... -> dma_generic_alloc_coherent() [on x86_64] -->
dma_alloc_from_contiguous()
dma_alloc_from_contiguous() is a no-op without CONFIG_DMA_CMA, or when
the machine is booted with kernel boot cmdline parameter "cma=0", so it
triggers the fast alloc_pages_node() fallback at least on x86_64.
With CMA, this function becomes progressively more slow with every
minute of desktop use, e.g., runtimes going up from < 0.3 usecs to
hundreds or thousands of microseconds (before it gives up and
alloc_pages_node() fallback is used), so this causes the
multi-second/minute hangs of the desktop.
So it seems ttm memory allocations quickly fragment and/or exhaust the
CMA memory area, and dma_alloc_from_contiguous() tries very hard to find
a fitting hole big enough to satisfy allocations with a retry loop (see
http://lxr.free-electrons.com/source/drivers/base/dma-contiguous.c#L339)
that takes forever.
This is not good, also not for other devices which actually need a
non-fragmented CMA for DMA, so what to do? I doubt most current gpus
still need physically contiguous dma memory, maybe with exception of
some embedded gpus?
My naive approach would be to add a new gfp_t flag a la ___GFP_AVOIDCMA,
and make callers of dma_alloc_from_contiguous() refrain from doing so if
they have some fallback for getting memory. And then add that flag to
ttm's ttm_dma_populate() gfp_flags, e.g., around here:
http://lxr.free-electrons.com/source/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c#L884
However i'm not familiar enough with memory management, so likely
greater minds here have much better ideas on how to deal with this?
thanks,
-mario
Typical snippet from an example trace of a badly stalling desktop with
CMA (alloc_pages_node() fallback may have been missing in this traces
ftrace_filter settings):
1) | ttm_dma_pool_get_pages [ttm]() {
1) | ttm_dma_page_pool_fill_locked [ttm]() {
1) | ttm_dma_pool_alloc_new_pages [ttm]() {
1) | __ttm_dma_alloc_page [ttm]() {
1) | dma_generic_alloc_coherent() {
1) ! 1873.071 us | dma_alloc_from_contiguous();
1) ! 1874.292 us | }
1) ! 1875.400 us | }
1) | __ttm_dma_alloc_page [ttm]() {
1) | dma_generic_alloc_coherent() {
1) ! 1868.372 us | dma_alloc_from_contiguous();
1) ! 1869.586 us | }
1) ! 1870.053 us | }
1) | __ttm_dma_alloc_page [ttm]() {
1) | dma_generic_alloc_coherent() {
1) ! 1871.085 us | dma_alloc_from_contiguous();
1) ! 1872.240 us | }
1) ! 1872.669 us | }
1) | __ttm_dma_alloc_page [ttm]() {
1) | dma_generic_alloc_coherent() {
1) ! 1888.934 us | dma_alloc_from_contiguous();
1) ! 1890.179 us | }
1) ! 1890.608 us | }
1) 0.048 us | ttm_set_pages_caching [ttm]();
1) ! 7511.000 us | }
1) ! 7511.306 us | }
1) ! 7511.623 us | }
The good case (with cma=0 kernel cmdline, so dma_alloc_from_contiguous()
no-ops,)
0) | ttm_dma_pool_get_pages [ttm]() {
0) | ttm_dma_page_pool_fill_locked [ttm]() {
0) | ttm_dma_pool_alloc_new_pages [ttm]() {
0) | __ttm_dma_alloc_page [ttm]() {
0) | dma_generic_alloc_coherent() {
0) 0.171 us | dma_alloc_from_contiguous();
0) 0.849 us | __alloc_pages_nodemask();
0) 3.029 us | }
0) 3.882 us | }
0) | __ttm_dma_alloc_page [ttm]() {
0) | dma_generic_alloc_coherent() {
0) 0.037 us | dma_alloc_from_contiguous();
0) 0.163 us | __alloc_pages_nodemask();
0) 1.408 us | }
0) 1.719 us | }
0) | __ttm_dma_alloc_page [ttm]() {
0) | dma_generic_alloc_coherent() {
0) 0.035 us | dma_alloc_from_contiguous();
0) 0.153 us | __alloc_pages_nodemask();
0) 1.454 us | }
0) 1.720 us | }
0) | __ttm_dma_alloc_page [ttm]() {
0) | dma_generic_alloc_coherent() {
0) 0.036 us | dma_alloc_from_contiguous();
0) 0.112 us | __alloc_pages_nodemask();
0) 1.211 us | }
0) 1.541 us | }
0) 0.035 us | ttm_set_pages_caching [ttm]();
0) + 10.902 us | }
0) + 11.577 us | }
0) + 11.988 us | }
next reply other threads:[~2014-08-08 17:42 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-08-08 17:42 Mario Kleiner [this message]
2014-08-09 5:39 ` CONFIG_DMA_CMA causes ttm performance problems/hangs Thomas Hellstrom
2014-08-09 13:33 ` Konrad Rzeszutek Wilk
2014-08-09 13:58 ` Thomas Hellstrom
2014-08-10 3:11 ` Mario Kleiner
2014-08-10 11:03 ` Thomas Hellstrom
2014-08-10 18:02 ` Mario Kleiner
2014-08-11 10:11 ` Thomas Hellstrom
2014-08-11 15:17 ` Jerome Glisse
2014-08-12 12:12 ` Mario Kleiner
2014-08-12 20:47 ` Konrad Rzeszutek Wilk
2014-08-13 1:50 ` Michel Dänzer
2014-08-13 2:04 ` Mario Kleiner
2014-08-13 2:17 ` Jerome Glisse
2014-08-13 8:42 ` Lucas Stach
2014-08-13 2:04 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53E50C1B.9080507@gmail.com \
--to=mario.kleiner.de@gmail.com \
--cc=alexdeucher@gmail.com \
--cc=ben@decadent.org.uk \
--cc=deathsimple@vodafone.de \
--cc=dri-devel@lists.freedesktop.org \
--cc=kamal@canonical.com \
--cc=linux-kernel@vger.kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=skeggsb@gmail.com \
--cc=thellstrom@vmware.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).