Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: "Christian König" <christian.koenig@amd.com>
To: "Michel Dänzer" <michel@daenzer.net>,
	"Ilia Mirkin" <imirkin@alum.mit.edu>
Cc: dri-devel <dri-devel@lists.freedesktop.org>,
	amd-gfx mailing list <amd-gfx@lists.freedesktop.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE
Date: Sun, 29 Apr 2018 09:02:43 +0200	[thread overview]
Message-ID: <d5f9459e-801c-e592-d06d-c9052ef3ad4d@amd.com> (raw)
In-Reply-To: <e8166443-14f0-1eb3-a6a2-d425792b0643@daenzer.net>

Am 29.04.2018 um 01:02 schrieb Michel Dänzer:
> On 2018-04-28 06:30 PM, Ilia Mirkin wrote:
>> On Fri, Apr 27, 2018 at 9:08 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>
>>> Previously, TTM would always (with CONFIG_TRANSPARENT_HUGEPAGE enabled)
>>> try to allocate huge pages. However, not all drivers can take advantage
>>> of huge pages, but they would incur the overhead for allocating and
>>> freeing them anyway.
>>>
>>> Now, drivers which can take advantage of huge pages need to set the new
>>> flag TTM_PAGE_FLAG_TRANSHUGE to get them. Drivers not setting this flag
>>> no longer incur any overhead for allocating or freeing huge pages.
>>>
>>> v2:
>>> * Also guard swapping of consecutive pages in ttm_get_pages
>>> * Reword commit log, hopefully clearer now
>>>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>> Both I and lots of other people, based on reports, are still seeing
>> plenty of issues with this as late as 4.16.4.
> "lots of other people", "plenty of issues" sounds a bit exaggerated from
> what I've seen. FWIW, while I did see the original messages myself, I
> haven't seen any since Christian's original fix (see below), neither
> with amdgpu nor radeon, even before this patch you followed up to.
>
>
>> Admittedly I'm on nouveau, but others have reported issues with
>> radeon/amdgpu as well. It's been going on since the feature was merged
>> in v4.15, with what seems like little investigation from the authors
>> introducing the feature.
> That's not a fair assessment. See
> https://bugs.freedesktop.org/show_bug.cgi?id=104082#c40 and following
> comments.
>
> Christian fixed the original issue in
> d0bc0c2a31c95002d37c3cc511ffdcab851b3256 "swiotlb: suppress warning when
> __GFP_NOWARN is set". Christian did his best to try and get the fix in
> before 4.15 final, but for reasons beyond his control, it was delayed
> until 4.16-rc1 and then backported to 4.15.5.
>
> Unfortunately, there was an swiotlb regression (not directly related to
> Christian's work) shortly after this fix, also in 4.16-rc1, which is now
> fixed in 4.17-rc1 and will be backported to 4.16.y.

And that's exactly the reason why I intentionally kept this enabled for 
all users of the TTM DMA page pool and not put it behind a flag.

This change has surfaced quite a number of bugs in the swiotlb code 
which could have caused issues before. It's just that those code path 
where never exercised massively before.

Additional to that using huge pages is beneficial for the MM and CPU TLB 
(not implemented yet) even when the GPU driver can't make much use of it.

> It looks like there's at least one more bug left, but it's not clear yet
> when that was introduced, whether it's directly related to Christian's
> work, or indeed what the impact is. Let's not get ahead of ourselves.

Well my patches surfaced the problems, but the underlying issues where 
present even before those changes and I'm very well involved in fixing 
the underlying issues.

I even considered to just revert the huge page path for the DMA pool 
allocator, but it's just that the TTM patches seem to work exactly as 
they are intended. So that doesn't feel like doing the right thing here.

>> We now have *two* broken releases, v4.15 and v4.16 (anything that
>> spews error messages and stack traces ad-infinitum in dmesg is, by
>> definition, broken).
> I haven't seen any evidence that there's still an issue in 4.15, is
> there any?

Not that I know of, the fix was backported as far as I know.

>> You're putting this behind a flag now (finally),
> I wrote this patch because I realized due to some remark I happened to
> see you make this week on IRC that the huge page support in TTM was
> enabled for all drivers. Instead of making that kind of remark on IRC,
> it would have been more constructive, and more conducive to quick
> implementation, to suggest making the feature not active for drivers
> which don't need it in a mailing list post.

I have to admit that I'm lacking behind taking care of the amdgpu/radeon 
user space issues just because of more important stuff to do, but the 
issues affecting other drivers should be fixed by now.

BTW: The user space problems for amdgpu/radeon seems to come from either 
the DDX or Glamour.

For example try playing a video user firefox with Glamour enabled and 
take a look at how much memory we free/allocate.

It's multiple gigabytes for just a few seconds playback, that strongly 
indicates that we allocate/free a texture for each displayed frame which 
is quite far from optimal.

Regards,
Christian.

>
>
> At least, please do more research before making this kind of negative
> post.
>
> P.S. You might also want to look into whether nouveau really should be
> hitting swiotlb in these cases.
>

next prev parent reply	other threads:[~2018-04-29  7:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-26 15:06 [PATCH 1/2] drm/ttm: Add TTM_PAGE_FLAG_TRANSHUGE Michel Dänzer
2018-04-26 15:06 ` [PATCH 2/2] drm/ttm: Use GFP_TRANSHUGE_LIGHT for allocating huge pages Michel Dänzer
2018-04-29  7:04   ` Christian König
2018-04-27  2:51 ` [PATCH 1/2] drm/ttm: Add TTM_PAGE_FLAG_TRANSHUGE zhoucm1
2018-04-27  8:41   ` Michel Dänzer
2018-04-27 13:08 ` [PATCH v2 1/2] drm/ttm: Only allocate huge pages with new flag TTM_PAGE_FLAG_TRANSHUGE Michel Dänzer
2018-04-28 16:30   ` Ilia Mirkin
2018-04-28 23:02     ` Michel Dänzer
2018-04-28 23:56       ` Ilia Mirkin
2018-05-02  8:08         ` Michel Dänzer
2018-04-29  7:02       ` Christian König [this message]
2018-04-30 16:33         ` Michel Dänzer
2018-04-30 18:22           ` Christian König
2018-04-30 23:15             ` Dave Airlie
2018-05-01 13:59               ` Michel Dänzer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d5f9459e-801c-e592-d06d-c9052ef3ad4d@amd.com \
    --to=christian.koenig@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=imirkin@alum.mit.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=michel@daenzer.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox