From: Felix Kuehling <felix.kuehling@amd.com>
To: James Zhu <James.Zhu@amd.com>,
amd-gfx@lists.freedesktop.org, christian.koenig@amd.com,
philip.yang@amd.com
Cc: jamesz@amd.com
Subject: Re: [PATCH v3] drm/amdgpu: fix stall on CPU when allocate large system memory
Date: Tue, 22 Nov 2022 11:21:16 -0500 [thread overview]
Message-ID: <a09596f0-e44a-29f9-db02-e9d649140270@amd.com> (raw)
In-Reply-To: <20221121145312.125272-1-James.Zhu@amd.com>
Am 2022-11-21 um 09:53 schrieb James Zhu:
> -v2: 1. rename variable to redue confuse
> 2. optimize the code
> -v3: move new define out of the middle of the code
>
> When applications try to allocate large system (more than > 128GB),
> "stall cpu" is reported.
>
> for such large system memory, walk_page_range takes more than 20s usually.
> The warning message can be removed when splitting hmm range into smaller
> ones which is not more 64GB for each walk_page_range.
>
> [ 164.437617] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1753: amdgpu: create BO VA 0x7f63c7a00000 size 0x2f16000000 domain CPU
> [ 164.488847] amdgpu:amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu:1785: amdgpu: creating userptr BO for user_addr = 7f63c7a00000
> [ 185.439116] rcu: INFO: rcu_sched self-detected stall on CPU
> [ 185.439125] rcu: 8-....: (20999 ticks this GP) idle=e22/1/0x4000000000000000 softirq=2242/2242 fqs=5249
> [ 185.439137] (t=21000 jiffies g=6325 q=1215)
> [ 185.439141] NMI backtrace for cpu 8
> [ 185.439143] CPU: 8 PID: 3470 Comm: kfdtest Kdump: loaded Tainted: G O 5.12.0-0_fbk5_zion_rc1_5697_g2c723fb88626 #1
> [ 185.439147] Hardware name: HPE ProLiant XL675d Gen10 Plus/ProLiant XL675d Gen10 Plus, BIOS A47 11/06/2020
> [ 185.439150] Call Trace:
> [ 185.439153] <IRQ>
> [ 185.439157] dump_stack+0x64/0x7c
> [ 185.439163] nmi_cpu_backtrace.cold.7+0x30/0x65
> [ 185.439165] ? lapic_can_unplug_cpu+0x70/0x70
> [ 185.439170] nmi_trigger_cpumask_backtrace+0xf9/0x100
> [ 185.439174] rcu_dump_cpu_stacks+0xc5/0xf5
> [ 185.439178] rcu_sched_clock_irq.cold.97+0x112/0x38c
> [ 185.439182] ? tick_sched_handle.isra.21+0x50/0x50
> [ 185.439185] update_process_times+0x8c/0xc0
> [ 185.439189] tick_sched_timer+0x63/0x70
> [ 185.439192] __hrtimer_run_queues+0xff/0x250
> [ 185.439195] hrtimer_interrupt+0xf4/0x200
> [ 185.439199] __sysvec_apic_timer_interrupt+0x51/0xd0
> [ 185.439201] sysvec_apic_timer_interrupt+0x69/0x90
> [ 185.439206] </IRQ>
> [ 185.439207] asm_sysvec_apic_timer_interrupt+0x12/0x20
> [ 185.439211] RIP: 0010:clear_page_rep+0x7/0x10
> [ 185.439214] Code: e8 fe 7c 51 00 44 89 e2 48 89 ee 48 89 df e8 60 ff ff ff c6 03 00 5b 5d 41 5c c3 cc cc cc cc cc cc cc cc b9 00 02 00 00 31 c0 <f3> 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00
> [ 185.439218] RSP: 0018:ffffc9000f58f818 EFLAGS: 00000246
> [ 185.439220] RAX: 0000000000000000 RBX: 0000000000000881 RCX: 000000000000005c
> [ 185.439223] RDX: 0000000000100dca RSI: 0000000000000000 RDI: ffff88a59e0e5d20
> [ 185.439225] RBP: ffffea0096783940 R08: ffff888118c35280 R09: ffffea0096783940
> [ 185.439227] R10: ffff888000000000 R11: 0000160000000000 R12: ffffea0096783980
> [ 185.439228] R13: ffffea0096783940 R14: ffff88b07fdfdd00 R15: 0000000000000000
> [ 185.439232] prep_new_page+0x81/0xc0
> [ 185.439236] get_page_from_freelist+0x13be/0x16f0
> [ 185.439240] ? release_pages+0x16a/0x4a0
> [ 185.439244] __alloc_pages_nodemask+0x1ae/0x340
> [ 185.439247] alloc_pages_vma+0x74/0x1e0
> [ 185.439251] __handle_mm_fault+0xafe/0x1360
> [ 185.439255] handle_mm_fault+0xc3/0x280
> [ 185.439257] hmm_vma_fault.isra.22+0x49/0x90
> [ 185.439261] __walk_page_range+0x692/0x9b0
> [ 185.439265] walk_page_range+0x9b/0x120
> [ 185.439269] hmm_range_fault+0x4f/0x90
> [ 185.439274] amdgpu_hmm_range_get_pages+0x24f/0x260 [amdgpu]
> [ 185.439463] amdgpu_ttm_tt_get_user_pages+0xc2/0x190 [amdgpu]
> [ 185.439603] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x49f/0x7a0 [amdgpu]
> [ 185.439774] kfd_ioctl_alloc_memory_of_gpu+0xfb/0x410 [amdgpu]
>
> Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 50 +++++++++++++++++--------
> 1 file changed, 35 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> index a48ea62b12b0..8a2e5716d8db 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
> @@ -51,6 +51,8 @@
> #include "amdgpu_amdkfd.h"
> #include "amdgpu_hmm.h"
>
> +#define MAX_WALK_BYTE (64ULL<<30)
> +
> /**
> * amdgpu_hmm_invalidate_gfx - callback to notify about mm change
> *
> @@ -163,6 +165,7 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier,
> struct hmm_range **phmm_range)
> {
> struct hmm_range *hmm_range;
> + unsigned long end;
> unsigned long timeout;
> unsigned long i;
> unsigned long *pfns;
> @@ -184,25 +187,42 @@ int amdgpu_hmm_range_get_pages(struct mmu_interval_notifier *notifier,
> hmm_range->default_flags |= HMM_PFN_REQ_WRITE;
> hmm_range->hmm_pfns = pfns;
> hmm_range->start = start;
> - hmm_range->end = start + npages * PAGE_SIZE;
> + end = start + npages * PAGE_SIZE;
> hmm_range->dev_private_owner = owner;
>
> - /* Assuming 512MB takes maxmium 1 second to fault page address */
> - timeout = max(npages >> 17, 1ULL) * HMM_RANGE_DEFAULT_TIMEOUT;
> - timeout = jiffies + msecs_to_jiffies(timeout);
> + do {
> + hmm_range->end = min(hmm_range->start + MAX_WALK_BYTE, end);
> +
> + pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
> + hmm_range->start, hmm_range->end);
> +
> + /* Assuming 512MB takes maxmium 1 second to fault page address */
> + timeout = max((hmm_range->end - hmm_range->start) >> 29, 1ULL) *
> + HMM_RANGE_DEFAULT_TIMEOUT;
> + timeout = jiffies + msecs_to_jiffies(timeout);
>
> retry:
> - hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
> - r = hmm_range_fault(hmm_range);
> - if (unlikely(r)) {
> - /*
> - * FIXME: This timeout should encompass the retry from
> - * mmu_interval_read_retry() as well.
> - */
> - if (r == -EBUSY && !time_after(jiffies, timeout))
> - goto retry;
> - goto out_free_pfns;
> - }
> + hmm_range->notifier_seq = mmu_interval_read_begin(notifier);
> + r = hmm_range_fault(hmm_range);
> + if (unlikely(r)) {
> + /*
> + * FIXME: This timeout should encompass the retry from
> + * mmu_interval_read_retry() as well.
> + */
> + if (r == -EBUSY && !time_after(jiffies, timeout))
> + goto retry;
> + goto out_free_pfns;
> + }
> +
> + if (hmm_range->end == end)
> + break;
> + hmm_range->hmm_pfns += MAX_WALK_BYTE >> PAGE_SHIFT;
> + hmm_range->start = hmm_range->end;
> + schedule();
> + } while (hmm_range->end < end);
> +
> + hmm_range->start = start;
> + hmm_range->hmm_pfns = pfns;
>
> /*
> * Due to default_flags, all pages are HMM_PFN_VALID or
prev parent reply other threads:[~2022-11-22 16:21 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 21:38 [PATCH] drm/amdgpu: fix stall on CPU when allocate large system memory James Zhu
2022-11-17 22:03 ` Felix Kuehling
2022-11-18 12:22 ` James Zhu
2022-11-21 13:13 ` [PATCH v2] " James Zhu
2022-11-21 13:18 ` Christian König
2022-11-21 14:46 ` James Zhu
2022-11-21 14:53 ` [PATCH v3] " James Zhu
2022-11-22 15:12 ` James Zhu
2022-11-22 16:21 ` Felix Kuehling [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a09596f0-e44a-29f9-db02-e9d649140270@amd.com \
--to=felix.kuehling@amd.com \
--cc=James.Zhu@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=jamesz@amd.com \
--cc=philip.yang@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.