* (no subject) @ 2025-08-20 14:33 Christian König 2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König ` (3 more replies) 0 siblings, 4 replies; 50+ messages in thread From: Christian König @ 2025-08-20 14:33 UTC (permalink / raw) To: intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, david, dave.hansen, luto, peterz Hi everyone, sorry for CCing so many people, but that rabbit hole turned out to be deeper than originally thought. TTM always had problems with UC/WC mappings on 32bit systems and drivers often had to revert to hacks like using GFP_DMA32 to get things working while having no rational explanation why that helped (see the TTM AGP, radeon and nouveau driver code for that). It turned out that the PAT implementation we use on x86 not only enforces the same caching attributes for pages in the linear kernel mapping, but also for highmem pages through a separate R/B tree. That was unexpected and TTM never updated that R/B tree for highmem pages, so the function pgprot_set_cachemode() just overwrote the caching attributes drivers passed in to vmf_insert_pfn_prot() and that essentially caused all kind of random trouble. An R/B tree is potentially not a good data structure to hold thousands if not millions of different attributes for each page, so updating that is probably not the way to solve this issue. Thomas pointed out that the i915 driver is using apply_page_range() instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and just fill in the page tables with what the driver things is the right caching attribute. This patch set here implements this and it turns out to much *faster* than the old implementation. Together with another change on my test system mapping 1GiB of memory through TTM improved nearly by a factor of 10 (197ms -> 20ms)! Please review the general idea and/or comment on the patches. Thanks, Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot 2025-08-20 14:33 Christian König @ 2025-08-20 14:33 ` Christian König 2025-08-20 14:33 ` [PATCH 2/3] drm/ttm: reapply increase ttm pre-fault value to PMD size" Christian König ` (2 subsequent siblings) 3 siblings, 0 replies; 50+ messages in thread From: Christian König @ 2025-08-20 14:33 UTC (permalink / raw) To: intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, david, dave.hansen, luto, peterz Thomas pointed out that i915 is using apply_page_range instead of vm_insert_pfn_prot to circumvent the PAT lookup and generally speed up the page fault handling. I've thought I give it a try and measure how much this can improve things and it turned that mapping a 1GiB buffer is now more than 4x times faster than before. Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 130 ++++++++++++++++---------------- 1 file changed, 64 insertions(+), 66 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index a194db83421d..93764b166678 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -160,6 +160,38 @@ vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo, } EXPORT_SYMBOL(ttm_bo_vm_reserve); +/* State bag for calls to ttm_bo_vm_apply_cb */ +struct ttm_bo_vm_bag { + struct mm_struct *mm; + struct ttm_buffer_object *bo; + struct ttm_tt *ttm; + unsigned long page_offset; + pgprot_t prot; +}; + +/* Callback to fill in a specific PTE */ +static int ttm_bo_vm_apply_cb(pte_t *pte, unsigned long addr, void *data) +{ + struct ttm_bo_vm_bag *bag = data; + struct ttm_buffer_object *bo = bag->bo; + unsigned long pfn; + + if (bo->resource->bus.is_iomem) { + pfn = ttm_bo_io_mem_pfn(bo, bag->page_offset); + } else { + struct page *page = bag->ttm->pages[bag->page_offset]; + + if (unlikely(!page)) + return -ENOMEM; + pfn = page_to_pfn(page); + } + + /* Special PTE are not associated with any struct page */ + set_pte_at(bag->mm, addr, pte, pte_mkspecial(pfn_pte(pfn, bag->prot))); + bag->page_offset++; + return 0; +} + /** * ttm_bo_vm_fault_reserved - TTM fault helper * @vmf: The struct vm_fault given as argument to the fault callback @@ -183,101 +215,67 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf, pgoff_t num_prefault) { struct vm_area_struct *vma = vmf->vma; - struct ttm_buffer_object *bo = vma->vm_private_data; - struct ttm_device *bdev = bo->bdev; - unsigned long page_offset; - unsigned long page_last; - unsigned long pfn; - struct ttm_tt *ttm = NULL; - struct page *page; + struct ttm_bo_vm_bag bag = { + .mm = vma->vm_mm, + .bo = vma->vm_private_data + }; + unsigned long size; + vm_fault_t ret; int err; - pgoff_t i; - vm_fault_t ret = VM_FAULT_NOPAGE; - unsigned long address = vmf->address; /* * Wait for buffer data in transit, due to a pipelined * move. */ - ret = ttm_bo_vm_fault_idle(bo, vmf); + ret = ttm_bo_vm_fault_idle(bag.bo, vmf); if (unlikely(ret != 0)) return ret; - err = ttm_mem_io_reserve(bdev, bo->resource); + err = ttm_mem_io_reserve(bag.bo->bdev, bag.bo->resource); if (unlikely(err != 0)) return VM_FAULT_SIGBUS; - page_offset = ((address - vma->vm_start) >> PAGE_SHIFT) + - vma->vm_pgoff - drm_vma_node_start(&bo->base.vma_node); - page_last = vma_pages(vma) + vma->vm_pgoff - - drm_vma_node_start(&bo->base.vma_node); - - if (unlikely(page_offset >= PFN_UP(bo->base.size))) + bag.page_offset = ((vmf->address - vma->vm_start) >> PAGE_SHIFT) + + vma->vm_pgoff - drm_vma_node_start(&bag.bo->base.vma_node); + if (unlikely(bag.page_offset >= PFN_UP(bag.bo->base.size))) return VM_FAULT_SIGBUS; - prot = ttm_io_prot(bo, bo->resource, prot); - if (!bo->resource->bus.is_iomem) { + prot = ttm_io_prot(bag.bo, bag.bo->resource, prot); + if (!bag.bo->resource->bus.is_iomem) { struct ttm_operation_ctx ctx = { .interruptible = true, .no_wait_gpu = false, .force_alloc = true }; - ttm = bo->ttm; - err = ttm_bo_populate(bo, &ctx); - if (err) { - if (err == -EINTR || err == -ERESTARTSYS || - err == -EAGAIN) - return VM_FAULT_NOPAGE; - - pr_debug("TTM fault hit %pe.\n", ERR_PTR(err)); - return VM_FAULT_SIGBUS; - } + bag.ttm = bag.bo->ttm; + err = ttm_bo_populate(bag.bo, &ctx); + if (err) + goto error; } else { /* Iomem should not be marked encrypted */ prot = pgprot_decrypted(prot); } + bag.prot = prot; - /* - * Speculatively prefault a number of pages. Only error on - * first page. - */ - for (i = 0; i < num_prefault; ++i) { - if (bo->resource->bus.is_iomem) { - pfn = ttm_bo_io_mem_pfn(bo, page_offset); - } else { - page = ttm->pages[page_offset]; - if (unlikely(!page && i == 0)) { - return VM_FAULT_OOM; - } else if (unlikely(!page)) { - break; - } - pfn = page_to_pfn(page); - } + /* Speculatively prefault a number of pages. */ + size = min(num_prefault << PAGE_SHIFT, vma->vm_end - vmf->address); + err = apply_to_page_range(vma->vm_mm, vmf->address, size, + ttm_bo_vm_apply_cb, &bag); - /* - * Note that the value of @prot at this point may differ from - * the value of @vma->vm_page_prot in the caching- and - * encryption bits. This is because the exact location of the - * data may not be known at mmap() time and may also change - * at arbitrary times while the data is mmap'ed. - * See vmf_insert_pfn_prot() for a discussion. - */ - ret = vmf_insert_pfn_prot(vma, address, pfn, prot); +error: + if (err == -EINTR || err == -ERESTARTSYS || err == -EAGAIN) + return VM_FAULT_NOPAGE; - /* Never error on prefaulted PTEs */ - if (unlikely((ret & VM_FAULT_ERROR))) { - if (i == 0) - return VM_FAULT_NOPAGE; - else - break; - } + if (err == -ENOMEM) + return VM_FAULT_OOM; - address += PAGE_SIZE; - if (unlikely(++page_offset >= page_last)) - break; + if (err) { + pr_debug("TTM fault hit %pe.\n", ERR_PTR(err)); + return VM_FAULT_SIGBUS; } - return ret; + + return VM_FAULT_NOPAGE; } EXPORT_SYMBOL(ttm_bo_vm_fault_reserved); -- 2.43.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/3] drm/ttm: reapply increase ttm pre-fault value to PMD size" 2025-08-20 14:33 Christian König 2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König @ 2025-08-20 14:33 ` Christian König 2025-08-20 14:33 ` [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 Christian König 2025-08-20 15:23 ` David Hildenbrand 3 siblings, 0 replies; 50+ messages in thread From: Christian König @ 2025-08-20 14:33 UTC (permalink / raw) To: intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, david, dave.hansen, luto, peterz Now that we have improved the handling faulting in a full PMD only increases the overhead on my test system from 21us to 29us if only a single page is requested, but massively improves the performance for all other use cases. So re-apply that change again to improve the fault handling for large allocations bringing us close to improving it by a factor of 10. This reverts commit c358a809cb58af944d496944391a240e02f5837a. Signed-off-by: Christian König <christian.koenig@amd.com> --- include/drm/ttm/ttm_bo.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 903cd1030110..e96477606207 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -39,7 +39,11 @@ #include "ttm_device.h" /* Default number of pre-faulted pages in the TTM fault handler */ +#if CONFIG_PGTABLE_LEVELS > 2 +#define TTM_BO_VM_NUM_PREFAULT (1 << (PMD_SHIFT - PAGE_SHIFT)) +#else #define TTM_BO_VM_NUM_PREFAULT 16 +#endif struct iosys_map; -- 2.43.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 2025-08-20 14:33 Christian König 2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König 2025-08-20 14:33 ` [PATCH 2/3] drm/ttm: reapply increase ttm pre-fault value to PMD size" Christian König @ 2025-08-20 14:33 ` Christian König 2025-08-20 15:12 ` Borislav Petkov 2025-08-20 15:23 ` David Hildenbrand 3 siblings, 1 reply; 50+ messages in thread From: Christian König @ 2025-08-20 14:33 UTC (permalink / raw) To: intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, david, dave.hansen, luto, peterz On some x86 systems (old AMD Athlons, Intel Luna Lake) we have the problem that changing the caching flags of system memory requires changing the global MTRR/PAT tables since those CPUs can't handle aliasing caching attributes. But on most modern x86 system (e.g. AMD CPUs after 2004) we actually don't need that any more and can update the caching flags directly in the PTEs of the userspace and kernel mappings. We already do this with encryption on x86 64bit for quite a while and all other supported platforms (Sparc, PowerPC, ARM, MIPS, LONGARCH) as well as the i915 driver have never done anything different either. So stop changing the global chaching flags for CPU systems which don't need it and just insert a clflush to be on the safe side so that we never return memory with dirty cache lines. Testing on a Ryzen 5 and 7 shows that the clflush has absolutely no performance impact, but I'm still waiting for CI systems to confirm functional correctness. v2: drop the pool only on AMD CPUs for now Signed-off-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/ttm/ttm_pool.c | 37 +++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index 83b10706ba89..3f830fb2aea5 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -45,6 +45,7 @@ #include <drm/ttm/ttm_pool.h> #include <drm/ttm/ttm_tt.h> #include <drm/ttm/ttm_bo.h> +#include <drm/drm_cache.h> #include "ttm_module.h" @@ -119,6 +120,8 @@ module_param(page_pool_size, ulong, 0644); static atomic_long_t allocated_pages; +static bool skip_caching_adjustment; + static struct ttm_pool_type global_write_combined[NR_PAGE_ORDERS]; static struct ttm_pool_type global_uncached[NR_PAGE_ORDERS]; @@ -195,7 +198,8 @@ static void ttm_pool_free_page(struct ttm_pool *pool, enum ttm_caching caching, /* We don't care that set_pages_wb is inefficient here. This is only * used when we have to shrink and CPU overhead is irrelevant then. */ - if (caching != ttm_cached && !PageHighMem(p)) + if (!skip_caching_adjustment && + caching != ttm_cached && !PageHighMem(p)) set_pages_wb(p, 1 << order); #endif @@ -223,13 +227,19 @@ static int ttm_pool_apply_caching(struct ttm_pool_alloc_state *alloc) if (!num_pages) return 0; - switch (alloc->tt_caching) { - case ttm_cached: - break; - case ttm_write_combined: - return set_pages_array_wc(alloc->caching_divide, num_pages); - case ttm_uncached: - return set_pages_array_uc(alloc->caching_divide, num_pages); + if (skip_caching_adjustment) { + drm_clflush_pages(alloc->caching_divide, num_pages); + } else { + switch (alloc->tt_caching) { + case ttm_cached: + break; + case ttm_write_combined: + return set_pages_array_wc(alloc->caching_divide, + num_pages); + case ttm_uncached: + return set_pages_array_uc(alloc->caching_divide, + num_pages); + } } #endif alloc->caching_divide = alloc->pages; @@ -342,6 +352,9 @@ static struct ttm_pool_type *ttm_pool_select_type(struct ttm_pool *pool, return &pool->caching[caching].orders[order]; #ifdef CONFIG_X86 + if (skip_caching_adjustment) + return NULL; + switch (caching) { case ttm_write_combined: if (pool->nid != NUMA_NO_NODE) @@ -981,7 +994,7 @@ long ttm_pool_backup(struct ttm_pool *pool, struct ttm_tt *tt, #ifdef CONFIG_X86 /* Anything returned to the system needs to be cached. */ - if (tt->caching != ttm_cached) + if (!skip_caching_adjustment && tt->caching != ttm_cached) set_pages_array_wb(tt->pages, tt->num_pages); #endif @@ -1296,6 +1309,12 @@ int ttm_pool_mgr_init(unsigned long num_pages) spin_lock_init(&shrinker_lock); INIT_LIST_HEAD(&shrinker_list); +#ifdef CONFIG_X86 + skip_caching_adjustment = + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && + static_cpu_has(X86_FEATURE_CLFLUSH); +#endif + for (i = 0; i < NR_PAGE_ORDERS; ++i) { ttm_pool_type_init(&global_write_combined[i], NULL, ttm_write_combined, i); -- 2.43.0 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 2025-08-20 14:33 ` [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 Christian König @ 2025-08-20 15:12 ` Borislav Petkov 0 siblings, 0 replies; 50+ messages in thread From: Borislav Petkov @ 2025-08-20 15:12 UTC (permalink / raw) To: Christian König Cc: intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, david, dave.hansen, luto, peterz On Wed, Aug 20, 2025 at 04:33:13PM +0200, Christian König wrote: > +#ifdef CONFIG_X86 > + skip_caching_adjustment = > + (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && > + static_cpu_has(X86_FEATURE_CLFLUSH); cpu_feature_enabled() > +#endif I'd prefer if this would call a function in arch/x86/ which tests those instead of using x86 artifacts in drivers. Thx. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-20 14:33 Christian König ` (2 preceding siblings ...) 2025-08-20 14:33 ` [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 Christian König @ 2025-08-20 15:23 ` David Hildenbrand 2025-08-21 8:10 ` Re: Christian König 2025-08-21 9:16 ` your mail Lorenzo Stoakes 3 siblings, 2 replies; 50+ messages in thread From: David Hildenbrand @ 2025-08-20 15:23 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes CCing Lorenzo On 20.08.25 16:33, Christian König wrote: > Hi everyone, > > sorry for CCing so many people, but that rabbit hole turned out to be > deeper than originally thought. > > TTM always had problems with UC/WC mappings on 32bit systems and drivers > often had to revert to hacks like using GFP_DMA32 to get things working > while having no rational explanation why that helped (see the TTM AGP, > radeon and nouveau driver code for that). > > It turned out that the PAT implementation we use on x86 not only enforces > the same caching attributes for pages in the linear kernel mapping, but > also for highmem pages through a separate R/B tree. > > That was unexpected and TTM never updated that R/B tree for highmem pages, > so the function pgprot_set_cachemode() just overwrote the caching > attributes drivers passed in to vmf_insert_pfn_prot() and that essentially > caused all kind of random trouble. > > An R/B tree is potentially not a good data structure to hold thousands if > not millions of different attributes for each page, so updating that is > probably not the way to solve this issue. > > Thomas pointed out that the i915 driver is using apply_page_range() > instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and > just fill in the page tables with what the driver things is the right > caching attribute. I assume you mean apply_to_page_range() -- same issue in patch subjects. Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :( Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better". All the sanity checks from vmf_insert_pfn(), gone. Can we please fix the underlying issue properly? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-20 15:23 ` David Hildenbrand @ 2025-08-21 8:10 ` Christian König 2025-08-25 19:10 ` Re: David Hildenbrand 2025-08-21 9:16 ` your mail Lorenzo Stoakes 1 sibling, 1 reply; 50+ messages in thread From: Christian König @ 2025-08-21 8:10 UTC (permalink / raw) To: David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 20.08.25 17:23, David Hildenbrand wrote: > CCing Lorenzo > > On 20.08.25 16:33, Christian König wrote: >> Hi everyone, >> >> sorry for CCing so many people, but that rabbit hole turned out to be >> deeper than originally thought. >> >> TTM always had problems with UC/WC mappings on 32bit systems and drivers >> often had to revert to hacks like using GFP_DMA32 to get things working >> while having no rational explanation why that helped (see the TTM AGP, >> radeon and nouveau driver code for that). >> >> It turned out that the PAT implementation we use on x86 not only enforces >> the same caching attributes for pages in the linear kernel mapping, but >> also for highmem pages through a separate R/B tree. >> >> That was unexpected and TTM never updated that R/B tree for highmem pages, >> so the function pgprot_set_cachemode() just overwrote the caching >> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially >> caused all kind of random trouble. >> >> An R/B tree is potentially not a good data structure to hold thousands if >> not millions of different attributes for each page, so updating that is >> probably not the way to solve this issue. >> >> Thomas pointed out that the i915 driver is using apply_page_range() >> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and >> just fill in the page tables with what the driver things is the right >> caching attribute. > > I assume you mean apply_to_page_range() -- same issue in patch subjects. Oh yes, of course. Sorry. > Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :( Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach. > Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better". Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble. The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other. What I don't understand is why do we have the PAT in the first place? No other architecture does it this way. Is that because of the of x86 CPUs which have problems when different page tables contain different caching attributes for the same physical memory? > All the sanity checks from vmf_insert_pfn(), gone. > > Can we please fix the underlying issue properly? I'm happy to implement anything advised, my question is what should we solve this issue? Regards, Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-21 8:10 ` Re: Christian König @ 2025-08-25 19:10 ` David Hildenbrand 2025-08-26 8:38 ` Re: Christian König 0 siblings, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-25 19:10 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 21.08.25 10:10, Christian König wrote: > On 20.08.25 17:23, David Hildenbrand wrote: >> CCing Lorenzo >> >> On 20.08.25 16:33, Christian König wrote: >>> Hi everyone, >>> >>> sorry for CCing so many people, but that rabbit hole turned out to be >>> deeper than originally thought. >>> >>> TTM always had problems with UC/WC mappings on 32bit systems and drivers >>> often had to revert to hacks like using GFP_DMA32 to get things working >>> while having no rational explanation why that helped (see the TTM AGP, >>> radeon and nouveau driver code for that). >>> >>> It turned out that the PAT implementation we use on x86 not only enforces >>> the same caching attributes for pages in the linear kernel mapping, but >>> also for highmem pages through a separate R/B tree. >>> >>> That was unexpected and TTM never updated that R/B tree for highmem pages, >>> so the function pgprot_set_cachemode() just overwrote the caching >>> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially >>> caused all kind of random trouble. >>> >>> An R/B tree is potentially not a good data structure to hold thousands if >>> not millions of different attributes for each page, so updating that is >>> probably not the way to solve this issue. >>> >>> Thomas pointed out that the i915 driver is using apply_page_range() >>> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and >>> just fill in the page tables with what the driver things is the right >>> caching attribute. >> >> I assume you mean apply_to_page_range() -- same issue in patch subjects. > > Oh yes, of course. Sorry. > >> Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :( > > Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach. > >> Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better". > > Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble. > > The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other. > > What I don't understand is why do we have the PAT in the first place? No other architecture does it this way. Probably because no other architecture has these weird glitches I assume ... skimming over memtype_reserve() and friends there are quite some corner cases the code is handling (BIOS, ACPI, low ISA, system RAM, ...) I did a lot of work on the higher PAT level functions, but I am no expert on the lower level management functions, and in particular all the special cases with different memory types. IIRC, the goal of the PAT subsystem is to make sure that no two page tables map the same PFN with different caching attributes. It treats ordinary system RAM (IORESOURCE_SYSTEM_RAM) usually in a special way: no special caching mode. For everything else, it expects that someone first reserves a memory range for a specific caching mode. For example, remap_pfn_range()...->pfnmap_track()->memtype_reserve() will make sure that there are no conflicts, to the call memtype_kernel_map_sync() to make sure the identity mapping is updated to the new type. In case someone ends up calling pfnmap_setup_cachemode(), the expectation is that there was a previous call to memtype_reserve_io() or similar, such that pfnmap_setup_cachemode() will find that caching mode. So my assumption would be that that is missing for the drivers here? Last time I asked where this reservation is done, Peter Xu explained [1] it at least for VFIO: vfio_pci_core_mmap pci_iomap pci_iomap_range ... __ioremap_caller memtype_reserve Now, could it be that something like that is missing in these drivers (ioremap etc)? [1] https://lkml.kernel.org/r/aBDXr-Qp4z0tS50P@x1.local > > Is that because of the of x86 CPUs which have problems when different page tables contain different caching attributes for the same physical memory? Yes, but I don't think x86 is special here. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-25 19:10 ` Re: David Hildenbrand @ 2025-08-26 8:38 ` Christian König 2025-08-26 8:46 ` Re: David Hildenbrand 2025-08-26 12:37 ` David Hildenbrand 0 siblings, 2 replies; 50+ messages in thread From: Christian König @ 2025-08-26 8:38 UTC (permalink / raw) To: David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 25.08.25 21:10, David Hildenbrand wrote: > On 21.08.25 10:10, Christian König wrote: >> On 20.08.25 17:23, David Hildenbrand wrote: >>> CCing Lorenzo >>> >>> On 20.08.25 16:33, Christian König wrote: >>>> Hi everyone, >>>> >>>> sorry for CCing so many people, but that rabbit hole turned out to be >>>> deeper than originally thought. >>>> >>>> TTM always had problems with UC/WC mappings on 32bit systems and drivers >>>> often had to revert to hacks like using GFP_DMA32 to get things working >>>> while having no rational explanation why that helped (see the TTM AGP, >>>> radeon and nouveau driver code for that). >>>> >>>> It turned out that the PAT implementation we use on x86 not only enforces >>>> the same caching attributes for pages in the linear kernel mapping, but >>>> also for highmem pages through a separate R/B tree. >>>> >>>> That was unexpected and TTM never updated that R/B tree for highmem pages, >>>> so the function pgprot_set_cachemode() just overwrote the caching >>>> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially >>>> caused all kind of random trouble. >>>> >>>> An R/B tree is potentially not a good data structure to hold thousands if >>>> not millions of different attributes for each page, so updating that is >>>> probably not the way to solve this issue. >>>> >>>> Thomas pointed out that the i915 driver is using apply_page_range() >>>> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and >>>> just fill in the page tables with what the driver things is the right >>>> caching attribute. >>> >>> I assume you mean apply_to_page_range() -- same issue in patch subjects. >> >> Oh yes, of course. Sorry. >> >>> Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :( >> >> Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach. >> >>> Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better". >> >> Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble. >> >> The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other. >> >> What I don't understand is why do we have the PAT in the first place? No other architecture does it this way. > > Probably because no other architecture has these weird glitches I assume ... skimming over memtype_reserve() and friends there are quite some corner cases the code is handling (BIOS, ACPI, low ISA, system RAM, ...) > > > I did a lot of work on the higher PAT level functions, but I am no expert on the lower level management functions, and in particular all the special cases with different memory types. > > IIRC, the goal of the PAT subsystem is to make sure that no two page tables map the same PFN with different caching attributes. Yeah, that actually makes sense. Thomas from Intel recently explained the technical background to me: Some x86 CPUs write back cache lines even if they aren't dirty and what can happen is that because of the linear mapping the CPU speculatively loads a cache line which is elsewhere mapped uncached. So the end result is that the writeback of not dirty cache lines potentially corrupts the data in the otherwise uncached system memory. But that a) only applies to memory in the linear mapping and b) only to a handful of x86 CPU types (e.g. recently Intels Luna Lake, AMD Athlons produced before 2004, maybe others). > It treats ordinary system RAM (IORESOURCE_SYSTEM_RAM) usually in a special way: no special caching mode. > > For everything else, it expects that someone first reserves a memory range for a specific caching mode. > > For example, remap_pfn_range()...->pfnmap_track()->memtype_reserve() will make sure that there are no conflicts, to the call memtype_kernel_map_sync() to make sure the identity mapping is updated to the new type. > > In case someone ends up calling pfnmap_setup_cachemode(), the expectation is that there was a previous call to memtype_reserve_io() or similar, such that pfnmap_setup_cachemode() will find that caching mode. > > > So my assumption would be that that is missing for the drivers here? Well yes and no. See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it. So what would happen is that we completely clutter the R/B tree used by the PAT with thousands if not millions of entries. > > Last time I asked where this reservation is done, Peter Xu explained [1] it at least for VFIO: > > vfio_pci_core_mmap > pci_iomap > pci_iomap_range > ... > __ioremap_caller > memtype_reserve > > > Now, could it be that something like that is missing in these drivers (ioremap etc)? Well that would solve the issue temporary, but I'm pretty sure that will just go boom at a different place then :( One possibility would be to say that the PAT only overrides the attributes if they aren't normal cached and leaves everything else alone. What do you think? Thanks, Christian. > > > > [1] https://lkml.kernel.org/r/aBDXr-Qp4z0tS50P@x1.local > > >> >> Is that because of the of x86 CPUs which have problems when different page tables contain different caching attributes for the same physical memory? > > Yes, but I don't think x86 is special here. > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 8:38 ` Re: Christian König @ 2025-08-26 8:46 ` David Hildenbrand 2025-08-26 9:00 ` Re: Christian König 2025-08-26 12:37 ` David Hildenbrand 1 sibling, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-26 8:46 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 10:38, Christian König wrote: > On 25.08.25 21:10, David Hildenbrand wrote: >> On 21.08.25 10:10, Christian König wrote: >>> On 20.08.25 17:23, David Hildenbrand wrote: >>>> CCing Lorenzo >>>> >>>> On 20.08.25 16:33, Christian König wrote: >>>>> Hi everyone, >>>>> >>>>> sorry for CCing so many people, but that rabbit hole turned out to be >>>>> deeper than originally thought. >>>>> >>>>> TTM always had problems with UC/WC mappings on 32bit systems and drivers >>>>> often had to revert to hacks like using GFP_DMA32 to get things working >>>>> while having no rational explanation why that helped (see the TTM AGP, >>>>> radeon and nouveau driver code for that). >>>>> >>>>> It turned out that the PAT implementation we use on x86 not only enforces >>>>> the same caching attributes for pages in the linear kernel mapping, but >>>>> also for highmem pages through a separate R/B tree. >>>>> >>>>> That was unexpected and TTM never updated that R/B tree for highmem pages, >>>>> so the function pgprot_set_cachemode() just overwrote the caching >>>>> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially >>>>> caused all kind of random trouble. >>>>> >>>>> An R/B tree is potentially not a good data structure to hold thousands if >>>>> not millions of different attributes for each page, so updating that is >>>>> probably not the way to solve this issue. >>>>> >>>>> Thomas pointed out that the i915 driver is using apply_page_range() >>>>> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and >>>>> just fill in the page tables with what the driver things is the right >>>>> caching attribute. >>>> >>>> I assume you mean apply_to_page_range() -- same issue in patch subjects. >>> >>> Oh yes, of course. Sorry. >>> >>>> Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :( >>> >>> Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach. >>> >>>> Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better". >>> >>> Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble. >>> >>> The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other. >>> >>> What I don't understand is why do we have the PAT in the first place? No other architecture does it this way. >> >> Probably because no other architecture has these weird glitches I assume ... skimming over memtype_reserve() and friends there are quite some corner cases the code is handling (BIOS, ACPI, low ISA, system RAM, ...) >> >> >> I did a lot of work on the higher PAT level functions, but I am no expert on the lower level management functions, and in particular all the special cases with different memory types. >> >> IIRC, the goal of the PAT subsystem is to make sure that no two page tables map the same PFN with different caching attributes. > > Yeah, that actually makes sense. Thomas from Intel recently explained the technical background to me: > > Some x86 CPUs write back cache lines even if they aren't dirty and what can happen is that because of the linear mapping the CPU speculatively loads a cache line which is elsewhere mapped uncached. > > So the end result is that the writeback of not dirty cache lines potentially corrupts the data in the otherwise uncached system memory. > > But that a) only applies to memory in the linear mapping and b) only to a handful of x86 CPU types (e.g. recently Intels Luna Lake, AMD Athlons produced before 2004, maybe others). > >> It treats ordinary system RAM (IORESOURCE_SYSTEM_RAM) usually in a special way: no special caching mode. >> >> For everything else, it expects that someone first reserves a memory range for a specific caching mode. >> >> For example, remap_pfn_range()...->pfnmap_track()->memtype_reserve() will make sure that there are no conflicts, to the call memtype_kernel_map_sync() to make sure the identity mapping is updated to the new type. >> >> In case someone ends up calling pfnmap_setup_cachemode(), the expectation is that there was a previous call to memtype_reserve_io() or similar, such that pfnmap_setup_cachemode() will find that caching mode. >> >> >> So my assumption would be that that is missing for the drivers here? > > Well yes and no. > > See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it. > > So what would happen is that we completely clutter the R/B tree used by the PAT with thousands if not millions of entries. > Hm, above you're saying that there is no direct map, but now you are saying that the pages were obtained through get_free_page()? I agree that what you describe here sounds suboptimal. But if the pages where obtained from the buddy, there surely is a direct map -- unless we explicitly remove it :( If we're talking about individual pages without a directmap, I would wonder if they are actually part of a bigger memory region that can just be reserved in one go (similar to how remap_pfn_range()) would handle it. Can you briefly describe how your use case obtains these PFNs, and how scattered tehy + their caching attributes might be? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 8:46 ` Re: David Hildenbrand @ 2025-08-26 9:00 ` Christian König 2025-08-26 9:17 ` Re: David Hildenbrand 0 siblings, 1 reply; 50+ messages in thread From: Christian König @ 2025-08-26 9:00 UTC (permalink / raw) To: David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 10:46, David Hildenbrand wrote: >>> So my assumption would be that that is missing for the drivers here? >> >> Well yes and no. >> >> See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it. >> >> So what would happen is that we completely clutter the R/B tree used by the PAT with thousands if not millions of entries. >> > > Hm, above you're saying that there is no direct map, but now you are saying that the pages were obtained through get_free_page()? The problem only happens with highmem pages on 32bit kernels. Those pages are not in the linear mapping. > I agree that what you describe here sounds suboptimal. But if the pages where obtained from the buddy, there surely is a direct map -- unless we explicitly remove it :( > > If we're talking about individual pages without a directmap, I would wonder if they are actually part of a bigger memory region that can just be reserved in one go (similar to how remap_pfn_range()) would handle it. > > Can you briefly describe how your use case obtains these PFNs, and how scattered tehy + their caching attributes might be? What drivers do is to call get_free_page() or alloc_pages_node() with the GFP_HIGHUSER flag set. For non highmem pages drivers then calls set_pages_wc/uc() which changes the caching of the linear mapping, but for highmem pages there is no linear mapping so set_pages_wc() or set_pages_uc() doesn't work and drivers avoid calling it. Those are basically just random system memory pages. So they are potentially scattered over the whole memory address space. Regards, Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 9:00 ` Re: Christian König @ 2025-08-26 9:17 ` David Hildenbrand 2025-08-26 9:56 ` Re: Christian König 0 siblings, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-26 9:17 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 11:00, Christian König wrote: > On 26.08.25 10:46, David Hildenbrand wrote: >>>> So my assumption would be that that is missing for the drivers here? >>> >>> Well yes and no. >>> >>> See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it. >>> >>> So what would happen is that we completely clutter the R/B tree used by the PAT with thousands if not millions of entries. >>> >> >> Hm, above you're saying that there is no direct map, but now you are saying that the pages were obtained through get_free_page()? > > The problem only happens with highmem pages on 32bit kernels. Those pages are not in the linear mapping. Right, in the common case there is a direct map. > >> I agree that what you describe here sounds suboptimal. But if the pages where obtained from the buddy, there surely is a direct map -- unless we explicitly remove it :( >> >> If we're talking about individual pages without a directmap, I would wonder if they are actually part of a bigger memory region that can just be reserved in one go (similar to how remap_pfn_range()) would handle it. >> >> Can you briefly describe how your use case obtains these PFNs, and how scattered tehy + their caching attributes might be? > > What drivers do is to call get_free_page() or alloc_pages_node() with the GFP_HIGHUSER flag set. > > For non highmem pages drivers then calls set_pages_wc/uc() which changes the caching of the linear mapping, but for highmem pages there is no linear mapping so set_pages_wc() or set_pages_uc() doesn't work and drivers avoid calling it. > > Those are basically just random system memory pages. So they are potentially scattered over the whole memory address space. Thanks, that's valuable information. So essentially these drivers maintain their own consistency and PAT is not aware of that. And the real problem is ordinary system RAM. There are various ways forward. 1) We use another interface that consumes pages instead of PFNs, like a vm_insert_pages_pgprot() we would be adding. Is there any strong requirement for inserting non-refcounted PFNs? 2) We add another interface that consumes PFNs, but explicitly states that it is only for ordinary system RAM, and that the user is required for updating the direct map. We could sanity-check the direct map in debug kernels. 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this system RAM differently. There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. We could also perform the set_pages_wc/uc() from inside that function, but maybe it depends on the use case whether we want to do that whenever we map them into a process? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 9:17 ` Re: David Hildenbrand @ 2025-08-26 9:56 ` Christian König 2025-08-26 12:07 ` Re: David Hildenbrand 2025-08-26 14:27 ` Thomas Hellström 0 siblings, 2 replies; 50+ messages in thread From: Christian König @ 2025-08-26 9:56 UTC (permalink / raw) To: David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 11:17, David Hildenbrand wrote: > On 26.08.25 11:00, Christian König wrote: >> On 26.08.25 10:46, David Hildenbrand wrote: >>>>> So my assumption would be that that is missing for the drivers here? >>>> >>>> Well yes and no. >>>> >>>> See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it. >>>> >>>> So what would happen is that we completely clutter the R/B tree used by the PAT with thousands if not millions of entries. >>>> >>> >>> Hm, above you're saying that there is no direct map, but now you are saying that the pages were obtained through get_free_page()? >> >> The problem only happens with highmem pages on 32bit kernels. Those pages are not in the linear mapping. > > Right, in the common case there is a direct map. > >> >>> I agree that what you describe here sounds suboptimal. But if the pages where obtained from the buddy, there surely is a direct map -- unless we explicitly remove it :( >>> >>> If we're talking about individual pages without a directmap, I would wonder if they are actually part of a bigger memory region that can just be reserved in one go (similar to how remap_pfn_range()) would handle it. >>> >>> Can you briefly describe how your use case obtains these PFNs, and how scattered tehy + their caching attributes might be? >> >> What drivers do is to call get_free_page() or alloc_pages_node() with the GFP_HIGHUSER flag set. >> >> For non highmem pages drivers then calls set_pages_wc/uc() which changes the caching of the linear mapping, but for highmem pages there is no linear mapping so set_pages_wc() or set_pages_uc() doesn't work and drivers avoid calling it. >> >> Those are basically just random system memory pages. So they are potentially scattered over the whole memory address space. > > Thanks, that's valuable information. > > So essentially these drivers maintain their own consistency and PAT is not aware of that. > > And the real problem is ordinary system RAM. > > There are various ways forward. > > 1) We use another interface that consumes pages instead of PFNs, like a > vm_insert_pages_pgprot() we would be adding. > > Is there any strong requirement for inserting non-refcounted PFNs? Yes, there is a strong requirement to insert non-refcounted PFNs. We had a lot of trouble with KVM people trying to grab a reference to those pages even if the VMA had the VM_PFNMAP flag set. > 2) We add another interface that consumes PFNs, but explicitly states > that it is only for ordinary system RAM, and that the user is > required for updating the direct map. > > We could sanity-check the direct map in debug kernels. I would rather like to see vmf_insert_pfn_prot() fixed instead. That function was explicitly added to insert the PFN with the given attributes and as far as I can see all users of that function expect exactly that. > > 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this > system RAM differently. > > > There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. > > In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. Well, exactly that's the use case here and that is not abusive at all as far as I can see. What drivers want is to insert a PFN with a certain set of caching attributes regardless if it's system memory or iomem. That's why vmf_insert_pfn_prot() was created in the first place. That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. > We could also perform the set_pages_wc/uc() from inside that function, but maybe it depends on the use case whether we want to do that whenever we map them into a process? It sounds like a good idea in theory, but I think it is potentially to much overhead to be applicable. Thanks, Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 9:56 ` Re: Christian König @ 2025-08-26 12:07 ` David Hildenbrand 2025-08-26 16:09 ` Re: Christian König 2025-08-26 14:27 ` Thomas Hellström 1 sibling, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-26 12:07 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes >> >> 1) We use another interface that consumes pages instead of PFNs, like a >> vm_insert_pages_pgprot() we would be adding. >> >> Is there any strong requirement for inserting non-refcounted PFNs? > > Yes, there is a strong requirement to insert non-refcounted PFNs. > > We had a lot of trouble with KVM people trying to grab a reference to those pages even if the VMA had the VM_PFNMAP flag set. Yes, KVM ignored (and maybe still does) VM_PFNMAP to some degree, which is rather nasty. > >> 2) We add another interface that consumes PFNs, but explicitly states >> that it is only for ordinary system RAM, and that the user is >> required for updating the direct map. >> >> We could sanity-check the direct map in debug kernels. > > I would rather like to see vmf_insert_pfn_prot() fixed instead. > > That function was explicitly added to insert the PFN with the given attributes and as far as I can see all users of that function expect exactly that. It's all a bit tricky :( > >> >> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this >> system RAM differently. >> >> >> There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. >> >> In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. > > Well, exactly that's the use case here and that is not abusive at all as far as I can see. > > What drivers want is to insert a PFN with a certain set of caching attributes regardless if it's system memory or iomem. That's why vmf_insert_pfn_prot() was created in the first place. I mean, the use case of "allocate pages from the buddy and fixup the linear map" sounds perfectly reasonable to me. Absolutely no reason to get PAT involved. Nobody else should be messing with that memory after all. As soon as we are talking about other memory ranges (iomem) that are not from the buddy, it gets weird to bypass PAT, and the question I am asking myself is, when is it okay, and when not. > > That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. I'll have to think about this a bit: assuming only vmf_insert_pfn() calls pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we sanity check that somebody is doing something against the will of PAT. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 12:07 ` Re: David Hildenbrand @ 2025-08-26 16:09 ` Christian König 2025-08-27 9:13 ` [PATCH 0/3] drm/ttm: Michel Dänzer 2025-08-28 21:18 ` stupid and complicated PAT :) David Hildenbrand 0 siblings, 2 replies; 50+ messages in thread From: Christian König @ 2025-08-26 16:09 UTC (permalink / raw) To: David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 14:07, David Hildenbrand wrote: >> >>> 2) We add another interface that consumes PFNs, but explicitly states >>> that it is only for ordinary system RAM, and that the user is >>> required for updating the direct map. >>> >>> We could sanity-check the direct map in debug kernels. >> >> I would rather like to see vmf_insert_pfn_prot() fixed instead. >> >> That function was explicitly added to insert the PFN with the given attributes and as far as I can see all users of that function expect exactly that. > > It's all a bit tricky :( I would rather say horrible complicated :( >>> >>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this >>> system RAM differently. >>> >>> >>> There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. >>> >>> In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. >> >> Well, exactly that's the use case here and that is not abusive at all as far as I can see. >> >> What drivers want is to insert a PFN with a certain set of caching attributes regardless if it's system memory or iomem. That's why vmf_insert_pfn_prot() was created in the first place. > > I mean, the use case of "allocate pages from the buddy and fixup the linear map" sounds perfectly reasonable to me. Absolutely no reason to get PAT involved. Nobody else should be messing with that memory after all. > > As soon as we are talking about other memory ranges (iomem) that are not from the buddy, it gets weird to bypass PAT, and the question I am asking myself is, when is it okay, and when not. Ok let me try to explain parts of the history and the big picture for at least the graphics use case on x86. In 1996/97 Intel came up with the idea of AGP: https://en.wikipedia.org/wiki/Accelerated_Graphics_Port At that time the CPUs, PCI bus and system memory were all connected together through the north bridge: https://en.wikipedia.org/wiki/Northbridge_(computing) The problem was that AGP also introduced the concept of putting large amounts of data for the video controller (PCI device) into system memory when you don't have enough local device memory (VRAM). But that meant when that memory is cached that the north bridge always had to snoop the CPU cache over the front side bus for every access the video controller made. This meant a huge performance bottleneck, so the idea was born to access that data uncached. Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, but the concept of putting video controller (GPU) stuff into uncached system memory has prevailed. So for example even modern AMD CPU based laptops need uncached system memory if their local memory is not large enough to contain the picture to display on the monitor. And with modern 8k monitors that can actually happen quite fast... What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of their local memory (iomem) or uncached/wc system memory. To summarize that we have an interface to fill in the page tables with either iomem or system memory is actually part of the design. That's how the HW driver is expected to work. >> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. > > I'll have to think about this a bit: assuming only vmf_insert_pfn() calls pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we sanity check that somebody is doing something against the will of PAT. I think the most defensive approach for a quick fix is this change here: static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode pcm) { - *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | - cachemode2protval(pcm)); + if (pcm != _PAGE_CACHE_MODE_WB) + *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | + cachemode2protval(pcm)); } This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but still allows callers to use something different on normal WB system memory. What do you think? Regards, Christian ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/3] drm/ttm: ... 2025-08-26 16:09 ` Re: Christian König @ 2025-08-27 9:13 ` Michel Dänzer 2025-08-28 21:18 ` stupid and complicated PAT :) David Hildenbrand 1 sibling, 0 replies; 50+ messages in thread From: Michel Dänzer @ 2025-08-27 9:13 UTC (permalink / raw) To: Christian König, David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes Public Service Announcement from your friendly neighbourhood mailing list moderation queue processor: I recommend setting a non-empty subject in follow-ups to this thread, or I might accidentally discard them from the moderation queue (I just almost did). -- Earthling Michel Dänzer \ GNOME / Xwayland / Mesa developer https://redhat.com \ Libre software enthusiast ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid and complicated PAT :) 2025-08-26 16:09 ` Re: Christian König 2025-08-27 9:13 ` [PATCH 0/3] drm/ttm: Michel Dänzer @ 2025-08-28 21:18 ` David Hildenbrand 2025-08-28 21:28 ` David Hildenbrand 1 sibling, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-28 21:18 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 18:09, Christian König wrote: > On 26.08.25 14:07, David Hildenbrand wrote: >>> >>>> 2) We add another interface that consumes PFNs, but explicitly states >>>> that it is only for ordinary system RAM, and that the user is >>>> required for updating the direct map. >>>> >>>> We could sanity-check the direct map in debug kernels. >>> >>> I would rather like to see vmf_insert_pfn_prot() fixed instead. >>> >>> That function was explicitly added to insert the PFN with the given attributes and as far as I can see all users of that function expect exactly that. >> >> It's all a bit tricky :( > > I would rather say horrible complicated :( > >>>> >>>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this >>>> system RAM differently. >>>> >>>> >>>> There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. >>>> >>>> In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. >>> >>> Well, exactly that's the use case here and that is not abusive at all as far as I can see. >>> >>> What drivers want is to insert a PFN with a certain set of caching attributes regardless if it's system memory or iomem. That's why vmf_insert_pfn_prot() was created in the first place. >> >> I mean, the use case of "allocate pages from the buddy and fixup the linear map" sounds perfectly reasonable to me. Absolutely no reason to get PAT involved. Nobody else should be messing with that memory after all. >> >> As soon as we are talking about other memory ranges (iomem) that are not from the buddy, it gets weird to bypass PAT, and the question I am asking myself is, when is it okay, and when not. > > Ok let me try to explain parts of the history and the big picture for at least the graphics use case on x86. > > In 1996/97 Intel came up with the idea of AGP: https://en.wikipedia.org/wiki/Accelerated_Graphics_Port > > At that time the CPUs, PCI bus and system memory were all connected together through the north bridge: https://en.wikipedia.org/wiki/Northbridge_(computing) > > The problem was that AGP also introduced the concept of putting large amounts of data for the video controller (PCI device) into system memory when you don't have enough local device memory (VRAM). > > But that meant when that memory is cached that the north bridge always had to snoop the CPU cache over the front side bus for every access the video controller made. This meant a huge performance bottleneck, so the idea was born to access that data uncached. Ack. > > > Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, but the concept of putting video controller (GPU) stuff into uncached system memory has prevailed. > > So for example even modern AMD CPU based laptops need uncached system memory if their local memory is not large enough to contain the picture to display on the monitor. And with modern 8k monitors that can actually happen quite fast... > > What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of their local memory (iomem) or uncached/wc system memory. That makes perfect sense. I assume we might or might not have "struct page" (pfn_valid) for the iomem, depending on where these areas reside, correct? > > > To summarize that we have an interface to fill in the page tables with either iomem or system memory is actually part of the design. That's how the HW driver is expected to work. > >>> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. >> >> I'll have to think about this a bit: assuming only vmf_insert_pfn() calls pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we sanity check that somebody is doing something against the will of PAT. > > I think the most defensive approach for a quick fix is this change here: > > static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode pcm) > { > - *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | > - cachemode2protval(pcm)); > + if (pcm != _PAGE_CACHE_MODE_WB) > + *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | > + cachemode2protval(pcm)); > } > > This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but still allows callers to use something different on normal WB system memory. > > What do you think? This feels like too big of a hammer. In particular, it changes things like phys_mem_access_prot_allowed(), which requires more care. First, I thought we should limit what we do to vmf_insert_pfn_prot() only. But then I realized that we have stuff like vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); I'm still trying to find out the easy way out that is not a complete hack. Will the iomem ever be mapped by the driver again with a different cache mode? (e.g., WB -> UC -> WB) -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid and complicated PAT :) 2025-08-28 21:18 ` stupid and complicated PAT :) David Hildenbrand @ 2025-08-28 21:28 ` David Hildenbrand 2025-08-28 21:32 ` David Hildenbrand 0 siblings, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-28 21:28 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 28.08.25 23:18, David Hildenbrand wrote: > On 26.08.25 18:09, Christian König wrote: >> On 26.08.25 14:07, David Hildenbrand wrote: >>>> >>>>> 2) We add another interface that consumes PFNs, but explicitly states >>>>> that it is only for ordinary system RAM, and that the user is >>>>> required for updating the direct map. >>>>> >>>>> We could sanity-check the direct map in debug kernels. >>>> >>>> I would rather like to see vmf_insert_pfn_prot() fixed instead. >>>> >>>> That function was explicitly added to insert the PFN with the given attributes and as far as I can see all users of that function expect exactly that. >>> >>> It's all a bit tricky :( >> >> I would rather say horrible complicated :( >> >>>>> >>>>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this >>>>> system RAM differently. >>>>> >>>>> >>>>> There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. >>>>> >>>>> In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. >>>> >>>> Well, exactly that's the use case here and that is not abusive at all as far as I can see. >>>> >>>> What drivers want is to insert a PFN with a certain set of caching attributes regardless if it's system memory or iomem. That's why vmf_insert_pfn_prot() was created in the first place. >>> >>> I mean, the use case of "allocate pages from the buddy and fixup the linear map" sounds perfectly reasonable to me. Absolutely no reason to get PAT involved. Nobody else should be messing with that memory after all. >>> >>> As soon as we are talking about other memory ranges (iomem) that are not from the buddy, it gets weird to bypass PAT, and the question I am asking myself is, when is it okay, and when not. >> >> Ok let me try to explain parts of the history and the big picture for at least the graphics use case on x86. >> >> In 1996/97 Intel came up with the idea of AGP: https://en.wikipedia.org/wiki/Accelerated_Graphics_Port >> >> At that time the CPUs, PCI bus and system memory were all connected together through the north bridge: https://en.wikipedia.org/wiki/Northbridge_(computing) >> >> The problem was that AGP also introduced the concept of putting large amounts of data for the video controller (PCI device) into system memory when you don't have enough local device memory (VRAM). >> >> But that meant when that memory is cached that the north bridge always had to snoop the CPU cache over the front side bus for every access the video controller made. This meant a huge performance bottleneck, so the idea was born to access that data uncached. > > Ack. > >> >> >> Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, but the concept of putting video controller (GPU) stuff into uncached system memory has prevailed. >> >> So for example even modern AMD CPU based laptops need uncached system memory if their local memory is not large enough to contain the picture to display on the monitor. And with modern 8k monitors that can actually happen quite fast... >> >> What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of their local memory (iomem) or uncached/wc system memory. > > That makes perfect sense. I assume we might or might not have "struct > page" (pfn_valid) for the iomem, depending on where these areas reside, > correct? > >> >> >> To summarize that we have an interface to fill in the page tables with either iomem or system memory is actually part of the design. That's how the HW driver is expected to work. >> >>>> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. >>> >>> I'll have to think about this a bit: assuming only vmf_insert_pfn() calls pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we sanity check that somebody is doing something against the will of PAT. >> >> I think the most defensive approach for a quick fix is this change here: >> >> static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode pcm) >> { >> - *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | >> - cachemode2protval(pcm)); >> + if (pcm != _PAGE_CACHE_MODE_WB) >> + *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | >> + cachemode2protval(pcm)); >> } >> >> This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but still allows callers to use something different on normal WB system memory. >> >> What do you think? > > This feels like too big of a hammer. In particular, it changes things > like phys_mem_access_prot_allowed(), which requires more care. > > First, I thought we should limit what we do to vmf_insert_pfn_prot() > only. But then I realized that we have stuff like > > vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); > > I'm still trying to find out the easy way out that is not a complete hack. > > Will the iomem ever be mapped by the driver again with a different cache > mode? (e.g., WB -> UC -> WB) What I am currently wondering is: assume we get a pfnmap_setup_cachemode_pfn() call and we could reliably identify whether there was a previous registration, then we could do (a) No previous registration: don't modify pgprot. Hopefully the driver knows what it is doing. Maybe we can add sanity checks that the direct map was already updated etc. (b) A previous registration: modify pgprot like we do today. System RAM is the problem. I wonder how many of these registrations we really get and if we could just store them in the same tree as !system RAM instead of abusing page flags. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid and complicated PAT :) 2025-08-28 21:28 ` David Hildenbrand @ 2025-08-28 21:32 ` David Hildenbrand 2025-08-29 10:50 ` Christian König 0 siblings, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-28 21:32 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 28.08.25 23:28, David Hildenbrand wrote: > On 28.08.25 23:18, David Hildenbrand wrote: >> On 26.08.25 18:09, Christian König wrote: >>> On 26.08.25 14:07, David Hildenbrand wrote: >>>>> >>>>>> 2) We add another interface that consumes PFNs, but explicitly states >>>>>> that it is only for ordinary system RAM, and that the user is >>>>>> required for updating the direct map. >>>>>> >>>>>> We could sanity-check the direct map in debug kernels. >>>>> >>>>> I would rather like to see vmf_insert_pfn_prot() fixed instead. >>>>> >>>>> That function was explicitly added to insert the PFN with the given attributes and as far as I can see all users of that function expect exactly that. >>>> >>>> It's all a bit tricky :( >>> >>> I would rather say horrible complicated :( >>> >>>>>> >>>>>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating this >>>>>> system RAM differently. >>>>>> >>>>>> >>>>>> There is also the option for a mixture between 1 and 2, where we get pages, but we map them non-refcounted in a VM_PFNMAP. >>>>>> >>>>>> In general, having pages makes it easier to assert that they are likely ordinary system ram pages, and that the interface is not getting abused for something else. >>>>> >>>>> Well, exactly that's the use case here and that is not abusive at all as far as I can see. >>>>> >>>>> What drivers want is to insert a PFN with a certain set of caching attributes regardless if it's system memory or iomem. That's why vmf_insert_pfn_prot() was created in the first place. >>>> >>>> I mean, the use case of "allocate pages from the buddy and fixup the linear map" sounds perfectly reasonable to me. Absolutely no reason to get PAT involved. Nobody else should be messing with that memory after all. >>>> >>>> As soon as we are talking about other memory ranges (iomem) that are not from the buddy, it gets weird to bypass PAT, and the question I am asking myself is, when is it okay, and when not. >>> >>> Ok let me try to explain parts of the history and the big picture for at least the graphics use case on x86. >>> >>> In 1996/97 Intel came up with the idea of AGP: https://en.wikipedia.org/wiki/Accelerated_Graphics_Port >>> >>> At that time the CPUs, PCI bus and system memory were all connected together through the north bridge: https://en.wikipedia.org/wiki/Northbridge_(computing) >>> >>> The problem was that AGP also introduced the concept of putting large amounts of data for the video controller (PCI device) into system memory when you don't have enough local device memory (VRAM). >>> >>> But that meant when that memory is cached that the north bridge always had to snoop the CPU cache over the front side bus for every access the video controller made. This meant a huge performance bottleneck, so the idea was born to access that data uncached. >> >> Ack. >> >>> >>> >>> Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, but the concept of putting video controller (GPU) stuff into uncached system memory has prevailed. >>> >>> So for example even modern AMD CPU based laptops need uncached system memory if their local memory is not large enough to contain the picture to display on the monitor. And with modern 8k monitors that can actually happen quite fast... >>> >>> What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of their local memory (iomem) or uncached/wc system memory. >> >> That makes perfect sense. I assume we might or might not have "struct >> page" (pfn_valid) for the iomem, depending on where these areas reside, >> correct? >> >>> >>> >>> To summarize that we have an interface to fill in the page tables with either iomem or system memory is actually part of the design. That's how the HW driver is expected to work. >>> >>>>> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. >>>> >>>> I'll have to think about this a bit: assuming only vmf_insert_pfn() calls pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we sanity check that somebody is doing something against the will of PAT. >>> >>> I think the most defensive approach for a quick fix is this change here: >>> >>> static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode pcm) >>> { >>> - *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | >>> - cachemode2protval(pcm)); >>> + if (pcm != _PAGE_CACHE_MODE_WB) >>> + *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | >>> + cachemode2protval(pcm)); >>> } >>> >>> This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but still allows callers to use something different on normal WB system memory. >>> >>> What do you think? >> >> This feels like too big of a hammer. In particular, it changes things >> like phys_mem_access_prot_allowed(), which requires more care. >> >> First, I thought we should limit what we do to vmf_insert_pfn_prot() >> only. But then I realized that we have stuff like >> >> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); >> >> I'm still trying to find out the easy way out that is not a complete hack. >> >> Will the iomem ever be mapped by the driver again with a different cache >> mode? (e.g., WB -> UC -> WB) > > What I am currently wondering is: assume we get a > pfnmap_setup_cachemode_pfn() call and we could reliably identify whether > there was a previous registration, then we could do > > (a) No previous registration: don't modify pgprot. Hopefully the driver > knows what it is doing. Maybe we can add sanity checks that the > direct map was already updated etc. > > (b) A previous registration: modify pgprot like we do today. > > System RAM is the problem. I wonder how many of these registrations we > really get and if we could just store them in the same tree as !system > RAM instead of abusing page flags. commit 9542ada803198e6eba29d3289abb39ea82047b92 Author: Suresh Siddha <suresh.b.siddha@intel.com> Date: Wed Sep 24 08:53:33 2008 -0700 x86: track memtype for RAM in page struct Track the memtype for RAM pages in page struct instead of using the memtype list. This avoids the explosion in the number of entries in memtype list (of the order of 20,000 with AGP) and makes the PAT tracking simpler. We are using PG_arch_1 bit in page->flags. We still use the memtype list for non RAM pages. I do wonder if that explosion is still an issue today. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid and complicated PAT :) 2025-08-28 21:32 ` David Hildenbrand @ 2025-08-29 10:50 ` Christian König 2025-08-29 19:52 ` David Hildenbrand 0 siblings, 1 reply; 50+ messages in thread From: Christian König @ 2025-08-29 10:50 UTC (permalink / raw) To: David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes You write mails faster than I can answer :) I will try to answer all questions and comment here, here but please ping me if I miss something. On 28.08.25 23:32, David Hildenbrand wrote: > On 28.08.25 23:28, David Hildenbrand wrote: >> On 28.08.25 23:18, David Hildenbrand wrote: >>> On 26.08.25 18:09, Christian König wrote: >>>> On 26.08.25 14:07, David Hildenbrand wrote: [SNIP] >>>> Well that was nearly 30years ago, PCI, AGP and front side bus are long gone, but the concept of putting video controller (GPU) stuff into uncached system memory has prevailed. >>>> >>>> So for example even modern AMD CPU based laptops need uncached system memory if their local memory is not large enough to contain the picture to display on the monitor. And with modern 8k monitors that can actually happen quite fast... >>>> >>>> What drivers do today is to call vmf_insert_pfn_prot() either with the PFN of their local memory (iomem) or uncached/wc system memory. >>> >>> That makes perfect sense. I assume we might or might not have "struct >>> page" (pfn_valid) for the iomem, depending on where these areas reside, >>> correct? Exactly that, yes. >>>> To summarize that we have an interface to fill in the page tables with either iomem or system memory is actually part of the design. That's how the HW driver is expected to work. >>>> >>>>>> That drivers need to call set_pages_wc/uc() for the linear mapping on x86 manually is correct and checking that is clearly a good idea for debug kernels. >>>>> >>>>> I'll have to think about this a bit: assuming only vmf_insert_pfn() calls pfnmap_setup_cachemode_pfn() but vmf_insert_pfn_prot() doesn't, how could we sanity check that somebody is doing something against the will of PAT. >>>> >>>> I think the most defensive approach for a quick fix is this change here: >>>> >>>> static inline void pgprot_set_cachemode(pgprot_t *prot, enum page_cache_mode pcm) >>>> { >>>> - *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | >>>> - cachemode2protval(pcm)); >>>> + if (pcm != _PAGE_CACHE_MODE_WB) >>>> + *prot = __pgprot((pgprot_val(*prot) & ~_PAGE_CACHE_MASK) | >>>> + cachemode2protval(pcm)); >>>> } >>>> >>>> This applies the PAT value if it's anything else than _PAGE_CACHE_MODE_WB but still allows callers to use something different on normal WB system memory. >>>> >>>> What do you think? >>> >>> This feels like too big of a hammer. In particular, it changes things >>> like phys_mem_access_prot_allowed(), which requires more care. >>> >>> First, I thought we should limit what we do to vmf_insert_pfn_prot() >>> only. But then I realized that we have stuff like >>> >>> vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); >>> >>> I'm still trying to find out the easy way out that is not a complete hack. Well I think when the change is limited to only to vmf_insert_pfn_prot() for now we can limit the risk quite a bit as well. Background is that only a handful of callers are using vmf_insert_pfn_prot() and it looks like all of those actually do know what they are doing and using the right flags. >>> Will the iomem ever be mapped by the driver again with a different cache >>> mode? (e.g., WB -> UC -> WB) Yes, that can absolutely happen. But for iomem we would have an explicit call to ioremap(), ioremap_wc(), ioremap_cache() for that before anybody would map anything into userspace page tables. But thinking more about it I just had an OMFG moment! Is it possible that the PAT currently already has a problem with that? We had customer projects where BARs of different PCIe devices ended up on different physical addresses after a hot remove/re-add. Is it possible that the PAT keeps enforcing certain caching attributes for a physical address? E.g. for example because a driver doesn't clean up properly on hot remove? If yes than that would explain a massive number of problems we had with hot add/remove. >> What I am currently wondering is: assume we get a >> pfnmap_setup_cachemode_pfn() call and we could reliably identify whether >> there was a previous registration, then we could do >> >> (a) No previous registration: don't modify pgprot. Hopefully the driver >> knows what it is doing. Maybe we can add sanity checks that the >> direct map was already updated etc. >> (b) A previous registration: modify pgprot like we do today. That would work for me. >> System RAM is the problem. I wonder how many of these registrations we >> really get and if we could just store them in the same tree as !system >> RAM instead of abusing page flags. > > commit 9542ada803198e6eba29d3289abb39ea82047b92 > Author: Suresh Siddha <suresh.b.siddha@intel.com> > Date: Wed Sep 24 08:53:33 2008 -0700 > > x86: track memtype for RAM in page struct > Track the memtype for RAM pages in page struct instead of using the > memtype list. This avoids the explosion in the number of entries in > memtype list (of the order of 20,000 with AGP) and makes the PAT > tracking simpler. > We are using PG_arch_1 bit in page->flags. > We still use the memtype list for non RAM pages. > > > I do wonder if that explosion is still an issue today. Yes it is. That is exactly the issue I'm working on here. It's just that AGP was replaced by internal GPU MMUs over time and so we don't use the old AGP code any more but just call get_free_pages() (or similar) directly. Thanks a lot for the help, Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid and complicated PAT :) 2025-08-29 10:50 ` Christian König @ 2025-08-29 19:52 ` David Hildenbrand 2025-08-29 19:58 ` David Hildenbrand 0 siblings, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-29 19:52 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes > > Yes, that can absolutely happen. But for iomem we would have an explicit call to ioremap(), ioremap_wc(), ioremap_cache() for that before anybody would map anything into userspace page tables. > > But thinking more about it I just had an OMFG moment! Is it possible that the PAT currently already has a problem with that? > > We had customer projects where BARs of different PCIe devices ended up on different physical addresses after a hot remove/re-add. > > Is it possible that the PAT keeps enforcing certain caching attributes for a physical address? E.g. for example because a driver doesn't clean up properly on hot remove? > > If yes than that would explain a massive number of problems we had with hot add/remove. The code is a mess, so if a driver messed up, likely everything is possible. TBH, the more I look at this all, the more WTF moments I am having. > >>> What I am currently wondering is: assume we get a >>> pfnmap_setup_cachemode_pfn() call and we could reliably identify whether >>> there was a previous registration, then we could do >>> >>> (a) No previous registration: don't modify pgprot. Hopefully the driver >>> knows what it is doing. Maybe we can add sanity checks that the >>> direct map was already updated etc. >>> (b) A previous registration: modify pgprot like we do today. > > That would work for me. > >>> System RAM is the problem. I wonder how many of these registrations we >>> really get and if we could just store them in the same tree as !system >>> RAM instead of abusing page flags. >> >> commit 9542ada803198e6eba29d3289abb39ea82047b92 >> Author: Suresh Siddha <suresh.b.siddha@intel.com> >> Date: Wed Sep 24 08:53:33 2008 -0700 >> >> x86: track memtype for RAM in page struct >> Track the memtype for RAM pages in page struct instead of using the >> memtype list. This avoids the explosion in the number of entries in >> memtype list (of the order of 20,000 with AGP) and makes the PAT >> tracking simpler. >> We are using PG_arch_1 bit in page->flags. >> We still use the memtype list for non RAM pages. >> >> >> I do wonder if that explosion is still an issue today. > > Yes it is. That is exactly the issue I'm working on here. > > It's just that AGP was replaced by internal GPU MMUs over time and so we don't use the old AGP code any more but just call get_free_pages() (or similar) directly. Okay, I thought I slowly understood how it works, then I stumbled over the set_memory_uc / set_memory_wc implementation and now I am *all confused*. I mean, that does perform a PAT reservation. But when is that reservation ever freed again? :/ How can set_memory_wc() followed by set_memory_uc() possibly work? I am pretty sure I am missing a piece of the puzzle. I think you mentioned that set_memory_uc() is avoided by drivers because of highmem mess, but what are drivers then using to modify the direct map? -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid and complicated PAT :) 2025-08-29 19:52 ` David Hildenbrand @ 2025-08-29 19:58 ` David Hildenbrand 0 siblings, 0 replies; 50+ messages in thread From: David Hildenbrand @ 2025-08-29 19:58 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 29.08.25 21:52, David Hildenbrand wrote: > >> >> Yes, that can absolutely happen. But for iomem we would have an explicit call to ioremap(), ioremap_wc(), ioremap_cache() for that before anybody would map anything into userspace page tables. >> >> But thinking more about it I just had an OMFG moment! Is it possible that the PAT currently already has a problem with that? >> >> We had customer projects where BARs of different PCIe devices ended up on different physical addresses after a hot remove/re-add. >> >> Is it possible that the PAT keeps enforcing certain caching attributes for a physical address? E.g. for example because a driver doesn't clean up properly on hot remove? >> >> If yes than that would explain a massive number of problems we had with hot add/remove. > > The code is a mess, so if a driver messed up, likely everything is possible. > > TBH, the more I look at this all, the more WTF moments I am having. > >> >>>> What I am currently wondering is: assume we get a >>>> pfnmap_setup_cachemode_pfn() call and we could reliably identify whether >>>> there was a previous registration, then we could do >>>> >>>> (a) No previous registration: don't modify pgprot. Hopefully the driver >>>> knows what it is doing. Maybe we can add sanity checks that the >>>> direct map was already updated etc. >>>> (b) A previous registration: modify pgprot like we do today. >> >> That would work for me. >> >>>> System RAM is the problem. I wonder how many of these registrations we >>>> really get and if we could just store them in the same tree as !system >>>> RAM instead of abusing page flags. >>> >>> commit 9542ada803198e6eba29d3289abb39ea82047b92 >>> Author: Suresh Siddha <suresh.b.siddha@intel.com> >>> Date: Wed Sep 24 08:53:33 2008 -0700 >>> >>> x86: track memtype for RAM in page struct >>> Track the memtype for RAM pages in page struct instead of using the >>> memtype list. This avoids the explosion in the number of entries in >>> memtype list (of the order of 20,000 with AGP) and makes the PAT >>> tracking simpler. >>> We are using PG_arch_1 bit in page->flags. >>> We still use the memtype list for non RAM pages. >>> >>> >>> I do wonder if that explosion is still an issue today. >> >> Yes it is. That is exactly the issue I'm working on here. >> >> It's just that AGP was replaced by internal GPU MMUs over time and so we don't use the old AGP code any more but just call get_free_pages() (or similar) directly. > > Okay, I thought I slowly understood how it works, then I stumbled over > the set_memory_uc / set_memory_wc implementation and now I am *all > confused*. > > I mean, that does perform a PAT reservation. > > But when is that reservation ever freed again? :/ Ah, set_memory_wb() does that. It just frees stuff. It should have been called something like "reset", probably. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 9:56 ` Re: Christian König 2025-08-26 12:07 ` Re: David Hildenbrand @ 2025-08-26 14:27 ` Thomas Hellström 2025-08-28 21:01 ` stupid PAT :) David Hildenbrand 1 sibling, 1 reply; 50+ messages in thread From: Thomas Hellström @ 2025-08-26 14:27 UTC (permalink / raw) To: Christian König, David Hildenbrand, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes Hi, Christian, On Tue, 2025-08-26 at 11:56 +0200, Christian König wrote: > On 26.08.25 11:17, David Hildenbrand wrote: > > On 26.08.25 11:00, Christian König wrote: > > > On 26.08.25 10:46, David Hildenbrand wrote: > > > > > > So my assumption would be that that is missing for the > > > > > > drivers here? > > > > > > > > > > Well yes and no. > > > > > > > > > > See the PAT is optimized for applying specific caching > > > > > attributes to ranges [A..B] (e.g. it uses an R/B tree). But > > > > > what drivers do here is that they have single pages (usually > > > > > for get_free_page or similar) and want to apply a certain > > > > > caching attribute to it. > > > > > > > > > > So what would happen is that we completely clutter the R/B > > > > > tree used by the PAT with thousands if not millions of > > > > > entries. > > > > > > > > > > > > > Hm, above you're saying that there is no direct map, but now > > > > you are saying that the pages were obtained through > > > > get_free_page()? > > > > > > The problem only happens with highmem pages on 32bit kernels. > > > Those pages are not in the linear mapping. > > > > Right, in the common case there is a direct map. > > > > > > > > > I agree that what you describe here sounds suboptimal. But if > > > > the pages where obtained from the buddy, there surely is a > > > > direct map -- unless we explicitly remove it :( > > > > > > > > If we're talking about individual pages without a directmap, I > > > > would wonder if they are actually part of a bigger memory > > > > region that can just be reserved in one go (similar to how > > > > remap_pfn_range()) would handle it. > > > > > > > > Can you briefly describe how your use case obtains these PFNs, > > > > and how scattered tehy + their caching attributes might be? > > > > > > What drivers do is to call get_free_page() or alloc_pages_node() > > > with the GFP_HIGHUSER flag set. > > > > > > For non highmem pages drivers then calls set_pages_wc/uc() which > > > changes the caching of the linear mapping, but for highmem pages > > > there is no linear mapping so set_pages_wc() or set_pages_uc() > > > doesn't work and drivers avoid calling it. > > > > > > Those are basically just random system memory pages. So they are > > > potentially scattered over the whole memory address space. > > > > Thanks, that's valuable information. > > > > So essentially these drivers maintain their own consistency and PAT > > is not aware of that. > > > > And the real problem is ordinary system RAM. > > > > There are various ways forward. > > > > 1) We use another interface that consumes pages instead of PFNs, > > like a > > vm_insert_pages_pgprot() we would be adding. > > > > Is there any strong requirement for inserting non-refcounted > > PFNs? > > Yes, there is a strong requirement to insert non-refcounted PFNs. > > We had a lot of trouble with KVM people trying to grab a reference to > those pages even if the VMA had the VM_PFNMAP flag set. > > > 2) We add another interface that consumes PFNs, but explicitly > > states > > that it is only for ordinary system RAM, and that the user is > > required for updating the direct map. > > > > We could sanity-check the direct map in debug kernels. > > I would rather like to see vmf_insert_pfn_prot() fixed instead. > > That function was explicitly added to insert the PFN with the given > attributes and as far as I can see all users of that function expect > exactly that. > > > > > 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating > > this > > system RAM differently. > > > > > > There is also the option for a mixture between 1 and 2, where we > > get pages, but we map them non-refcounted in a VM_PFNMAP. > > > > In general, having pages makes it easier to assert that they are > > likely ordinary system ram pages, and that the interface is not > > getting abused for something else. > > Well, exactly that's the use case here and that is not abusive at all > as far as I can see. > > What drivers want is to insert a PFN with a certain set of caching > attributes regardless if it's system memory or iomem. That's why > vmf_insert_pfn_prot() was created in the first place. > > That drivers need to call set_pages_wc/uc() for the linear mapping on > x86 manually is correct and checking that is clearly a good idea for > debug kernels. So where is this trending? Is the current suggestion to continue disallowing aliased mappings with conflicting caching modes and enforce checks in debug kernels? /Thomas > > > We could also perform the set_pages_wc/uc() from inside that > > function, but maybe it depends on the use case whether we want to > > do that whenever we map them into a process? > > It sounds like a good idea in theory, but I think it is potentially > to much overhead to be applicable. > > Thanks, > Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: stupid PAT :) 2025-08-26 14:27 ` Thomas Hellström @ 2025-08-28 21:01 ` David Hildenbrand 0 siblings, 0 replies; 50+ messages in thread From: David Hildenbrand @ 2025-08-28 21:01 UTC (permalink / raw) To: Thomas Hellström, Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 16:27, Thomas Hellström wrote: > Hi, Christian, > > On Tue, 2025-08-26 at 11:56 +0200, Christian König wrote: >> On 26.08.25 11:17, David Hildenbrand wrote: >>> On 26.08.25 11:00, Christian König wrote: >>>> On 26.08.25 10:46, David Hildenbrand wrote: >>>>>>> So my assumption would be that that is missing for the >>>>>>> drivers here? >>>>>> >>>>>> Well yes and no. >>>>>> >>>>>> See the PAT is optimized for applying specific caching >>>>>> attributes to ranges [A..B] (e.g. it uses an R/B tree). But >>>>>> what drivers do here is that they have single pages (usually >>>>>> for get_free_page or similar) and want to apply a certain >>>>>> caching attribute to it. >>>>>> >>>>>> So what would happen is that we completely clutter the R/B >>>>>> tree used by the PAT with thousands if not millions of >>>>>> entries. >>>>>> >>>>> >>>>> Hm, above you're saying that there is no direct map, but now >>>>> you are saying that the pages were obtained through >>>>> get_free_page()? >>>> >>>> The problem only happens with highmem pages on 32bit kernels. >>>> Those pages are not in the linear mapping. >>> >>> Right, in the common case there is a direct map. >>> >>>> >>>>> I agree that what you describe here sounds suboptimal. But if >>>>> the pages where obtained from the buddy, there surely is a >>>>> direct map -- unless we explicitly remove it :( >>>>> >>>>> If we're talking about individual pages without a directmap, I >>>>> would wonder if they are actually part of a bigger memory >>>>> region that can just be reserved in one go (similar to how >>>>> remap_pfn_range()) would handle it. >>>>> >>>>> Can you briefly describe how your use case obtains these PFNs, >>>>> and how scattered tehy + their caching attributes might be? >>>> >>>> What drivers do is to call get_free_page() or alloc_pages_node() >>>> with the GFP_HIGHUSER flag set. >>>> >>>> For non highmem pages drivers then calls set_pages_wc/uc() which >>>> changes the caching of the linear mapping, but for highmem pages >>>> there is no linear mapping so set_pages_wc() or set_pages_uc() >>>> doesn't work and drivers avoid calling it. >>>> >>>> Those are basically just random system memory pages. So they are >>>> potentially scattered over the whole memory address space. >>> >>> Thanks, that's valuable information. >>> >>> So essentially these drivers maintain their own consistency and PAT >>> is not aware of that. >>> >>> And the real problem is ordinary system RAM. >>> >>> There are various ways forward. >>> >>> 1) We use another interface that consumes pages instead of PFNs, >>> like a >>> vm_insert_pages_pgprot() we would be adding. >>> >>> Is there any strong requirement for inserting non-refcounted >>> PFNs? >> >> Yes, there is a strong requirement to insert non-refcounted PFNs. >> >> We had a lot of trouble with KVM people trying to grab a reference to >> those pages even if the VMA had the VM_PFNMAP flag set. >> >>> 2) We add another interface that consumes PFNs, but explicitly >>> states >>> that it is only for ordinary system RAM, and that the user is >>> required for updating the direct map. >>> >>> We could sanity-check the direct map in debug kernels. >> >> I would rather like to see vmf_insert_pfn_prot() fixed instead. >> >> That function was explicitly added to insert the PFN with the given >> attributes and as far as I can see all users of that function expect >> exactly that. >> >>> >>> 3) We teach PAT code in pfnmap_setup_cachemode_pfn() about treating >>> this >>> system RAM differently. >>> >>> >>> There is also the option for a mixture between 1 and 2, where we >>> get pages, but we map them non-refcounted in a VM_PFNMAP. >>> >>> In general, having pages makes it easier to assert that they are >>> likely ordinary system ram pages, and that the interface is not >>> getting abused for something else. >> >> Well, exactly that's the use case here and that is not abusive at all >> as far as I can see. >> >> What drivers want is to insert a PFN with a certain set of caching >> attributes regardless if it's system memory or iomem. That's why >> vmf_insert_pfn_prot() was created in the first place. >> >> That drivers need to call set_pages_wc/uc() for the linear mapping on >> x86 manually is correct and checking that is clearly a good idea for >> debug kernels. > > So where is this trending? Is the current suggestion to continue > disallowing aliased mappings with conflicting caching modes and enforce > checks in debug kernels? Not sure, it's a mess. The big question is to find out when it is really ok to bypass PAT and when to better let it have a saying. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-08-26 8:38 ` Re: Christian König 2025-08-26 8:46 ` Re: David Hildenbrand @ 2025-08-26 12:37 ` David Hildenbrand 1 sibling, 0 replies; 50+ messages in thread From: David Hildenbrand @ 2025-08-26 12:37 UTC (permalink / raw) To: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86 Cc: airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Lorenzo Stoakes On 26.08.25 10:38, Christian König wrote: > On 25.08.25 21:10, David Hildenbrand wrote: >> On 21.08.25 10:10, Christian König wrote: >>> On 20.08.25 17:23, David Hildenbrand wrote: >>>> CCing Lorenzo >>>> >>>> On 20.08.25 16:33, Christian König wrote: >>>>> Hi everyone, >>>>> >>>>> sorry for CCing so many people, but that rabbit hole turned out to be >>>>> deeper than originally thought. >>>>> >>>>> TTM always had problems with UC/WC mappings on 32bit systems and drivers >>>>> often had to revert to hacks like using GFP_DMA32 to get things working >>>>> while having no rational explanation why that helped (see the TTM AGP, >>>>> radeon and nouveau driver code for that). >>>>> >>>>> It turned out that the PAT implementation we use on x86 not only enforces >>>>> the same caching attributes for pages in the linear kernel mapping, but >>>>> also for highmem pages through a separate R/B tree. >>>>> >>>>> That was unexpected and TTM never updated that R/B tree for highmem pages, >>>>> so the function pgprot_set_cachemode() just overwrote the caching >>>>> attributes drivers passed in to vmf_insert_pfn_prot() and that essentially >>>>> caused all kind of random trouble. >>>>> >>>>> An R/B tree is potentially not a good data structure to hold thousands if >>>>> not millions of different attributes for each page, so updating that is >>>>> probably not the way to solve this issue. >>>>> >>>>> Thomas pointed out that the i915 driver is using apply_page_range() >>>>> instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and >>>>> just fill in the page tables with what the driver things is the right >>>>> caching attribute. >>>> >>>> I assume you mean apply_to_page_range() -- same issue in patch subjects. >>> >>> Oh yes, of course. Sorry. >>> >>>> Oh this sounds horrible. Why oh why do we have these hacks in core-mm and have drivers abuse them :( >>> >>> Yeah I was also a bit hesitated to use that, but the performance advantage is so high that we probably can't avoid the general approach. >>> >>>> Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird crap to page tables because "you know better". >>> >>> Exactly that's the problem I'm pointing out, drivers *do* know it better. The core memory management has applied incorrect values which caused all kind of the trouble. >>> >>> The problem is not a bug in PAT nor TTM/drivers but rather how they interact with each other. >>> >>> What I don't understand is why do we have the PAT in the first place? No other architecture does it this way. >> >> Probably because no other architecture has these weird glitches I assume ... skimming over memtype_reserve() and friends there are quite some corner cases the code is handling (BIOS, ACPI, low ISA, system RAM, ...) >> >> >> I did a lot of work on the higher PAT level functions, but I am no expert on the lower level management functions, and in particular all the special cases with different memory types. >> >> IIRC, the goal of the PAT subsystem is to make sure that no two page tables map the same PFN with different caching attributes. > > Yeah, that actually makes sense. Thomas from Intel recently explained the technical background to me: > > Some x86 CPUs write back cache lines even if they aren't dirty and what can happen is that because of the linear mapping the CPU speculatively loads a cache line which is elsewhere mapped uncached. > > So the end result is that the writeback of not dirty cache lines potentially corrupts the data in the otherwise uncached system memory. > > But that a) only applies to memory in the linear mapping and b) only to a handful of x86 CPU types (e.g. recently Intels Luna Lake, AMD Athlons produced before 2004, maybe others). > >> It treats ordinary system RAM (IORESOURCE_SYSTEM_RAM) usually in a special way: no special caching mode. >> >> For everything else, it expects that someone first reserves a memory range for a specific caching mode. >> >> For example, remap_pfn_range()...->pfnmap_track()->memtype_reserve() will make sure that there are no conflicts, to the call memtype_kernel_map_sync() to make sure the identity mapping is updated to the new type. >> >> In case someone ends up calling pfnmap_setup_cachemode(), the expectation is that there was a previous call to memtype_reserve_io() or similar, such that pfnmap_setup_cachemode() will find that caching mode. >> >> >> So my assumption would be that that is missing for the drivers here? > > Well yes and no. > > See the PAT is optimized for applying specific caching attributes to ranges [A..B] (e.g. it uses an R/B tree). But what drivers do here is that they have single pages (usually for get_free_page or similar) and want to apply a certain caching attribute to it. One clarification after staring at PAT code once again: for pages (RAM), the caching attribute is stored in the page flags, not in the R/B tree. If nothing was set, it defaults to _PAGE_CACHE_MODE_WB AFAIKs. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: your mail 2025-08-20 15:23 ` David Hildenbrand 2025-08-21 8:10 ` Re: Christian König @ 2025-08-21 9:16 ` Lorenzo Stoakes 2025-08-21 9:30 ` David Hildenbrand 1 sibling, 1 reply; 50+ messages in thread From: Lorenzo Stoakes @ 2025-08-21 9:16 UTC (permalink / raw) To: David Hildenbrand Cc: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Liam Howlett +cc Liam as he's also had some fun with PAT in the past. On Wed, Aug 20, 2025 at 05:23:07PM +0200, David Hildenbrand wrote: > CCing Lorenzo > > On 20.08.25 16:33, Christian König wrote: > > Hi everyone, > > > > sorry for CCing so many people, but that rabbit hole turned out to be > > deeper than originally thought. > > > > TTM always had problems with UC/WC mappings on 32bit systems and drivers > > often had to revert to hacks like using GFP_DMA32 to get things working > > while having no rational explanation why that helped (see the TTM AGP, > > radeon and nouveau driver code for that). > > > > It turned out that the PAT implementation we use on x86 not only enforces > > the same caching attributes for pages in the linear kernel mapping, but > > also for highmem pages through a separate R/B tree. Obviously this aspect is on the PAT guys. PAT has caused some concerns for us in mm before, cf. David's series at [0]. [0]:https://lore.kernel.org/linux-mm/20250512123424.637989-1-david@redhat.com/ > > > > That was unexpected and TTM never updated that R/B tree for highmem pages, > > so the function pgprot_set_cachemode() just overwrote the caching > > attributes drivers passed in to vmf_insert_pfn_prot() and that essentially > > caused all kind of random trouble. > > > > An R/B tree is potentially not a good data structure to hold thousands if > > not millions of different attributes for each page, so updating that is > > probably not the way to solve this issue. > > > > Thomas pointed out that the i915 driver is using apply_page_range() > > instead of vmf_insert_pfn_prot() to circumvent the PAT implementation and > > just fill in the page tables with what the driver things is the right > > caching attribute. > > I assume you mean apply_to_page_range() -- same issue in patch subjects. > > Oh this sounds horrible. Why oh why do we have these hacks in core-mm and > have drivers abuse them :( Yeah this is not intended behaviour and I actually think we should not permit this at all. In fact I think we should un-export this. I think the hold up with it is xen, as the only other users are arch code. Probably we need to find a new interface just for xen and provide that just for them... > > Honestly, apply_to_pte_range() is just the entry in doing all kinds of weird > crap to page tables because "you know better". Yes. This is just not permitted for general driver usage and is an abuse of the mm API really. Esp. when the underlying issue is not to do with core mm... > > All the sanity checks from vmf_insert_pfn(), gone. > > Can we please fix the underlying issue properly? Yes, PLEASE. > > -- > Cheers > > David / dhildenb > I will add this xen/apply_to_page_range() thing to my TODOs, which atm would invovle changing these drivers to use vmf_insert_pfn_prot() instead. So ideally we'll have addressed the underlying issue before I get to this, because this really really shouldn't be something we allow drivers to use generally. Cheers, Lorenzo ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: your mail 2025-08-21 9:16 ` your mail Lorenzo Stoakes @ 2025-08-21 9:30 ` David Hildenbrand 2025-08-21 10:05 ` Lorenzo Stoakes 0 siblings, 1 reply; 50+ messages in thread From: David Hildenbrand @ 2025-08-21 9:30 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Liam Howlett > I will add this xen/apply_to_page_range() thing to my TODOs, which atm > would invovle changing these drivers to use vmf_insert_pfn_prot() instead. > Busy today (want to reply to Christian) but a) Re: performance, we would want something like vmf_insert_pfns_prot(), similar to vm_insert_pages(), to bulk-insert multiple PFNs. b) Re: PAT, we'll have to figure out why PAT information is wrong here (was there no previous PAT reservation from the driver?), but IF we really have to override, we'd want a way to tell vmf_insert_pfn_prot() to force the selected caching mode. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: your mail 2025-08-21 9:30 ` David Hildenbrand @ 2025-08-21 10:05 ` Lorenzo Stoakes 2025-08-21 10:16 ` David Hildenbrand 2025-08-25 18:35 ` Christian König 0 siblings, 2 replies; 50+ messages in thread From: Lorenzo Stoakes @ 2025-08-21 10:05 UTC (permalink / raw) To: David Hildenbrand Cc: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Liam Howlett On Thu, Aug 21, 2025 at 11:30:43AM +0200, David Hildenbrand wrote: > > I will add this xen/apply_to_page_range() thing to my TODOs, which atm > > would invovle changing these drivers to use vmf_insert_pfn_prot() instead. > > > > Busy today (want to reply to Christian) but > > a) Re: performance, we would want something like > vmf_insert_pfns_prot(), similar to vm_insert_pages(), to bulk-insert > multiple PFNs. > > b) Re: PAT, we'll have to figure out why PAT information is wrong here > (was there no previous PAT reservation from the driver?), but IF we > really have to override, we'd want a way to tell > vmf_insert_pfn_prot() to force the selected caching mode. > Ack, ok good that we have a feasible way forward. FYI, spoke to Peter off-list and he mentioned he had a more general series to get rid of this kind of [ab]use of apply_to_page_range() (see [0]), I gather he hasn't the time to resurrect but perhaps one of us can at some point? Perhaps we need a shorter term fix to _this_ issue (which involves not using this interface), and then follow it up with an adaptation of the below? Cheers, Lorenzo [0]:https://lore.kernel.org/all/20210412080012.357146277@infradead.org/ > -- > Cheers > > David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: your mail 2025-08-21 10:05 ` Lorenzo Stoakes @ 2025-08-21 10:16 ` David Hildenbrand 2025-08-25 18:35 ` Christian König 1 sibling, 0 replies; 50+ messages in thread From: David Hildenbrand @ 2025-08-21 10:16 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Christian König, intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Liam Howlett On 21.08.25 12:05, Lorenzo Stoakes wrote: > On Thu, Aug 21, 2025 at 11:30:43AM +0200, David Hildenbrand wrote: >>> I will add this xen/apply_to_page_range() thing to my TODOs, which atm >>> would invovle changing these drivers to use vmf_insert_pfn_prot() instead. >>> >> >> Busy today (want to reply to Christian) but >> >> a) Re: performance, we would want something like >> vmf_insert_pfns_prot(), similar to vm_insert_pages(), to bulk-insert >> multiple PFNs. >> >> b) Re: PAT, we'll have to figure out why PAT information is wrong here >> (was there no previous PAT reservation from the driver?), but IF we >> really have to override, we'd want a way to tell >> vmf_insert_pfn_prot() to force the selected caching mode. >> > > Ack, ok good that we have a feasible way forward. > > FYI, spoke to Peter off-list and he mentioned he had a more general series > to get rid of this kind of [ab]use of apply_to_page_range() (see [0]), I > gather he hasn't the time to resurrect but perhaps one of us can at some > point? > > Perhaps we need a shorter term fix to _this_ issue (which involves not > using this interface), and then follow it up with an adaptation of the > below? We need to understand why PAT would be wrong and why it would even be ok to ignore it. Not hacking around it. FWIW, I just recently documented: +/** + * pfnmap_setup_cachemode - setup the cachemode in the pgprot for a pfn range + * @pfn: the start of the pfn range + * @size: the size of the pfn range in bytes + * @prot: the pgprot to modify + * + * Lookup the cachemode for the pfn range starting at @pfn with the size + * @size and store it in @prot, leaving other data in @prot unchanged. + * + * This allows for a hardware implementation to have fine-grained control of + * memory cache behavior at page level granularity. Without a hardware + * implementation, this function does nothing. + * + * Currently there is only one implementation for this - x86 Page Attribute + * Table (PAT). See Documentation/arch/x86/pat.rst for more details. + * + * This function can fail if the pfn range spans pfns that require differing + * cachemodes. If the pfn range was previously verified to have a single + * cachemode, it is sufficient to query only a single pfn. The assumption is + * that this is the case for drivers using the vmf_insert_pfn*() interface. + * + * Returns 0 on success and -EINVAL on error. + */ +int pfnmap_setup_cachemode(unsigned long pfn, unsigned long size, + pgprot_t *prot); extern int track_pfn_copy(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, unsigned long *pfn); extern void untrack_pfn_copy(struct vm_area_struct *dst_vma, @@ -1563,6 +1584,21 @@ extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, extern void untrack_pfn_clear(struct vm_area_struct *vma); #endif +/** + * pfnmap_setup_cachemode_pfn - setup the cachemode in the pgprot for a pfn + * @pfn: the pfn + * @prot: the pgprot to modify + * + * Lookup the cachemode for @pfn and store it in @prot, leaving other + * data in @prot unchanged. + * + * See pfnmap_setup_cachemode() for details. + */ +static inline void pfnmap_setup_cachemode_pfn(unsigned long pfn, pgprot_t *prot) +{ + pfnmap_setup_cachemode(pfn, PAGE_SIZE, prot); +} There is certainly something missing that the driver would have previously verified that the cachemode is as expected. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: your mail 2025-08-21 10:05 ` Lorenzo Stoakes 2025-08-21 10:16 ` David Hildenbrand @ 2025-08-25 18:35 ` Christian König 2025-08-25 19:20 ` David Hildenbrand 1 sibling, 1 reply; 50+ messages in thread From: Christian König @ 2025-08-25 18:35 UTC (permalink / raw) To: Lorenzo Stoakes, David Hildenbrand Cc: intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Liam Howlett On 21.08.25 12:05, Lorenzo Stoakes wrote: > On Thu, Aug 21, 2025 at 11:30:43AM +0200, David Hildenbrand wrote: >>> I will add this xen/apply_to_page_range() thing to my TODOs, which atm >>> would invovle changing these drivers to use vmf_insert_pfn_prot() instead. >>> >> >> Busy today (want to reply to Christian) but >> >> a) Re: performance, we would want something like >> vmf_insert_pfns_prot(), similar to vm_insert_pages(), to bulk-insert >> multiple PFNs. Yes, exactly that. Ideally something like an iterator/callback like interface. I've seen at least four or five different representations of the PFNs in drivers. >> b) Re: PAT, we'll have to figure out why PAT information is wrong here >> (was there no previous PAT reservation from the driver?), but IF we >> really have to override, we'd want a way to tell >> vmf_insert_pfn_prot() to force the selected caching mode. >> Well the difference between vmf_insert_pfn() and vmf_insert_pfn_prot() is that the driver actually want to specify the caching modes. That this is overridden by the PAT even for pages which are not part of the linear mapping is really surprising. As far as I can see there is no technical necessity for that. Even for pages in the linear mapping only a handful of x86 CPUs actually need that. See Intels i915 GPU driver for reference. Intel has used that approach for ages and for AMD CPUs the only reference I could find where the kernel needs it are Athlons produced between 1996 and 2004. Maybe we should disable the PAT on CPUs which actually don't need it? > Ack, ok good that we have a feasible way forward. > > FYI, spoke to Peter off-list and he mentioned he had a more general series > to get rid of this kind of [ab]use of apply_to_page_range() (see [0]), I > gather he hasn't the time to resurrect but perhaps one of us can at some > point? > > Perhaps we need a shorter term fix to _this_ issue (which involves not > using this interface), and then follow it up with an adaptation of the > below? Sounds like a plan to me. Regards, Christian. > > Cheers, Lorenzo > > [0]:https://lore.kernel.org/all/20210412080012.357146277@infradead.org/ > > > >> -- >> Cheers >> >> David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: your mail 2025-08-25 18:35 ` Christian König @ 2025-08-25 19:20 ` David Hildenbrand 0 siblings, 0 replies; 50+ messages in thread From: David Hildenbrand @ 2025-08-25 19:20 UTC (permalink / raw) To: Christian König, Lorenzo Stoakes Cc: intel-xe, intel-gfx, dri-devel, amd-gfx, x86, airlied, thomas.hellstrom, matthew.brost, dave.hansen, luto, peterz, Liam Howlett On 25.08.25 20:35, Christian König wrote: > On 21.08.25 12:05, Lorenzo Stoakes wrote: >> On Thu, Aug 21, 2025 at 11:30:43AM +0200, David Hildenbrand wrote: >>>> I will add this xen/apply_to_page_range() thing to my TODOs, which atm >>>> would invovle changing these drivers to use vmf_insert_pfn_prot() instead. >>>> >>> >>> Busy today (want to reply to Christian) but >>> >>> a) Re: performance, we would want something like >>> vmf_insert_pfns_prot(), similar to vm_insert_pages(), to bulk-insert >>> multiple PFNs. > > Yes, exactly that. Ideally something like an iterator/callback like interface. > > I've seen at least four or five different representations of the PFNs in drivers. > >>> b) Re: PAT, we'll have to figure out why PAT information is wrong here >>> (was there no previous PAT reservation from the driver?), but IF we >>> really have to override, we'd want a way to tell >>> vmf_insert_pfn_prot() to force the selected caching mode. >>> > > Well the difference between vmf_insert_pfn() and vmf_insert_pfn_prot() is that the driver actually want to specify the caching modes. Yes, it's all a mess. x86/PAT doesn't want inconsistencies, so it expects that a previous reservation would make sure that that caching mode is actually valid. > > That this is overridden by the PAT even for pages which are not part of the linear mapping is really surprising. Yes, IIUC, it expects an earlier reservation on PAT systems. > > As far as I can see there is no technical necessity for that. Even for pages in the linear mapping only a handful of x86 CPUs actually need that. See Intels i915 GPU driver for reference. > > Intel has used that approach for ages and for AMD CPUs the only reference I could find where the kernel needs it are Athlons produced between 1996 and 2004. > > Maybe we should disable the PAT on CPUs which actually don't need it? Not sure if that will solve our problems on systems that need it because of some devices. I guess the problem of pfnmap_setup_cachemode_pfn() is that there is no interface to undo it: pfnmap_track() is pared with pfnmap_untrack() such that it can simply do/undo the reservation itself. That's why pfnmap_setup_cachemode_pfn() leaves it up to the caller that a reservation was trigger earlier differently -- which can properly be undone. -- Cheers David / dhildenb ^ permalink raw reply [flat|nested] 50+ messages in thread
* (no subject) @ 2025-01-08 13:59 Jiang Liu 2025-01-08 14:10 ` Christian König 2025-01-08 16:33 ` Re: Mario Limonciello 0 siblings, 2 replies; 50+ messages in thread From: Jiang Liu @ 2025-01-08 13:59 UTC (permalink / raw) To: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello, Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx Cc: Jiang Liu Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume Recently we were testing suspend/resume functionality with AMD GPUs, we have encountered several resource tracking related bugs, such as double buffer free, use after free and unbalanced irq reference count. We have tried to solve these issues case by case, but found that may not be the right way. Especially about the unbalanced irq reference count, there will be new issues appear once we fixed the current known issues. After analyzing related source code, we found that there may be some fundamental implementaion flaws behind these resource tracking issues. The amdgpu driver has two major state machines to driver the device management flow, one is for ip blocks, the other is for ras blocks. The hook points defined in struct amd_ip_funcs for device setup/teardown are symmetric, but the implementation is asymmetric, sometime even ambiguous. The most obvious two issues we noticed are: 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() are called from .hw_fini() instead of .early_fini(). 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't match the way to set those flags. When taking device suspend/resume into account, in addition to device probe/remove, things get much more complex. Some issues arise because many suspend/resume implementations directly reuse .hw_init/.hw_fini/ .late_init hook points. So we try to fix those issues by two enhancements/refinements to current device management state machines. The first change is to make the ip block state machine and associated status flags work in stack-like way as below: Callback Status Flags early_init: valid = true sw_init: sw = true hw_init: hw = true late_init: late_initialized = true early_fini: late_initialized = false hw_fini: hw = false sw_fini: sw = false late_fini: valid = false Also do the same thing for ras block state machine, though it's much more simpler. The second change is fine tune the overall device management work flow as below: 1. amdgpu_driver_load_kms() amdgpu_device_init() amdgpu_device_ip_early_init() ip_blocks[i].early_init() ip_blocks[i].status.valid = true amdgpu_device_ip_init() amdgpu_ras_init() ip_blocks[i].sw_init() ip_blocks[i].status.sw = true ip_blocks[i].hw_init() ip_blocks[i].status.hw = true amdgpu_device_ip_late_init() ip_blocks[i].late_init() ip_blocks[i].status.late_initialized = true amdgpu_ras_late_init() ras_blocks[i].ras_late_init() amdgpu_ras_feature_enable_on_boot() 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff() amdgpu_device_suspend() amdgpu_ras_early_fini() ras_blocks[i].ras_early_fini() amdgpu_ras_feature_disable() amdgpu_ras_suspend() amdgpu_ras_disable_all_features() +++ ip_blocks[i].early_fini() +++ ip_blocks[i].status.late_initialized = false ip_blocks[i].suspend() 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore() amdgpu_device_resume() amdgpu_device_ip_resume() ip_blocks[i].resume() amdgpu_device_ip_late_init() ip_blocks[i].late_init() ip_blocks[i].status.late_initialized = true amdgpu_ras_late_init() ras_blocks[i].ras_late_init() amdgpu_ras_feature_enable_on_boot() amdgpu_ras_resume() amdgpu_ras_enable_all_features() 4. amdgpu_driver_unload_kms() amdgpu_device_fini_hw() amdgpu_ras_early_fini() ras_blocks[i].ras_early_fini() +++ ip_blocks[i].early_fini() +++ ip_blocks[i].status.late_initialized = false ip_blocks[i].hw_fini() ip_blocks[i].status.hw = false 5. amdgpu_driver_release_kms() amdgpu_device_fini_sw() amdgpu_device_ip_fini() ip_blocks[i].sw_fini() ip_blocks[i].status.sw = false --- ip_blocks[i].status.valid = false +++ amdgpu_ras_fini() ip_blocks[i].late_fini() +++ ip_blocks[i].status.valid = false --- ip_blocks[i].status.late_initialized = false --- amdgpu_ras_fini() The main changes include: 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend(). Currently there's only one ip block which provides `early_fini` callback. We have add a check of `in_s3` to keep current behavior in function amdgpu_dm_early_fini(). So there should be no functional changes. 2) set ip_blocks[i].status.late_initialized to false after calling callback `early_fini`. We have auditted all usages of the late_initialized flag and no functional changes found. 3) only set ip_blocks[i].status.valid = false after calling the `late_fini` callback. 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini. Then we try to refine each subsystem, such as nbio, asic, gfx, gmc, ras etc, to follow the new design. Currently we have only taken the nbio and asic as examples to show the proposed changes. Once we have confirmed that's the right way to go, we will handle the lefting subsystems. This is in early stage and requesting for comments, any comments and suggestions are welcomed! Jiang Liu (13): amdgpu: wrong array index to get ip block for PSP drm/admgpu: add helper functions to track status for ras manager drm/amdgpu: add a flag to track ras debugfs creation status drm/amdgpu: free all resources on error recovery path of amdgpu_ras_init() drm/amdgpu: introduce a flag to track refcount held for features drm/amdgpu: enhance amdgpu_ras_block_late_fini() drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() drm/amdgpu: make IP block state machine works in stack like way drm/admgpu: make device state machine work in stack like way drm/amdgpu/sdma: improve the way to manage irq reference count drm/amdgpu/nbio: improve the way to manage irq reference count drm/amdgpu/asic: make ip block operations symmetric by .early_fini() drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++----- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +- drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 + drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 + drivers/gpu/drm/amd/amdgpu/nv.c | 14 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 - drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +-- drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++--- drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++-- drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 + 25 files changed, 326 insertions(+), 118 deletions(-) -- 2.43.5 ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-01-08 13:59 Jiang Liu @ 2025-01-08 14:10 ` Christian König 2025-01-08 16:33 ` Re: Mario Limonciello 1 sibling, 0 replies; 50+ messages in thread From: Christian König @ 2025-01-08 14:10 UTC (permalink / raw) To: Jiang Liu, alexander.deucher, Xinhui.Pan, airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, mario.limonciello, Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx Am 08.01.25 um 14:59 schrieb Jiang Liu: > Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume > > Recently we were testing suspend/resume functionality with AMD GPUs, > we have encountered several resource tracking related bugs, such as > double buffer free, use after free and unbalanced irq reference count. > > We have tried to solve these issues case by case, but found that may > not be the right way. Especially about the unbalanced irq reference > count, there will be new issues appear once we fixed the current known > issues. After analyzing related source code, we found that there may be > some fundamental implementaion flaws behind these resource tracking > issues. In general please run your patches through checkpatch.pl. There are quite a number of style issues with those code changes. > > The amdgpu driver has two major state machines to driver the device > management flow, one is for ip blocks, the other is for ras blocks. > The hook points defined in struct amd_ip_funcs for device setup/teardown > are symmetric, but the implementation is asymmetric, sometime even > ambiguous. The most obvious two issues we noticed are: > 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() > are called from .hw_fini() instead of .early_fini(). Yes and if I remember correctly that is absolutely intentional. IRQs can't be enabled unless all IP blocks are up and running because otherwise the IRQ handler sometimes doesn't have the necessary functionality at hand. But for HW fini we only disable IRQs before we actually tear down the HW state because we need them for operation feedback. E.g. for example ring buffer completion interrupts for tear down commands. Regards, Christian. > 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't > match the way to set those flags. > > When taking device suspend/resume into account, in addition to device > probe/remove, things get much more complex. Some issues arise because > many suspend/resume implementations directly reuse .hw_init/.hw_fini/ > .late_init hook points. > > So we try to fix those issues by two enhancements/refinements to current > device management state machines. > > The first change is to make the ip block state machine and associated > status flags work in stack-like way as below: > Callback Status Flags > early_init: valid = true > sw_init: sw = true > hw_init: hw = true > late_init: late_initialized = true > early_fini: late_initialized = false > hw_fini: hw = false > sw_fini: sw = false > late_fini: valid = false > > Also do the same thing for ras block state machine, though it's much > more simpler. > > The second change is fine tune the overall device management work > flow as below: > 1. amdgpu_driver_load_kms() > amdgpu_device_init() > amdgpu_device_ip_early_init() > ip_blocks[i].early_init() > ip_blocks[i].status.valid = true > amdgpu_device_ip_init() > amdgpu_ras_init() > ip_blocks[i].sw_init() > ip_blocks[i].status.sw = true > ip_blocks[i].hw_init() > ip_blocks[i].status.hw = true > amdgpu_device_ip_late_init() > ip_blocks[i].late_init() > ip_blocks[i].status.late_initialized = true > amdgpu_ras_late_init() > ras_blocks[i].ras_late_init() > amdgpu_ras_feature_enable_on_boot() > > 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff() > amdgpu_device_suspend() > amdgpu_ras_early_fini() > ras_blocks[i].ras_early_fini() > amdgpu_ras_feature_disable() > amdgpu_ras_suspend() > amdgpu_ras_disable_all_features() > +++ ip_blocks[i].early_fini() > +++ ip_blocks[i].status.late_initialized = false > ip_blocks[i].suspend() > > 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore() > amdgpu_device_resume() > amdgpu_device_ip_resume() > ip_blocks[i].resume() > amdgpu_device_ip_late_init() > ip_blocks[i].late_init() > ip_blocks[i].status.late_initialized = true > amdgpu_ras_late_init() > ras_blocks[i].ras_late_init() > amdgpu_ras_feature_enable_on_boot() > amdgpu_ras_resume() > amdgpu_ras_enable_all_features() > > 4. amdgpu_driver_unload_kms() > amdgpu_device_fini_hw() > amdgpu_ras_early_fini() > ras_blocks[i].ras_early_fini() > +++ ip_blocks[i].early_fini() > +++ ip_blocks[i].status.late_initialized = false > ip_blocks[i].hw_fini() > ip_blocks[i].status.hw = false > > 5. amdgpu_driver_release_kms() > amdgpu_device_fini_sw() > amdgpu_device_ip_fini() > ip_blocks[i].sw_fini() > ip_blocks[i].status.sw = false > --- ip_blocks[i].status.valid = false > +++ amdgpu_ras_fini() > ip_blocks[i].late_fini() > +++ ip_blocks[i].status.valid = false > --- ip_blocks[i].status.late_initialized = false > --- amdgpu_ras_fini() > > The main changes include: > 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend(). > Currently there's only one ip block which provides `early_fini` > callback. We have add a check of `in_s3` to keep current behavior in > function amdgpu_dm_early_fini(). So there should be no functional > changes. > 2) set ip_blocks[i].status.late_initialized to false after calling > callback `early_fini`. We have auditted all usages of the > late_initialized flag and no functional changes found. > 3) only set ip_blocks[i].status.valid = false after calling the > `late_fini` callback. > 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini. > > Then we try to refine each subsystem, such as nbio, asic, gfx, gmc, > ras etc, to follow the new design. Currently we have only taken the > nbio and asic as examples to show the proposed changes. Once we have > confirmed that's the right way to go, we will handle the lefting > subsystems. > > This is in early stage and requesting for comments, any comments and > suggestions are welcomed! > Jiang Liu (13): > amdgpu: wrong array index to get ip block for PSP > drm/admgpu: add helper functions to track status for ras manager > drm/amdgpu: add a flag to track ras debugfs creation status > drm/amdgpu: free all resources on error recovery path of > amdgpu_ras_init() > drm/amdgpu: introduce a flag to track refcount held for features > drm/amdgpu: enhance amdgpu_ras_block_late_fini() > drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR > drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() > drm/amdgpu: make IP block state machine works in stack like way > drm/admgpu: make device state machine work in stack like way > drm/amdgpu/sdma: improve the way to manage irq reference count > drm/amdgpu/nbio: improve the way to manage irq reference count > drm/amdgpu/asic: make ip block operations symmetric by .early_fini() > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++----- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 + > drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- > drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +- > drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 + > drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 + > drivers/gpu/drm/amd/amdgpu/nv.c | 14 +- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 - > drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +-- > drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++--- > drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++-- > drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 + > 25 files changed, 326 insertions(+), 118 deletions(-) > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-01-08 13:59 Jiang Liu 2025-01-08 14:10 ` Christian König @ 2025-01-08 16:33 ` Mario Limonciello 2025-01-09 5:34 ` Re: Gerry Liu 1 sibling, 1 reply; 50+ messages in thread From: Mario Limonciello @ 2025-01-08 16:33 UTC (permalink / raw) To: Jiang Liu, alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona, sunil.khatri, lijo.lazar, Hawking.Zhang, Jun.Ma2, xiaogang.chen, Kent.Russell, shuox.liu, amd-gfx On 1/8/2025 07:59, Jiang Liu wrote: > Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread. > > Recently we were testing suspend/resume functionality with AMD GPUs, > we have encountered several resource tracking related bugs, such as > double buffer free, use after free and unbalanced irq reference count. Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU? Are they only with SRIOV? Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs? I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume. > > We have tried to solve these issues case by case, but found that may > not be the right way. Especially about the unbalanced irq reference > count, there will be new issues appear once we fixed the current known > issues. After analyzing related source code, we found that there may be > some fundamental implementaion flaws behind these resource tracking implementation > issues. > > The amdgpu driver has two major state machines to driver the device > management flow, one is for ip blocks, the other is for ras blocks. > The hook points defined in struct amd_ip_funcs for device setup/teardown > are symmetric, but the implementation is asymmetric, sometime even > ambiguous. The most obvious two issues we noticed are: > 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() > are called from .hw_fini() instead of .early_fini(). > 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't > match the way to set those flags. > > When taking device suspend/resume into account, in addition to device > probe/remove, things get much more complex. Some issues arise because > many suspend/resume implementations directly reuse .hw_init/.hw_fini/ > .late_init hook points. > > So we try to fix those issues by two enhancements/refinements to current > device management state machines. > > The first change is to make the ip block state machine and associated > status flags work in stack-like way as below: > Callback Status Flags > early_init: valid = true > sw_init: sw = true > hw_init: hw = true > late_init: late_initialized = true > early_fini: late_initialized = false > hw_fini: hw = false > sw_fini: sw = false > late_fini: valid = false At a high level this makes sense to me, but I'd just call 'late' or 'late_init'. Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable. > > Also do the same thing for ras block state machine, though it's much > more simpler. > > The second change is fine tune the overall device management work > flow as below: > 1. amdgpu_driver_load_kms() > amdgpu_device_init() > amdgpu_device_ip_early_init() > ip_blocks[i].early_init() > ip_blocks[i].status.valid = true > amdgpu_device_ip_init() > amdgpu_ras_init() > ip_blocks[i].sw_init() > ip_blocks[i].status.sw = true > ip_blocks[i].hw_init() > ip_blocks[i].status.hw = true > amdgpu_device_ip_late_init() > ip_blocks[i].late_init() > ip_blocks[i].status.late_initialized = true > amdgpu_ras_late_init() > ras_blocks[i].ras_late_init() > amdgpu_ras_feature_enable_on_boot() > > 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff() > amdgpu_device_suspend() > amdgpu_ras_early_fini() > ras_blocks[i].ras_early_fini() > amdgpu_ras_feature_disable() > amdgpu_ras_suspend() > amdgpu_ras_disable_all_features() > +++ ip_blocks[i].early_fini() > +++ ip_blocks[i].status.late_initialized = false > ip_blocks[i].suspend() > > 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore() > amdgpu_device_resume() > amdgpu_device_ip_resume() > ip_blocks[i].resume() > amdgpu_device_ip_late_init() > ip_blocks[i].late_init() > ip_blocks[i].status.late_initialized = true > amdgpu_ras_late_init() > ras_blocks[i].ras_late_init() > amdgpu_ras_feature_enable_on_boot() > amdgpu_ras_resume() > amdgpu_ras_enable_all_features() > > 4. amdgpu_driver_unload_kms() > amdgpu_device_fini_hw() > amdgpu_ras_early_fini() > ras_blocks[i].ras_early_fini() > +++ ip_blocks[i].early_fini() > +++ ip_blocks[i].status.late_initialized = false > ip_blocks[i].hw_fini() > ip_blocks[i].status.hw = false > > 5. amdgpu_driver_release_kms() > amdgpu_device_fini_sw() > amdgpu_device_ip_fini() > ip_blocks[i].sw_fini() > ip_blocks[i].status.sw = false > --- ip_blocks[i].status.valid = false > +++ amdgpu_ras_fini() > ip_blocks[i].late_fini() > +++ ip_blocks[i].status.valid = false > --- ip_blocks[i].status.late_initialized = false > --- amdgpu_ras_fini() > > The main changes include: > 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend(). > Currently there's only one ip block which provides `early_fini` > callback. We have add a check of `in_s3` to keep current behavior in > function amdgpu_dm_early_fini(). So there should be no functional > changes. > 2) set ip_blocks[i].status.late_initialized to false after calling > callback `early_fini`. We have auditted all usages of the > late_initialized flag and no functional changes found. > 3) only set ip_blocks[i].status.valid = false after calling the > `late_fini` callback. > 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini. > > Then we try to refine each subsystem, such as nbio, asic, gfx, gmc, > ras etc, to follow the new design. Currently we have only taken the > nbio and asic as examples to show the proposed changes. Once we have > confirmed that's the right way to go, we will handle the lefting > subsystems. > > This is in early stage and requesting for comments, any comments and > suggestions are welcomed! > Jiang Liu (13): > amdgpu: wrong array index to get ip block for PSP > drm/admgpu: add helper functions to track status for ras manager > drm/amdgpu: add a flag to track ras debugfs creation status > drm/amdgpu: free all resources on error recovery path of > amdgpu_ras_init() > drm/amdgpu: introduce a flag to track refcount held for features > drm/amdgpu: enhance amdgpu_ras_block_late_fini() > drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR > drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() > drm/amdgpu: make IP block state machine works in stack like way > drm/admgpu: make device state machine work in stack like way > drm/amdgpu/sdma: improve the way to manage irq reference count > drm/amdgpu/nbio: improve the way to manage irq reference count > drm/amdgpu/asic: make ip block operations symmetric by .early_fini() > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++----- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++- > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 + > drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- > drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +- > drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 + > drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 + > drivers/gpu/drm/amd/amdgpu/nv.c | 14 +- > drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 - > drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +-- > drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++--- > drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++-- > drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 + > 25 files changed, 326 insertions(+), 118 deletions(-) > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-01-08 16:33 ` Re: Mario Limonciello @ 2025-01-09 5:34 ` Gerry Liu 2025-01-09 17:10 ` Re: Mario Limonciello 0 siblings, 1 reply; 50+ messages in thread From: Gerry Liu @ 2025-01-09 5:34 UTC (permalink / raw) To: Mario Limonciello Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona, sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang, Kent.Russell, Shuo Liu, amd-gfx [-- Attachment #1: Type: text/plain, Size: 10505 bytes --] > 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com> 写道: > > On 1/8/2025 07:59, Jiang Liu wrote: >> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume > > I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread. Maybe it’s caused by one extra blank line at the header. > >> Recently we were testing suspend/resume functionality with AMD GPUs, >> we have encountered several resource tracking related bugs, such as >> double buffer free, use after free and unbalanced irq reference count. > > Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU? > Are they only with SRIOV? > > Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs? > > I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume. We are investigating to develop some advanced product features based on amdgpu suspend/resume. So we started by tested the suspend/resume functionality of AMD 308x GPUs with the following simple script: ``` echo platform > /sys/power/pm_test i=0 while true; do echo mem > /sys/power/state let i=i+1 echo $i sleep 1 done ``` It succeeds with the first and second iteration but always fails on following iterations on a bare metal servers with eight MI308X GPUs. With some investigation we found that the gpu asic should be reset during the test, so we submitted a patch to fix the failure (https://github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ROCK-Kernel-Driver/pull/181>) During analyze and root-cause the failure, we have encountered several crashes, resource leakages and false alarms. So I have worked out patch sets to solve issues we encountered. The other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025-January/118484.html <https://lists.freedesktop.org/archives/amd-gfx/2025-January/118484.html> With sriov in single VF mode, resume always fails. Seems some contexts/vram buffers get lost during suspend and haven’t be restored on resume, so cause failure. We haven’t tested sriov in multiple VFs mode yet. We need more help from AMD side to make SR work for SRIOV:) > >> We have tried to solve these issues case by case, but found that may >> not be the right way. Especially about the unbalanced irq reference >> count, there will be new issues appear once we fixed the current known >> issues. After analyzing related source code, we found that there may be >> some fundamental implementaion flaws behind these resource tracking > > implementation > >> issues. >> The amdgpu driver has two major state machines to driver the device >> management flow, one is for ip blocks, the other is for ras blocks. >> The hook points defined in struct amd_ip_funcs for device setup/teardown >> are symmetric, but the implementation is asymmetric, sometime even >> ambiguous. The most obvious two issues we noticed are: >> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() >> are called from .hw_fini() instead of .early_fini(). >> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't >> match the way to set those flags. >> When taking device suspend/resume into account, in addition to device >> probe/remove, things get much more complex. Some issues arise because >> many suspend/resume implementations directly reuse .hw_init/.hw_fini/ >> .late_init hook points. >> >> So we try to fix those issues by two enhancements/refinements to current >> device management state machines. >> The first change is to make the ip block state machine and associated >> status flags work in stack-like way as below: >> Callback Status Flags >> early_init: valid = true >> sw_init: sw = true >> hw_init: hw = true >> late_init: late_initialized = true >> early_fini: late_initialized = false >> hw_fini: hw = false >> sw_fini: sw = false >> late_fini: valid = false > > At a high level this makes sense to me, but I'd just call 'late' or 'late_init'. > > Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable. I will add a patch to convert those bool flags into an enum. Thanks, Gerry > >> Also do the same thing for ras block state machine, though it's much >> more simpler. >> The second change is fine tune the overall device management work >> flow as below: >> 1. amdgpu_driver_load_kms() >> amdgpu_device_init() >> amdgpu_device_ip_early_init() >> ip_blocks[i].early_init() >> ip_blocks[i].status.valid = true >> amdgpu_device_ip_init() >> amdgpu_ras_init() >> ip_blocks[i].sw_init() >> ip_blocks[i].status.sw = true >> ip_blocks[i].hw_init() >> ip_blocks[i].status.hw = true >> amdgpu_device_ip_late_init() >> ip_blocks[i].late_init() >> ip_blocks[i].status.late_initialized = true >> amdgpu_ras_late_init() >> ras_blocks[i].ras_late_init() >> amdgpu_ras_feature_enable_on_boot() >> 2. amdgpu_pmops_suspend()/amdgpu_pmops_freeze()/amdgpu_pmops_poweroff() >> amdgpu_device_suspend() >> amdgpu_ras_early_fini() >> ras_blocks[i].ras_early_fini() >> amdgpu_ras_feature_disable() >> amdgpu_ras_suspend() >> amdgpu_ras_disable_all_features() >> +++ ip_blocks[i].early_fini() >> +++ ip_blocks[i].status.late_initialized = false >> ip_blocks[i].suspend() >> 3. amdgpu_pmops_resume()/amdgpu_pmops_thaw()/amdgpu_pmops_restore() >> amdgpu_device_resume() >> amdgpu_device_ip_resume() >> ip_blocks[i].resume() >> amdgpu_device_ip_late_init() >> ip_blocks[i].late_init() >> ip_blocks[i].status.late_initialized = true >> amdgpu_ras_late_init() >> ras_blocks[i].ras_late_init() >> amdgpu_ras_feature_enable_on_boot() >> amdgpu_ras_resume() >> amdgpu_ras_enable_all_features() >> 4. amdgpu_driver_unload_kms() >> amdgpu_device_fini_hw() >> amdgpu_ras_early_fini() >> ras_blocks[i].ras_early_fini() >> +++ ip_blocks[i].early_fini() >> +++ ip_blocks[i].status.late_initialized = false >> ip_blocks[i].hw_fini() >> ip_blocks[i].status.hw = false >> 5. amdgpu_driver_release_kms() >> amdgpu_device_fini_sw() >> amdgpu_device_ip_fini() >> ip_blocks[i].sw_fini() >> ip_blocks[i].status.sw = false >> --- ip_blocks[i].status.valid = false >> +++ amdgpu_ras_fini() >> ip_blocks[i].late_fini() >> +++ ip_blocks[i].status.valid = false >> --- ip_blocks[i].status.late_initialized = false >> --- amdgpu_ras_fini() >> The main changes include: >> 1) invoke ip_blocks[i].early_fini in amdgpu_pmops_suspend(). >> Currently there's only one ip block which provides `early_fini` >> callback. We have add a check of `in_s3` to keep current behavior in >> function amdgpu_dm_early_fini(). So there should be no functional >> changes. >> 2) set ip_blocks[i].status.late_initialized to false after calling >> callback `early_fini`. We have auditted all usages of the >> late_initialized flag and no functional changes found. >> 3) only set ip_blocks[i].status.valid = false after calling the >> `late_fini` callback. >> 4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini. >> Then we try to refine each subsystem, such as nbio, asic, gfx, gmc, >> ras etc, to follow the new design. Currently we have only taken the >> nbio and asic as examples to show the proposed changes. Once we have >> confirmed that's the right way to go, we will handle the lefting >> subsystems. >> This is in early stage and requesting for comments, any comments and >> suggestions are welcomed! >> Jiang Liu (13): >> amdgpu: wrong array index to get ip block for PSP >> drm/admgpu: add helper functions to track status for ras manager >> drm/amdgpu: add a flag to track ras debugfs creation status >> drm/amdgpu: free all resources on error recovery path of >> amdgpu_ras_init() >> drm/amdgpu: introduce a flag to track refcount held for features >> drm/amdgpu: enhance amdgpu_ras_block_late_fini() >> drm/amdgpu: enhance amdgpu_ras_pre_fini() to better support SR >> drm/admgpu: rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini() >> drm/amdgpu: make IP block state machine works in stack like way >> drm/admgpu: make device state machine work in stack like way >> drm/amdgpu/sdma: improve the way to manage irq reference count >> drm/amdgpu/nbio: improve the way to manage irq reference count >> drm/amdgpu/asic: make ip block operations symmetric by .early_fini() >> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 40 +++++ >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 37 ++++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 144 +++++++++++++----- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 16 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 26 +++- >> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 2 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 +- >> drivers/gpu/drm/amd/amdgpu/nbio_v7_4.c | 1 + >> drivers/gpu/drm/amd/amdgpu/nbio_v7_9.c | 1 + >> drivers/gpu/drm/amd/amdgpu/nv.c | 14 +- >> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 - >> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 23 +-- >> drivers/gpu/drm/amd/amdgpu/soc15.c | 38 ++--- >> drivers/gpu/drm/amd/amdgpu/soc21.c | 35 +++-- >> drivers/gpu/drm/amd/amdgpu/soc24.c | 17 ++- >> .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 + >> 25 files changed, 326 insertions(+), 118 deletions(-) [-- Attachment #2: Type: text/html, Size: 38668 bytes --] ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-01-09 5:34 ` Re: Gerry Liu @ 2025-01-09 17:10 ` Mario Limonciello 2025-01-13 1:19 ` Re: Gerry Liu 0 siblings, 1 reply; 50+ messages in thread From: Mario Limonciello @ 2025-01-09 17:10 UTC (permalink / raw) To: Gerry Liu Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona, sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang, Kent.Russell, Shuo Liu, amd-gfx General note - don't use HTML for mailing list communication. I'm not sure if Apple Mail lets you switch this around. If not, you might try using Thunderbird instead. You can pick to reply in plain text or HTML by holding shift when you hit "reply all" For my reply I'll convert my reply to plain text, please see inline below. On 1/8/2025 23:34, Gerry Liu wrote: > > >> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com >> <mailto:mario.limonciello@amd.com>> 写道: >> >> On 1/8/2025 07:59, Jiang Liu wrote: >>> Subject: [RFC PATCH 00/13] Enhance device state machine to better >>> support suspend/resume >> >> I'm not sure how this happened, but your subject didn't end up in the >> subject of the thread on patch 0 so the thread just looks like an >> unsubjected thread. > Maybe it’s caused by one extra blank line at the header. Yeah that might be it. Hopefully it doesn't happen on v2. > >> >>> Recently we were testing suspend/resume functionality with AMD GPUs, >>> we have encountered several resource tracking related bugs, such as >>> double buffer free, use after free and unbalanced irq reference count. >> >> Can you share more aobut how you were hitting these issues? Are they >> specific to S3 or to s2idle flows? dGPU or APU? >> Are they only with SRIOV? >> >> Is there anything to do with the host influencing the failures to >> happen, or are you contriving the failures to find the bugs? >> >> I know we've had some reports about resource tracking warnings on the >> reset flows, but I haven't heard much about suspend/resume. > We are investigating to develop some advanced product features based on > amdgpu suspend/resume. > So we started by tested the suspend/resume functionality of AMD 308x > GPUs with the following simple script: > ``` > echoplatform >/sys/power/pm_test > i=0 > while true; do > echomem >/sys/power/state > leti=i+1 > echo$i > sleep1 > done > ``` > > It succeeds with the first and second iteration but always fails on > following iterations on a bare metal servers with eight MI308X GPUs. Can you share more about this server? Does it support suspend to ram or a hardware backed suspend to idle? If you don't know, you can check like this: ❯ cat /sys/power/mem_sleep s2idle [deep] If it's suspend to idle, what does the FACP indicate? You can do this check to find out if you don't know. ❯ sudo cp /sys/firmware/acpi/tables/FACP /tmp ❯ sudo iasl -d /tmp/FACP ❯ grep "idle" -i /tmp/FACP.dsl Low Power S0 Idle (V5) : 0 > With some investigation we found that the gpu asic should be reset > during the test, Yeah; but this comes back to my above questions. Typically there is an assumption that the power rails are going to be cut in system suspend. If that doesn't hold true, then you're doing a pure software suspend and have found a series of issues in the driver with how that's handled. > so we submitted a patch to fix the failure (https:// > github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ > ROCK-Kernel-Driver/pull/181>) Typically kernel patches don't go through that repo, they're discussed on the mailing lists. Can you bring this patch for discussion on amd-gfx? > > During analyze and root-cause the failure, we have encountered several > crashes, resource leakages and false alarms. Yeah; I think you found some real issues. > So I have worked out patch sets to solve issues we encountered. The > other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025- > January/118484.html <https://lists.freedesktop.org/archives/amd- > gfx/2025-January/118484.html> Thanks! > > With sriov in single VF mode, resume always fails. Seems some contexts/ > vram buffers get lost during suspend and haven’t be restored on resume, > so cause failure. > We haven’t tested sriov in multiple VFs mode yet. We need more help from > AMD side to make SR work for SRIOV:) > >> >>> We have tried to solve these issues case by case, but found that may >>> not be the right way. Especially about the unbalanced irq reference >>> count, there will be new issues appear once we fixed the current known >>> issues. After analyzing related source code, we found that there may be >>> some fundamental implementaion flaws behind these resource tracking >> >> implementation >> >>> issues. >>> The amdgpu driver has two major state machines to driver the device >>> management flow, one is for ip blocks, the other is for ras blocks. >>> The hook points defined in struct amd_ip_funcs for device setup/teardown >>> are symmetric, but the implementation is asymmetric, sometime even >>> ambiguous. The most obvious two issues we noticed are: >>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() >>> are called from .hw_fini() instead of .early_fini(). >>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't >>> match the way to set those flags. >>> When taking device suspend/resume into account, in addition to device >>> probe/remove, things get much more complex. Some issues arise because >>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/ >>> .late_init hook points. >>> >>> So we try to fix those issues by two enhancements/refinements to current >>> device management state machines. >>> The first change is to make the ip block state machine and associated >>> status flags work in stack-like way as below: >>> Callback Status Flags >>> early_init: valid = true >>> sw_init: sw = true >>> hw_init: hw = true >>> late_init: late_initialized = true >>> early_fini: late_initialized = false >>> hw_fini: hw = false >>> sw_fini: sw = false >>> late_fini: valid = false >> >> At a high level this makes sense to me, but I'd just call 'late' or >> 'late_init'. >> >> Another idea if you make it stack like is to do it as a true enum for >> the state machine and store it all in one variable. > I will add a patch to convert those bool flags into an enum. Thanks! ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-01-09 17:10 ` Re: Mario Limonciello @ 2025-01-13 1:19 ` Gerry Liu 2025-01-13 21:59 ` Re: Mario Limonciello 0 siblings, 1 reply; 50+ messages in thread From: Gerry Liu @ 2025-01-13 1:19 UTC (permalink / raw) To: Mario Limonciello Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona, sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang, Kent.Russell, Shuo Liu, amd-gfx > 2025年1月10日 01:10,Mario Limonciello <mario.limonciello@amd.com> 写道: > > General note - don't use HTML for mailing list communication. > > I'm not sure if Apple Mail lets you switch this around. > > If not, you might try using Thunderbird instead. You can pick to reply in plain text or HTML by holding shift when you hit "reply all" > > For my reply I'll convert my reply to plain text, please see inline below. > > On 1/8/2025 23:34, Gerry Liu wrote: >>> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com <mailto:mario.limonciello@amd.com>> 写道: >>> >>> On 1/8/2025 07:59, Jiang Liu wrote: >>>> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume >>> >>> I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread. >> Maybe it’s caused by one extra blank line at the header. > > Yeah that might be it. Hopefully it doesn't happen on v2. > >>> >>>> Recently we were testing suspend/resume functionality with AMD GPUs, >>>> we have encountered several resource tracking related bugs, such as >>>> double buffer free, use after free and unbalanced irq reference count. >>> >>> Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU? >>> Are they only with SRIOV? >>> >>> Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs? >>> >>> I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume. >> We are investigating to develop some advanced product features based on amdgpu suspend/resume. >> So we started by tested the suspend/resume functionality of AMD 308x GPUs with the following simple script: >> ``` >> echoplatform >/sys/power/pm_test >> i=0 >> while true; do >> echomem >/sys/power/state >> leti=i+1 >> echo$i >> sleep1 >> done >> ``` >> It succeeds with the first and second iteration but always fails on following iterations on a bare metal servers with eight MI308X GPUs. > > Can you share more about this server? Does it support suspend to ram or a hardware backed suspend to idle? If you don't know, you can check like this: > > ❯ cat /sys/power/mem_sleep > s2idle [deep] # cat /sys/power/mem_sleep [s2idle] > > If it's suspend to idle, what does the FACP indicate? You can do this check to find out if you don't know. > > ❯ sudo cp /sys/firmware/acpi/tables/FACP /tmp > ❯ sudo iasl -d /tmp/FACP > ❯ grep "idle" -i /tmp/FACP.dsl > Low Power S0 Idle (V5) : 0 > With acpidump and `iasl -d facp.data`, we got: [070h 0112 4] Flags (decoded below) : 000084A5 WBINVD instruction is operational (V1) : 1 WBINVD flushes all caches (V1) : 0 All CPUs support C1 (V1) : 1 C2 works on MP system (V1) : 0 Control Method Power Button (V1) : 0 Control Method Sleep Button (V1) : 1 RTC wake not in fixed reg space (V1) : 0 RTC can wake system from S4 (V1) : 1 32-bit PM Timer (V1) : 0 Docking Supported (V1) : 0 Reset Register Supported (V2) : 1 Sealed Case (V3) : 0 Headless - No Video (V3) : 0 Use native instr after SLP_TYPx (V3) : 0 PCIEXP_WAK Bits Supported (V4) : 0 Use Platform Timer (V4) : 1 RTC_STS valid on S4 wake (V4) : 0 Remote Power-on capable (V4) : 0 Use APIC Cluster Model (V4) : 0 Use APIC Physical Destination Mode (V4) : 0 Hardware Reduced (V5) : 0 Low Power S0 Idle (V5) : 0 >> With some investigation we found that the gpu asic should be reset during the test, > > Yeah; but this comes back to my above questions. Typically there is an assumption that the power rails are going to be cut in system suspend. > > If that doesn't hold true, then you're doing a pure software suspend and have found a series of issues in the driver with how that's handled. Yeah, we are trying to do a `pure software suspend`, letting hypervisor to save/restore system images instead of guest OS. And during the suspend process, we hope we can cancel the suspend request at any later stage. We cancel suspend at late stages, it does behave like a pure software suspend. > >> so we submitted a patch to fix the failure (https:// github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ ROCK-Kernel-Driver/pull/181>) > > Typically kernel patches don't go through that repo, they're discussed on the mailing lists. Can you bring this patch for discussion on amd-gfx? Will post to amd-gfx after solving the conflicts. Regards, Gerry > >> During analyze and root-cause the failure, we have encountered several crashes, resource leakages and false alarms. > > Yeah; I think you found some real issues. > >> So I have worked out patch sets to solve issues we encountered. The other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025- January/118484.html <https://lists.freedesktop.org/archives/amd- gfx/2025-January/118484.html> > > Thanks! > >> With sriov in single VF mode, resume always fails. Seems some contexts/ vram buffers get lost during suspend and haven’t be restored on resume, so cause failure. >> We haven’t tested sriov in multiple VFs mode yet. We need more help from AMD side to make SR work for SRIOV:) >>> >>>> We have tried to solve these issues case by case, but found that may >>>> not be the right way. Especially about the unbalanced irq reference >>>> count, there will be new issues appear once we fixed the current known >>>> issues. After analyzing related source code, we found that there may be >>>> some fundamental implementaion flaws behind these resource tracking >>> >>> implementation >>> >>>> issues. >>>> The amdgpu driver has two major state machines to driver the device >>>> management flow, one is for ip blocks, the other is for ras blocks. >>>> The hook points defined in struct amd_ip_funcs for device setup/teardown >>>> are symmetric, but the implementation is asymmetric, sometime even >>>> ambiguous. The most obvious two issues we noticed are: >>>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() >>>> are called from .hw_fini() instead of .early_fini(). >>>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't >>>> match the way to set those flags. >>>> When taking device suspend/resume into account, in addition to device >>>> probe/remove, things get much more complex. Some issues arise because >>>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/ >>>> .late_init hook points. >>>> >>>> So we try to fix those issues by two enhancements/refinements to current >>>> device management state machines. >>>> The first change is to make the ip block state machine and associated >>>> status flags work in stack-like way as below: >>>> Callback Status Flags >>>> early_init: valid = true >>>> sw_init: sw = true >>>> hw_init: hw = true >>>> late_init: late_initialized = true >>>> early_fini: late_initialized = false >>>> hw_fini: hw = false >>>> sw_fini: sw = false >>>> late_fini: valid = false >>> >>> At a high level this makes sense to me, but I'd just call 'late' or 'late_init'. >>> >>> Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable. >> I will add a patch to convert those bool flags into an enum. > > Thanks! ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2025-01-13 1:19 ` Re: Gerry Liu @ 2025-01-13 21:59 ` Mario Limonciello 0 siblings, 0 replies; 50+ messages in thread From: Mario Limonciello @ 2025-01-13 21:59 UTC (permalink / raw) To: Gerry Liu Cc: alexander.deucher, christian.koenig, Xinhui.Pan, airlied, simona, sunil.khatri, Lazar, Lijo, Hawking.Zhang, Chen, Xiaogang, Kent.Russell, Shuo Liu, amd-gfx On 1/12/2025 19:19, Gerry Liu wrote: > > >> 2025年1月10日 01:10,Mario Limonciello <mario.limonciello@amd.com> 写道: >> >> General note - don't use HTML for mailing list communication. >> >> I'm not sure if Apple Mail lets you switch this around. >> >> If not, you might try using Thunderbird instead. You can pick to reply in plain text or HTML by holding shift when you hit "reply all" >> >> For my reply I'll convert my reply to plain text, please see inline below. >> >> On 1/8/2025 23:34, Gerry Liu wrote: >>>> 2025年1月9日 00:33,Mario Limonciello <mario.limonciello@amd.com <mailto:mario.limonciello@amd.com>> 写道: >>>> >>>> On 1/8/2025 07:59, Jiang Liu wrote: >>>>> Subject: [RFC PATCH 00/13] Enhance device state machine to better support suspend/resume >>>> >>>> I'm not sure how this happened, but your subject didn't end up in the subject of the thread on patch 0 so the thread just looks like an unsubjected thread. >>> Maybe it’s caused by one extra blank line at the header. >> >> Yeah that might be it. Hopefully it doesn't happen on v2. >> >>>> >>>>> Recently we were testing suspend/resume functionality with AMD GPUs, >>>>> we have encountered several resource tracking related bugs, such as >>>>> double buffer free, use after free and unbalanced irq reference count. >>>> >>>> Can you share more aobut how you were hitting these issues? Are they specific to S3 or to s2idle flows? dGPU or APU? >>>> Are they only with SRIOV? >>>> >>>> Is there anything to do with the host influencing the failures to happen, or are you contriving the failures to find the bugs? >>>> >>>> I know we've had some reports about resource tracking warnings on the reset flows, but I haven't heard much about suspend/resume. >>> We are investigating to develop some advanced product features based on amdgpu suspend/resume. >>> So we started by tested the suspend/resume functionality of AMD 308x GPUs with the following simple script: >>> ``` >>> echoplatform >/sys/power/pm_test >>> i=0 >>> while true; do >>> echomem >/sys/power/state >>> leti=i+1 >>> echo$i >>> sleep1 >>> done >>> ``` >>> It succeeds with the first and second iteration but always fails on following iterations on a bare metal servers with eight MI308X GPUs. >> >> Can you share more about this server? Does it support suspend to ram or a hardware backed suspend to idle? If you don't know, you can check like this: >> >> ❯ cat /sys/power/mem_sleep >> s2idle [deep] > # cat /sys/power/mem_sleep > [s2idle] > >> >> If it's suspend to idle, what does the FACP indicate? You can do this check to find out if you don't know. >> >> ❯ sudo cp /sys/firmware/acpi/tables/FACP /tmp >> ❯ sudo iasl -d /tmp/FACP >> ❯ grep "idle" -i /tmp/FACP.dsl >> Low Power S0 Idle (V5) : 0 >> > With acpidump and `iasl -d facp.data`, we got: > [070h 0112 4] Flags (decoded below) : 000084A5 > WBINVD instruction is operational (V1) : 1 > WBINVD flushes all caches (V1) : 0 > All CPUs support C1 (V1) : 1 > C2 works on MP system (V1) : 0 > Control Method Power Button (V1) : 0 > Control Method Sleep Button (V1) : 1 > RTC wake not in fixed reg space (V1) : 0 > RTC can wake system from S4 (V1) : 1 > 32-bit PM Timer (V1) : 0 > Docking Supported (V1) : 0 > Reset Register Supported (V2) : 1 > Sealed Case (V3) : 0 > Headless - No Video (V3) : 0 > Use native instr after SLP_TYPx (V3) : 0 > PCIEXP_WAK Bits Supported (V4) : 0 > Use Platform Timer (V4) : 1 > RTC_STS valid on S4 wake (V4) : 0 > Remote Power-on capable (V4) : 0 > Use APIC Cluster Model (V4) : 0 > Use APIC Physical Destination Mode (V4) : 0 > Hardware Reduced (V5) : 0 > Low Power S0 Idle (V5) : 0 > >>> With some investigation we found that the gpu asic should be reset during the test, >> >> Yeah; but this comes back to my above questions. Typically there is an assumption that the power rails are going to be cut in system suspend. >> >> If that doesn't hold true, then you're doing a pure software suspend and have found a series of issues in the driver with how that's handled. > Yeah, we are trying to do a `pure software suspend`, letting hypervisor to save/restore system images instead of guest OS. > And during the suspend process, we hope we can cancel the suspend request at any later stage. > We cancel suspend at late stages, it does behave like a pure software suspend. > Thanks; this all makes a lot more sense now. This isn't an area that has a lot of coverage right now. Most suspend testing happens with the power being cut and coming back fresh. Will keep this in mind as reviewing your future iterations of your patches. >> >>> so we submitted a patch to fix the failure (https:// github.com/ROCm/ROCK-Kernel-Driver/pull/181 <https://github.com/ROCm/ ROCK-Kernel-Driver/pull/181>) >> >> Typically kernel patches don't go through that repo, they're discussed on the mailing lists. Can you bring this patch for discussion on amd-gfx? > Will post to amd-gfx after solving the conflicts. Thx! > > Regards, > Gerry > >> >>> During analyze and root-cause the failure, we have encountered several crashes, resource leakages and false alarms. >> >> Yeah; I think you found some real issues. >> >>> So I have worked out patch sets to solve issues we encountered. The other patch set is https://lists.freedesktop.org/archives/amd-gfx/2025- January/118484.html <https://lists.freedesktop.org/archives/amd- gfx/2025-January/118484.html> >> >> Thanks! >> >>> With sriov in single VF mode, resume always fails. Seems some contexts/ vram buffers get lost during suspend and haven’t be restored on resume, so cause failure. >>> We haven’t tested sriov in multiple VFs mode yet. We need more help from AMD side to make SR work for SRIOV:) >>>> >>>>> We have tried to solve these issues case by case, but found that may >>>>> not be the right way. Especially about the unbalanced irq reference >>>>> count, there will be new issues appear once we fixed the current known >>>>> issues. After analyzing related source code, we found that there may be >>>>> some fundamental implementaion flaws behind these resource tracking >>>> >>>> implementation >>>> >>>>> issues. >>>>> The amdgpu driver has two major state machines to driver the device >>>>> management flow, one is for ip blocks, the other is for ras blocks. >>>>> The hook points defined in struct amd_ip_funcs for device setup/teardown >>>>> are symmetric, but the implementation is asymmetric, sometime even >>>>> ambiguous. The most obvious two issues we noticed are: >>>>> 1) amdgpu_irq_get() are called from .late_init() but amdgpu_irq_put() >>>>> are called from .hw_fini() instead of .early_fini(). >>>>> 2) the way to reset ip_bloc.status.valid/sw/hw/late_initialized doesn't >>>>> match the way to set those flags. >>>>> When taking device suspend/resume into account, in addition to device >>>>> probe/remove, things get much more complex. Some issues arise because >>>>> many suspend/resume implementations directly reuse .hw_init/.hw_fini/ >>>>> .late_init hook points. >>>>> >>>>> So we try to fix those issues by two enhancements/refinements to current >>>>> device management state machines. >>>>> The first change is to make the ip block state machine and associated >>>>> status flags work in stack-like way as below: >>>>> Callback Status Flags >>>>> early_init: valid = true >>>>> sw_init: sw = true >>>>> hw_init: hw = true >>>>> late_init: late_initialized = true >>>>> early_fini: late_initialized = false >>>>> hw_fini: hw = false >>>>> sw_fini: sw = false >>>>> late_fini: valid = false >>>> >>>> At a high level this makes sense to me, but I'd just call 'late' or 'late_init'. >>>> >>>> Another idea if you make it stack like is to do it as a true enum for the state machine and store it all in one variable. >>> I will add a patch to convert those bool flags into an enum. >> >> Thanks! > ^ permalink raw reply [flat|nested] 50+ messages in thread
* (no subject) @ 2022-09-12 12:36 Christian König 2022-09-13 2:04 ` Alex Deucher 0 siblings, 1 reply; 50+ messages in thread From: Christian König @ 2022-09-12 12:36 UTC (permalink / raw) To: alexander.deucher, amd-gfx Hey Alex, I've decided to split this patch set into two because we still can't figure out where the VCN regressions come from. Ruijing tested them and confirmed that they don't regress VCN. Can you and maybe Felix take a look and review them? Thanks, Christian. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2022-09-12 12:36 Christian König @ 2022-09-13 2:04 ` Alex Deucher 0 siblings, 0 replies; 50+ messages in thread From: Alex Deucher @ 2022-09-13 2:04 UTC (permalink / raw) To: Christian König; +Cc: alexander.deucher, amd-gfx On Mon, Sep 12, 2022 at 8:36 AM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Hey Alex, > > I've decided to split this patch set into two because we still can't > figure out where the VCN regressions come from. > > Ruijing tested them and confirmed that they don't regress VCN. > > Can you and maybe Felix take a look and review them? Looks good to me. Series is: Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > > Thanks, > Christian. > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* (no subject) @ 2020-07-16 21:22 Mauro Rossi 2020-07-20 9:00 ` Christian König 0 siblings, 1 reply; 50+ messages in thread From: Mauro Rossi @ 2020-07-16 21:22 UTC (permalink / raw) To: amd-gfx; +Cc: alexander.deucher, Mauro Rossi, harry.wentland The series adds SI support to AMD DC Changelog: [RFC] Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c [PATCH v2] Rebase on amd-staging-drm-next dated 17-Oct-2018 [PATCH v3] Add support for DCE6 specific headers, ad hoc DCE6 macros, funtions and fixes, rebase on current amd-staging-drm-next Commits [01/27]..[08/27] SI support added in various DC components [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) [PATCH v3 02/27] drm/amd/display: add asics info for SI parts [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) Commits [09/27]..[24/27] DCE6 specific code adaptions [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) Commits [25/27]..[27/27] SI support final enablements [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-16 21:22 Mauro Rossi @ 2020-07-20 9:00 ` Christian König 2020-07-20 9:59 ` Re: Mauro Rossi 0 siblings, 1 reply; 50+ messages in thread From: Christian König @ 2020-07-20 9:00 UTC (permalink / raw) To: Mauro Rossi, amd-gfx; +Cc: alexander.deucher, harry.wentland Hi Mauro, I'm not deep into the whole DC design, so just some general high level comments on the cover letter: 1. Please add a subject line to the cover letter, my spam filter thinks that this is suspicious otherwise. 2. Then you should probably note how well (badly?) is that tested. Since you noted proof of concept it might not even work. 3. How feature complete (HDMI audio?, Freesync?) is it? Apart from that it looks like a rather impressive piece of work :) Cheers, Christian. Am 16.07.20 um 23:22 schrieb Mauro Rossi: > The series adds SI support to AMD DC > > Changelog: > > [RFC] > Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c > > [PATCH v2] > Rebase on amd-staging-drm-next dated 17-Oct-2018 > > [PATCH v3] > Add support for DCE6 specific headers, > ad hoc DCE6 macros, funtions and fixes, > rebase on current amd-staging-drm-next > > > Commits [01/27]..[08/27] SI support added in various DC components > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) > > Commits [09/27]..[24/27] DCE6 specific code adaptions > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) > > > Commits [25/27]..[27/27] SI support final enablements > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-20 9:00 ` Christian König @ 2020-07-20 9:59 ` Mauro Rossi 2020-07-22 2:51 ` Re: Alex Deucher 0 siblings, 1 reply; 50+ messages in thread From: Mauro Rossi @ 2020-07-20 9:59 UTC (permalink / raw) To: christian.koenig; +Cc: Deucher, Alexander, Harry Wentland, amd-gfx Hi Christian, On Mon, Jul 20, 2020 at 11:00 AM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Hi Mauro, > > I'm not deep into the whole DC design, so just some general high level > comments on the cover letter: > > 1. Please add a subject line to the cover letter, my spam filter thinks > that this is suspicious otherwise. My mistake in the editing of covert letter with git send-email, I may have forgot to keep the Subject at the top > > 2. Then you should probably note how well (badly?) is that tested. Since > you noted proof of concept it might not even work. The Changelog is to be read as: [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was just a rebase onto amd-staging-drm-next this series [PATCH v3] has all the known changes required for DCE6 specificity and based on a long offline thread with Alexander Deutcher and past dri-devel chats with Harry Wentland. It was tested for my possibilities of testing with HD7750 and HD7950, with checks in dmesg output for not getting "missing registers/masks" kernel WARNING and with kernel build on Ubuntu 20.04 and with android-x86 The proposal I made to Alex is that AMD testing systems will be used for further regression testing, as part of review and validation for eligibility to amd-staging-drm-next > > 3. How feature complete (HDMI audio?, Freesync?) is it? All the changes in DC impacting DCE8 (dc/dce80 path) were ported to DCE6 (dc/dce60 path) in the last two years from initial submission > > Apart from that it looks like a rather impressive piece of work :) > > Cheers, > Christian. Thanks, please consider that most of the latest DCE6 specific parts were possible due to recent Alex support in getting the correct DCE6 headers, his suggestions and continuous feedback. I would suggest that Alex comments on the proposed next steps to follow. Mauro > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: > > The series adds SI support to AMD DC > > > > Changelog: > > > > [RFC] > > Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c > > > > [PATCH v2] > > Rebase on amd-staging-drm-next dated 17-Oct-2018 > > > > [PATCH v3] > > Add support for DCE6 specific headers, > > ad hoc DCE6 macros, funtions and fixes, > > rebase on current amd-staging-drm-next > > > > > > Commits [01/27]..[08/27] SI support added in various DC components > > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) > > > > Commits [09/27]..[24/27] DCE6 specific code adaptions > > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) > > > > > > Commits [25/27]..[27/27] SI support final enablements > > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) > > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) > > > > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> > > > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-20 9:59 ` Re: Mauro Rossi @ 2020-07-22 2:51 ` Alex Deucher 2020-07-22 7:56 ` Re: Mauro Rossi 0 siblings, 1 reply; 50+ messages in thread From: Alex Deucher @ 2020-07-22 2:51 UTC (permalink / raw) To: Mauro Rossi Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> wrote: > > Hi Christian, > > On Mon, Jul 20, 2020 at 11:00 AM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: > > > > Hi Mauro, > > > > I'm not deep into the whole DC design, so just some general high level > > comments on the cover letter: > > > > 1. Please add a subject line to the cover letter, my spam filter thinks > > that this is suspicious otherwise. > > My mistake in the editing of covert letter with git send-email, > I may have forgot to keep the Subject at the top > > > > > 2. Then you should probably note how well (badly?) is that tested. Since > > you noted proof of concept it might not even work. > > The Changelog is to be read as: > > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was > just a rebase onto amd-staging-drm-next > > this series [PATCH v3] has all the known changes required for DCE6 specificity > and based on a long offline thread with Alexander Deutcher and past > dri-devel chats with Harry Wentland. > > It was tested for my possibilities of testing with HD7750 and HD7950, > with checks in dmesg output for not getting "missing registers/masks" > kernel WARNING > and with kernel build on Ubuntu 20.04 and with android-x86 > > The proposal I made to Alex is that AMD testing systems will be used > for further regression testing, > as part of review and validation for eligibility to amd-staging-drm-next > We will certainly test it once it lands, but presumably this is working on the SI cards you have access to? > > > > 3. How feature complete (HDMI audio?, Freesync?) is it? > > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to > DCE6 (dc/dce60 path) in the last two years from initial submission > > > > > Apart from that it looks like a rather impressive piece of work :) > > > > Cheers, > > Christian. > > Thanks, > please consider that most of the latest DCE6 specific parts were > possible due to recent Alex support in getting the correct DCE6 > headers, > his suggestions and continuous feedback. > > I would suggest that Alex comments on the proposed next steps to follow. The code looks pretty good to me. I'd like to get some feedback from the display team to see if they have any concerns, but beyond that I think we can pull it into the tree and continue improving it there. Do you have a link to a git tree I can pull directly that contains these patches? Is this the right branch? https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next Thanks! Alex > > Mauro > > > > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: > > > The series adds SI support to AMD DC > > > > > > Changelog: > > > > > > [RFC] > > > Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c > > > > > > [PATCH v2] > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 > > > > > > [PATCH v3] > > > Add support for DCE6 specific headers, > > > ad hoc DCE6 macros, funtions and fixes, > > > rebase on current amd-staging-drm-next > > > > > > > > > Commits [01/27]..[08/27] SI support added in various DC components > > > > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) > > > > > > Commits [09/27]..[24/27] DCE6 specific code adaptions > > > > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) > > > > > > > > > Commits [25/27]..[27/27] SI support final enablements > > > > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) > > > > > > > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> > > > > > > _______________________________________________ > > > amd-gfx mailing list > > > amd-gfx@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-22 2:51 ` Re: Alex Deucher @ 2020-07-22 7:56 ` Mauro Rossi 2020-07-24 18:31 ` Re: Alex Deucher 0 siblings, 1 reply; 50+ messages in thread From: Mauro Rossi @ 2020-07-22 7:56 UTC (permalink / raw) To: Alex Deucher Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list [-- Attachment #1.1: Type: text/plain, Size: 6915 bytes --] Hello, re-sending and copying full DL On Wed, Jul 22, 2020 at 4:51 AM Alex Deucher <alexdeucher@gmail.com> wrote: > On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> wrote: > > > > Hi Christian, > > > > On Mon, Jul 20, 2020 at 11:00 AM Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > > > > > > Hi Mauro, > > > > > > I'm not deep into the whole DC design, so just some general high level > > > comments on the cover letter: > > > > > > 1. Please add a subject line to the cover letter, my spam filter thinks > > > that this is suspicious otherwise. > > > > My mistake in the editing of covert letter with git send-email, > > I may have forgot to keep the Subject at the top > > > > > > > > 2. Then you should probably note how well (badly?) is that tested. > Since > > > you noted proof of concept it might not even work. > > > > The Changelog is to be read as: > > > > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was > > just a rebase onto amd-staging-drm-next > > > > this series [PATCH v3] has all the known changes required for DCE6 > specificity > > and based on a long offline thread with Alexander Deutcher and past > > dri-devel chats with Harry Wentland. > > > > It was tested for my possibilities of testing with HD7750 and HD7950, > > with checks in dmesg output for not getting "missing registers/masks" > > kernel WARNING > > and with kernel build on Ubuntu 20.04 and with android-x86 > > > > The proposal I made to Alex is that AMD testing systems will be used > > for further regression testing, > > as part of review and validation for eligibility to amd-staging-drm-next > > > > We will certainly test it once it lands, but presumably this is > working on the SI cards you have access to? > Yes, most of my testing was done with android-x86 Android CTS (EGL, GLES2, GLES3, VK) I am also in contact with a person with Firepro W5130M who is running a piglit session I had bought an HD7850 to test with Pitcairn, but it arrived as defective so I could not test with Pitcair > > > > > > 3. How feature complete (HDMI audio?, Freesync?) is it? > > > > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to > > DCE6 (dc/dce60 path) in the last two years from initial submission > > > > > > > > Apart from that it looks like a rather impressive piece of work :) > > > > > > Cheers, > > > Christian. > > > > Thanks, > > please consider that most of the latest DCE6 specific parts were > > possible due to recent Alex support in getting the correct DCE6 > > headers, > > his suggestions and continuous feedback. > > > > I would suggest that Alex comments on the proposed next steps to follow. > > The code looks pretty good to me. I'd like to get some feedback from > the display team to see if they have any concerns, but beyond that I > think we can pull it into the tree and continue improving it there. > Do you have a link to a git tree I can pull directly that contains > these patches? Is this the right branch? > https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next > > Thanks! > > Alex > The following branch was pushed with the series on top of amd-staging-drm-next https://github.com/maurossi/linux/commits/kernel-5.6_si_drm-next > > > > > Mauro > > > > > > > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: > > > > The series adds SI support to AMD DC > > > > > > > > Changelog: > > > > > > > > [RFC] > > > > Preliminar Proof Of Concept, with DCE8 headers still used in > dce60_resources.c > > > > > > > > [PATCH v2] > > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 > > > > > > > > [PATCH v3] > > > > Add support for DCE6 specific headers, > > > > ad hoc DCE6 macros, funtions and fixes, > > > > rebase on current amd-staging-drm-next > > > > > > > > > > > > Commits [01/27]..[08/27] SI support added in various DC components > > > > > > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) > > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts > > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support > (v9b) > > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) > > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 > > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) > > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) > > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) > > > > > > > > Commits [09/27]..[24/27] DCE6 specific code adaptions > > > > > > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI > parts (v2) > > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 > > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific > macros,functions > > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros > > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific > macros,functions > > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific > macros,functions > > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 > specific macros,functions > > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific > macros,functions > > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific > macros,functions > > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific > macros,functions > > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) > > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling > Horizontal Filter Init > > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 > macros,functions > > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 > specific .cursor_lock > > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 > specific functions > > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) > > > > > > > > > > > > Commits [25/27]..[27/27] SI support final enablements > > > > > > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for > Bonarie and later > > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) > > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig > (v2) > > > > > > > > > > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> > > > > > > > > _______________________________________________ > > > > amd-gfx mailing list > > > > amd-gfx@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > [-- Attachment #1.2: Type: text/html, Size: 9472 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-22 7:56 ` Re: Mauro Rossi @ 2020-07-24 18:31 ` Alex Deucher 2020-07-26 15:31 ` Re: Mauro Rossi 0 siblings, 1 reply; 50+ messages in thread From: Alex Deucher @ 2020-07-24 18:31 UTC (permalink / raw) To: Mauro Rossi Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list [-- Attachment #1: Type: text/plain, Size: 7470 bytes --] On Wed, Jul 22, 2020 at 3:57 AM Mauro Rossi <issor.oruam@gmail.com> wrote: > > Hello, > re-sending and copying full DL > > On Wed, Jul 22, 2020 at 4:51 AM Alex Deucher <alexdeucher@gmail.com> wrote: >> >> On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> wrote: >> > >> > Hi Christian, >> > >> > On Mon, Jul 20, 2020 at 11:00 AM Christian König >> > <ckoenig.leichtzumerken@gmail.com> wrote: >> > > >> > > Hi Mauro, >> > > >> > > I'm not deep into the whole DC design, so just some general high level >> > > comments on the cover letter: >> > > >> > > 1. Please add a subject line to the cover letter, my spam filter thinks >> > > that this is suspicious otherwise. >> > >> > My mistake in the editing of covert letter with git send-email, >> > I may have forgot to keep the Subject at the top >> > >> > > >> > > 2. Then you should probably note how well (badly?) is that tested. Since >> > > you noted proof of concept it might not even work. >> > >> > The Changelog is to be read as: >> > >> > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was >> > just a rebase onto amd-staging-drm-next >> > >> > this series [PATCH v3] has all the known changes required for DCE6 specificity >> > and based on a long offline thread with Alexander Deutcher and past >> > dri-devel chats with Harry Wentland. >> > >> > It was tested for my possibilities of testing with HD7750 and HD7950, >> > with checks in dmesg output for not getting "missing registers/masks" >> > kernel WARNING >> > and with kernel build on Ubuntu 20.04 and with android-x86 >> > >> > The proposal I made to Alex is that AMD testing systems will be used >> > for further regression testing, >> > as part of review and validation for eligibility to amd-staging-drm-next >> > >> >> We will certainly test it once it lands, but presumably this is >> working on the SI cards you have access to? > > > Yes, most of my testing was done with android-x86 Android CTS (EGL, GLES2, GLES3, VK) > > I am also in contact with a person with Firepro W5130M who is running a piglit session > > I had bought an HD7850 to test with Pitcairn, but it arrived as defective so I could not test with Pitcair > > >> >> > > >> > > 3. How feature complete (HDMI audio?, Freesync?) is it? >> > >> > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to >> > DCE6 (dc/dce60 path) in the last two years from initial submission >> > >> > > >> > > Apart from that it looks like a rather impressive piece of work :) >> > > >> > > Cheers, >> > > Christian. >> > >> > Thanks, >> > please consider that most of the latest DCE6 specific parts were >> > possible due to recent Alex support in getting the correct DCE6 >> > headers, >> > his suggestions and continuous feedback. >> > >> > I would suggest that Alex comments on the proposed next steps to follow. >> >> The code looks pretty good to me. I'd like to get some feedback from >> the display team to see if they have any concerns, but beyond that I >> think we can pull it into the tree and continue improving it there. >> Do you have a link to a git tree I can pull directly that contains >> these patches? Is this the right branch? >> https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next >> >> Thanks! >> >> Alex > > > The following branch was pushed with the series on top of amd-staging-drm-next > > https://github.com/maurossi/linux/commits/kernel-5.6_si_drm-next I gave this a quick test on all of the SI asics and the various monitors I had available and it looks good. A few minor patches I noticed are attached. If they look good to you, I'll squash them into the series when I commit it. I've pushed it to my fdo tree as well: https://cgit.freedesktop.org/~agd5f/linux/log/?h=si_dc_support Thanks! Alex > >> >> >> > >> > Mauro >> > >> > > >> > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: >> > > > The series adds SI support to AMD DC >> > > > >> > > > Changelog: >> > > > >> > > > [RFC] >> > > > Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c >> > > > >> > > > [PATCH v2] >> > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 >> > > > >> > > > [PATCH v3] >> > > > Add support for DCE6 specific headers, >> > > > ad hoc DCE6 macros, funtions and fixes, >> > > > rebase on current amd-staging-drm-next >> > > > >> > > > >> > > > Commits [01/27]..[08/27] SI support added in various DC components >> > > > >> > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) >> > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts >> > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) >> > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) >> > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 >> > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) >> > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) >> > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) >> > > > >> > > > Commits [09/27]..[24/27] DCE6 specific code adaptions >> > > > >> > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) >> > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 >> > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions >> > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros >> > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions >> > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions >> > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions >> > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions >> > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions >> > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions >> > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) >> > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init >> > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions >> > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock >> > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions >> > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) >> > > > >> > > > >> > > > Commits [25/27]..[27/27] SI support final enablements >> > > > >> > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later >> > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) >> > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) >> > > > >> > > > >> > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> >> > > > >> > > > _______________________________________________ >> > > > amd-gfx mailing list >> > > > amd-gfx@lists.freedesktop.org >> > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> > > >> > _______________________________________________ >> > amd-gfx mailing list >> > amd-gfx@lists.freedesktop.org >> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #2: 0002-drm-amdgpu-display-addming-return-type-for-dce60_pro.patch --] [-- Type: text/x-patch, Size: 982 bytes --] From 782fea4387d22686856c87b8ac0491a43a4d944c Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@amd.com> Date: Thu, 23 Jul 2020 21:05:41 -0400 Subject: [PATCH 2/3] drm/amdgpu/display: addming return type for dce60_program_front_end_for_pipe Probably a copy/paste typo. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> --- drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c b/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c index 66e5a1ba2a58..920c7ae29d53 100644 --- a/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c +++ b/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c @@ -266,7 +266,7 @@ static void dce60_program_scaler(const struct dc *dc, &pipe_ctx->plane_res.scl_data); } - +static void dce60_program_front_end_for_pipe( struct dc *dc, struct pipe_ctx *pipe_ctx) { -- 2.25.4 [-- Attachment #3: 0003-drm-amdgpu-display-Fix-up-PLL-handling-for-DCE6.patch --] [-- Type: text/x-patch, Size: 1855 bytes --] From 2b18098918717d9ee4c69a47be3527d1cc812b7f Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@amd.com> Date: Fri, 24 Jul 2020 11:41:31 -0400 Subject: [PATCH 3/3] drm/amdgpu/display: Fix up PLL handling for DCE6 DCE6.0 supports 2 PLLs. DCE6.1 supports 3 PLLs. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> --- drivers/gpu/drm/amd/display/dc/dce60/dce60_resource.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce60/dce60_resource.c b/drivers/gpu/drm/amd/display/dc/dce60/dce60_resource.c index 261333afc936..5a5a9cb77acb 100644 --- a/drivers/gpu/drm/amd/display/dc/dce60/dce60_resource.c +++ b/drivers/gpu/drm/amd/display/dc/dce60/dce60_resource.c @@ -379,7 +379,7 @@ static const struct resource_caps res_cap_61 = { .num_timing_generator = 4, .num_audio = 6, .num_stream_encoder = 6, - .num_pll = 2, + .num_pll = 3, .num_ddc = 6, }; @@ -983,9 +983,7 @@ static bool dce60_construct( dce60_clock_source_create(ctx, bp, CLOCK_SOURCE_ID_PLL0, &clk_src_regs[0], false); pool->base.clock_sources[1] = dce60_clock_source_create(ctx, bp, CLOCK_SOURCE_ID_PLL1, &clk_src_regs[1], false); - pool->base.clock_sources[2] = - dce60_clock_source_create(ctx, bp, CLOCK_SOURCE_ID_PLL2, &clk_src_regs[2], false); - pool->base.clk_src_count = 3; + pool->base.clk_src_count = 2; } else { pool->base.dp_clock_source = @@ -993,9 +991,7 @@ static bool dce60_construct( pool->base.clock_sources[0] = dce60_clock_source_create(ctx, bp, CLOCK_SOURCE_ID_PLL1, &clk_src_regs[1], false); - pool->base.clock_sources[1] = - dce60_clock_source_create(ctx, bp, CLOCK_SOURCE_ID_PLL2, &clk_src_regs[2], false); - pool->base.clk_src_count = 2; + pool->base.clk_src_count = 1; } if (pool->base.dp_clock_source == NULL) { -- 2.25.4 [-- Attachment #4: 0001-drm-amdgpu-display-remove-unused-variable-in-dce60_c.patch --] [-- Type: text/x-patch, Size: 1084 bytes --] From 2ced8e528937051e4d8536718c6dc776e0b46314 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@amd.com> Date: Thu, 23 Jul 2020 21:02:14 -0400 Subject: [PATCH 1/3] drm/amdgpu/display: remove unused variable in dce60_configure_crc Signed-off-by: Alex Deucher <alexander.deucher@amd.com> --- drivers/gpu/drm/amd/display/dc/dce60/dce60_timing_generator.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dce60/dce60_timing_generator.c b/drivers/gpu/drm/amd/display/dc/dce60/dce60_timing_generator.c index 4a5b7a0940c6..fc1af0ff0ca4 100644 --- a/drivers/gpu/drm/amd/display/dc/dce60/dce60_timing_generator.c +++ b/drivers/gpu/drm/amd/display/dc/dce60/dce60_timing_generator.c @@ -192,8 +192,6 @@ static bool dce60_is_tg_enabled(struct timing_generator *tg) bool dce60_configure_crc(struct timing_generator *tg, const struct crc_params *params) { - struct dce110_timing_generator *tg110 = DCE110TG_FROM_TG(tg); - /* Cannot configure crc on a CRTC that is disabled */ if (!dce60_is_tg_enabled(tg)) return false; -- 2.25.4 [-- Attachment #5: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: 2020-07-24 18:31 ` Re: Alex Deucher @ 2020-07-26 15:31 ` Mauro Rossi 2020-07-27 18:31 ` Re: Alex Deucher 0 siblings, 1 reply; 50+ messages in thread From: Mauro Rossi @ 2020-07-26 15:31 UTC (permalink / raw) To: Alex Deucher Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list [-- Attachment #1.1: Type: text/plain, Size: 9396 bytes --] Hello, On Fri, Jul 24, 2020 at 8:31 PM Alex Deucher <alexdeucher@gmail.com> wrote: > On Wed, Jul 22, 2020 at 3:57 AM Mauro Rossi <issor.oruam@gmail.com> wrote: > > > > Hello, > > re-sending and copying full DL > > > > On Wed, Jul 22, 2020 at 4:51 AM Alex Deucher <alexdeucher@gmail.com> > wrote: > >> > >> On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> > wrote: > >> > > >> > Hi Christian, > >> > > >> > On Mon, Jul 20, 2020 at 11:00 AM Christian König > >> > <ckoenig.leichtzumerken@gmail.com> wrote: > >> > > > >> > > Hi Mauro, > >> > > > >> > > I'm not deep into the whole DC design, so just some general high > level > >> > > comments on the cover letter: > >> > > > >> > > 1. Please add a subject line to the cover letter, my spam filter > thinks > >> > > that this is suspicious otherwise. > >> > > >> > My mistake in the editing of covert letter with git send-email, > >> > I may have forgot to keep the Subject at the top > >> > > >> > > > >> > > 2. Then you should probably note how well (badly?) is that tested. > Since > >> > > you noted proof of concept it might not even work. > >> > > >> > The Changelog is to be read as: > >> > > >> > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was > >> > just a rebase onto amd-staging-drm-next > >> > > >> > this series [PATCH v3] has all the known changes required for DCE6 > specificity > >> > and based on a long offline thread with Alexander Deutcher and past > >> > dri-devel chats with Harry Wentland. > >> > > >> > It was tested for my possibilities of testing with HD7750 and HD7950, > >> > with checks in dmesg output for not getting "missing registers/masks" > >> > kernel WARNING > >> > and with kernel build on Ubuntu 20.04 and with android-x86 > >> > > >> > The proposal I made to Alex is that AMD testing systems will be used > >> > for further regression testing, > >> > as part of review and validation for eligibility to > amd-staging-drm-next > >> > > >> > >> We will certainly test it once it lands, but presumably this is > >> working on the SI cards you have access to? > > > > > > Yes, most of my testing was done with android-x86 Android CTS (EGL, > GLES2, GLES3, VK) > > > > I am also in contact with a person with Firepro W5130M who is running a > piglit session > > > > I had bought an HD7850 to test with Pitcairn, but it arrived as > defective so I could not test with Pitcair > > > > > >> > >> > > > >> > > 3. How feature complete (HDMI audio?, Freesync?) is it? > >> > > >> > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to > >> > DCE6 (dc/dce60 path) in the last two years from initial submission > >> > > >> > > > >> > > Apart from that it looks like a rather impressive piece of work :) > >> > > > >> > > Cheers, > >> > > Christian. > >> > > >> > Thanks, > >> > please consider that most of the latest DCE6 specific parts were > >> > possible due to recent Alex support in getting the correct DCE6 > >> > headers, > >> > his suggestions and continuous feedback. > >> > > >> > I would suggest that Alex comments on the proposed next steps to > follow. > >> > >> The code looks pretty good to me. I'd like to get some feedback from > >> the display team to see if they have any concerns, but beyond that I > >> think we can pull it into the tree and continue improving it there. > >> Do you have a link to a git tree I can pull directly that contains > >> these patches? Is this the right branch? > >> https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next > >> > >> Thanks! > >> > >> Alex > > > > > > The following branch was pushed with the series on top of > amd-staging-drm-next > > > > https://github.com/maurossi/linux/commits/kernel-5.6_si_drm-next > > I gave this a quick test on all of the SI asics and the various > monitors I had available and it looks good. A few minor patches I > noticed are attached. If they look good to you, I'll squash them into > the series when I commit it. I've pushed it to my fdo tree as well: > https://cgit.freedesktop.org/~agd5f/linux/log/?h=si_dc_support > > Thanks! > > Alex > The new patches are ok and with the following infomation about piglit tests, the series may be good to go. I have performed piglit tests on Tahiti HD7950 on kernel 5.8.0-rc6 with AMD DC support for SI and comparison with vanilla kernel 5.8.0-rc6 Results are the following [piglit gpu tests with kernel 5.8.0-rc6-amddcsi] utente@utente-desktop:~/piglit$ ./piglit run gpu . [26714/26714] skip: 1731, pass: 24669, warn: 15, fail: 288, crash: 11 Thank you for running Piglit! Results have been written to /home/utente/piglit [piglit gpu tests with vanilla 5.8.0-rc6] utente@utente-desktop:~/piglit$ ./piglit run gpu . [26714/26714] skip: 1731, pass: 24673, warn: 13, fail: 283, crash: 14 Thank you for running Piglit! Results have been written to /home/utente/piglit In the attachment the comparison of "5.8.0-rc6-amddcsi" vs "5.8.0-rc6" vanilla and viceversa, I see no significant regression and in the delta of failed tests I don't recognize DC related test cases, but you may also have a look. dmesg for "5.8.0-rc6-amddcsi" is also provide the check the crashes Regarding the other user testing the series with Firepro W5130M he found an already existing issue in amdgpu si_support=1 which is independent from my series and matches a problem alrady reported. [1] Mauro [1] https://bbs.archlinux.org/viewtopic.php?id=249097 > > > > >> > >> > >> > > >> > Mauro > >> > > >> > > > >> > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: > >> > > > The series adds SI support to AMD DC > >> > > > > >> > > > Changelog: > >> > > > > >> > > > [RFC] > >> > > > Preliminar Proof Of Concept, with DCE8 headers still used in > dce60_resources.c > >> > > > > >> > > > [PATCH v2] > >> > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 > >> > > > > >> > > > [PATCH v3] > >> > > > Add support for DCE6 specific headers, > >> > > > ad hoc DCE6 macros, funtions and fixes, > >> > > > rebase on current amd-staging-drm-next > >> > > > > >> > > > > >> > > > Commits [01/27]..[08/27] SI support added in various DC components > >> > > > > >> > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) > >> > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts > >> > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 > support (v9b) > >> > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support > (v2) > >> > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 > >> > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 > (v2) > >> > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 > (v4) > >> > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) > >> > > > > >> > > > Commits [09/27]..[24/27] DCE6 specific code adaptions > >> > > > > >> > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI > parts (v2) > >> > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size > to 64 > >> > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific > macros,functions > >> > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific > macros > >> > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific > macros,functions > >> > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific > macros,functions > >> > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 > specific macros,functions > >> > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 > specific macros,functions > >> > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific > macros,functions > >> > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 > specific macros,functions > >> > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) > >> > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling > Horizontal Filter Init > >> > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 > macros,functions > >> > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 > specific .cursor_lock > >> > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add > DCE6 specific functions > >> > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) > >> > > > > >> > > > > >> > > > Commits [25/27]..[27/27] SI support final enablements > >> > > > > >> > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property > for Bonarie and later > >> > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) > >> > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the > Kconfig (v2) > >> > > > > >> > > > > >> > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> > >> > > > > >> > > > _______________________________________________ > >> > > > amd-gfx mailing list > >> > > > amd-gfx@lists.freedesktop.org > >> > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >> > > > >> > _______________________________________________ > >> > amd-gfx mailing list > >> > amd-gfx@lists.freedesktop.org > >> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > [-- Attachment #1.2: Type: text/html, Size: 13512 bytes --] [-- Attachment #2: dmesg_kernel-5.8.0-rc6_amddcsi.txt --] [-- Type: text/plain, Size: 87504 bytes --] [ 0.000000] microcode: microcode updated early to revision 0x21, date = 2019-02-13 [ 0.000000] Linux version 5.8.0-050800rc6-generic (kernel@kathleen) (gcc (Ubuntu 9.3.0-13ubuntu1) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34.90.20200716) #202007192331 SMP Sun Jul 19 23:33:45 UTC 2020 [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.8.0-050800rc6-generic root=UUID=833ac3c7-4d08-47b5-807f-9a8ddeb3a8d2 ro quiet splash radeon.si_support=0 amdgpu.si_support=1 vt.handoff=7 [ 0.000000] KERNEL supported cpus: [ 0.000000] Intel GenuineIntel [ 0.000000] AMD AuthenticAMD [ 0.000000] Hygon HygonGenuine [ 0.000000] Centaur CentaurHauls [ 0.000000] zhaoxin Shanghai [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [ 0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009d7ff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009d800-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000dd907fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000dd908000-0x00000000de08cfff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000de08d000-0x00000000de116fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000de117000-0x00000000de1b6fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000de1b7000-0x00000000de9a5fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000de9a6000-0x00000000de9a6fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000de9a7000-0x00000000de9e9fff] ACPI NVS [ 0.000000] BIOS-e820: [mem 0x00000000de9ea000-0x00000000df407fff] usable [ 0.000000] BIOS-e820: [mem 0x00000000df408000-0x00000000df7f0fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000df7f1000-0x00000000df7fffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed00000-0x00000000fed03fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000ff000000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000021effffff] usable [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] SMBIOS 2.7 present. [ 0.000000] DMI: To Be Filled By O.E.M. To Be Filled By O.E.M./H77 Pro4/MVP, BIOS P1.70 08/07/2013 [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 3392.425 MHz processor [ 0.000891] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000892] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000897] last_pfn = 0x21f000 max_arch_pfn = 0x400000000 [ 0.000901] MTRR default type: uncachable [ 0.000901] MTRR fixed ranges enabled: [ 0.000902] 00000-9FFFF write-back [ 0.000903] A0000-BFFFF uncachable [ 0.000903] C0000-CFFFF write-protect [ 0.000904] D0000-E7FFF uncachable [ 0.000904] E8000-FFFFF write-protect [ 0.000905] MTRR variable ranges enabled: [ 0.000906] 0 base 000000000 mask E00000000 write-back [ 0.000907] 1 base 200000000 mask FF0000000 write-back [ 0.000907] 2 base 210000000 mask FF8000000 write-back [ 0.000908] 3 base 218000000 mask FFC000000 write-back [ 0.000908] 4 base 21C000000 mask FFE000000 write-back [ 0.000909] 5 base 21E000000 mask FFF000000 write-back [ 0.000910] 6 base 0E0000000 mask FE0000000 uncachable [ 0.000910] 7 disabled [ 0.000910] 8 disabled [ 0.000911] 9 disabled [ 0.001158] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [ 0.001265] total RAM covered: 8176M [ 0.001638] Found optimal setting for mtrr clean up [ 0.001639] gran_size: 64K chunk_size: 32M num_reg: 6 lose cover RAM: 0G [ 0.001863] e820: update [mem 0xe0000000-0xffffffff] usable ==> reserved [ 0.001866] last_pfn = 0xdf800 max_arch_pfn = 0x400000000 [ 0.008892] found SMP MP-table at [mem 0x000fd8d0-0x000fd8df] [ 0.030371] check: Scanning 1 areas for low memory corruption [ 0.030771] RAMDISK: [mem 0x3172d000-0x34b8dfff] [ 0.030777] ACPI: Early table checksum verification disabled [ 0.030780] ACPI: RSDP 0x00000000000F0490 000024 (v02 ALASKA) [ 0.030782] ACPI: XSDT 0x00000000DE19B080 00007C (v01 ALASKA A M I 01072009 AMI 00010013) [ 0.030787] ACPI: FACP 0x00000000DE1A4DC0 00010C (v05 ALASKA A M I 01072009 AMI 00010013) [ 0.030791] ACPI: DSDT 0x00000000DE19B190 009C2D (v02 ALASKA A M I 00000022 INTL 20051117) [ 0.030794] ACPI: FACS 0x00000000DE1B5080 000040 [ 0.030796] ACPI: APIC 0x00000000DE1A4ED0 000072 (v03 ALASKA A M I 01072009 AMI 00010013) [ 0.030798] ACPI: FPDT 0x00000000DE1A4F48 000044 (v01 ALASKA A M I 01072009 AMI 00010013) [ 0.030800] ACPI: MCFG 0x00000000DE1A4F90 00003C (v01 ALASKA A M I 01072009 MSFT 00000097) [ 0.030802] ACPI: SSDT 0x00000000DE1A4FD0 0007E1 (v01 Intel_ AoacTabl 00001000 INTL 20091112) [ 0.030804] ACPI: AAFT 0x00000000DE1A57B8 000112 (v01 ALASKA OEMAAFT 01072009 MSFT 00000097) [ 0.030806] ACPI: HPET 0x00000000DE1A58D0 000038 (v01 ALASKA A M I 01072009 AMI. 00000005) [ 0.030808] ACPI: SSDT 0x00000000DE1A5908 00036D (v01 SataRe SataTabl 00001000 INTL 20091112) [ 0.030811] ACPI: SSDT 0x00000000DE1A5C78 0009AA (v01 PmRef Cpu0Ist 00003000 INTL 20051117) [ 0.030813] ACPI: SSDT 0x00000000DE1A6628 000A92 (v01 PmRef CpuPm 00003000 INTL 20051117) [ 0.030815] ACPI: BGRT 0x00000000DE1A70C0 000038 (v00 ALASKA A M I 01072009 AMI 00010013) [ 0.030822] ACPI: Local APIC address 0xfee00000 [ 0.030892] No NUMA configuration found [ 0.030893] Faking a node at [mem 0x0000000000000000-0x000000021effffff] [ 0.030901] NODE_DATA(0) allocated [mem 0x21efd1000-0x21effafff] [ 0.031211] Zone ranges: [ 0.031211] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.031212] DMA32 [mem 0x0000000001000000-0x00000000ffffffff] [ 0.031213] Normal [mem 0x0000000100000000-0x000000021effffff] [ 0.031214] Device empty [ 0.031214] Movable zone start for each node [ 0.031217] Early memory node ranges [ 0.031217] node 0: [mem 0x0000000000001000-0x000000000009cfff] [ 0.031218] node 0: [mem 0x0000000000100000-0x00000000dd907fff] [ 0.031219] node 0: [mem 0x00000000de08d000-0x00000000de116fff] [ 0.031219] node 0: [mem 0x00000000de9a6000-0x00000000de9a6fff] [ 0.031220] node 0: [mem 0x00000000de9ea000-0x00000000df407fff] [ 0.031220] node 0: [mem 0x00000000df7f1000-0x00000000df7fffff] [ 0.031221] node 0: [mem 0x0000000100000000-0x000000021effffff] [ 0.031310] Zeroed struct page in unavailable ranges: 11428 pages [ 0.031311] Initmem setup node 0 [mem 0x0000000000001000-0x000000021effffff] [ 0.031312] On node 0 totalpages: 2085724 [ 0.031313] DMA zone: 64 pages used for memmap [ 0.031313] DMA zone: 21 pages reserved [ 0.031314] DMA zone: 3996 pages, LIFO batch:0 [ 0.031341] DMA32 zone: 14159 pages used for memmap [ 0.031341] DMA32 zone: 906176 pages, LIFO batch:63 [ 0.042718] Normal zone: 18368 pages used for memmap [ 0.042720] Normal zone: 1175552 pages, LIFO batch:63 [ 0.058107] ACPI: PM-Timer IO Port: 0x408 [ 0.058109] ACPI: Local APIC address 0xfee00000 [ 0.058116] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1]) [ 0.058126] IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 [ 0.058128] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.058129] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.058130] ACPI: IRQ0 used by override. [ 0.058131] ACPI: IRQ9 used by override. [ 0.058133] Using ACPI (MADT) for SMP configuration information [ 0.058134] ACPI: HPET id: 0x8086a701 base: 0xfed00000 [ 0.058139] TSC deadline timer available [ 0.058140] smpboot: Allowing 4 CPUs, 0 hotplug CPUs [ 0.058155] PM: hibernation: Registered nosave memory: [mem 0x00000000-0x00000fff] [ 0.058157] PM: hibernation: Registered nosave memory: [mem 0x0009d000-0x0009dfff] [ 0.058157] PM: hibernation: Registered nosave memory: [mem 0x0009e000-0x0009ffff] [ 0.058157] PM: hibernation: Registered nosave memory: [mem 0x000a0000-0x000dffff] [ 0.058158] PM: hibernation: Registered nosave memory: [mem 0x000e0000-0x000fffff] [ 0.058159] PM: hibernation: Registered nosave memory: [mem 0xdd908000-0xde08cfff] [ 0.058160] PM: hibernation: Registered nosave memory: [mem 0xde117000-0xde1b6fff] [ 0.058161] PM: hibernation: Registered nosave memory: [mem 0xde1b7000-0xde9a5fff] [ 0.058162] PM: hibernation: Registered nosave memory: [mem 0xde9a7000-0xde9e9fff] [ 0.058163] PM: hibernation: Registered nosave memory: [mem 0xdf408000-0xdf7f0fff] [ 0.058164] PM: hibernation: Registered nosave memory: [mem 0xdf800000-0xf7ffffff] [ 0.058164] PM: hibernation: Registered nosave memory: [mem 0xf8000000-0xfbffffff] [ 0.058165] PM: hibernation: Registered nosave memory: [mem 0xfc000000-0xfebfffff] [ 0.058165] PM: hibernation: Registered nosave memory: [mem 0xfec00000-0xfec00fff] [ 0.058166] PM: hibernation: Registered nosave memory: [mem 0xfec01000-0xfecfffff] [ 0.058166] PM: hibernation: Registered nosave memory: [mem 0xfed00000-0xfed03fff] [ 0.058166] PM: hibernation: Registered nosave memory: [mem 0xfed04000-0xfed1bfff] [ 0.058167] PM: hibernation: Registered nosave memory: [mem 0xfed1c000-0xfed1ffff] [ 0.058167] PM: hibernation: Registered nosave memory: [mem 0xfed20000-0xfedfffff] [ 0.058168] PM: hibernation: Registered nosave memory: [mem 0xfee00000-0xfee00fff] [ 0.058168] PM: hibernation: Registered nosave memory: [mem 0xfee01000-0xfeffffff] [ 0.058168] PM: hibernation: Registered nosave memory: [mem 0xff000000-0xffffffff] [ 0.058170] [mem 0xdf800000-0xf7ffffff] available for PCI devices [ 0.058171] Booting paravirtualized kernel on bare hardware [ 0.058173] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns [ 0.058179] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:4 nr_cpu_ids:4 nr_node_ids:1 [ 0.058466] percpu: Embedded 56 pages/cpu s192512 r8192 d28672 u524288 [ 0.058471] pcpu-alloc: s192512 r8192 d28672 u524288 alloc=1*2097152 [ 0.058471] pcpu-alloc: [0] 0 1 2 3 [ 0.058495] Built 1 zonelists, mobility grouping on. Total pages: 2053112 [ 0.058495] Policy zone: Normal [ 0.058497] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.8.0-050800rc6-generic root=UUID=833ac3c7-4d08-47b5-807f-9a8ddeb3a8d2 ro quiet splash radeon.si_support=0 amdgpu.si_support=1 vt.handoff=7 [ 0.059394] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes, linear) [ 0.059799] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes, linear) [ 0.059841] mem auto-init: stack:off, heap alloc:on, heap free:off [ 0.098003] Memory: 8041840K/8342896K available (14339K kernel code, 2555K rwdata, 8736K rodata, 2632K init, 4912K bss, 301056K reserved, 0K cma-reserved) [ 0.098010] random: get_random_u64 called from kmem_cache_open+0x2d/0x410 with crng_init=0 [ 0.098116] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.098128] Kernel/User page tables isolation: enabled [ 0.098144] ftrace: allocating 46071 entries in 180 pages [ 0.111578] ftrace: allocated 180 pages with 4 groups [ 0.111684] rcu: Hierarchical RCU implementation. [ 0.111685] rcu: RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=4. [ 0.111686] Trampoline variant of Tasks RCU enabled. [ 0.111686] Rude variant of Tasks RCU enabled. [ 0.111687] Tracing variant of Tasks RCU enabled. [ 0.111687] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.111688] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4 [ 0.114400] NR_IRQS: 524544, nr_irqs: 456, preallocated irqs: 16 [ 0.114611] random: crng done (trusting CPU's manufacturer) [ 0.114630] Console: colour dummy device 80x25 [ 0.114634] printk: console [tty0] enabled [ 0.114648] ACPI: Core revision 20200528 [ 0.114745] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns [ 0.114755] APIC: Switch to symmetric I/O mode setup [ 0.114825] x2apic: IRQ remapping doesn't support X2APIC mode [ 0.115236] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.134755] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x30e65a81c66, max_idle_ns: 440795263477 ns [ 0.134758] Calibrating delay loop (skipped), value calculated using timer frequency.. 6784.85 BogoMIPS (lpj=13569700) [ 0.134760] pid_max: default: 32768 minimum: 301 [ 0.134781] LSM: Security Framework initializing [ 0.134788] Yama: becoming mindful. [ 0.134810] AppArmor: AppArmor initialized [ 0.134854] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.134874] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes, linear) [ 0.135089] mce: CPU0: Thermal monitoring enabled (TM1) [ 0.135099] process: using mwait in idle threads [ 0.135101] Last level iTLB entries: 4KB 512, 2MB 8, 4MB 8 [ 0.135101] Last level dTLB entries: 4KB 512, 2MB 32, 4MB 32, 1GB 0 [ 0.135103] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization [ 0.135104] Spectre V2 : Mitigation: Full generic retpoline [ 0.135105] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch [ 0.135105] Spectre V2 : Enabling Restricted Speculation for firmware calls [ 0.135106] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier [ 0.135107] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp [ 0.135109] SRBDS: Vulnerable: No microcode [ 0.135110] MDS: Mitigation: Clear CPU buffers [ 0.135279] Freeing SMP alternatives memory: 40K [ 0.138821] smpboot: CPU0: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz (family: 0x6, model: 0x3a, stepping: 0x9) [ 0.138915] Performance Events: PEBS fmt1+, IvyBridge events, 16-deep LBR, full-width counters, Intel PMU driver. [ 0.138921] ... version: 3 [ 0.138921] ... bit width: 48 [ 0.138922] ... generic registers: 8 [ 0.138922] ... value mask: 0000ffffffffffff [ 0.138922] ... max period: 00007fffffffffff [ 0.138923] ... fixed-purpose events: 3 [ 0.138923] ... event mask: 00000007000000ff [ 0.138953] rcu: Hierarchical SRCU implementation. [ 0.139601] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. [ 0.139647] smp: Bringing up secondary CPUs ... [ 0.139724] x86: Booting SMP configuration: [ 0.139725] .... node #0, CPUs: #1 #2 #3 [ 0.146807] smp: Brought up 1 node, 4 CPUs [ 0.146807] smpboot: Max logical packages: 1 [ 0.146807] smpboot: Total of 4 processors activated (27139.40 BogoMIPS) [ 0.147882] devtmpfs: initialized [ 0.147882] x86/mm: Memory block size: 128MB [ 0.147882] PM: Registering ACPI NVS region [mem 0xde117000-0xde1b6fff] (655360 bytes) [ 0.147882] PM: Registering ACPI NVS region [mem 0xde9a7000-0xde9e9fff] (274432 bytes) [ 0.147882] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns [ 0.147882] futex hash table entries: 1024 (order: 4, 65536 bytes, linear) [ 0.147882] pinctrl core: initialized pinctrl subsystem [ 0.147882] PM: RTC time: 12:11:09, date: 2020-07-26 [ 0.147882] thermal_sys: Registered thermal governor 'fair_share' [ 0.147882] thermal_sys: Registered thermal governor 'bang_bang' [ 0.147882] thermal_sys: Registered thermal governor 'step_wise' [ 0.147882] thermal_sys: Registered thermal governor 'user_space' [ 0.147882] thermal_sys: Registered thermal governor 'power_allocator' [ 0.147882] NET: Registered protocol family 16 [ 0.147882] DMA: preallocated 1024 KiB GFP_KERNEL pool for atomic allocations [ 0.147882] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations [ 0.147882] DMA: preallocated 1024 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations [ 0.147882] audit: initializing netlink subsys (disabled) [ 0.147882] audit: type=2000 audit(1595765468.032:1): state=initialized audit_enabled=0 res=1 [ 0.147882] EISA bus registered [ 0.147882] cpuidle: using governor ladder [ 0.147882] cpuidle: using governor menu [ 0.147882] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it [ 0.147882] ACPI: bus type PCI registered [ 0.147882] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5 [ 0.147882] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000) [ 0.147882] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820 [ 0.147882] PCI: Using configuration type 1 for base access [ 0.147882] core: PMU erratum BJ122, BV98, HSD29 workaround disabled, HT off [ 0.147882] ENERGY_PERF_BIAS: Set to 'normal', was 'performance' [ 0.150780] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.150835] ACPI: Added _OSI(Module Device) [ 0.150835] ACPI: Added _OSI(Processor Device) [ 0.150836] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.150837] ACPI: Added _OSI(Processor Aggregator Device) [ 0.150837] ACPI: Added _OSI(Linux-Dell-Video) [ 0.150838] ACPI: Added _OSI(Linux-Lenovo-NV-HDMI-Audio) [ 0.150839] ACPI: Added _OSI(Linux-HPI-Hybrid-Graphics) [ 0.157639] ACPI: 5 ACPI AML tables successfully acquired and loaded [ 0.159283] ACPI: Dynamic OEM Table Load: [ 0.159288] ACPI: SSDT 0xFFFF9176154C7000 00083B (v01 PmRef Cpu0Cst 00003001 INTL 20051117) [ 0.160055] ACPI: Dynamic OEM Table Load: [ 0.160059] ACPI: SSDT 0xFFFF9176154BE000 000303 (v01 PmRef ApIst 00003000 INTL 20051117) [ 0.160633] ACPI: Dynamic OEM Table Load: [ 0.160636] ACPI: SSDT 0xFFFF917615082400 000119 (v01 PmRef ApCst 00003000 INTL 20051117) [ 0.161917] ACPI: Interpreter enabled [ 0.161936] ACPI: (supports S0 S3 S4 S5) [ 0.161937] ACPI: Using IOAPIC for interrupt routing [ 0.162004] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.162243] ACPI: Enabled 16 GPEs in block 00 to 3F [ 0.167082] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-3e]) [ 0.167087] acpi PNP0A08:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI HPX-Type3] [ 0.167379] acpi PNP0A08:00: _OSC: platform does not support [PCIeHotplug SHPCHotplug PME LTR] [ 0.167574] acpi PNP0A08:00: _OSC: OS now controls [AER PCIeCapability] [ 0.167574] acpi PNP0A08:00: FADT indicates ASPM is unsupported, using BIOS configuration [ 0.168078] PCI host bridge to bus 0000:00 [ 0.168080] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 0.168081] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 0.168082] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 0.168083] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window] [ 0.168083] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window] [ 0.168084] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window] [ 0.168085] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window] [ 0.168086] pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff window] [ 0.168087] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window] [ 0.168087] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xfeafffff window] [ 0.168088] pci_bus 0000:00: root bus resource [bus 00-3e] [ 0.168096] pci 0000:00:00.0: [8086:0150] type 00 class 0x060000 [ 0.168183] pci 0000:00:01.0: [8086:0151] type 01 class 0x060400 [ 0.168215] pci 0000:00:01.0: PME# supported from D0 D3hot D3cold [ 0.168321] pci 0000:00:14.0: [8086:1e31] type 00 class 0x0c0330 [ 0.168343] pci 0000:00:14.0: reg 0x10: [mem 0xf7f00000-0xf7f0ffff 64bit] [ 0.168408] pci 0000:00:14.0: PME# supported from D3hot D3cold [ 0.168494] pci 0000:00:16.0: [8086:1e3a] type 00 class 0x078000 [ 0.168517] pci 0000:00:16.0: reg 0x10: [mem 0xf7f1a000-0xf7f1a00f 64bit] [ 0.168585] pci 0000:00:16.0: PME# supported from D0 D3hot D3cold [ 0.168668] pci 0000:00:1a.0: [8086:1e2d] type 00 class 0x0c0320 [ 0.168688] pci 0000:00:1a.0: reg 0x10: [mem 0xf7f18000-0xf7f183ff] [ 0.168767] pci 0000:00:1a.0: PME# supported from D0 D3hot D3cold [ 0.168853] pci 0000:00:1b.0: [8086:1e20] type 00 class 0x040300 [ 0.168872] pci 0000:00:1b.0: reg 0x10: [mem 0xf7f10000-0xf7f13fff 64bit] [ 0.168944] pci 0000:00:1b.0: PME# supported from D0 D3hot D3cold [ 0.169037] pci 0000:00:1c.0: [8086:1e10] type 01 class 0x060400 [ 0.169192] pci 0000:00:1c.0: PME# supported from D0 D3hot D3cold [ 0.169312] pci 0000:00:1c.4: [8086:244e] type 01 class 0x060401 [ 0.169401] pci 0000:00:1c.4: PME# supported from D0 D3hot D3cold [ 0.169496] pci 0000:00:1c.5: [8086:1e1a] type 01 class 0x060400 [ 0.169586] pci 0000:00:1c.5: PME# supported from D0 D3hot D3cold [ 0.169683] pci 0000:00:1c.7: [8086:1e1e] type 01 class 0x060400 [ 0.169773] pci 0000:00:1c.7: PME# supported from D0 D3hot D3cold [ 0.169870] pci 0000:00:1d.0: [8086:1e26] type 00 class 0x0c0320 [ 0.169890] pci 0000:00:1d.0: reg 0x10: [mem 0xf7f17000-0xf7f173ff] [ 0.169969] pci 0000:00:1d.0: PME# supported from D0 D3hot D3cold [ 0.170057] pci 0000:00:1f.0: [8086:1e4a] type 00 class 0x060100 [ 0.170234] pci 0000:00:1f.2: [8086:1e02] type 00 class 0x010601 [ 0.170250] pci 0000:00:1f.2: reg 0x10: [io 0xf070-0xf077] [ 0.170256] pci 0000:00:1f.2: reg 0x14: [io 0xf060-0xf063] [ 0.170263] pci 0000:00:1f.2: reg 0x18: [io 0xf050-0xf057] [ 0.170269] pci 0000:00:1f.2: reg 0x1c: [io 0xf040-0xf043] [ 0.170275] pci 0000:00:1f.2: reg 0x20: [io 0xf020-0xf03f] [ 0.170281] pci 0000:00:1f.2: reg 0x24: [mem 0xf7f16000-0xf7f167ff] [ 0.170318] pci 0000:00:1f.2: PME# supported from D3hot [ 0.170397] pci 0000:00:1f.3: [8086:1e22] type 00 class 0x0c0500 [ 0.170413] pci 0000:00:1f.3: reg 0x10: [mem 0xf7f15000-0xf7f150ff 64bit] [ 0.170431] pci 0000:00:1f.3: reg 0x20: [io 0xf000-0xf01f] [ 0.170547] pci 0000:01:00.0: [1002:679a] type 00 class 0x030000 [ 0.170558] pci 0000:01:00.0: reg 0x10: [mem 0xe0000000-0xefffffff 64bit pref] [ 0.170563] pci 0000:01:00.0: reg 0x18: [mem 0xf7e00000-0xf7e3ffff 64bit] [ 0.170567] pci 0000:01:00.0: reg 0x20: [io 0xe000-0xe0ff] [ 0.170573] pci 0000:01:00.0: reg 0x30: [mem 0xf7e40000-0xf7e5ffff pref] [ 0.170576] pci 0000:01:00.0: enabling Extended Tags [ 0.170602] pci 0000:01:00.0: supports D1 D2 [ 0.170603] pci 0000:01:00.0: PME# supported from D1 D2 D3hot [ 0.170651] pci 0000:01:00.1: [1002:aaa0] type 00 class 0x040300 [ 0.170661] pci 0000:01:00.1: reg 0x10: [mem 0xf7e60000-0xf7e63fff 64bit] [ 0.170677] pci 0000:01:00.1: enabling Extended Tags [ 0.170699] pci 0000:01:00.1: supports D1 D2 [ 0.170743] pci 0000:00:01.0: PCI bridge to [bus 01] [ 0.170744] pci 0000:00:01.0: bridge window [io 0xe000-0xefff] [ 0.170746] pci 0000:00:01.0: bridge window [mem 0xf7e00000-0xf7efffff] [ 0.170748] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xefffffff 64bit pref] [ 0.174800] pci 0000:00:1c.0: PCI bridge to [bus 02] [ 0.174868] pci 0000:03:00.0: [1b21:1080] type 01 class 0x060401 [ 0.175050] pci 0000:00:1c.4: PCI bridge to [bus 03-04] (subtractive decode) [ 0.175059] pci 0000:00:1c.4: bridge window [io 0x0000-0x0cf7 window] (subtractive decode) [ 0.175060] pci 0000:00:1c.4: bridge window [io 0x0d00-0xffff window] (subtractive decode) [ 0.175061] pci 0000:00:1c.4: bridge window [mem 0x000a0000-0x000bffff window] (subtractive decode) [ 0.175062] pci 0000:00:1c.4: bridge window [mem 0x000d0000-0x000d3fff window] (subtractive decode) [ 0.175063] pci 0000:00:1c.4: bridge window [mem 0x000d4000-0x000d7fff window] (subtractive decode) [ 0.175063] pci 0000:00:1c.4: bridge window [mem 0x000d8000-0x000dbfff window] (subtractive decode) [ 0.175065] pci 0000:00:1c.4: bridge window [mem 0x000dc000-0x000dffff window] (subtractive decode) [ 0.175066] pci 0000:00:1c.4: bridge window [mem 0x000e0000-0x000e3fff window] (subtractive decode) [ 0.175067] pci 0000:00:1c.4: bridge window [mem 0x000e4000-0x000e7fff window] (subtractive decode) [ 0.175068] pci 0000:00:1c.4: bridge window [mem 0xe0000000-0xfeafffff window] (subtractive decode) [ 0.175102] pci_bus 0000:04: extended config space not accessible [ 0.175181] pci 0000:03:00.0: PCI bridge to [bus 04] (subtractive decode) [ 0.175201] pci 0000:03:00.0: bridge window [io 0x0000-0x0cf7 window] (subtractive decode) [ 0.175202] pci 0000:03:00.0: bridge window [io 0x0d00-0xffff window] (subtractive decode) [ 0.175203] pci 0000:03:00.0: bridge window [mem 0x000a0000-0x000bffff window] (subtractive decode) [ 0.175204] pci 0000:03:00.0: bridge window [mem 0x000d0000-0x000d3fff window] (subtractive decode) [ 0.175204] pci 0000:03:00.0: bridge window [mem 0x000d4000-0x000d7fff window] (subtractive decode) [ 0.175205] pci 0000:03:00.0: bridge window [mem 0x000d8000-0x000dbfff window] (subtractive decode) [ 0.175206] pci 0000:03:00.0: bridge window [mem 0x000dc000-0x000dffff window] (subtractive decode) [ 0.175207] pci 0000:03:00.0: bridge window [mem 0x000e0000-0x000e3fff window] (subtractive decode) [ 0.175208] pci 0000:03:00.0: bridge window [mem 0x000e4000-0x000e7fff window] (subtractive decode) [ 0.175208] pci 0000:03:00.0: bridge window [mem 0xe0000000-0xfeafffff window] (subtractive decode) [ 0.175270] pci 0000:05:00.0: [10ec:8168] type 00 class 0x020000 [ 0.175303] pci 0000:05:00.0: reg 0x10: [io 0xd000-0xd0ff] [ 0.175335] pci 0000:05:00.0: reg 0x18: [mem 0xf0004000-0xf0004fff 64bit pref] [ 0.175354] pci 0000:05:00.0: reg 0x20: [mem 0xf0000000-0xf0003fff 64bit pref] [ 0.175479] pci 0000:05:00.0: supports D1 D2 [ 0.175480] pci 0000:05:00.0: PME# supported from D0 D1 D2 D3hot D3cold [ 0.175606] pci 0000:00:1c.5: PCI bridge to [bus 05] [ 0.175609] pci 0000:00:1c.5: bridge window [io 0xd000-0xdfff] [ 0.175616] pci 0000:00:1c.5: bridge window [mem 0xf0000000-0xf00fffff 64bit pref] [ 0.175666] pci 0000:06:00.0: [1b21:0612] type 00 class 0x010601 [ 0.175694] pci 0000:06:00.0: reg 0x10: [io 0xc050-0xc057] [ 0.175707] pci 0000:06:00.0: reg 0x14: [io 0xc040-0xc043] [ 0.175719] pci 0000:06:00.0: reg 0x18: [io 0xc030-0xc037] [ 0.175731] pci 0000:06:00.0: reg 0x1c: [io 0xc020-0xc023] [ 0.175743] pci 0000:06:00.0: reg 0x20: [io 0xc000-0xc01f] [ 0.175756] pci 0000:06:00.0: reg 0x24: [mem 0xf7d00000-0xf7d001ff] [ 0.175931] pci 0000:00:1c.7: PCI bridge to [bus 06] [ 0.175933] pci 0000:00:1c.7: bridge window [io 0xc000-0xcfff] [ 0.175936] pci 0000:00:1c.7: bridge window [mem 0xf7d00000-0xf7dfffff] [ 0.176585] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15) [ 0.176646] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 *10 11 12 14 15) [ 0.176705] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 *10 11 12 14 15) [ 0.176763] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 *5 6 10 11 12 14 15) [ 0.176822] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled. [ 0.176880] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 10 11 12 14 15) *0, disabled. [ 0.176938] ACPI: PCI Interrupt Link [LNKG] (IRQs *3 4 5 6 10 11 12 14 15) [ 0.176998] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 10 *11 12 14 15) [ 0.177169] iommu: Default domain type: Translated [ 0.177169] pci 0000:01:00.0: vgaarb: setting as boot VGA device [ 0.177169] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none [ 0.177169] pci 0000:01:00.0: vgaarb: bridge control possible [ 0.177169] vgaarb: loaded [ 0.177169] SCSI subsystem initialized [ 0.177169] libata version 3.00 loaded. [ 0.177169] ACPI: bus type USB registered [ 0.177169] usbcore: registered new interface driver usbfs [ 0.177169] usbcore: registered new interface driver hub [ 0.177169] usbcore: registered new device driver usb [ 0.177169] pps_core: LinuxPPS API ver. 1 registered [ 0.177169] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it> [ 0.177169] PTP clock support registered [ 0.177169] EDAC MC: Ver: 3.0.0 [ 0.177169] NetLabel: Initializing [ 0.177169] NetLabel: domain hash size = 128 [ 0.177169] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO [ 0.177169] NetLabel: unlabeled traffic allowed by default [ 0.177169] PCI: Using ACPI for IRQ routing [ 0.179063] PCI: pci_cache_line_size set to 64 bytes [ 0.179109] e820: reserve RAM buffer [mem 0x0009d800-0x0009ffff] [ 0.179109] e820: reserve RAM buffer [mem 0xdd908000-0xdfffffff] [ 0.179110] e820: reserve RAM buffer [mem 0xde117000-0xdfffffff] [ 0.179111] e820: reserve RAM buffer [mem 0xde9a7000-0xdfffffff] [ 0.179112] e820: reserve RAM buffer [mem 0xdf408000-0xdfffffff] [ 0.179112] e820: reserve RAM buffer [mem 0xdf800000-0xdfffffff] [ 0.179113] e820: reserve RAM buffer [mem 0x21f000000-0x21fffffff] [ 0.179338] hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, 0, 0, 0, 0, 0 [ 0.179340] hpet0: 8 comparators, 64-bit 14.318180 MHz counter [ 0.181360] clocksource: Switched to clocksource tsc-early [ 0.190211] VFS: Disk quotas dquot_6.6.0 [ 0.190224] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.190309] AppArmor: AppArmor Filesystem Enabled [ 0.190331] pnp: PnP ACPI init [ 0.190448] system 00:00: [mem 0xfed40000-0xfed44fff] has been reserved [ 0.190452] system 00:00: Plug and Play ACPI device, IDs PNP0c01 (active) [ 0.190536] system 00:01: [io 0x0680-0x069f] has been reserved [ 0.190537] system 00:01: [io 0x1000-0x100f] has been reserved [ 0.190538] system 00:01: [io 0xffff] has been reserved [ 0.190539] system 00:01: [io 0xffff] has been reserved [ 0.190540] system 00:01: [io 0x0400-0x0453] has been reserved [ 0.190541] system 00:01: [io 0x0458-0x047f] has been reserved [ 0.190543] system 00:01: [io 0x0500-0x057f] has been reserved [ 0.190544] system 00:01: [io 0x164e-0x164f] has been reserved [ 0.190546] system 00:01: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.190567] pnp 00:02: Plug and Play ACPI device, IDs PNP0b00 (active) [ 0.190613] system 00:03: [io 0x0454-0x0457] has been reserved [ 0.190616] system 00:03: Plug and Play ACPI device, IDs INT3f0d PNP0c02 (active) [ 0.190699] system 00:04: [io 0x0290-0x029f] has been reserved [ 0.190701] system 00:04: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.190931] system 00:05: [io 0x04d0-0x04d1] has been reserved [ 0.190934] system 00:05: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.190972] pnp 00:06: Plug and Play ACPI device, IDs PNP0303 PNP030b (active) [ 0.191125] pnp 00:07: [dma 0 disabled] [ 0.191160] pnp 00:07: Plug and Play ACPI device, IDs PNP0501 (active) [ 0.191391] system 00:08: [mem 0xfed1c000-0xfed1ffff] has been reserved [ 0.191392] system 00:08: [mem 0xfed10000-0xfed17fff] has been reserved [ 0.191393] system 00:08: [mem 0xfed18000-0xfed18fff] has been reserved [ 0.191394] system 00:08: [mem 0xfed19000-0xfed19fff] has been reserved [ 0.191395] system 00:08: [mem 0xf8000000-0xfbffffff] has been reserved [ 0.191396] system 00:08: [mem 0xfed20000-0xfed3ffff] has been reserved [ 0.191397] system 00:08: [mem 0xfed90000-0xfed93fff] has been reserved [ 0.191398] system 00:08: [mem 0xfed45000-0xfed8ffff] has been reserved [ 0.191399] system 00:08: [mem 0xff000000-0xffffffff] has been reserved [ 0.191400] system 00:08: [mem 0xfee00000-0xfeefffff] could not be reserved [ 0.191401] system 00:08: [mem 0xf0100000-0xf0100fff] has been reserved [ 0.191403] system 00:08: Plug and Play ACPI device, IDs PNP0c02 (active) [ 0.191561] pnp: PnP ACPI: found 9 devices [ 0.197011] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns [ 0.197057] NET: Registered protocol family 2 [ 0.197198] tcp_listen_portaddr_hash hash table entries: 4096 (order: 4, 65536 bytes, linear) [ 0.197256] TCP established hash table entries: 65536 (order: 7, 524288 bytes, linear) [ 0.197413] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 0.197478] TCP: Hash tables configured (established 65536 bind 65536) [ 0.197550] UDP hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.197573] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes, linear) [ 0.197615] NET: Registered protocol family 1 [ 0.197619] NET: Registered protocol family 44 [ 0.197629] pci 0000:00:01.0: PCI bridge to [bus 01] [ 0.197631] pci 0000:00:01.0: bridge window [io 0xe000-0xefff] [ 0.197633] pci 0000:00:01.0: bridge window [mem 0xf7e00000-0xf7efffff] [ 0.197635] pci 0000:00:01.0: bridge window [mem 0xe0000000-0xefffffff 64bit pref] [ 0.197637] pci 0000:00:1c.0: PCI bridge to [bus 02] [ 0.197654] pci 0000:03:00.0: PCI bridge to [bus 04] [ 0.197672] pci 0000:00:1c.4: PCI bridge to [bus 03-04] [ 0.197682] pci 0000:00:1c.5: PCI bridge to [bus 05] [ 0.197683] pci 0000:00:1c.5: bridge window [io 0xd000-0xdfff] [ 0.197689] pci 0000:00:1c.5: bridge window [mem 0xf0000000-0xf00fffff 64bit pref] [ 0.197694] pci 0000:00:1c.7: PCI bridge to [bus 06] [ 0.197695] pci 0000:00:1c.7: bridge window [io 0xc000-0xcfff] [ 0.197699] pci 0000:00:1c.7: bridge window [mem 0xf7d00000-0xf7dfffff] [ 0.197706] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] [ 0.197707] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] [ 0.197708] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] [ 0.197709] pci_bus 0000:00: resource 7 [mem 0x000d0000-0x000d3fff window] [ 0.197710] pci_bus 0000:00: resource 8 [mem 0x000d4000-0x000d7fff window] [ 0.197711] pci_bus 0000:00: resource 9 [mem 0x000d8000-0x000dbfff window] [ 0.197712] pci_bus 0000:00: resource 10 [mem 0x000dc000-0x000dffff window] [ 0.197712] pci_bus 0000:00: resource 11 [mem 0x000e0000-0x000e3fff window] [ 0.197713] pci_bus 0000:00: resource 12 [mem 0x000e4000-0x000e7fff window] [ 0.197714] pci_bus 0000:00: resource 13 [mem 0xe0000000-0xfeafffff window] [ 0.197715] pci_bus 0000:01: resource 0 [io 0xe000-0xefff] [ 0.197716] pci_bus 0000:01: resource 1 [mem 0xf7e00000-0xf7efffff] [ 0.197716] pci_bus 0000:01: resource 2 [mem 0xe0000000-0xefffffff 64bit pref] [ 0.197718] pci_bus 0000:03: resource 4 [io 0x0000-0x0cf7 window] [ 0.197718] pci_bus 0000:03: resource 5 [io 0x0d00-0xffff window] [ 0.197719] pci_bus 0000:03: resource 6 [mem 0x000a0000-0x000bffff window] [ 0.197720] pci_bus 0000:03: resource 7 [mem 0x000d0000-0x000d3fff window] [ 0.197721] pci_bus 0000:03: resource 8 [mem 0x000d4000-0x000d7fff window] [ 0.197722] pci_bus 0000:03: resource 9 [mem 0x000d8000-0x000dbfff window] [ 0.197722] pci_bus 0000:03: resource 10 [mem 0x000dc000-0x000dffff window] [ 0.197723] pci_bus 0000:03: resource 11 [mem 0x000e0000-0x000e3fff window] [ 0.197724] pci_bus 0000:03: resource 12 [mem 0x000e4000-0x000e7fff window] [ 0.197725] pci_bus 0000:03: resource 13 [mem 0xe0000000-0xfeafffff window] [ 0.197726] pci_bus 0000:04: resource 4 [io 0x0000-0x0cf7 window] [ 0.197726] pci_bus 0000:04: resource 5 [io 0x0d00-0xffff window] [ 0.197727] pci_bus 0000:04: resource 6 [mem 0x000a0000-0x000bffff window] [ 0.197728] pci_bus 0000:04: resource 7 [mem 0x000d0000-0x000d3fff window] [ 0.197729] pci_bus 0000:04: resource 8 [mem 0x000d4000-0x000d7fff window] [ 0.197730] pci_bus 0000:04: resource 9 [mem 0x000d8000-0x000dbfff window] [ 0.197730] pci_bus 0000:04: resource 10 [mem 0x000dc000-0x000dffff window] [ 0.197731] pci_bus 0000:04: resource 11 [mem 0x000e0000-0x000e3fff window] [ 0.197732] pci_bus 0000:04: resource 12 [mem 0x000e4000-0x000e7fff window] [ 0.197733] pci_bus 0000:04: resource 13 [mem 0xe0000000-0xfeafffff window] [ 0.197734] pci_bus 0000:05: resource 0 [io 0xd000-0xdfff] [ 0.197735] pci_bus 0000:05: resource 2 [mem 0xf0000000-0xf00fffff 64bit pref] [ 0.197735] pci_bus 0000:06: resource 0 [io 0xc000-0xcfff] [ 0.197736] pci_bus 0000:06: resource 1 [mem 0xf7d00000-0xf7dfffff] [ 0.222890] pci 0000:00:1a.0: quirk_usb_early_handoff+0x0/0x662 took 24314 usecs [ 0.246886] pci 0000:00:1d.0: quirk_usb_early_handoff+0x0/0x662 took 23419 usecs [ 0.246898] pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff] [ 0.246903] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0 [ 0.246910] pci 0000:03:00.0: CLS mismatch (64 != 32), using 64 bytes [ 0.246965] Trying to unpack rootfs image as initramfs... [ 0.364777] Freeing initrd memory: 53636K [ 0.364812] PCI-DMA: Using software bounce buffering for IO (SWIOTLB) [ 0.364814] software IO TLB: mapped [mem 0xd6600000-0xda600000] (64MB) [ 0.365043] check: Scanning for low memory corruption every 60 seconds [ 0.365380] Initialise system trusted keyrings [ 0.365388] Key type blacklist registered [ 0.365410] workingset: timestamp_bits=36 max_order=21 bucket_order=0 [ 0.366383] zbud: loaded [ 0.366576] squashfs: version 4.0 (2009/01/31) Phillip Lougher [ 0.366695] fuse: init (API version 7.31) [ 0.366801] integrity: Platform Keyring initialized [ 0.375417] Key type asymmetric registered [ 0.375418] Asymmetric key parser 'x509' registered [ 0.375424] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 244) [ 0.375457] io scheduler mq-deadline registered [ 0.376279] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4 [ 0.376316] vesafb: mode is 1280x1024x32, linelength=5120, pages=0 [ 0.376317] vesafb: scrolling: redraw [ 0.376318] vesafb: Truecolor: size=0:8:8:8, shift=0:16:8:0 [ 0.376332] vesafb: framebuffer at 0xe0000000, mapped to 0x0000000076879528, using 5120k, total 5120k [ 0.376360] fbcon: Deferring console take-over [ 0.376361] fb0: VESA VGA frame buffer device [ 0.376369] intel_idle: MWAIT substates: 0x1120 [ 0.376370] intel_idle: v0.5.1 model 0x3A [ 0.376490] intel_idle: Local APIC timer is reliable in all C-states [ 0.376584] input: Power Button as /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0C:00/input/input0 [ 0.376600] ACPI: Power Button [PWRB] [ 0.376625] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input1 [ 0.376649] ACPI: Power Button [PWRF] [ 0.376964] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled [ 0.397473] 00:07: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 0.399205] Linux agpgart interface v0.103 [ 0.401124] loop: module loaded [ 0.401322] libphy: Fixed MDIO Bus: probed [ 0.401323] tun: Universal TUN/TAP device driver, 1.6 [ 0.401342] PPP generic driver version 2.4.2 [ 0.401375] VFIO - User Level meta-driver version: 0.3 [ 0.401445] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver [ 0.401447] ehci-pci: EHCI PCI platform driver [ 0.401544] ehci-pci 0000:00:1a.0: EHCI Host Controller [ 0.401548] ehci-pci 0000:00:1a.0: new USB bus registered, assigned bus number 1 [ 0.401558] ehci-pci 0000:00:1a.0: debug port 2 [ 0.405470] ehci-pci 0000:00:1a.0: cache line size of 64 is not supported [ 0.405481] ehci-pci 0000:00:1a.0: irq 16, io mem 0xf7f18000 [ 0.418782] ehci-pci 0000:00:1a.0: USB 2.0 started, EHCI 1.00 [ 0.418858] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.08 [ 0.418859] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 0.418860] usb usb1: Product: EHCI Host Controller [ 0.418861] usb usb1: Manufacturer: Linux 5.8.0-050800rc6-generic ehci_hcd [ 0.418861] usb usb1: SerialNumber: 0000:00:1a.0 [ 0.419022] hub 1-0:1.0: USB hub found [ 0.419029] hub 1-0:1.0: 2 ports detected [ 0.419235] ehci-pci 0000:00:1d.0: EHCI Host Controller [ 0.419238] ehci-pci 0000:00:1d.0: new USB bus registered, assigned bus number 2 [ 0.419247] ehci-pci 0000:00:1d.0: debug port 2 [ 0.423140] ehci-pci 0000:00:1d.0: cache line size of 64 is not supported [ 0.423147] ehci-pci 0000:00:1d.0: irq 23, io mem 0xf7f17000 [ 0.438781] ehci-pci 0000:00:1d.0: USB 2.0 started, EHCI 1.00 [ 0.438850] usb usb2: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.08 [ 0.438851] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 0.438852] usb usb2: Product: EHCI Host Controller [ 0.438853] usb usb2: Manufacturer: Linux 5.8.0-050800rc6-generic ehci_hcd [ 0.438854] usb usb2: SerialNumber: 0000:00:1d.0 [ 0.439011] hub 2-0:1.0: USB hub found [ 0.439017] hub 2-0:1.0: 2 ports detected [ 0.439137] ehci-platform: EHCI generic platform driver [ 0.439143] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver [ 0.439146] ohci-pci: OHCI PCI platform driver [ 0.439153] ohci-platform: OHCI generic platform driver [ 0.439157] uhci_hcd: USB Universal Host Controller Interface driver [ 0.439197] i8042: PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1 [ 0.439197] i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp [ 0.439680] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 0.439888] mousedev: PS/2 mouse device common for all mice [ 0.440215] rtc_cmos 00:02: RTC can wake from S4 [ 0.440407] rtc_cmos 00:02: registered as rtc0 [ 0.440466] rtc_cmos 00:02: setting system clock to 2020-07-26T12:11:09 UTC (1595765469) [ 0.440478] rtc_cmos 00:02: alarms up to one month, y3k, 242 bytes nvram, hpet irqs [ 0.440483] i2c /dev entries driver [ 0.440513] device-mapper: uevent: version 1.0.3 [ 0.440556] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com [ 0.440571] platform eisa.0: Probing EISA bus 0 [ 0.440572] platform eisa.0: EISA: Cannot allocate resource for mainboard [ 0.440573] platform eisa.0: Cannot allocate resource for EISA slot 1 [ 0.440574] platform eisa.0: Cannot allocate resource for EISA slot 2 [ 0.440574] platform eisa.0: Cannot allocate resource for EISA slot 3 [ 0.440575] platform eisa.0: Cannot allocate resource for EISA slot 4 [ 0.440576] platform eisa.0: Cannot allocate resource for EISA slot 5 [ 0.440577] platform eisa.0: Cannot allocate resource for EISA slot 6 [ 0.440577] platform eisa.0: Cannot allocate resource for EISA slot 7 [ 0.440578] platform eisa.0: Cannot allocate resource for EISA slot 8 [ 0.440579] platform eisa.0: EISA: Detected 0 cards [ 0.440583] intel_pstate: Intel P-state driver initializing [ 0.440811] ledtrig-cpu: registered to indicate activity on CPUs [ 0.440850] drop_monitor: Initializing network drop monitor service [ 0.440988] NET: Registered protocol family 10 [ 0.446148] Segment Routing with IPv6 [ 0.446163] NET: Registered protocol family 17 [ 0.446230] Key type dns_resolver registered [ 0.446481] microcode: sig=0x306a9, pf=0x2, revision=0x21 [ 0.446529] microcode: Microcode Update Driver: v2.2. [ 0.446532] IPI shorthand broadcast: enabled [ 0.446537] sched_clock: Marking stable (446362121, 164421)->(452023091, -5496549) [ 0.446596] registered taskstats version 1 [ 0.446605] Loading compiled-in X.509 certificates [ 0.447191] Loaded X.509 cert 'Build time autogenerated kernel key: f5ed095bb538b9d2a07de73aa8b3b326e45d53f0' [ 0.447219] zswap: loaded using pool lzo/zbud [ 0.447327] Key type ._fscrypt registered [ 0.447328] Key type .fscrypt registered [ 0.447328] Key type fscrypt-provisioning registered [ 0.449435] Key type encrypted registered [ 0.449437] AppArmor: AppArmor sha1 policy hashing enabled [ 0.449442] ima: No TPM chip found, activating TPM-bypass! [ 0.449445] ima: Allocated hash algorithm: sha1 [ 0.449452] ima: No architecture policies found [ 0.449462] evm: Initialising EVM extended attributes: [ 0.449462] evm: security.selinux [ 0.449463] evm: security.SMACK64 [ 0.449463] evm: security.SMACK64EXEC [ 0.449463] evm: security.SMACK64TRANSMUTE [ 0.449464] evm: security.SMACK64MMAP [ 0.449464] evm: security.apparmor [ 0.449464] evm: security.ima [ 0.449464] evm: security.capability [ 0.449465] evm: HMAC attrs: 0x1 [ 0.449711] PM: Magic number: 12:847:178 [ 0.449746] acpi device:0e: hash matches [ 0.449762] platform: hash matches [ 0.449851] RAS: Correctable Errors collector initialized. [ 0.450788] Freeing unused decrypted memory: 2040K [ 0.451226] Freeing unused kernel image (initmem) memory: 2632K [ 0.464247] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input2 [ 0.470785] Write protecting the kernel read-only data: 26624k [ 0.471421] Freeing unused kernel image (text/rodata gap) memory: 2044K [ 0.471711] Freeing unused kernel image (rodata/data gap) memory: 1504K [ 0.511328] x86/mm: Checked W+X mappings: passed, no W+X pages found. [ 0.511329] x86/mm: Checking user space page tables [ 0.550008] x86/mm: Checked W+X mappings: passed, no W+X pages found. [ 0.550011] Run /init as init process [ 0.550012] with arguments: [ 0.550012] /init [ 0.550013] splash [ 0.550013] with environment: [ 0.550014] HOME=/ [ 0.550014] TERM=linux [ 0.550014] BOOT_IMAGE=/boot/vmlinuz-5.8.0-050800rc6-generic [ 0.616201] xhci_hcd 0000:00:14.0: xHCI Host Controller [ 0.616206] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 3 [ 0.617408] xhci_hcd 0000:00:14.0: hcc params 0x20007181 hci version 0x100 quirks 0x000000000000b930 [ 0.617412] xhci_hcd 0000:00:14.0: cache line size of 64 is not supported [ 0.617453] ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000400-0x000000000000047F (\PMIO) (20200528/utaddress-204) [ 0.617458] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 0.617460] ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054F conflicts with OpRegion 0x0000000000000500-0x000000000000057F (\GPR2) (20200528/utaddress-204) [ 0.617463] ACPI Warning: SystemIO range 0x0000000000000540-0x000000000000054F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200528/utaddress-204) [ 0.617465] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 0.617465] ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x000000000000057F (\GPR2) (20200528/utaddress-204) [ 0.617467] ACPI Warning: SystemIO range 0x0000000000000530-0x000000000000053F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200528/utaddress-204) [ 0.617469] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 0.617469] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x000000000000057F (\GPR2) (20200528/utaddress-204) [ 0.617471] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x0000000000000563 (\GPIO) (20200528/utaddress-204) [ 0.617473] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver [ 0.617473] lpc_ich: Resource conflict(s) found affecting gpio_ich [ 0.617550] usb usb3: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.08 [ 0.617551] usb usb3: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 0.617551] usb usb3: Product: xHCI Host Controller [ 0.617552] usb usb3: Manufacturer: Linux 5.8.0-050800rc6-generic xhci-hcd [ 0.617553] usb usb3: SerialNumber: 0000:00:14.0 [ 0.619611] ahci 0000:00:1f.2: version 3.0 [ 0.619698] r8169 0000:05:00.0: can't disable ASPM; OS doesn't have ASPM control [ 0.619813] hub 3-0:1.0: USB hub found [ 0.620778] hub 3-0:1.0: 4 ports detected [ 0.630937] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x3f impl SATA mode [ 0.630939] ahci 0000:00:1f.2: flags: 64bit ncq pm led clo pio slum part ems apst [ 0.636087] i801_smbus 0000:00:1f.3: SMBus using PCI interrupt [ 0.636977] xhci_hcd 0000:00:14.0: xHCI Host Controller [ 0.636980] xhci_hcd 0000:00:14.0: new USB bus registered, assigned bus number 4 [ 0.636982] xhci_hcd 0000:00:14.0: Host supports USB 3.0 SuperSpeed [ 0.637007] i2c i2c-0: 2/4 memory slots populated (from DMI) [ 0.637019] usb usb4: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.08 [ 0.637020] usb usb4: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 0.637021] usb usb4: Product: xHCI Host Controller [ 0.637021] usb usb4: Manufacturer: Linux 5.8.0-050800rc6-generic xhci-hcd [ 0.637022] usb usb4: SerialNumber: 0000:00:14.0 [ 0.637102] hub 4-0:1.0: USB hub found [ 0.637109] hub 4-0:1.0: 4 ports detected [ 0.637356] i2c i2c-0: Successfully instantiated SPD at 0x50 [ 0.637656] i2c i2c-0: Successfully instantiated SPD at 0x51 [ 0.650843] libphy: r8169: probed [ 0.659022] r8169 0000:05:00.0 eth0: RTL8168evl/8111evl, bc:5f:f4:99:82:b4, XID 2c9, IRQ 31 [ 0.659023] r8169 0000:05:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko] [ 0.695313] scsi host0: ahci [ 0.695501] scsi host1: ahci [ 0.695605] scsi host2: ahci [ 0.695702] scsi host3: ahci [ 0.695832] scsi host4: ahci [ 0.695947] scsi host5: ahci [ 0.695978] ata1: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16100 irq 30 [ 0.695979] ata2: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16180 irq 30 [ 0.695981] ata3: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16200 irq 30 [ 0.695982] ata4: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16280 irq 30 [ 0.695983] ata5: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16300 irq 30 [ 0.695984] ata6: SATA max UDMA/133 abar m2048@0xf7f16000 port 0xf7f16380 irq 30 [ 0.696142] ahci 0000:06:00.0: SSS flag set, parallel bus scan disabled [ 0.696180] ahci 0000:06:00.0: AHCI 0001.0200 32 slots 2 ports 6 Gbps 0x3 impl SATA mode [ 0.696181] ahci 0000:06:00.0: flags: 64bit ncq sntf stag led clo pmp pio slum part ccc sxs [ 0.696361] scsi host6: ahci [ 0.696415] scsi host7: ahci [ 0.696446] ata7: SATA max UDMA/133 abar m512@0xf7d00000 port 0xf7d00100 irq 32 [ 0.696448] ata8: SATA max UDMA/133 abar m512@0xf7d00000 port 0xf7d00180 irq 32 [ 0.754782] usb 1-1: new high-speed USB device number 2 using ehci-pci [ 0.774790] usb 2-1: new high-speed USB device number 2 using ehci-pci [ 0.911507] usb 1-1: New USB device found, idVendor=8087, idProduct=0024, bcdDevice= 0.00 [ 0.911508] usb 1-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 0.911849] hub 1-1:1.0: USB hub found [ 0.912053] hub 1-1:1.0: 6 ports detected [ 0.931162] usb 2-1: New USB device found, idVendor=8087, idProduct=0024, bcdDevice= 0.00 [ 0.931165] usb 2-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 [ 0.931557] hub 2-1:1.0: USB hub found [ 0.931651] hub 2-1:1.0: 8 ports detected [ 1.010804] ata7: SATA link down (SStatus 0 SControl 300) [ 1.010808] ata6: SATA link down (SStatus 0 SControl 300) [ 1.010836] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [ 1.010857] ata5: SATA link down (SStatus 0 SControl 300) [ 1.010883] ata2: SATA link down (SStatus 0 SControl 300) [ 1.010895] ata1: SATA link down (SStatus 0 SControl 300) [ 1.010908] ata4: SATA link down (SStatus 0 SControl 300) [ 1.012014] ata3.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded [ 1.012018] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out [ 1.012020] ata3.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out [ 1.047436] ata3.00: ATA-7: ST3360320AS, 3.AAM, max UDMA/133 [ 1.047437] ata3.00: 703282608 sectors, multi 16: LBA48 NCQ (depth 32) [ 1.073177] ata3.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded [ 1.073180] ata3.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out [ 1.073183] ata3.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out [ 1.105743] ata3.00: configured for UDMA/133 [ 1.105861] scsi 2:0:0:0: Direct-Access ATA ST3360320AS M PQ: 0 ANSI: 5 [ 1.106002] sd 2:0:0:0: Attached scsi generic sg0 type 0 [ 1.106029] sd 2:0:0:0: [sda] 703282608 512-byte logical blocks: (360 GB/335 GiB) [ 1.106036] sd 2:0:0:0: [sda] Write Protect is off [ 1.106037] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00 [ 1.106050] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 1.173751] sda: sda1 sda2 [ 1.174077] sd 2:0:0:0: [sda] Attached SCSI disk [ 1.178771] usb 1-1.5: new low-speed USB device number 3 using ehci-pci [ 1.302266] usb 1-1.5: New USB device found, idVendor=045e, idProduct=0040, bcdDevice= 1.21 [ 1.302269] usb 1-1.5: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 1.302279] usb 1-1.5: Product: Microsoft Wheel Mouse Optical® [ 1.302280] usb 1-1.5: Manufacturer: Microsoft [ 1.306529] hid: raw HID events driver (C) Jiri Kosina [ 1.313170] usbcore: registered new interface driver usbhid [ 1.313170] usbhid: USB HID core driver [ 1.315148] input: Microsoft Microsoft Wheel Mouse Optical® as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.5/1-1.5:1.0/0003:045E:0040.0001/input/input3 [ 1.315224] hid-generic 0003:045E:0040.0001: input,hidraw0: USB HID v1.00 Mouse [Microsoft Microsoft Wheel Mouse Optical®] on usb-0000:00:1a.0-1.5/input0 [ 1.366782] tsc: Refined TSC clocksource calibration: 3392.293 MHz [ 1.366789] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x30e5de2a436, max_idle_ns: 440795285127 ns [ 1.366901] clocksource: Switched to clocksource tsc [ 1.382775] usb 1-1.6: new high-speed USB device number 4 using ehci-pci [ 1.405193] ata8: SATA link down (SStatus 0 SControl 300) [ 1.493243] usb 1-1.6: New USB device found, idVendor=05e3, idProduct=0605, bcdDevice= 6.0b [ 1.493244] usb 1-1.6: New USB device strings: Mfr=0, Product=1, SerialNumber=0 [ 1.493245] usb 1-1.6: Product: USB2.0 Hub [ 1.493691] hub 1-1.6:1.0: USB hub found [ 1.494115] hub 1-1.6:1.0: 4 ports detected [ 2.119687] fbcon: Taking over console [ 2.119758] Console: switching to colour frame buffer device 160x64 [ 2.192425] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 4.153346] systemd[1]: Inserted module 'autofs4' [ 4.317155] systemd[1]: systemd 245.4-4ubuntu3.2 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid) [ 4.334876] systemd[1]: Detected architecture x86-64. [ 4.360873] systemd[1]: Set hostname to <utente-desktop>. [ 7.546847] systemd[1]: /lib/systemd/system/dbus.socket:5: ListenStream= references a path below legacy directory /var/run/, updating /var/run/dbus/system_bus_socket → /run/dbus/system_bus_socket; please update the unit file accordingly. [ 8.593193] systemd[1]: Created slice Virtual Machine and Container Slice. [ 8.593492] systemd[1]: Created slice system-modprobe.slice. [ 8.593642] systemd[1]: Created slice User and Session Slice. [ 8.593690] systemd[1]: Started Forward Password Requests to Wall Directory Watch. [ 8.593823] systemd[1]: Set up automount Arbitrary Executable File Formats File System Automount Point. [ 8.593856] systemd[1]: Reached target User and Group Name Lookups. [ 8.593866] systemd[1]: Reached target Remote File Systems. [ 8.593872] systemd[1]: Reached target Slices. [ 8.593888] systemd[1]: Reached target Libvirt guests shutdown. [ 8.593938] systemd[1]: Listening on Device-mapper event daemon FIFOs. [ 8.594003] systemd[1]: Listening on LVM2 poll daemon socket. [ 8.605075] systemd[1]: Listening on Syslog Socket. [ 8.605141] systemd[1]: Listening on fsck to fsckd communication Socket. [ 8.605180] systemd[1]: Listening on initctl Compatibility Named Pipe. [ 8.605314] systemd[1]: Listening on Journal Audit Socket. [ 8.605367] systemd[1]: Listening on Journal Socket (/dev/log). [ 8.605436] systemd[1]: Listening on Journal Socket. [ 8.605529] systemd[1]: Listening on Network Service Netlink Socket. [ 8.605591] systemd[1]: Listening on udev Control Socket. [ 8.605632] systemd[1]: Listening on udev Kernel Socket. [ 8.606314] systemd[1]: Mounting Huge Pages File System... [ 8.607032] systemd[1]: Mounting POSIX Message Queue File System... [ 8.607828] systemd[1]: Mounting Kernel Debug File System... [ 8.608560] systemd[1]: Mounting Kernel Trace File System... [ 8.609756] systemd[1]: Starting Journal Service... [ 8.610486] systemd[1]: Starting Availability of block devices... [ 8.611470] systemd[1]: Starting Set the console keyboard layout... [ 8.612340] systemd[1]: Starting Create list of static device nodes for the current kernel... [ 8.613086] systemd[1]: Starting Monitoring of LVM2 mirrors, snapshots etc. using dmeventd or progress polling... [ 8.613818] systemd[1]: Starting Load Kernel Module drm... [ 8.755368] systemd[1]: Condition check resulted in Set Up Additional Binary Formats being skipped. [ 8.755416] systemd[1]: Condition check resulted in File System Check on Root Device being skipped. [ 8.834411] systemd[1]: Starting Load Kernel Modules... [ 8.835159] systemd[1]: Starting Remount Root and Kernel File Systems... [ 8.835857] systemd[1]: Starting udev Coldplug all Devices... [ 8.836525] systemd[1]: Starting Uncomplicated firewall... [ 8.837906] systemd[1]: Mounted Huge Pages File System. [ 8.838007] systemd[1]: Mounted POSIX Message Queue File System. [ 8.838088] systemd[1]: Mounted Kernel Debug File System. [ 8.838167] systemd[1]: Mounted Kernel Trace File System. [ 8.838502] systemd[1]: Finished Availability of block devices. [ 8.846510] systemd[1]: Finished Create list of static device nodes for the current kernel. [ 9.003539] systemd[1]: Started Journal Service. [ 9.039225] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro [ 9.207828] systemd-journald[295]: Received client request to flush runtime journal. [ 9.534583] lp: driver loaded but no devices found [ 9.675407] ppdev: user-space parallel port driver [ 13.179050] audit: type=1400 audit(1595765482.234:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=388 comm="apparmor_parser" [ 13.179061] audit: type=1400 audit(1595765482.234:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=388 comm="apparmor_parser" [ 13.179063] audit: type=1400 audit(1595765482.234:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=388 comm="apparmor_parser" [ 13.228910] audit: type=1400 audit(1595765482.282:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libreoffice-oopslash" pid=390 comm="apparmor_parser" [ 13.321052] audit: type=1400 audit(1595765482.374:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="unity8-dash" pid=392 comm="apparmor_parser" [ 13.327188] audit: type=1400 audit(1595765482.382:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="content-hub-peer-picker" pid=391 comm="apparmor_parser" [ 13.391780] audit: type=1400 audit(1595765482.446:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/tcpdump" pid=387 comm="apparmor_parser" [ 13.470023] audit: type=1400 audit(1595765482.522:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/cups-browsed" pid=393 comm="apparmor_parser" [ 13.493912] audit: type=1400 audit(1595765482.546:10): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=389 comm="apparmor_parser" [ 13.493923] audit: type=1400 audit(1595765482.546:11): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/sbin/cupsd" pid=389 comm="apparmor_parser" [ 16.818546] at24 0-0050: supply vcc not found, using dummy regulator [ 16.819139] at24 0-0050: 256 byte spd EEPROM, read-only [ 16.819164] at24 0-0051: supply vcc not found, using dummy regulator [ 16.819730] at24 0-0051: 256 byte spd EEPROM, read-only [ 20.037329] RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer [ 20.037330] RAPL PMU: hw unit of domain pp0-core 2^-16 Joules [ 20.037331] RAPL PMU: hw unit of domain package 2^-16 Joules [ 21.044402] [drm] radeon kernel modesetting enabled. [ 21.044450] radeon 0000:01:00.0: SI support disabled by module param [ 21.048448] cryptd: max_cpu_qlen set to 1000 [ 21.477046] AVX version of gcm_enc/dec engaged. [ 21.477048] AES CTR mode by8 optimization enabled [ 21.618260] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <jroedel@suse.de> [ 21.618262] AMD-Vi: AMD IOMMUv2 functionality not available on this system [ 22.281348] [drm] amdgpu kernel modesetting enabled. [ 22.281415] CRAT table not found [ 22.281418] Virtual CRAT table created for CPU [ 22.281432] amdgpu: Topology: Add CPU node [ 22.281502] checking generic (e0000000 500000) vs hw (e0000000 10000000) [ 22.281503] fb0: switching to amdgpudrmfb from VESA VGA [ 22.281577] Console: switching to colour dummy device 80x25 [ 22.281606] amdgpu 0000:01:00.0: vgaarb: deactivate vga console [ 22.281726] [drm] initializing kernel modesetting (TAHITI 0x1002:0x679A 0x174B:0xE207 0x00). [ 22.281728] amdgpu 0000:01:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported [ 22.281734] [drm] register mmio base: 0xF7E00000 [ 22.281734] [drm] register mmio size: 262144 [ 22.281735] [drm] PCIE atomic ops is not supported [ 22.281739] [drm] add ip block number 0 <si_common> [ 22.281739] [drm] add ip block number 1 <gmc_v6_0> [ 22.281740] [drm] add ip block number 2 <si_ih> [ 22.281740] [drm] add ip block number 3 <gfx_v6_0> [ 22.281741] [drm] add ip block number 4 <si_dma> [ 22.281741] [drm] add ip block number 5 <si_dpm> [ 22.281742] [drm] add ip block number 6 <dce_v6_0> [ 22.281743] kfd kfd: TAHITI not supported in kfd [ 22.288950] [drm] BIOS signature incorrect 0 0 [ 22.288955] resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000d0000-0x000d3fff window] [ 22.288958] caller pci_map_rom+0x71/0x18c mapping multiple BARs [ 22.288975] amdgpu: ATOM BIOS: 113-1E207200SA-T47 [ 22.289285] [drm] vm size is 64 GB, 2 levels, block size is 10-bit, fragment size is 9-bit [ 22.380020] snd_hda_intel 0000:01:00.1: Force to non-snoop mode [ 22.490933] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input4 [ 22.490969] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input5 [ 22.490998] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input6 [ 22.491027] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input7 [ 22.491058] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input8 [ 22.491087] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input9 [ 22.813179] amdgpu 0000:01:00.0: amdgpu: VRAM: 3072M 0x000000F400000000 - 0x000000F4BFFFFFFF (3072M used) [ 22.813181] amdgpu 0000:01:00.0: amdgpu: GART: 1024M 0x000000FF00000000 - 0x000000FF3FFFFFFF [ 22.813190] [drm] Detected VRAM RAM=3072M, BAR=256M [ 22.813190] [drm] RAM width 384bits GDDR5 [ 22.813279] [TTM] Zone kernel: Available graphics memory: 4051868 KiB [ 22.813280] [TTM] Zone dma32: Available graphics memory: 2097152 KiB [ 22.813280] [TTM] Initializing pool allocator [ 22.813283] [TTM] Initializing DMA pool allocator [ 22.813315] [drm] amdgpu: 3072M of VRAM memory ready [ 22.813317] [drm] amdgpu: 3072M of GTT memory ready. [ 22.813320] [drm] GART: num cpu pages 262144, num gpu pages 262144 [ 22.813765] amdgpu 0000:01:00.0: amdgpu: PCIE GART of 1024M enabled (table at 0x000000F400500000). [ 22.813811] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 22.828189] intel_rapl_common: Found RAPL domain package [ 22.828190] intel_rapl_common: Found RAPL domain core [ 23.047397] snd_hda_codec_realtek hdaudioC0D0: autoconfig for ALC892: line_outs=3 (0x14/0x15/0x16/0x0/0x0) type:line [ 23.047399] snd_hda_codec_realtek hdaudioC0D0: speaker_outs=0 (0x0/0x0/0x0/0x0/0x0) [ 23.047400] snd_hda_codec_realtek hdaudioC0D0: hp_outs=1 (0x1b/0x0/0x0/0x0/0x0) [ 23.047400] snd_hda_codec_realtek hdaudioC0D0: mono: mono_out=0x0 [ 23.047401] snd_hda_codec_realtek hdaudioC0D0: dig-out=0x1e/0x0 [ 23.047401] snd_hda_codec_realtek hdaudioC0D0: inputs: [ 23.047403] snd_hda_codec_realtek hdaudioC0D0: Front Mic=0x19 [ 23.047404] snd_hda_codec_realtek hdaudioC0D0: Rear Mic=0x18 [ 23.047404] snd_hda_codec_realtek hdaudioC0D0: Line=0x1a [ 23.060290] input: HDA Intel PCH Rear Mic as /devices/pci0000:00/0000:00:1b.0/sound/card0/input10 [ 23.060326] input: HDA Intel PCH Line as /devices/pci0000:00/0000:00:1b.0/sound/card0/input11 [ 23.060356] input: HDA Intel PCH Line Out Front as /devices/pci0000:00/0000:00:1b.0/sound/card0/input12 [ 23.060386] input: HDA Intel PCH Line Out Surround as /devices/pci0000:00/0000:00:1b.0/sound/card0/input13 [ 23.060424] input: HDA Intel PCH Line Out CLFE as /devices/pci0000:00/0000:00:1b.0/sound/card0/input14 [ 23.132188] [drm] Internal thermal controller with fan control [ 23.132195] [drm] amdgpu: dpm initialized [ 23.132231] [drm] AMDGPU Display Connectors [ 23.132231] [drm] Connector 0: [ 23.132232] [drm] DP-1 [ 23.132232] [drm] HPD5 [ 23.132233] [drm] DDC: 0x194c 0x194c 0x194d 0x194d 0x194e 0x194e 0x194f 0x194f [ 23.132233] [drm] Encoders: [ 23.132234] [drm] DFP1: INTERNAL_UNIPHY2 [ 23.132234] [drm] Connector 1: [ 23.132234] [drm] DP-2 [ 23.132235] [drm] HPD4 [ 23.132235] [drm] DDC: 0x1950 0x1950 0x1951 0x1951 0x1952 0x1952 0x1953 0x1953 [ 23.132236] [drm] Encoders: [ 23.132236] [drm] DFP2: INTERNAL_UNIPHY2 [ 23.132236] [drm] Connector 2: [ 23.132236] [drm] HDMI-A-1 [ 23.132237] [drm] HPD1 [ 23.132237] [drm] DDC: 0x1954 0x1954 0x1955 0x1955 0x1956 0x1956 0x1957 0x1957 [ 23.132238] [drm] Encoders: [ 23.132238] [drm] DFP3: INTERNAL_UNIPHY1 [ 23.132238] [drm] Connector 3: [ 23.132238] [drm] DVI-I-1 [ 23.132239] [drm] HPD3 [ 23.132239] [drm] DDC: 0x1960 0x1960 0x1961 0x1961 0x1962 0x1962 0x1963 0x1963 [ 23.132240] [drm] Encoders: [ 23.132240] [drm] DFP4: INTERNAL_UNIPHY [ 23.132240] [drm] CRT1: INTERNAL_KLDSCP_DAC1 [ 23.132527] [drm] PCIE gen 3 link speeds already enabled [ 23.274921] amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 28 [ 23.364927] [drm] fb mappable at 0xE0703000 [ 23.364928] [drm] vram apper at 0xE0000000 [ 23.364929] [drm] size 5242880 [ 23.364929] [drm] fb depth is 24 [ 23.364929] [drm] pitch is 5120 [ 23.365091] fbcon: amdgpudrmfb (fb0) is primary device [ 23.463699] Console: switching to colour frame buffer device 160x64 [ 23.465607] amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device [ 23.736585] [drm] Initialized amdgpu 3.38.0 20150101 for 0000:01:00.0 on minor 0 ... [ 7723.674495] arb_gpu_shader5[114877]: segfault at 7fbb937fe9d0 ip 00007fbbbaad8aab sp 00007fff47d256a0 error 4 in libpthread-2.31.so[7fbbbaad5000+11000] [ 7723.674502] Code: Bad RIP value. [ 7758.485659] arb_enhanced_la[124954]: segfault at 290001 ip 00007f73e6c3ad5a sp 00007ffdbe5d4aa8 error 4 in libc-2.31.so[7f73e6bab000+178000] [ 7758.485664] Code: Bad RIP value. [ 7759.173405] arb_enhanced_la[125230]: segfault at 290001 ip 00007f5ad9fa7d5a sp 00007fffd9aaa1e8 error 4 in libc-2.31.so[7f5ad9f18000+178000] [ 7759.173411] Code: Bad RIP value. [ 7805.053360] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x0006880c [ 7805.053364] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 [ 7805.053365] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0608800C [ 7805.053367] amdgpu 0000:01:00.0: amdgpu: VM fault (0x0c, vmid 3) at page 0, read from '' (0x00000000) (136) [ 7813.142358] [TTM] Failed to find memory space for buffer 0x00000000812205b0 eviction [ 7813.142371] [TTM] No space for 00000000812205b0 (524288 pages, 2097152K, 2048M) [ 7813.142373] [TTM] placement[0]=0x00060002 (1) [ 7813.142374] [TTM] has_type: 1 [ 7813.142374] [TTM] use_type: 1 [ 7813.142375] [TTM] flags: 0x0000000A [ 7813.142376] [TTM] gpu_offset: 0xFF00000000 [ 7813.142376] [TTM] size: 786432 [ 7813.142377] [TTM] available_caching: 0x00070000 [ 7813.142377] [TTM] default_caching: 0x00010000 [ 7813.142379] [TTM] 0x0000000000000400-0x0000000000000402: 2: used [ 7813.142380] [TTM] 0x0000000000000402-0x0000000000000412: 16: used [ 7813.142381] [TTM] 0x0000000000000412-0x0000000000000414: 2: used [ 7813.142382] [TTM] 0x0000000000000414-0x0000000000000416: 2: used [ 7813.142383] [TTM] 0x0000000000000416-0x0000000000000418: 2: used [ 7813.142384] [TTM] 0x0000000000000418-0x000000000000041a: 2: used [ 7813.142384] [TTM] 0x000000000000041a-0x000000000000041c: 2: used [ 7813.142385] [TTM] 0x000000000000041c-0x000000000000051c: 256: used [ 7813.142386] [TTM] 0x000000000000051c-0x000000000000061c: 256: used [ 7813.142387] [TTM] 0x000000000000061c-0x000000000000061e: 2: used [ 7813.142388] [TTM] 0x000000000000061e-0x0000000000000620: 2: used [ 7813.142388] [TTM] 0x0000000000000620-0x0000000000000622: 2: used [ 7813.142389] [TTM] 0x0000000000000622-0x0000000000000624: 2: used [ 7813.142390] [TTM] 0x0000000000000624-0x0000000000000626: 2: used [ 7813.142391] [TTM] 0x0000000000000626-0x0000000000000628: 2: used [ 7813.142391] [TTM] 0x0000000000000628-0x000000000000062a: 2: used [ 7813.142392] [TTM] 0x000000000000062a-0x000000000000062c: 2: used [ 7813.142393] [TTM] 0x000000000000062c-0x000000000000062e: 2: used [ 7813.142393] [TTM] 0x000000000000062e-0x0000000000000630: 2: used [ 7813.142394] [TTM] 0x0000000000000630-0x0000000000000632: 2: used [ 7813.142395] [TTM] 0x0000000000000632-0x0000000000000634: 2: used [ 7813.142395] [TTM] 0x0000000000000634-0x0000000000000636: 2: used [ 7813.142396] [TTM] 0x0000000000000636-0x0000000000000638: 2: used [ 7813.142397] [TTM] 0x0000000000000638-0x000000000000063a: 2: used [ 7813.142398] [TTM] 0x000000000000063a-0x000000000000063c: 2: used [ 7813.142399] [TTM] 0x000000000000063c-0x000000000000063e: 2: used [ 7813.142400] [TTM] 0x000000000000063e-0x000000000000063f: 1: used [ 7813.142400] [TTM] 0x000000000000063f-0x0000000000000641: 2: used [ 7813.142401] [TTM] 0x0000000000000641-0x0000000000000643: 2: used [ 7813.142402] [TTM] 0x0000000000000643-0x0000000000000645: 2: used [ 7813.142402] [TTM] 0x0000000000000645-0x0000000000000647: 2: used [ 7813.142403] [TTM] 0x0000000000000647-0x0000000000000649: 2: used [ 7813.142404] [TTM] 0x0000000000000649-0x000000000000064b: 2: used [ 7813.142405] [TTM] 0x000000000000064b-0x000000000000064d: 2: used [ 7813.142406] [TTM] 0x000000000000064d-0x000000000000064f: 2: used [ 7813.142406] [TTM] 0x000000000000064f-0x0000000000000651: 2: used [ 7813.142407] [TTM] 0x0000000000000651-0x0000000000000653: 2: used [ 7813.142408] [TTM] 0x0000000000000653-0x0000000000000655: 2: used [ 7813.142409] [TTM] 0x0000000000000655-0x0000000000000657: 2: used [ 7813.142409] [TTM] 0x0000000000000657-0x0000000000000659: 2: used [ 7813.142410] [TTM] 0x0000000000000659-0x000000000000065b: 2: used [ 7813.142411] [TTM] 0x000000000000065b-0x0000000000000692: 55: free [ 7813.142411] [TTM] 0x0000000000000692-0x0000000000000694: 2: used [ 7813.142412] [TTM] 0x0000000000000694-0x000000000000070f: 123: free [ 7813.142413] [TTM] 0x000000000000070f-0x0000000000000711: 2: used [ 7813.142413] [TTM] 0x0000000000000711-0x000000000000079c: 139: free [ 7813.142414] [TTM] 0x000000000000079c-0x000000000000079e: 2: used [ 7813.142415] [TTM] 0x000000000000079e-0x00000000000007ee: 80: free [ 7813.142415] [TTM] 0x00000000000007ee-0x00000000000007f0: 2: used [ 7813.142461] [TTM] 0x00000000000007f0-0x00000000000007f2: 2: used [ 7813.142462] [TTM] 0x00000000000007f2-0x00000000000007fe: 12: free [ 7813.142463] [TTM] 0x00000000000007fe-0x0000000000000800: 2: used [ 7813.142463] [TTM] 0x0000000000000800-0x0000000000000806: 6: free [ 7813.142464] [TTM] 0x0000000000000806-0x0000000000000808: 2: used [ 7813.142464] [TTM] 0x0000000000000808-0x000000000000080e: 6: free [ 7813.142465] [TTM] 0x000000000000080e-0x000000000000082e: 32: used [ 7813.142465] [TTM] 0x000000000000082e-0x000000000000083a: 12: free [ 7813.142466] [TTM] 0x000000000000083a-0x000000000000083c: 2: used [ 7813.142467] [TTM] 0x000000000000083c-0x000000000000083e: 2: used [ 7813.142467] [TTM] 0x000000000000083e-0x0000000000000840: 2: used [ 7813.142469] [TTM] 0x0000000000000840-0x0000000000000842: 2: used [ 7813.142469] [TTM] 0x0000000000000842-0x0000000000000844: 2: used [ 7813.142470] [TTM] 0x0000000000000844-0x0000000000000846: 2: used [ 7813.142471] [TTM] 0x0000000000000846-0x0000000000000848: 2: used [ 7813.142472] [TTM] 0x0000000000000848-0x000000000000084a: 2: used [ 7813.142473] [TTM] 0x000000000000084a-0x000000000000084c: 2: used [ 7813.142473] [TTM] 0x000000000000084c-0x000000000000084e: 2: used [ 7813.142474] [TTM] 0x000000000000084e-0x0000000000000850: 2: used [ 7813.142475] [TTM] 0x0000000000000850-0x0000000000000852: 2: used [ 7813.142475] [TTM] 0x0000000000000852-0x0000000000000854: 2: used [ 7813.142476] [TTM] 0x0000000000000854-0x000000000000088a: 54: free [ 7813.142476] [TTM] 0x000000000000088a-0x000000000000088c: 2: used [ 7813.142477] [TTM] 0x000000000000088c-0x0000000000040000: 259956: free [ 7813.142478] [TTM] total: 261120, used 677 free 260443 [ 7813.142479] [TTM] man size:786432 pages, gtt available:260443 pages, usage:2054MB [ 7813.270091] [TTM] Failed to find memory space for buffer 0x00000000812205b0 eviction [ 7813.270104] [TTM] No space for 00000000812205b0 (524288 pages, 2097152K, 2048M) [ 7813.270105] [TTM] placement[0]=0x00060002 (1) [ 7813.270105] [TTM] has_type: 1 [ 7813.270106] [TTM] use_type: 1 [ 7813.270106] [TTM] flags: 0x0000000A [ 7813.270107] [TTM] gpu_offset: 0xFF00000000 [ 7813.270108] [TTM] size: 786432 [ 7813.270108] [TTM] available_caching: 0x00070000 [ 7813.270109] [TTM] default_caching: 0x00010000 [ 7813.270110] [TTM] 0x0000000000000400-0x0000000000000402: 2: used [ 7813.270111] [TTM] 0x0000000000000402-0x0000000000000412: 16: used [ 7813.270112] [TTM] 0x0000000000000412-0x0000000000000414: 2: used [ 7813.270113] [TTM] 0x0000000000000414-0x0000000000000416: 2: used [ 7813.270113] [TTM] 0x0000000000000416-0x0000000000000418: 2: used [ 7813.270114] [TTM] 0x0000000000000418-0x000000000000041a: 2: used [ 7813.270115] [TTM] 0x000000000000041a-0x000000000000041c: 2: used [ 7813.270116] [TTM] 0x000000000000041c-0x000000000000051c: 256: used [ 7813.270116] [TTM] 0x000000000000051c-0x000000000000061c: 256: used [ 7813.270117] [TTM] 0x000000000000061c-0x000000000000061e: 2: used [ 7813.270118] [TTM] 0x000000000000061e-0x0000000000040000: 260578: free [ 7813.270119] [TTM] total: 261120, used 542 free 260578 [ 7813.270120] [TTM] man size:786432 pages, gtt available:261602 pages, usage:2050MB [ 7813.339330] [TTM] Failed to find memory space for buffer 0x00000000812205b0 eviction [ 7813.339339] [TTM] No space for 00000000812205b0 (524288 pages, 2097152K, 2048M) [ 7813.339340] [TTM] placement[0]=0x00060002 (1) [ 7813.339341] [TTM] has_type: 1 [ 7813.339341] [TTM] use_type: 1 [ 7813.339342] [TTM] flags: 0x0000000A [ 7813.339343] [TTM] gpu_offset: 0xFF00000000 [ 7813.339343] [TTM] size: 786432 [ 7813.339344] [TTM] available_caching: 0x00070000 [ 7813.339344] [TTM] default_caching: 0x00010000 [ 7813.339347] [TTM] 0x0000000000000400-0x0000000000000402: 2: used [ 7813.339348] [TTM] 0x0000000000000402-0x0000000000000412: 16: used [ 7813.339348] [TTM] 0x0000000000000412-0x0000000000000414: 2: used [ 7813.339349] [TTM] 0x0000000000000414-0x0000000000000416: 2: used [ 7813.339350] [TTM] 0x0000000000000416-0x0000000000000418: 2: used [ 7813.339350] [TTM] 0x0000000000000418-0x000000000000041a: 2: used [ 7813.339351] [TTM] 0x000000000000041a-0x000000000000041c: 2: used [ 7813.339352] [TTM] 0x000000000000041c-0x000000000000051c: 256: used [ 7813.339353] [TTM] 0x000000000000051c-0x000000000000061c: 256: used [ 7813.339353] [TTM] 0x000000000000061c-0x000000000000061e: 2: used [ 7813.339354] [TTM] 0x000000000000061e-0x0000000000000620: 2: used [ 7813.339355] [TTM] 0x0000000000000620-0x0000000000000622: 2: used [ 7813.339356] [TTM] 0x0000000000000622-0x0000000000000624: 2: used [ 7813.339357] [TTM] 0x0000000000000624-0x0000000000000626: 2: used [ 7813.339357] [TTM] 0x0000000000000626-0x0000000000000628: 2: used [ 7813.339358] [TTM] 0x0000000000000628-0x00000000000006fe: 214: free [ 7813.339359] [TTM] 0x00000000000006fe-0x000000000000071e: 32: used [ 7813.339360] [TTM] 0x000000000000071e-0x000000000000071f: 1: used [ 7813.339360] [TTM] 0x000000000000071f-0x0000000000040000: 260321: free [ 7813.339361] [TTM] total: 261120, used 585 free 260535 [ 7813.339363] [TTM] man size:786432 pages, gtt available:260791 pages, usage:2053MB [ 7813.437505] [TTM] Failed to find memory space for buffer 0x00000000812205b0 eviction [ 7813.437516] [TTM] No space for 00000000812205b0 (524288 pages, 2097152K, 2048M) [ 7813.437517] [TTM] placement[0]=0x00060002 (1) [ 7813.437518] [TTM] has_type: 1 [ 7813.437519] [TTM] use_type: 1 [ 7813.437519] [TTM] flags: 0x0000000A [ 7813.437520] [TTM] gpu_offset: 0xFF00000000 [ 7813.437521] [TTM] size: 786432 [ 7813.437521] [TTM] available_caching: 0x00070000 [ 7813.437522] [TTM] default_caching: 0x00010000 [ 7813.437523] [TTM] 0x0000000000000400-0x0000000000000402: 2: used [ 7813.437524] [TTM] 0x0000000000000402-0x0000000000000412: 16: used [ 7813.437525] [TTM] 0x0000000000000412-0x0000000000000414: 2: used [ 7813.437526] [TTM] 0x0000000000000414-0x0000000000000416: 2: used [ 7813.437527] [TTM] 0x0000000000000416-0x0000000000000418: 2: used [ 7813.437527] [TTM] 0x0000000000000418-0x000000000000041a: 2: used [ 7813.437528] [TTM] 0x000000000000041a-0x000000000000041c: 2: used [ 7813.437529] [TTM] 0x000000000000041c-0x000000000000051c: 256: used [ 7813.437529] [TTM] 0x000000000000051c-0x000000000000061c: 256: used [ 7813.437530] [TTM] 0x000000000000061c-0x000000000000061e: 2: used [ 7813.437531] [TTM] 0x000000000000061e-0x0000000000040000: 260578: free [ 7813.437531] [TTM] total: 261120, used 542 free 260578 [ 7813.437533] [TTM] man size:786432 pages, gtt available:261602 pages, usage:2050MB [ 7813.438518] arb_uniform_buf[143135]: segfault at 0 ip 00007f20b6f990d7 sp 00007ffdebfcc8c8 error 6 in libc-2.31.so[7f20b6eff000+178000] [ 7813.438532] Code: Bad RIP value. [ 7919.344885] arb_shader_stor[146734]: segfault at 0 ip 00007fe2ab5020d7 sp 00007fff6027eda8 error 6 in libc-2.31.so[7fe2ab468000+178000] [ 7919.344894] Code: Bad RIP value. [ 7919.897315] arb_shader_stor[146769]: segfault at 0 ip 00007f10d8fbd0d7 sp 00007ffcf8895608 error 6 in libc-2.31.so[7f10d8f23000+178000] [ 7919.897332] Code: Bad RIP value. [ 8009.208256] egl-copy-buffer[147619]: segfault at 18 ip 00007f968e8c9e9b sp 00007ffe7ca12200 error 4 in libEGL_mesa.so.0.0.0[7f968e8a9000+26000] [ 8009.208263] Code: Bad RIP value. [ 8032.266864] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 [ 8070.875068] [TTM] Buffer eviction failed [ 8080.462745] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x00ce8804 [ 8080.462756] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00134006 [ 8080.462758] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E088004 [ 8080.462759] amdgpu 0000:01:00.0: amdgpu: VM fault (0x04, vmid 7) at page 1261574, read from '' (0x00000000) (136) [ 8080.478266] amdgpu 0000:01:00.0: amdgpu: GPU fault detected: 146 0x00c28804 [ 8080.478271] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00134006 [ 8080.478272] amdgpu 0000:01:00.0: amdgpu: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02088004 [ 8080.478274] amdgpu 0000:01:00.0: amdgpu: VM fault (0x04, vmid 1) at page 1261574, read from '' (0x00000000) (136) [ 8204.864339] shader_runner[168816]: segfault at 7f64df7fe9d0 ip 00007f6506a47aab sp 00007fff3961d340 error 4 [ 8204.864343] shader_runner[168803]: segfault at 7faa7d7fa9d0 ip 00007faa941ccaab sp 00007ffca97f4490 error 4 [ 8204.864345] in libpthread-2.31.so[7f6506a44000+11000] [ 8204.864348] Code: Bad RIP value. [ 8204.864349] in libpthread-2.31.so[7faa941c9000+11000] [ 8204.864351] Code: Bad RIP value. [ 8204.864376] shader_runner[168801]: segfault at 7f12bf7fe9d0 ip 00007f12d3155aab sp 00007fff81846f80 error 4 in libpthread-2.31.so[7f12d3152000+11000] [ 8204.864381] Code: Bad RIP value. [ 8204.864501] shader_runner[168802]: segfault at 7f7225ffb9d0 ip 00007f723cfa4aab sp 00007ffda6a8a890 error 4 in libpthread-2.31.so[7f723cfa1000+11000] [ 8204.864507] Code: Bad RIP value. [ 8207.293001] shader_runner[168847]: segfault at 7f4781ffb9d0 ip 00007f4799379aab sp 00007ffd72820630 error 4 in libpthread-2.31.so[7f4799376000+11000] [ 8207.293009] Code: Bad RIP value. [ 8207.303214] shader_runner[168849]: segfault at 7f01a27fc9d0 ip 00007f01c1c58aab sp 00007ffef3fc31d0 error 4 in libpthread-2.31.so[7f01c1c55000+11000] [ 8207.303220] Code: Bad RIP value. [ 8207.333651] shader_runner[168872]: segfault at 7f84fffff9d0 ip 00007f852f5f4aab sp 00007ffc03821a30 error 4 in libpthread-2.31.so[7f852f5f1000+11000] [ 8207.333656] Code: Bad RIP value. [ 8207.339399] shader_runner[168875]: segfault at 7f5dedffb9d0 ip 00007f5e04e37aab sp 00007ffd41558ad0 error 4 in libpthread-2.31.so[7f5e04e34000+11000] [ 8207.339405] Code: Bad RIP value. [ 8207.515900] shader_runner[168890]: segfault at 7f3e677fe9d0 ip 00007f3e76a5baab sp 00007ffe1bdbfa30 error 4 in libpthread-2.31.so[7f3e76a58000+11000] [ 8207.515907] Code: Bad RIP value. [ 8207.551837] shader_runner[168915]: segfault at 7f14667fc9d0 ip 00007f147dbdbaab sp 00007ffef737bb30 error 4 in libpthread-2.31.so[7f147dbd8000+11000] [ 8207.551842] Code: Bad RIP value. [ 8209.900683] show_signal_msg: 38 callbacks suppressed [ 8209.900686] shader_runner[169450]: segfault at 7fe88d1119d0 ip 00007fe897dc7aab sp 00007fff9a7994e0 error 4 in libpthread-2.31.so[7fe897dc4000+11000] [ 8209.900695] Code: Bad RIP value. [ 8209.958317] shader_runner[169463]: segfault at 7f05d8ff99d0 ip 00007f05e82a9aab sp 00007ffd29495db0 error 4 in libpthread-2.31.so[7f05e82a6000+11000] [ 8209.958323] Code: Bad RIP value. [ 8210.016780] shader_runner[169477]: segfault at 7fd1657fa9d0 ip 00007fd174c58aab sp 00007ffd46a738b0 error 4 in libpthread-2.31.so[7fd174c55000+11000] [ 8210.016787] Code: Bad RIP value. [ 8210.095393] shader_runner[169492]: segfault at 7f8d79d7c9d0 ip 00007f8d84a32aab sp 00007ffe83c7c320 error 4 in libpthread-2.31.so[7f8d84a2f000+11000] [ 8210.095398] Code: Bad RIP value. [ 8210.175068] shader_runner[169506]: segfault at 7f27877fe9d0 ip 00007f27a68b4aab sp 00007ffd39ff79a0 error 4 in libpthread-2.31.so[7f27a68b1000+11000] [ 8210.175075] Code: Bad RIP value. [ 8210.202147] shader_runner[169519]: segfault at 7f315a7fc9d0 ip 00007f316970daab sp 00007ffee6c3a210 error 4 in libpthread-2.31.so[7f316970a000+11000] [ 8210.202156] Code: Bad RIP value. [ 8210.288298] shader_runner[169534]: segfault at 7f7a3cff99d0 ip 00007f7a4c23baab sp 00007ffc087caeb0 error 4 in libpthread-2.31.so[7f7a4c238000+11000] [ 8210.288303] Code: Bad RIP value. [ 8210.329530] shader_runner[169547]: segfault at 7f63f57fa9d0 ip 00007f6404af5aab sp 00007ffdf3e7f790 error 4 in libpthread-2.31.so[7f6404af2000+11000] [ 8210.329536] Code: Bad RIP value. [ 8210.412320] shader_runner[169562]: segfault at 7f622471f9d0 ip 00007f622f3d5aab sp 00007fff6f38f6b0 error 4 in libpthread-2.31.so[7f622f3d2000+11000] [ 8210.412325] Code: Bad RIP value. [ 8210.455261] shader_runner[169575]: segfault at 7f0d177fe9d0 ip 00007f0d2e351aab sp 00007fff77b01400 error 4 in libpthread-2.31.so[7f0d2e34e000+11000] [ 8210.455269] Code: Bad RIP value. [ 8218.886289] show_signal_msg: 27 callbacks suppressed [ 8218.886292] shader_runner[172286]: segfault at 56393e81e408 ip 00007f4feb9a3ed9 sp 00007ffe74015800 error 4 in radeonsi_dri.so[7f4feb6ad000+d49000] [ 8218.886297] Code: Bad RIP value. [ 8218.899687] shader_runner[172285]: segfault at 563750011378 ip 00007ff7236e4ed9 sp 00007ffe2e978e10 error 4 in radeonsi_dri.so[7ff7233ee000+d49000] [ 8218.899692] Code: Bad RIP value. [ 8219.001985] shader_runner[172334]: segfault at 5623ce8c4848 ip 00007fa239f2bed9 sp 00007ffcaf7c4170 error 4 in radeonsi_dri.so[7fa239c35000+d49000] [ 8219.001991] Code: Bad RIP value. [ 8219.490115] shader_runner[172514]: segfault at 55f2d3009314 ip 00007fad22647500 sp 00007ffe441c0120 error 4 in radeonsi_dri.so[7fad2234f000+d49000] [ 8219.490123] Code: Bad RIP value. [ 8219.491095] shader_runner[172516]: segfault at 563bd86d20a4 ip 00007fb9e40f9500 sp 00007ffcd77518b0 error 4 in radeonsi_dri.so[7fb9e3e01000+d49000] [ 8219.491101] Code: Bad RIP value. [ 8219.711083] shader_runner[172588]: segfault at 55ca9ae686a4 ip 00007fe140555500 sp 00007ffe9cae1400 error 4 in radeonsi_dri.so[7fe14025d000+d49000] [ 8219.711090] Code: Bad RIP value. [ 8430.203633] perf: interrupt took too long (3138 > 3133), lowering kernel.perf_event_max_sample_rate to 63500 [ 9055.012725] audit: type=1400 audit(1595774523.846:84): apparmor="ALLOWED" operation="open" profile="libreoffice-soffice" name="/usr/share/libdrm/amdgpu.ids" pid=383072 comm="soffice.bin" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0 [-- Attachment #3: piglit_tests_amddcsi.ods --] [-- Type: application/vnd.oasis.opendocument.spreadsheet, Size: 34347 bytes --] [-- Attachment #4: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-26 15:31 ` Re: Mauro Rossi @ 2020-07-27 18:31 ` Alex Deucher 2020-07-27 19:46 ` Re: Mauro Rossi 0 siblings, 1 reply; 50+ messages in thread From: Alex Deucher @ 2020-07-27 18:31 UTC (permalink / raw) To: Mauro Rossi Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list On Sun, Jul 26, 2020 at 11:31 AM Mauro Rossi <issor.oruam@gmail.com> wrote: > > Hello, > > On Fri, Jul 24, 2020 at 8:31 PM Alex Deucher <alexdeucher@gmail.com> wrote: >> >> On Wed, Jul 22, 2020 at 3:57 AM Mauro Rossi <issor.oruam@gmail.com> wrote: >> > >> > Hello, >> > re-sending and copying full DL >> > >> > On Wed, Jul 22, 2020 at 4:51 AM Alex Deucher <alexdeucher@gmail.com> wrote: >> >> >> >> On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> wrote: >> >> > >> >> > Hi Christian, >> >> > >> >> > On Mon, Jul 20, 2020 at 11:00 AM Christian König >> >> > <ckoenig.leichtzumerken@gmail.com> wrote: >> >> > > >> >> > > Hi Mauro, >> >> > > >> >> > > I'm not deep into the whole DC design, so just some general high level >> >> > > comments on the cover letter: >> >> > > >> >> > > 1. Please add a subject line to the cover letter, my spam filter thinks >> >> > > that this is suspicious otherwise. >> >> > >> >> > My mistake in the editing of covert letter with git send-email, >> >> > I may have forgot to keep the Subject at the top >> >> > >> >> > > >> >> > > 2. Then you should probably note how well (badly?) is that tested. Since >> >> > > you noted proof of concept it might not even work. >> >> > >> >> > The Changelog is to be read as: >> >> > >> >> > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was >> >> > just a rebase onto amd-staging-drm-next >> >> > >> >> > this series [PATCH v3] has all the known changes required for DCE6 specificity >> >> > and based on a long offline thread with Alexander Deutcher and past >> >> > dri-devel chats with Harry Wentland. >> >> > >> >> > It was tested for my possibilities of testing with HD7750 and HD7950, >> >> > with checks in dmesg output for not getting "missing registers/masks" >> >> > kernel WARNING >> >> > and with kernel build on Ubuntu 20.04 and with android-x86 >> >> > >> >> > The proposal I made to Alex is that AMD testing systems will be used >> >> > for further regression testing, >> >> > as part of review and validation for eligibility to amd-staging-drm-next >> >> > >> >> >> >> We will certainly test it once it lands, but presumably this is >> >> working on the SI cards you have access to? >> > >> > >> > Yes, most of my testing was done with android-x86 Android CTS (EGL, GLES2, GLES3, VK) >> > >> > I am also in contact with a person with Firepro W5130M who is running a piglit session >> > >> > I had bought an HD7850 to test with Pitcairn, but it arrived as defective so I could not test with Pitcair >> > >> > >> >> >> >> > > >> >> > > 3. How feature complete (HDMI audio?, Freesync?) is it? >> >> > >> >> > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to >> >> > DCE6 (dc/dce60 path) in the last two years from initial submission >> >> > >> >> > > >> >> > > Apart from that it looks like a rather impressive piece of work :) >> >> > > >> >> > > Cheers, >> >> > > Christian. >> >> > >> >> > Thanks, >> >> > please consider that most of the latest DCE6 specific parts were >> >> > possible due to recent Alex support in getting the correct DCE6 >> >> > headers, >> >> > his suggestions and continuous feedback. >> >> > >> >> > I would suggest that Alex comments on the proposed next steps to follow. >> >> >> >> The code looks pretty good to me. I'd like to get some feedback from >> >> the display team to see if they have any concerns, but beyond that I >> >> think we can pull it into the tree and continue improving it there. >> >> Do you have a link to a git tree I can pull directly that contains >> >> these patches? Is this the right branch? >> >> https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next >> >> >> >> Thanks! >> >> >> >> Alex >> > >> > >> > The following branch was pushed with the series on top of amd-staging-drm-next >> > >> > https://github.com/maurossi/linux/commits/kernel-5.6_si_drm-next >> >> I gave this a quick test on all of the SI asics and the various >> monitors I had available and it looks good. A few minor patches I >> noticed are attached. If they look good to you, I'll squash them into >> the series when I commit it. I've pushed it to my fdo tree as well: >> https://cgit.freedesktop.org/~agd5f/linux/log/?h=si_dc_support >> >> Thanks! >> >> Alex > > > The new patches are ok and with the following infomation about piglit tests, > the series may be good to go. > > I have performed piglit tests on Tahiti HD7950 on kernel 5.8.0-rc6 with AMD DC support for SI > and comparison with vanilla kernel 5.8.0-rc6 > > Results are the following > > [piglit gpu tests with kernel 5.8.0-rc6-amddcsi] > > utente@utente-desktop:~/piglit$ ./piglit run gpu . > [26714/26714] skip: 1731, pass: 24669, warn: 15, fail: 288, crash: 11 > Thank you for running Piglit! > Results have been written to /home/utente/piglit > > [piglit gpu tests with vanilla 5.8.0-rc6] > > utente@utente-desktop:~/piglit$ ./piglit run gpu . > [26714/26714] skip: 1731, pass: 24673, warn: 13, fail: 283, crash: 14 > Thank you for running Piglit! > Results have been written to /home/utente/piglit > > In the attachment the comparison of "5.8.0-rc6-amddcsi" vs "5.8.0-rc6" vanilla > and viceversa, I see no significant regression and in the delta of failed tests I don't recognize DC related test cases, > but you may also have a look. Looks good to me. The series is: Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > > dmesg for "5.8.0-rc6-amddcsi" is also provide the check the crashes > > Regarding the other user testing the series with Firepro W5130M > he found an already existing issue in amdgpu si_support=1 which is independent from my series and matches a problem alrady reported. [1] > amdgpu does not currently implement GPU reset support for SI. Alex > Mauro > > [1] https://bbs.archlinux.org/viewtopic.php?id=249097 > >> >> >> > >> >> >> >> >> >> > >> >> > Mauro >> >> > >> >> > > >> >> > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: >> >> > > > The series adds SI support to AMD DC >> >> > > > >> >> > > > Changelog: >> >> > > > >> >> > > > [RFC] >> >> > > > Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c >> >> > > > >> >> > > > [PATCH v2] >> >> > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 >> >> > > > >> >> > > > [PATCH v3] >> >> > > > Add support for DCE6 specific headers, >> >> > > > ad hoc DCE6 macros, funtions and fixes, >> >> > > > rebase on current amd-staging-drm-next >> >> > > > >> >> > > > >> >> > > > Commits [01/27]..[08/27] SI support added in various DC components >> >> > > > >> >> > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) >> >> > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts >> >> > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) >> >> > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) >> >> > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 >> >> > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) >> >> > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) >> >> > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) >> >> > > > >> >> > > > Commits [09/27]..[24/27] DCE6 specific code adaptions >> >> > > > >> >> > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) >> >> > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 >> >> > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions >> >> > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros >> >> > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions >> >> > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions >> >> > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions >> >> > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions >> >> > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions >> >> > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions >> >> > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) >> >> > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init >> >> > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions >> >> > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock >> >> > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions >> >> > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) >> >> > > > >> >> > > > >> >> > > > Commits [25/27]..[27/27] SI support final enablements >> >> > > > >> >> > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later >> >> > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) >> >> > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) >> >> > > > >> >> > > > >> >> > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> >> >> > > > >> >> > > > _______________________________________________ >> >> > > > amd-gfx mailing list >> >> > > > amd-gfx@lists.freedesktop.org >> >> > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> > > >> >> > _______________________________________________ >> >> > amd-gfx mailing list >> >> > amd-gfx@lists.freedesktop.org >> >> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-27 18:31 ` Re: Alex Deucher @ 2020-07-27 19:46 ` Mauro Rossi 2020-07-27 19:54 ` Re: Alex Deucher 0 siblings, 1 reply; 50+ messages in thread From: Mauro Rossi @ 2020-07-27 19:46 UTC (permalink / raw) To: Alex Deucher Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list [-- Attachment #1.1: Type: text/plain, Size: 10849 bytes --] On Mon, Jul 27, 2020 at 8:31 PM Alex Deucher <alexdeucher@gmail.com> wrote: > On Sun, Jul 26, 2020 at 11:31 AM Mauro Rossi <issor.oruam@gmail.com> > wrote: > > > > Hello, > > > > On Fri, Jul 24, 2020 at 8:31 PM Alex Deucher <alexdeucher@gmail.com> > wrote: > >> > >> On Wed, Jul 22, 2020 at 3:57 AM Mauro Rossi <issor.oruam@gmail.com> > wrote: > >> > > >> > Hello, > >> > re-sending and copying full DL > >> > > >> > On Wed, Jul 22, 2020 at 4:51 AM Alex Deucher <alexdeucher@gmail.com> > wrote: > >> >> > >> >> On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> > wrote: > >> >> > > >> >> > Hi Christian, > >> >> > > >> >> > On Mon, Jul 20, 2020 at 11:00 AM Christian König > >> >> > <ckoenig.leichtzumerken@gmail.com> wrote: > >> >> > > > >> >> > > Hi Mauro, > >> >> > > > >> >> > > I'm not deep into the whole DC design, so just some general high > level > >> >> > > comments on the cover letter: > >> >> > > > >> >> > > 1. Please add a subject line to the cover letter, my spam filter > thinks > >> >> > > that this is suspicious otherwise. > >> >> > > >> >> > My mistake in the editing of covert letter with git send-email, > >> >> > I may have forgot to keep the Subject at the top > >> >> > > >> >> > > > >> >> > > 2. Then you should probably note how well (badly?) is that > tested. Since > >> >> > > you noted proof of concept it might not even work. > >> >> > > >> >> > The Changelog is to be read as: > >> >> > > >> >> > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] > was > >> >> > just a rebase onto amd-staging-drm-next > >> >> > > >> >> > this series [PATCH v3] has all the known changes required for DCE6 > specificity > >> >> > and based on a long offline thread with Alexander Deutcher and past > >> >> > dri-devel chats with Harry Wentland. > >> >> > > >> >> > It was tested for my possibilities of testing with HD7750 and > HD7950, > >> >> > with checks in dmesg output for not getting "missing > registers/masks" > >> >> > kernel WARNING > >> >> > and with kernel build on Ubuntu 20.04 and with android-x86 > >> >> > > >> >> > The proposal I made to Alex is that AMD testing systems will be > used > >> >> > for further regression testing, > >> >> > as part of review and validation for eligibility to > amd-staging-drm-next > >> >> > > >> >> > >> >> We will certainly test it once it lands, but presumably this is > >> >> working on the SI cards you have access to? > >> > > >> > > >> > Yes, most of my testing was done with android-x86 Android CTS (EGL, > GLES2, GLES3, VK) > >> > > >> > I am also in contact with a person with Firepro W5130M who is running > a piglit session > >> > > >> > I had bought an HD7850 to test with Pitcairn, but it arrived as > defective so I could not test with Pitcair > >> > > >> > > >> >> > >> >> > > > >> >> > > 3. How feature complete (HDMI audio?, Freesync?) is it? > >> >> > > >> >> > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to > >> >> > DCE6 (dc/dce60 path) in the last two years from initial submission > >> >> > > >> >> > > > >> >> > > Apart from that it looks like a rather impressive piece of work > :) > >> >> > > > >> >> > > Cheers, > >> >> > > Christian. > >> >> > > >> >> > Thanks, > >> >> > please consider that most of the latest DCE6 specific parts were > >> >> > possible due to recent Alex support in getting the correct DCE6 > >> >> > headers, > >> >> > his suggestions and continuous feedback. > >> >> > > >> >> > I would suggest that Alex comments on the proposed next steps to > follow. > >> >> > >> >> The code looks pretty good to me. I'd like to get some feedback from > >> >> the display team to see if they have any concerns, but beyond that I > >> >> think we can pull it into the tree and continue improving it there. > >> >> Do you have a link to a git tree I can pull directly that contains > >> >> these patches? Is this the right branch? > >> >> https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next > >> >> > >> >> Thanks! > >> >> > >> >> Alex > >> > > >> > > >> > The following branch was pushed with the series on top of > amd-staging-drm-next > >> > > >> > https://github.com/maurossi/linux/commits/kernel-5.6_si_drm-next > >> > >> I gave this a quick test on all of the SI asics and the various > >> monitors I had available and it looks good. A few minor patches I > >> noticed are attached. If they look good to you, I'll squash them into > >> the series when I commit it. I've pushed it to my fdo tree as well: > >> https://cgit.freedesktop.org/~agd5f/linux/log/?h=si_dc_support > >> > >> Thanks! > >> > >> Alex > > > > > > The new patches are ok and with the following infomation about piglit > tests, > > the series may be good to go. > > > > I have performed piglit tests on Tahiti HD7950 on kernel 5.8.0-rc6 with > AMD DC support for SI > > and comparison with vanilla kernel 5.8.0-rc6 > > > > Results are the following > > > > [piglit gpu tests with kernel 5.8.0-rc6-amddcsi] > > > > utente@utente-desktop:~/piglit$ ./piglit run gpu . > > [26714/26714] skip: 1731, pass: 24669, warn: 15, fail: 288, crash: 11 > > Thank you for running Piglit! > > Results have been written to /home/utente/piglit > > > > [piglit gpu tests with vanilla 5.8.0-rc6] > > > > utente@utente-desktop:~/piglit$ ./piglit run gpu . > > [26714/26714] skip: 1731, pass: 24673, warn: 13, fail: 283, crash: 14 > > Thank you for running Piglit! > > Results have been written to /home/utente/piglit > > > > In the attachment the comparison of "5.8.0-rc6-amddcsi" vs "5.8.0-rc6" > vanilla > > and viceversa, I see no significant regression and in the delta of > failed tests I don't recognize DC related test cases, > > but you may also have a look. > > Looks good to me. The series is: > Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > Thank you Alex for review and the help in finalizing the series and to Harry who initially encouraged me and provided the feedbacks to previous v2 series > > > > > dmesg for "5.8.0-rc6-amddcsi" is also provide the check the crashes > > > > Regarding the other user testing the series with Firepro W5130M > > he found an already existing issue in amdgpu si_support=1 which is > independent from my series and matches a problem alrady reported. [1] > > > > amdgpu does not currently implement GPU reset support for SI. > > Alex > If you have in the plans to add support and prevent those crashes, the user would be glad to be available for glxgears and piglit testing on Firepro W5130M Please let me know Mauro > > > Mauro > > > > [1] https://bbs.archlinux.org/viewtopic.php?id=249097 > > > >> > >> > >> > > >> >> > >> >> > >> >> > > >> >> > Mauro > >> >> > > >> >> > > > >> >> > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: > >> >> > > > The series adds SI support to AMD DC > >> >> > > > > >> >> > > > Changelog: > >> >> > > > > >> >> > > > [RFC] > >> >> > > > Preliminar Proof Of Concept, with DCE8 headers still used in > dce60_resources.c > >> >> > > > > >> >> > > > [PATCH v2] > >> >> > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 > >> >> > > > > >> >> > > > [PATCH v3] > >> >> > > > Add support for DCE6 specific headers, > >> >> > > > ad hoc DCE6 macros, funtions and fixes, > >> >> > > > rebase on current amd-staging-drm-next > >> >> > > > > >> >> > > > > >> >> > > > Commits [01/27]..[08/27] SI support added in various DC > components > >> >> > > > > >> >> > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers > (v6) > >> >> > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts > >> >> > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 > support (v9b) > >> >> > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support > (v2) > >> >> > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 > >> >> > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for > DCE6 (v2) > >> >> > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 > (v4) > >> >> > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support > (v4) > >> >> > > > > >> >> > > > Commits [09/27]..[24/27] DCE6 specific code adaptions > >> >> > > > > >> >> > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for > SI parts (v2) > >> >> > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set > max_cursor_size to 64 > >> >> > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific > macros,functions > >> >> > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific > macros > >> >> > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific > macros,functions > >> >> > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific > macros,functions > >> >> > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 > specific macros,functions > >> >> > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 > specific macros,functions > >> >> > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific > macros,functions > >> >> > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 > specific macros,functions > >> >> > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers > (v7) > >> >> > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling > Horizontal Filter Init > >> >> > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 > macros,functions > >> >> > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 > specific .cursor_lock > >> >> > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add > DCE6 specific functions > >> >> > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers > (v6) > >> >> > > > > >> >> > > > > >> >> > > > Commits [25/27]..[27/27] SI support final enablements > >> >> > > > > >> >> > > > [PATCH v3 25/27] drm/amd/display: create plane rotation > property for Bonarie and later > >> >> > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts > (v2) > >> >> > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the > Kconfig (v2) > >> >> > > > > >> >> > > > > >> >> > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> > >> >> > > > > >> >> > > > _______________________________________________ > >> >> > > > amd-gfx mailing list > >> >> > > > amd-gfx@lists.freedesktop.org > >> >> > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >> >> > > > >> >> > _______________________________________________ > >> >> > amd-gfx mailing list > >> >> > amd-gfx@lists.freedesktop.org > >> >> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > [-- Attachment #1.2: Type: text/html, Size: 16193 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: 2020-07-27 19:46 ` Re: Mauro Rossi @ 2020-07-27 19:54 ` Alex Deucher 0 siblings, 0 replies; 50+ messages in thread From: Alex Deucher @ 2020-07-27 19:54 UTC (permalink / raw) To: Mauro Rossi Cc: Deucher, Alexander, Harry Wentland, Christian Koenig, amd-gfx list On Mon, Jul 27, 2020 at 3:46 PM Mauro Rossi <issor.oruam@gmail.com> wrote: > > > > On Mon, Jul 27, 2020 at 8:31 PM Alex Deucher <alexdeucher@gmail.com> wrote: >> >> On Sun, Jul 26, 2020 at 11:31 AM Mauro Rossi <issor.oruam@gmail.com> wrote: >> > >> > Hello, >> > >> > On Fri, Jul 24, 2020 at 8:31 PM Alex Deucher <alexdeucher@gmail.com> wrote: >> >> >> >> On Wed, Jul 22, 2020 at 3:57 AM Mauro Rossi <issor.oruam@gmail.com> wrote: >> >> > >> >> > Hello, >> >> > re-sending and copying full DL >> >> > >> >> > On Wed, Jul 22, 2020 at 4:51 AM Alex Deucher <alexdeucher@gmail.com> wrote: >> >> >> >> >> >> On Mon, Jul 20, 2020 at 6:00 AM Mauro Rossi <issor.oruam@gmail.com> wrote: >> >> >> > >> >> >> > Hi Christian, >> >> >> > >> >> >> > On Mon, Jul 20, 2020 at 11:00 AM Christian König >> >> >> > <ckoenig.leichtzumerken@gmail.com> wrote: >> >> >> > > >> >> >> > > Hi Mauro, >> >> >> > > >> >> >> > > I'm not deep into the whole DC design, so just some general high level >> >> >> > > comments on the cover letter: >> >> >> > > >> >> >> > > 1. Please add a subject line to the cover letter, my spam filter thinks >> >> >> > > that this is suspicious otherwise. >> >> >> > >> >> >> > My mistake in the editing of covert letter with git send-email, >> >> >> > I may have forgot to keep the Subject at the top >> >> >> > >> >> >> > > >> >> >> > > 2. Then you should probably note how well (badly?) is that tested. Since >> >> >> > > you noted proof of concept it might not even work. >> >> >> > >> >> >> > The Changelog is to be read as: >> >> >> > >> >> >> > [RFC] was the initial Proof of concept was the RFC and [PATCH v2] was >> >> >> > just a rebase onto amd-staging-drm-next >> >> >> > >> >> >> > this series [PATCH v3] has all the known changes required for DCE6 specificity >> >> >> > and based on a long offline thread with Alexander Deutcher and past >> >> >> > dri-devel chats with Harry Wentland. >> >> >> > >> >> >> > It was tested for my possibilities of testing with HD7750 and HD7950, >> >> >> > with checks in dmesg output for not getting "missing registers/masks" >> >> >> > kernel WARNING >> >> >> > and with kernel build on Ubuntu 20.04 and with android-x86 >> >> >> > >> >> >> > The proposal I made to Alex is that AMD testing systems will be used >> >> >> > for further regression testing, >> >> >> > as part of review and validation for eligibility to amd-staging-drm-next >> >> >> > >> >> >> >> >> >> We will certainly test it once it lands, but presumably this is >> >> >> working on the SI cards you have access to? >> >> > >> >> > >> >> > Yes, most of my testing was done with android-x86 Android CTS (EGL, GLES2, GLES3, VK) >> >> > >> >> > I am also in contact with a person with Firepro W5130M who is running a piglit session >> >> > >> >> > I had bought an HD7850 to test with Pitcairn, but it arrived as defective so I could not test with Pitcair >> >> > >> >> > >> >> >> >> >> >> > > >> >> >> > > 3. How feature complete (HDMI audio?, Freesync?) is it? >> >> >> > >> >> >> > All the changes in DC impacting DCE8 (dc/dce80 path) were ported to >> >> >> > DCE6 (dc/dce60 path) in the last two years from initial submission >> >> >> > >> >> >> > > >> >> >> > > Apart from that it looks like a rather impressive piece of work :) >> >> >> > > >> >> >> > > Cheers, >> >> >> > > Christian. >> >> >> > >> >> >> > Thanks, >> >> >> > please consider that most of the latest DCE6 specific parts were >> >> >> > possible due to recent Alex support in getting the correct DCE6 >> >> >> > headers, >> >> >> > his suggestions and continuous feedback. >> >> >> > >> >> >> > I would suggest that Alex comments on the proposed next steps to follow. >> >> >> >> >> >> The code looks pretty good to me. I'd like to get some feedback from >> >> >> the display team to see if they have any concerns, but beyond that I >> >> >> think we can pull it into the tree and continue improving it there. >> >> >> Do you have a link to a git tree I can pull directly that contains >> >> >> these patches? Is this the right branch? >> >> >> https://github.com/maurossi/linux/commits/kernel-5.8rc4_si_next >> >> >> >> >> >> Thanks! >> >> >> >> >> >> Alex >> >> > >> >> > >> >> > The following branch was pushed with the series on top of amd-staging-drm-next >> >> > >> >> > https://github.com/maurossi/linux/commits/kernel-5.6_si_drm-next >> >> >> >> I gave this a quick test on all of the SI asics and the various >> >> monitors I had available and it looks good. A few minor patches I >> >> noticed are attached. If they look good to you, I'll squash them into >> >> the series when I commit it. I've pushed it to my fdo tree as well: >> >> https://cgit.freedesktop.org/~agd5f/linux/log/?h=si_dc_support >> >> >> >> Thanks! >> >> >> >> Alex >> > >> > >> > The new patches are ok and with the following infomation about piglit tests, >> > the series may be good to go. >> > >> > I have performed piglit tests on Tahiti HD7950 on kernel 5.8.0-rc6 with AMD DC support for SI >> > and comparison with vanilla kernel 5.8.0-rc6 >> > >> > Results are the following >> > >> > [piglit gpu tests with kernel 5.8.0-rc6-amddcsi] >> > >> > utente@utente-desktop:~/piglit$ ./piglit run gpu . >> > [26714/26714] skip: 1731, pass: 24669, warn: 15, fail: 288, crash: 11 >> > Thank you for running Piglit! >> > Results have been written to /home/utente/piglit >> > >> > [piglit gpu tests with vanilla 5.8.0-rc6] >> > >> > utente@utente-desktop:~/piglit$ ./piglit run gpu . >> > [26714/26714] skip: 1731, pass: 24673, warn: 13, fail: 283, crash: 14 >> > Thank you for running Piglit! >> > Results have been written to /home/utente/piglit >> > >> > In the attachment the comparison of "5.8.0-rc6-amddcsi" vs "5.8.0-rc6" vanilla >> > and viceversa, I see no significant regression and in the delta of failed tests I don't recognize DC related test cases, >> > but you may also have a look. >> >> Looks good to me. The series is: >> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> > > > Thank you Alex for review and the help in finalizing the series > and to Harry who initially encouraged me and provided the feedbacks to previous v2 series > Thanks for sticking with this! > >> >> >> > >> > dmesg for "5.8.0-rc6-amddcsi" is also provide the check the crashes >> > >> > Regarding the other user testing the series with Firepro W5130M >> > he found an already existing issue in amdgpu si_support=1 which is independent from my series and matches a problem alrady reported. [1] >> > >> >> amdgpu does not currently implement GPU reset support for SI. >> >> Alex > > > If you have in the plans to add support and prevent those crashes, > the user would be glad to be available for glxgears and piglit testing on Firepro W5130M Initial patch here: https://patchwork.freedesktop.org/patch/380648/ Alex > > Please let me know > > Mauro > >> >> >> > Mauro >> > >> > [1] https://bbs.archlinux.org/viewtopic.php?id=249097 >> > >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> > >> >> >> > Mauro >> >> >> > >> >> >> > > >> >> >> > > Am 16.07.20 um 23:22 schrieb Mauro Rossi: >> >> >> > > > The series adds SI support to AMD DC >> >> >> > > > >> >> >> > > > Changelog: >> >> >> > > > >> >> >> > > > [RFC] >> >> >> > > > Preliminar Proof Of Concept, with DCE8 headers still used in dce60_resources.c >> >> >> > > > >> >> >> > > > [PATCH v2] >> >> >> > > > Rebase on amd-staging-drm-next dated 17-Oct-2018 >> >> >> > > > >> >> >> > > > [PATCH v3] >> >> >> > > > Add support for DCE6 specific headers, >> >> >> > > > ad hoc DCE6 macros, funtions and fixes, >> >> >> > > > rebase on current amd-staging-drm-next >> >> >> > > > >> >> >> > > > >> >> >> > > > Commits [01/27]..[08/27] SI support added in various DC components >> >> >> > > > >> >> >> > > > [PATCH v3 01/27] drm/amdgpu: add some required DCE6 registers (v6) >> >> >> > > > [PATCH v3 02/27] drm/amd/display: add asics info for SI parts >> >> >> > > > [PATCH v3 03/27] drm/amd/display: dc/dce: add initial DCE6 support (v9b) >> >> >> > > > [PATCH v3 04/27] drm/amd/display: dc/core: add SI/DCE6 support (v2) >> >> >> > > > [PATCH v3 05/27] drm/amd/display: dc/bios: add support for DCE6 >> >> >> > > > [PATCH v3 06/27] drm/amd/display: dc/gpio: add support for DCE6 (v2) >> >> >> > > > [PATCH v3 07/27] drm/amd/display: dc/irq: add support for DCE6 (v4) >> >> >> > > > [PATCH v3 08/27] drm/amd/display: amdgpu_dm: add SI support (v4) >> >> >> > > > >> >> >> > > > Commits [09/27]..[24/27] DCE6 specific code adaptions >> >> >> > > > >> >> >> > > > [PATCH v3 09/27] drm/amd/display: dc/clk_mgr: add support for SI parts (v2) >> >> >> > > > [PATCH v3 10/27] drm/amd/display: dc/dce60: set max_cursor_size to 64 >> >> >> > > > [PATCH v3 11/27] drm/amd/display: dce_audio: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 12/27] drm/amd/display: dce_dmcu: add DCE6 specific macros >> >> >> > > > [PATCH v3 13/27] drm/amd/display: dce_hwseq: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 14/27] drm/amd/display: dce_ipp: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 15/27] drm/amd/display: dce_link_encoder: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 16/27] drm/amd/display: dce_mem_input: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 17/27] drm/amd/display: dce_opp: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 18/27] drm/amd/display: dce_transform: add DCE6 specific macros,functions >> >> >> > > > [PATCH v3 19/27] drm/amdgpu: add some required DCE6 registers (v7) >> >> >> > > > [PATCH v3 20/27] drm/amd/display: dce_transform: DCE6 Scaling Horizontal Filter Init >> >> >> > > > [PATCH v3 21/27] drm/amd/display: dce60_hw_sequencer: add DCE6 macros,functions >> >> >> > > > [PATCH v3 22/27] drm/amd/display: dce60_hw_sequencer: add DCE6 specific .cursor_lock >> >> >> > > > [PATCH v3 23/27] drm/amd/display: dce60_timing_generator: add DCE6 specific functions >> >> >> > > > [PATCH v3 24/27] drm/amd/display: dc/dce60: use DCE6 headers (v6) >> >> >> > > > >> >> >> > > > >> >> >> > > > Commits [25/27]..[27/27] SI support final enablements >> >> >> > > > >> >> >> > > > [PATCH v3 25/27] drm/amd/display: create plane rotation property for Bonarie and later >> >> >> > > > [PATCH v3 26/27] drm/amdgpu: enable DC support for SI parts (v2) >> >> >> > > > [PATCH v3 27/27] drm/amd/display: enable SI support in the Kconfig (v2) >> >> >> > > > >> >> >> > > > >> >> >> > > > Signed-off-by: Mauro Rossi <issor.oruam@gmail.com> >> >> >> > > > >> >> >> > > > _______________________________________________ >> >> >> > > > amd-gfx mailing list >> >> >> > > > amd-gfx@lists.freedesktop.org >> >> >> > > > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> > > >> >> >> > _______________________________________________ >> >> >> > amd-gfx mailing list >> >> >> > amd-gfx@lists.freedesktop.org >> >> >> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <20191205030032.GA26925@ray.huang@amd.com>]
* RE: [not found] <20191205030032.GA26925@ray.huang@amd.com> @ 2019-12-09 1:26 ` Quan, Evan 0 siblings, 0 replies; 50+ messages in thread From: Quan, Evan @ 2019-12-09 1:26 UTC (permalink / raw) To: Huang, Ray, Wang, Kevin(Yang) Cc: Deucher, Alexander, amd-gfx@lists.freedesktop.org I actually do not see any problem with this change. 1. if smu_read_smc_arg() always return 0, I do not see any meaning to keep "return 0". Making it a "void" API is more reasonable. 2. Making " WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_66, msg);" a separate API is ridiculous while " WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_90, 0);" and " WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82, param);" did not. Actually these three combined together makes a real "message sending". Anyway it's fine with me if you guys can live with original poor code. > -----Original Message----- > From: Huang Rui <ray.huang@amd.com> > Sent: Thursday, December 5, 2019 11:01 AM > To: Wang, Kevin(Yang) <Kevin1.Wang@amd.com> > Cc: Quan, Evan <Evan.Quan@amd.com>; amd-gfx@lists.freedesktop.org; > Deucher, Alexander <Alexander.Deucher@amd.com> > Subject: > > Bcc: > Subject: Re: [PATCH 1/2] drm/amd/powerplay: drop unnecessary API wrapper > and return value > Reply-To: > In-Reply-To: > <MN2PR12MB32961EFFD79528A4EFF4BF5AA25D0@MN2PR12MB3296.nampr > d12.prod.outlook.com> > > On Wed, Dec 04, 2019 at 08:41:00PM +0800, Wang, Kevin(Yang) wrote: > > [AMD Official Use Only - Internal Distribution Only] > > > > this change doesn't make sense, and if you really think the return > > value is useless. > > It is more reasonable to accept parameters with return value, not > > parameter. > > I think these two patches make the code look worse, unless there's a > > bug in it. > > add [1]@Huang, Ray double check. > > Best Regards, > > Kevin > > > > > ________________________________________________________________ > __ > > > > From: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> on behalf of Evan > > Quan <evan.quan@amd.com> > > Sent: Wednesday, December 4, 2019 5:53 PM > > To: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> > > Cc: Quan, Evan <Evan.Quan@amd.com> > > Subject: [PATCH 1/2] drm/amd/powerplay: drop unnecessary API wrapper > > and return value > > > > Some minor cosmetic fixes. > > Change-Id: I3ec217289f4cb491720430f2d0b0b4efe5e2b9aa > > Signed-off-by: Evan Quan <evan.quan@amd.com> > > --- > > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c | 12 ++---- > > .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h | 2 +- > > drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h | 2 +- > > drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h | 2 +- > > drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 39 +++++-------------- > > drivers/gpu/drm/amd/powerplay/smu_v12_0.c | 22 ++--------- > > 6 files changed, 19 insertions(+), 60 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > > b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > > index 2dd960e85a24..00a0df9b41c9 100644 > > --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > > +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c > > @@ -198,9 +198,7 @@ int smu_get_smc_version(struct smu_context *smu, > > uint32_t *if_version, uint32_t > > if (ret) > > return ret; > > > > - ret = smu_read_smc_arg(smu, if_version); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, if_version); > > } > > > > if (smu_version) { > > @@ -208,9 +206,7 @@ int smu_get_smc_version(struct smu_context *smu, > > uint32_t *if_version, uint32_t > > if (ret) > > return ret; > > > > - ret = smu_read_smc_arg(smu, smu_version); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, smu_version); > > } > > > > return ret; > > @@ -339,9 +335,7 @@ int smu_get_dpm_freq_by_index(struct > smu_context > > *smu, enum smu_clk_type clk_typ > > if (ret) > > return ret; > > > > - ret = smu_read_smc_arg(smu, ¶m); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, ¶m); > > > > /* BIT31: 0 - Fine grained DPM, 1 - Dicrete DPM > > * now, we un-support it */ > > diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > > b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > > index ca3fdc6777cf..e7b18b209bc7 100644 > > --- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > > +++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h > > @@ -502,7 +502,7 @@ struct pptable_funcs { > > int (*system_features_control)(struct smu_context *smu, bool > > en); > > int (*send_smc_msg_with_param)(struct smu_context *smu, > > enum smu_message_type msg, > > uint32_t param); > > - int (*read_smc_arg)(struct smu_context *smu, uint32_t *arg); > > + void (*read_smc_arg)(struct smu_context *smu, uint32_t *arg); > > int (*init_display_count)(struct smu_context *smu, uint32_t > > count); > > int (*set_allowed_mask)(struct smu_context *smu); > > int (*get_enabled_mask)(struct smu_context *smu, uint32_t > > *feature_mask, uint32_t num); > > diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h > > b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h > > index 610e301a5fce..4160147a03f3 100644 > > --- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h > > +++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h > > @@ -183,7 +183,7 @@ smu_v11_0_send_msg_with_param(struct > smu_context > > *smu, > > enum smu_message_type msg, > > uint32_t param); > > > > -int smu_v11_0_read_arg(struct smu_context *smu, uint32_t *arg); > > +void smu_v11_0_read_arg(struct smu_context *smu, uint32_t *arg); > > > > int smu_v11_0_init_display_count(struct smu_context *smu, uint32_t > > count); > > > > diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h > > b/drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h > > index 922973b7e29f..710af2860a8f 100644 > > --- a/drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h > > +++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h > > @@ -40,7 +40,7 @@ struct smu_12_0_cmn2aisc_mapping { > > int smu_v12_0_send_msg_without_waiting(struct smu_context *smu, > > uint16_t msg); > > > > -int smu_v12_0_read_arg(struct smu_context *smu, uint32_t *arg); > > +void smu_v12_0_read_arg(struct smu_context *smu, uint32_t *arg); > > > > int smu_v12_0_wait_for_response(struct smu_context *smu); > > > > diff --git a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > > b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > > index 8683e0678b56..325ec4864f90 100644 > > --- a/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > > +++ b/drivers/gpu/drm/amd/powerplay/smu_v11_0.c > > @@ -53,20 +53,11 @@ MODULE_FIRMWARE("amdgpu/navi12_smc.bin"); > > > > #define SMU11_VOLTAGE_SCALE 4 > > > > -static int smu_v11_0_send_msg_without_waiting(struct smu_context *smu, > > - uint16_t msg) > > -{ > > - struct amdgpu_device *adev = smu->adev; > > - WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_66, msg); > > - return 0; > > -} > > - > > -int smu_v11_0_read_arg(struct smu_context *smu, uint32_t *arg) > > +void smu_v11_0_read_arg(struct smu_context *smu, uint32_t *arg) > > { > > struct amdgpu_device *adev = smu->adev; > > > > *arg = RREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82); > > - return 0; > > } > > > > static int smu_v11_0_wait_for_response(struct smu_context *smu) > > @@ -109,7 +100,7 @@ smu_v11_0_send_msg_with_param(struct > smu_context > > *smu, > > > > WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82, param); > > > > - smu_v11_0_send_msg_without_waiting(smu, (uint16_t)index); > > + WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_66, > (uint16_t)index); > > > > ret = smu_v11_0_wait_for_response(smu); > > if (ret) > > @@ -843,16 +834,12 @@ int smu_v11_0_get_enabled_mask(struct > smu_context > > *smu, > > ret = smu_send_smc_msg(smu, > > SMU_MSG_GetEnabledSmuFeaturesHigh); > > if (ret) > > return ret; > > - ret = smu_read_smc_arg(smu, &feature_mask_high); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, &feature_mask_high); > > > > ret = smu_send_smc_msg(smu, > SMU_MSG_GetEnabledSmuFeaturesLow); > > if (ret) > > return ret; > > - ret = smu_read_smc_arg(smu, &feature_mask_low); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, &feature_mask_low); > > > > feature_mask[0] = feature_mask_low; > > feature_mask[1] = feature_mask_high; > > @@ -924,9 +911,7 @@ smu_v11_0_get_max_sustainable_clock(struct > > smu_context *smu, uint32_t *clock, > > return ret; > > } > > > > - ret = smu_read_smc_arg(smu, clock); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, clock); > > > > if (*clock != 0) > > return 0; > > @@ -939,7 +924,7 @@ smu_v11_0_get_max_sustainable_clock(struct > > smu_context *smu, uint32_t *clock, > > return ret; > > } > > > > - ret = smu_read_smc_arg(smu, clock); > > + smu_read_smc_arg(smu, clock); > > > > return ret; > > } > > @@ -1107,9 +1092,7 @@ int smu_v11_0_get_current_clk_freq(struct > > smu_context *smu, > > if (ret) > > return ret; > > > > - ret = smu_read_smc_arg(smu, &freq); > > - if (ret) > > - return ret; > > + smu_read_smc_arg(smu, &freq); > > } > > > > freq *= 100; > > @@ -1749,18 +1732,14 @@ int smu_v11_0_get_dpm_ultimate_freq(struct > > smu_context *smu, enum smu_clk_type c > > ret = smu_send_smc_msg_with_param(smu, > > SMU_MSG_GetMaxDpmFreq, param); > > if (ret) > > goto failed; > > - ret = smu_read_smc_arg(smu, max); > > - if (ret) > > - goto failed; > > + smu_read_smc_arg(smu, max); > > } > > > > if (min) { > > ret = smu_send_smc_msg_with_param(smu, > > SMU_MSG_GetMinDpmFreq, param); > > if (ret) > > goto failed; > > - ret = smu_read_smc_arg(smu, min); > > - if (ret) > > - goto failed; > > + smu_read_smc_arg(smu, min); > > } > > > > failed: > > diff --git a/drivers/gpu/drm/amd/powerplay/smu_v12_0.c > > b/drivers/gpu/drm/amd/powerplay/smu_v12_0.c > > index 269a7d73b58d..7f5f7e12a41e 100644 > > --- a/drivers/gpu/drm/amd/powerplay/smu_v12_0.c > > +++ b/drivers/gpu/drm/amd/powerplay/smu_v12_0.c > > @@ -41,21 +41,11 @@ > > #define SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS_MASK > > 0x00000006L > > #define SMUIO_GFX_MISC_CNTL__PWR_GFXOFF_STATUS__SHIFT 0x1 > > > > -int smu_v12_0_send_msg_without_waiting(struct smu_context *smu, > > - uint16_t msg) > > -{ > > - struct amdgpu_device *adev = smu->adev; > > - > > - WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_66, msg); > > - return 0; > > -} > > - > > -int smu_v12_0_read_arg(struct smu_context *smu, uint32_t *arg) > > +void smu_v12_0_read_arg(struct smu_context *smu, uint32_t *arg) > > { > > struct amdgpu_device *adev = smu->adev; > > > > *arg = RREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82); > > - return 0; > > } > > > > int smu_v12_0_wait_for_response(struct smu_context *smu) > > @@ -98,7 +88,7 @@ smu_v12_0_send_msg_with_param(struct > smu_context > > *smu, > > > > WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_82, param); > > > > - smu_v12_0_send_msg_without_waiting(smu, (uint16_t)index); > > + WREG32_SOC15(MP1, 0, mmMP1_SMN_C2PMSG_66, > (uint16_t)index); > > smu_v12_0_send_msg_without_waiting() function is more readable than using > raw register programming. > > Thanks, > Ray > > > > > ret = smu_v12_0_wait_for_response(smu); > > if (ret) > > @@ -352,9 +342,7 @@ int smu_v12_0_get_dpm_ultimate_freq(struct > > smu_context *smu, enum smu_clk_type c > > pr_err("Attempt to get max GX > > frequency from SMC Failed !\n"); > > goto failed; > > } > > - ret = smu_read_smc_arg(smu, max); > > - if (ret) > > - goto failed; > > + smu_read_smc_arg(smu, max); > > break; > > case SMU_UCLK: > > case SMU_FCLK: > > @@ -383,9 +371,7 @@ int smu_v12_0_get_dpm_ultimate_freq(struct > > smu_context *smu, enum smu_clk_type c > > pr_err("Attempt to get min GX > > frequency from SMC Failed !\n"); > > goto failed; > > } > > - ret = smu_read_smc_arg(smu, min); > > - if (ret) > > - goto failed; > > + smu_read_smc_arg(smu, min); > > break; > > case SMU_UCLK: > > case SMU_FCLK: > > -- > > 2.24.0 > > _______________________________________________ > > amd-gfx mailing list > > amd-gfx@lists.freedesktop.org > > [2]https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli > > sts.freedesktop.org%2Fmailman%2Flistinfo%2Famd- > gfx&data=02%7C01%7CK > > > evin1.Wang%40amd.com%7Cb2381beaed6e4f83074608d7789fe6ef%7C3dd896 > 1fe4884 > > > e608e11a82d994e183d%7C0%7C0%7C637110500489978071&sdata=U15c > qXp2n00L > > RZDeu2482cwoZmEIrXWHCgF4NFap%2BkQ%3D&reserved=0 > > > > References > > > > 1. mailto:Ray.Huang@amd.com > > 2. > > https://nam11.safelinks.protection.outlook.com/?url=https://lists.free > > desktop.org/mailman/listinfo/amd- > gfx&data=02|01|Kevin1.Wang@amd.co > > > m|b2381beaed6e4f83074608d7789fe6ef|3dd8961fe4884e608e11a82d994e183 > d|0| > > > 0|637110500489978071&sdata=U15cqXp2n00LRZDeu2482cwoZmEIrXWH > CgF4NFa > > p+kQ=&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <[PATCH xf86-video-amdgpu 0/3] Add non-desktop and leasing support>]
* (unknown), [not found] <[PATCH xf86-video-amdgpu 0/3] Add non-desktop and leasing support> @ 2018-03-03 4:49 ` Keith Packard [not found] ` <20180303044931.6902-1-keithp-aN4HjG94KOLQT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 50+ messages in thread From: Keith Packard @ 2018-03-03 4:49 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Cc: michel-otUistvHUpPR7s880joybQ, keithp-aN4HjG94KOLQT0dZR+AlfA Here are the patches to the modesetting driver amended for the amdgpu driver. -keith _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <20180303044931.6902-1-keithp-aN4HjG94KOLQT0dZR+AlfA@public.gmane.org>]
* Re: [not found] ` <20180303044931.6902-1-keithp-aN4HjG94KOLQT0dZR+AlfA@public.gmane.org> @ 2018-03-05 10:02 ` Michel Dänzer [not found] ` <82fc592b-f680-c663-1a0f-7b522ca932d2-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 50+ messages in thread From: Michel Dänzer @ 2018-03-05 10:02 UTC (permalink / raw) To: Keith Packard; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2018-03-03 05:49 AM, Keith Packard wrote: > Here are the patches to the modesetting driver amended for the amdgpu > driver. Thanks for the patches. Unfortunately, since this driver still has to compile and work with xserver >= 1.13, at least patches 1 & 3 cannot be applied as is. I was going to port these and take care of that anyway, though I might not get around to it before April. If it can't wait that long, I can give you details about what needs to be done. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <82fc592b-f680-c663-1a0f-7b522ca932d2-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [not found] ` <82fc592b-f680-c663-1a0f-7b522ca932d2-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2018-03-05 16:41 ` Keith Packard 0 siblings, 0 replies; 50+ messages in thread From: Keith Packard @ 2018-03-05 16:41 UTC (permalink / raw) To: Michel Dänzer; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 749 bytes --] Michel Dänzer <michel-otUistvHUpPR7s880joybQ@public.gmane.org> writes: > On 2018-03-03 05:49 AM, Keith Packard wrote: >> Here are the patches to the modesetting driver amended for the amdgpu >> driver. > > Thanks for the patches. Unfortunately, since this driver still has to > compile and work with xserver >= 1.13, at least patches 1 & 3 cannot be > applied as is. > > I was going to port these and take care of that anyway, though I might > not get around to it before April. If it can't wait that long, I can > give you details about what needs to be done. I'm good with that -- I needed this to test amdgpu vs modesetting for some applications, and just having the patches with support is good enough for me. -- -keith [-- Attachment #1.2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 50+ messages in thread
end of thread, other threads:[~2025-08-30 16:16 UTC | newest] Thread overview: 50+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-20 14:33 Christian König 2025-08-20 14:33 ` [PATCH 1/3] drm/ttm: use apply_page_range instead of vmf_insert_pfn_prot Christian König 2025-08-20 14:33 ` [PATCH 2/3] drm/ttm: reapply increase ttm pre-fault value to PMD size" Christian König 2025-08-20 14:33 ` [PATCH 3/3] drm/ttm: disable changing the global caching flags on newer AMD CPUs v2 Christian König 2025-08-20 15:12 ` Borislav Petkov 2025-08-20 15:23 ` David Hildenbrand 2025-08-21 8:10 ` Re: Christian König 2025-08-25 19:10 ` Re: David Hildenbrand 2025-08-26 8:38 ` Re: Christian König 2025-08-26 8:46 ` Re: David Hildenbrand 2025-08-26 9:00 ` Re: Christian König 2025-08-26 9:17 ` Re: David Hildenbrand 2025-08-26 9:56 ` Re: Christian König 2025-08-26 12:07 ` Re: David Hildenbrand 2025-08-26 16:09 ` Re: Christian König 2025-08-27 9:13 ` [PATCH 0/3] drm/ttm: Michel Dänzer 2025-08-28 21:18 ` stupid and complicated PAT :) David Hildenbrand 2025-08-28 21:28 ` David Hildenbrand 2025-08-28 21:32 ` David Hildenbrand 2025-08-29 10:50 ` Christian König 2025-08-29 19:52 ` David Hildenbrand 2025-08-29 19:58 ` David Hildenbrand 2025-08-26 14:27 ` Thomas Hellström 2025-08-28 21:01 ` stupid PAT :) David Hildenbrand 2025-08-26 12:37 ` David Hildenbrand 2025-08-21 9:16 ` your mail Lorenzo Stoakes 2025-08-21 9:30 ` David Hildenbrand 2025-08-21 10:05 ` Lorenzo Stoakes 2025-08-21 10:16 ` David Hildenbrand 2025-08-25 18:35 ` Christian König 2025-08-25 19:20 ` David Hildenbrand -- strict thread matches above, loose matches on Subject: below -- 2025-01-08 13:59 Jiang Liu 2025-01-08 14:10 ` Christian König 2025-01-08 16:33 ` Re: Mario Limonciello 2025-01-09 5:34 ` Re: Gerry Liu 2025-01-09 17:10 ` Re: Mario Limonciello 2025-01-13 1:19 ` Re: Gerry Liu 2025-01-13 21:59 ` Re: Mario Limonciello 2022-09-12 12:36 Christian König 2022-09-13 2:04 ` Alex Deucher 2020-07-16 21:22 Mauro Rossi 2020-07-20 9:00 ` Christian König 2020-07-20 9:59 ` Re: Mauro Rossi 2020-07-22 2:51 ` Re: Alex Deucher 2020-07-22 7:56 ` Re: Mauro Rossi 2020-07-24 18:31 ` Re: Alex Deucher 2020-07-26 15:31 ` Re: Mauro Rossi 2020-07-27 18:31 ` Re: Alex Deucher 2020-07-27 19:46 ` Re: Mauro Rossi 2020-07-27 19:54 ` Re: Alex Deucher [not found] <20191205030032.GA26925@ray.huang@amd.com> 2019-12-09 1:26 ` Quan, Evan [not found] <[PATCH xf86-video-amdgpu 0/3] Add non-desktop and leasing support> 2018-03-03 4:49 ` (unknown), Keith Packard [not found] ` <20180303044931.6902-1-keithp-aN4HjG94KOLQT0dZR+AlfA@public.gmane.org> 2018-03-05 10:02 ` Michel Dänzer [not found] ` <82fc592b-f680-c663-1a0f-7b522ca932d2-otUistvHUpPR7s880joybQ@public.gmane.org> 2018-03-05 16:41 ` Re: Keith Packard
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).