* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc [not found] <20231009042841.635366-1-uwu@icenowy.me> @ 2023-10-09 14:32 ` Sui Jingfeng 2023-10-10 0:15 ` WANG Xuerui 2023-10-10 0:50 ` Icenowy Zheng 0 siblings, 2 replies; 31+ messages in thread From: Sui Jingfeng @ 2023-10-09 14:32 UTC (permalink / raw) To: Icenowy Zheng, Huacai Chen, WANG Xuerui Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel Hi, On 2023/10/9 12:28, Icenowy Zheng wrote: > Currently the code disables WUC only disables it for ioremap_wc(), which > is only used when mapping writecombine pages like ioremap() (mapped to > the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a > crafted pgprot with pgprot_writecombine() function, which isn't > corrently disabled now. > > Disable WUC for pgprot_writecombine() (fallback to SUC) too. > > This improves AMDGPU driver stability on Loongson 3A5000 machines. > > Signed-off-by: Icenowy Zheng <uwu@icenowy.me> > --- > Changes since v1: > - Removed _WC macros > - Mention ioremap_wc in commit message > > arch/loongarch/include/asm/io.h | 5 ++--- > arch/loongarch/include/asm/pgtable-bits.h | 4 +++- > arch/loongarch/kernel/setup.c | 10 +++++----- > 3 files changed, 10 insertions(+), 9 deletions(-) > > diff --git a/arch/loongarch/include/asm/io.h b/arch/loongarch/include/asm/io.h > index 0dcb36b32cb25..290aad87a8847 100644 > --- a/arch/loongarch/include/asm/io.h > +++ b/arch/loongarch/include/asm/io.h > @@ -52,10 +52,9 @@ static inline void __iomem *ioremap_prot(phys_addr_t offset, unsigned long size, > * @offset: bus address of the memory > * @size: size of the resource to map > */ > -extern pgprot_t pgprot_wc; > - > #define ioremap_wc(offset, size) \ > - ioremap_prot((offset), (size), pgprot_val(pgprot_wc)) > + ioremap_prot((offset), (size), pgprot_val( \ > + wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC)) > > #define ioremap_cache(offset, size) \ > ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL)) > diff --git a/arch/loongarch/include/asm/pgtable-bits.h b/arch/loongarch/include/asm/pgtable-bits.h > index 35348d4c4209a..a3d827701736d 100644 > --- a/arch/loongarch/include/asm/pgtable-bits.h > +++ b/arch/loongarch/include/asm/pgtable-bits.h > @@ -92,6 +92,8 @@ > > #ifndef __ASSEMBLY__ > > +extern bool wc_enabled; > + > #define _PAGE_IOREMAP pgprot_val(PAGE_KERNEL_SUC) > > #define pgprot_noncached pgprot_noncached > @@ -111,7 +113,7 @@ static inline pgprot_t pgprot_writecombine(pgprot_t _prot) > { > unsigned long prot = pgprot_val(_prot); > > - prot = (prot & ~_CACHE_MASK) | _CACHE_WUC; > + prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC : _CACHE_SUC); > > return __pgprot(prot); > } > diff --git a/arch/loongarch/kernel/setup.c b/arch/loongarch/kernel/setup.c > index 7783f0a3d742c..465c1dbb6f4b4 100644 > --- a/arch/loongarch/kernel/setup.c > +++ b/arch/loongarch/kernel/setup.c > @@ -161,19 +161,19 @@ static void __init smbios_parse(void) > } > > #ifdef CONFIG_ARCH_WRITECOMBINE > -pgprot_t pgprot_wc = PAGE_KERNEL_WUC; > +bool wc_enabled = true; > #else > -pgprot_t pgprot_wc = PAGE_KERNEL_SUC; > +bool wc_enabled; > #endif > > -EXPORT_SYMBOL(pgprot_wc); > +EXPORT_SYMBOL(wc_enabled); > > static int __init setup_writecombine(char *p) > { > if (!strcmp(p, "on")) > - pgprot_wc = PAGE_KERNEL_WUC; > + wc_enabled = true; > else if (!strcmp(p, "off")) > - pgprot_wc = PAGE_KERNEL_SUC; > + wc_enabled = false; > else > pr_warn("Unknown writecombine setting \"%s\".\n", p); > Good catch! But this will make the write combine(WC) mappings completely unusable on LoongArch. This is nearly equivalent to say that LoongArch don't support write combine at all. But the write combine(WC) mappings works fine for software based drm drivers, such as drm/loongson and drm/ast etc. Even include drm/radeon and drm/amdgpu with pure software rendering setup (by putting Option "Accel" "off" into 10-amdgpu.conf or 10-radeon.conf) After merge this patch, the performance drop dramatically for 2D software rendering based display controller drivers. Well, this patch itself is a good catch, as it is a fix for the commit <16c52e503043> ("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm afraid that both of this commit and the <16c52e503043> commit are not a *real* fix write combine related issue on LoongArch. It just negative sidestep of the real problem. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-09 14:32 ` [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc Sui Jingfeng @ 2023-10-10 0:15 ` WANG Xuerui 2023-10-10 3:02 ` Sui Jingfeng 2023-10-10 0:50 ` Icenowy Zheng 1 sibling, 1 reply; 31+ messages in thread From: WANG Xuerui @ 2023-10-10 0:15 UTC (permalink / raw) To: Sui Jingfeng, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 10/9/23 22:32, Sui Jingfeng wrote: > Hi, > > > On 2023/10/9 12:28, Icenowy Zheng wrote: >> Currently the code disables WUC only disables it for ioremap_wc(), which >> is only used when mapping writecombine pages like ioremap() (mapped to >> the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a >> crafted pgprot with pgprot_writecombine() function, which isn't >> corrently disabled now. >> >> Disable WUC for pgprot_writecombine() (fallback to SUC) too. >> >> This improves AMDGPU driver stability on Loongson 3A5000 machines. >> >> Signed-off-by: Icenowy Zheng <uwu@icenowy.me> >> --- >> Changes since v1: >> - Removed _WC macros >> - Mention ioremap_wc in commit message >> >> arch/loongarch/include/asm/io.h | 5 ++--- >> arch/loongarch/include/asm/pgtable-bits.h | 4 +++- >> arch/loongarch/kernel/setup.c | 10 +++++----- >> 3 files changed, 10 insertions(+), 9 deletions(-) >> >> diff --git a/arch/loongarch/include/asm/io.h >> b/arch/loongarch/include/asm/io.h >> index 0dcb36b32cb25..290aad87a8847 100644 >> --- a/arch/loongarch/include/asm/io.h >> +++ b/arch/loongarch/include/asm/io.h >> @@ -52,10 +52,9 @@ static inline void __iomem >> *ioremap_prot(phys_addr_t offset, unsigned long size, >> * @offset: bus address of the memory >> * @size: size of the resource to map >> */ >> -extern pgprot_t pgprot_wc; >> - >> #define ioremap_wc(offset, size) \ >> - ioremap_prot((offset), (size), pgprot_val(pgprot_wc)) >> + ioremap_prot((offset), (size), pgprot_val( \ >> + wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC)) >> #define ioremap_cache(offset, size) \ >> ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL)) >> diff --git a/arch/loongarch/include/asm/pgtable-bits.h >> b/arch/loongarch/include/asm/pgtable-bits.h >> index 35348d4c4209a..a3d827701736d 100644 >> --- a/arch/loongarch/include/asm/pgtable-bits.h >> +++ b/arch/loongarch/include/asm/pgtable-bits.h >> @@ -92,6 +92,8 @@ >> #ifndef __ASSEMBLY__ >> +extern bool wc_enabled; >> + >> #define _PAGE_IOREMAP pgprot_val(PAGE_KERNEL_SUC) >> #define pgprot_noncached pgprot_noncached >> @@ -111,7 +113,7 @@ static inline pgprot_t >> pgprot_writecombine(pgprot_t _prot) >> { >> unsigned long prot = pgprot_val(_prot); >> - prot = (prot & ~_CACHE_MASK) | _CACHE_WUC; >> + prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC : >> _CACHE_SUC); >> return __pgprot(prot); >> } >> diff --git a/arch/loongarch/kernel/setup.c >> b/arch/loongarch/kernel/setup.c >> index 7783f0a3d742c..465c1dbb6f4b4 100644 >> --- a/arch/loongarch/kernel/setup.c >> +++ b/arch/loongarch/kernel/setup.c >> @@ -161,19 +161,19 @@ static void __init smbios_parse(void) >> } >> #ifdef CONFIG_ARCH_WRITECOMBINE >> -pgprot_t pgprot_wc = PAGE_KERNEL_WUC; >> +bool wc_enabled = true; >> #else >> -pgprot_t pgprot_wc = PAGE_KERNEL_SUC; >> +bool wc_enabled; >> #endif >> -EXPORT_SYMBOL(pgprot_wc); >> +EXPORT_SYMBOL(wc_enabled); >> static int __init setup_writecombine(char *p) >> { >> if (!strcmp(p, "on")) >> - pgprot_wc = PAGE_KERNEL_WUC; >> + wc_enabled = true; >> else if (!strcmp(p, "off")) >> - pgprot_wc = PAGE_KERNEL_SUC; >> + wc_enabled = false; >> else >> pr_warn("Unknown writecombine setting \"%s\".\n", p); > > > Good catch! > > But this will make the write combine(WC) mappings completely unusable > on LoongArch. > This is nearly equivalent to say that LoongArch don't support write > combine at all. > But the write combine(WC) mappings works fine for software based drm > drivers, > such as drm/loongson and drm/ast etc. Even include drm/radeon and > drm/amdgpu with > pure software rendering setup (by putting Option "Accel" "off" into > 10-amdgpu.conf > or 10-radeon.conf) After merge this patch, the performance drop > dramatically for > 2D software rendering based display controller drivers. > > Well, this patch itself is a good catch, as it is a fix for the commit > <16c52e503043> > ("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm > afraid that > both of this commit and the <16c52e503043> commit are not a *real* fix > write combine > related issue on LoongArch. It just negative sidestep of the real > problem. Sure, but given the public information I have access to, I don't think it's possible to "really" fix the root cause with software only. Do you have any insight on this, given from your perspective and language, such a solution seems to exist? -- WANG "xen0n" Xuerui Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/ ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-10 0:15 ` WANG Xuerui @ 2023-10-10 3:02 ` Sui Jingfeng 2023-10-10 12:26 ` Xi Ruoyao 0 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2023-10-10 3:02 UTC (permalink / raw) To: WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 2023/10/10 08:15, WANG Xuerui wrote: > > On 10/9/23 22:32, Sui Jingfeng wrote: >> Hi, >> >> >> On 2023/10/9 12:28, Icenowy Zheng wrote: >>> Currently the code disables WUC only disables it for ioremap_wc(), >>> which >>> is only used when mapping writecombine pages like ioremap() (mapped to >>> the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a >>> crafted pgprot with pgprot_writecombine() function, which isn't >>> corrently disabled now. >>> >>> Disable WUC for pgprot_writecombine() (fallback to SUC) too. >>> >>> This improves AMDGPU driver stability on Loongson 3A5000 machines. >>> >>> Signed-off-by: Icenowy Zheng <uwu@icenowy.me> >>> --- >>> Changes since v1: >>> - Removed _WC macros >>> - Mention ioremap_wc in commit message >>> >>> arch/loongarch/include/asm/io.h | 5 ++--- >>> arch/loongarch/include/asm/pgtable-bits.h | 4 +++- >>> arch/loongarch/kernel/setup.c | 10 +++++----- >>> 3 files changed, 10 insertions(+), 9 deletions(-) >>> >>> diff --git a/arch/loongarch/include/asm/io.h >>> b/arch/loongarch/include/asm/io.h >>> index 0dcb36b32cb25..290aad87a8847 100644 >>> --- a/arch/loongarch/include/asm/io.h >>> +++ b/arch/loongarch/include/asm/io.h >>> @@ -52,10 +52,9 @@ static inline void __iomem >>> *ioremap_prot(phys_addr_t offset, unsigned long size, >>> * @offset: bus address of the memory >>> * @size: size of the resource to map >>> */ >>> -extern pgprot_t pgprot_wc; >>> - >>> #define ioremap_wc(offset, size) \ >>> - ioremap_prot((offset), (size), pgprot_val(pgprot_wc)) >>> + ioremap_prot((offset), (size), pgprot_val( \ >>> + wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC)) >>> #define ioremap_cache(offset, size) \ >>> ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL)) >>> diff --git a/arch/loongarch/include/asm/pgtable-bits.h >>> b/arch/loongarch/include/asm/pgtable-bits.h >>> index 35348d4c4209a..a3d827701736d 100644 >>> --- a/arch/loongarch/include/asm/pgtable-bits.h >>> +++ b/arch/loongarch/include/asm/pgtable-bits.h >>> @@ -92,6 +92,8 @@ >>> #ifndef __ASSEMBLY__ >>> +extern bool wc_enabled; >>> + >>> #define _PAGE_IOREMAP pgprot_val(PAGE_KERNEL_SUC) >>> #define pgprot_noncached pgprot_noncached >>> @@ -111,7 +113,7 @@ static inline pgprot_t >>> pgprot_writecombine(pgprot_t _prot) >>> { >>> unsigned long prot = pgprot_val(_prot); >>> - prot = (prot & ~_CACHE_MASK) | _CACHE_WUC; >>> + prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC : >>> _CACHE_SUC); >>> return __pgprot(prot); >>> } >>> diff --git a/arch/loongarch/kernel/setup.c >>> b/arch/loongarch/kernel/setup.c >>> index 7783f0a3d742c..465c1dbb6f4b4 100644 >>> --- a/arch/loongarch/kernel/setup.c >>> +++ b/arch/loongarch/kernel/setup.c >>> @@ -161,19 +161,19 @@ static void __init smbios_parse(void) >>> } >>> #ifdef CONFIG_ARCH_WRITECOMBINE >>> -pgprot_t pgprot_wc = PAGE_KERNEL_WUC; >>> +bool wc_enabled = true; >>> #else >>> -pgprot_t pgprot_wc = PAGE_KERNEL_SUC; >>> +bool wc_enabled; >>> #endif >>> -EXPORT_SYMBOL(pgprot_wc); >>> +EXPORT_SYMBOL(wc_enabled); >>> static int __init setup_writecombine(char *p) >>> { >>> if (!strcmp(p, "on")) >>> - pgprot_wc = PAGE_KERNEL_WUC; >>> + wc_enabled = true; >>> else if (!strcmp(p, "off")) >>> - pgprot_wc = PAGE_KERNEL_SUC; >>> + wc_enabled = false; >>> else >>> pr_warn("Unknown writecombine setting \"%s\".\n", p); >> >> >> Good catch! >> >> But this will make the write combine(WC) mappings completely unusable >> on LoongArch. >> This is nearly equivalent to say that LoongArch don't support write >> combine at all. >> But the write combine(WC) mappings works fine for software based drm >> drivers, >> such as drm/loongson and drm/ast etc. Even include drm/radeon and >> drm/amdgpu with >> pure software rendering setup (by putting Option "Accel" "off" into >> 10-amdgpu.conf >> or 10-radeon.conf) After merge this patch, the performance drop >> dramatically for >> 2D software rendering based display controller drivers. >> >> Well, this patch itself is a good catch, as it is a fix for the >> commit <16c52e503043> >> ("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm >> afraid that >> both of this commit and the <16c52e503043> commit are not a *real* >> fix write combine >> related issue on LoongArch. It just negative sidestep of the real >> problem. > Sure, but given the public information I have access to, I don't think > it's possible to "really" fix the root cause with software only. Do > you have any insight on this, given from your perspective and > language, such a solution seems to exist? > On LoongArch, cached mapping and uncached mappings are DMA-coherent and guaranteed by the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU is involved. Therefore, On downstream kernel, We disable write combine(WC) mappings at the drm drivers side. - For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings. - For buffers reside in RAM, we replace the WC mappings with cached mappings. By this way, we were able to minimum the side effects, and meet the usable requirements for all of the GPU drivers. For DMA non-coherent buffers, we should try to implement arch-specific dma_map_ops, invalidate the CPU cache and flush the CPU write buffer before the device do DMA. Instead of pretend to be DMA coherent for all buffers, a kernel cmdline is not a system level solution for all of GPU drivers and OS release. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-10 3:02 ` Sui Jingfeng @ 2023-10-10 12:26 ` Xi Ruoyao 2023-10-13 11:12 ` Sui Jingfeng ` (2 more replies) 0 siblings, 3 replies; 31+ messages in thread From: Xi Ruoyao @ 2023-10-10 12:26 UTC (permalink / raw) To: Sui Jingfeng, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: > > On LoongArch, cached mapping and uncached mappings are DMA-coherent and guaranteed by > the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU is involved. Therefore, > On downstream kernel, We disable write combine(WC) mappings at the drm drivers side. Why it's only an issue when 3D GPU is involved? What's the difference between 3D GPUs and other devices? Is it possible that the other devices (say neural accelerators) start to perform DMA accesses in a similar way and then suddenly broken? > - For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings. > - For buffers reside in RAM, we replace the WC mappings with cached mappings. > > By this way, we were able to minimum the side effects, and meet the usable requirements > for all of the GPU drivers. AFAIK there has been some clear NAK from DRM maintainers towards this "approach". So it's not possible to be applied upstream. > For DMA non-coherent buffers, we should try to implement arch-specific dma_map_ops, > invalidate the CPU cache and flush the CPU write buffer before the device do DMA. Instead > of pretend to be DMA coherent for all buffers, a kernel cmdline is not a system level > solution for all of GPU drivers and OS release. IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper location of the workaround is in the bridge chip driver. Or am I misunderstanding something? -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-10 12:26 ` Xi Ruoyao @ 2023-10-13 11:12 ` Sui Jingfeng 2023-10-13 12:51 ` Sui Jingfeng 2024-12-02 16:23 ` Sui Jingfeng 2 siblings, 0 replies; 31+ messages in thread From: Sui Jingfeng @ 2023-10-13 11:12 UTC (permalink / raw) To: Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel Hi, On 2023/10/10 20:26, Xi Ruoyao wrote: >> For DMA non-coherent buffers, we should try to implement arch-specific dma_map_ops, >> invalidate the CPU cache and flush the CPU write buffer before the device do DMA. Instead >> of pretend to be DMA coherent for all buffers, a kernel cmdline is not a system level >> solution for all of GPU drivers and OS release. > IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper location > of the workaround is in the bridge chip driver. Or am I > misunderstanding something? The ls2k1000 and ls2k2000 SoC (which don't use with 7A1000 and 7A2000) are also suffer form this problem. If this is a hardware bug of 7A1000 and 7A2000, why forbidden WC mapping of pages in system RAM? The problem of this patch and the <16c52e503043> commit is that it forbidden all WC mappings by default. Even pages in system RAM. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-10 12:26 ` Xi Ruoyao 2023-10-13 11:12 ` Sui Jingfeng @ 2023-10-13 12:51 ` Sui Jingfeng 2023-10-13 13:15 ` Xi Ruoyao 2024-12-02 16:23 ` Sui Jingfeng 2 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2023-10-13 12:51 UTC (permalink / raw) To: Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel Hi, On 2023/10/10 20:26, Xi Ruoyao wrote: >> - For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings. >> - For buffers reside in RAM, we replace the WC mappings with cached mappings. >> >> By this way, we were able to minimum the side effects, and meet the usable requirements >> for all of the GPU drivers. > AFAIK there has been some clear NAK from DRM maintainers towards this > "approach". So it's not possible to be applied upstream. Yeah, domain specific reviewers are really hard to persuade to accept our solution. Probably because they are not know our hardware very well. But the side effects of this patch is really too hurt to accept. In this case, if you really want to make a progress by workaround. I think you could try scan the PCI device in the whole system on boot time. Turn off the WC mappings when there is a AMD or ATI GPUs found. Leave the the WC mappings open at the rest of cases. But this is yet another ugly workaround. Perhaps it is a bit of better than this because there is no way left. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-13 12:51 ` Sui Jingfeng @ 2023-10-13 13:15 ` Xi Ruoyao 2023-10-13 13:53 ` Xi Ruoyao 0 siblings, 1 reply; 31+ messages in thread From: Xi Ruoyao @ 2023-10-13 13:15 UTC (permalink / raw) To: Sui Jingfeng, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On Fri, 2023-10-13 at 20:51 +0800, Sui Jingfeng wrote: > Hi, > > > On 2023/10/10 20:26, Xi Ruoyao wrote: > > > - For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings. > > > - For buffers reside in RAM, we replace the WC mappings with cached mappings. > > > > > > By this way, we were able to minimum the side effects, and meet the usable requirements > > > for all of the GPU drivers. > > AFAIK there has been some clear NAK from DRM maintainers towards this > > "approach". So it's not possible to be applied upstream. > > > Yeah, domain specific reviewers are really hard to persuade to accept our solution. > Probably because they are not know our hardware very well. The problem: why this is only an issue with DRM devices? Work around it in DRM subsystem just does not make sense to me, at all. You cannot exclude the possibility that another subsystem will start to hit the same issue in the future. Then such a "solution" will cause #if CONFIG_LOONGARCH scattering everywhere and doing so is completely unmaintainable. > But the side effects of this patch is really too hurt to accept. In this case, if you really want to make > a progress by workaround. This is not a valid reason. For example, the spectre-like bug workarounds hurts the performance as well, but we enable them on affected CPUs by default. > I think you could try scan the PCI device in the whole > system on boot time. Turn off the WC mappings when there is a AMD or ATI GPUs found. Again, why this is only an issue with AMD or ATI GPUs? Can you provide some detailed documentations about this hardware issue so the community can help to figure out a solution? > Leave the the WC mappings open at the rest of cases. But this is yet another ugly > workaround. Perhaps it is a bit of better than this because there is no way left. And this also won't work with PCI hotplug, I'm not sure if it's supported now but I'm 99% sure you'll add it in the future. -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-13 13:15 ` Xi Ruoyao @ 2023-10-13 13:53 ` Xi Ruoyao 2025-01-21 9:19 ` Sui Jingfeng 0 siblings, 1 reply; 31+ messages in thread From: Xi Ruoyao @ 2023-10-13 13:53 UTC (permalink / raw) To: Sui Jingfeng, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On Fri, 2023-10-13 at 21:15 +0800, Xi Ruoyao wrote: > Again, why this is only an issue with AMD or ATI GPUs? > > Can you provide some detailed documentations about this hardware issue > so the community can help to figure out a solution? 3 years ago we had https://lkml.org/lkml/2020/8/10/255: In this case the patch is a clear NAK since you haven't root caused the issue and are just working around it in a very questionable manner. and But when the hardware doesn't correctly implement WC for PCIe BARs, then this is a violation of the PCIe spec and a bit more serious issue for the whole platform. So do we know the root cause now? <rant>Or in all the 3 years we just keep carrying a problematic workaround downstream, burying the head into the sand like an ostrich, and self comforting with "oh they don't understand our hardware"?!</rant> Even if the problem is really "they don't understand our hardware" you need to provide some materials to help people understanding the hardware better. If we cannot figure out the root cause or a proper fix is too difficult, we should *at least* have a cmdline option and/or a configuration option to allow the user to decide, like how we treat these spectre-like bugs. "Should the option be enabled or disabled by default" can be debated later. And please try to fix the hardware, to me it will be a compelling reason to pay some money for an upgrade :). -- Xi Ruoyao <xry111@xry111.site> School of Aerospace Science and Technology, Xidian University ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-13 13:53 ` Xi Ruoyao @ 2025-01-21 9:19 ` Sui Jingfeng 0 siblings, 0 replies; 31+ messages in thread From: Sui Jingfeng @ 2025-01-21 9:19 UTC (permalink / raw) To: Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, Shuah, conduct@kernel.org Hi, On 2023/10/13 21:53, Xi Ruoyao wrote: > On Fri, 2023-10-13 at 21:15 +0800, Xi Ruoyao wrote: >> Again, why this is only an issue with AMD or ATI GPUs? >> >> Can you provide some detailed documentations about this hardware issue >> so the community can help to figure out a solution? I don't ever owns you anything, so I think I don't have any obligation to have to provide you something, right? > 3 years ago we had https://lkml.org/lkml/2020/8/10/255: Its not "we". Just him or "you" if you want to pretend to be included. > In this case the patch is a clear NAK since youhaven't root caused the issue Excuse me, its not me posting that patch to upstream anyway. So who don't have the root cause? Can you figure out different person at the first place? Please? > are just working around it in a very questionable manner. The patch[1] that this patch fixes are circumvent the root issue in a very questionable manner. So, who don't know the root cause? or just cheating the community, pretend it solve the problem? [1] https://lore.kernel.org/loongarch/20230315085314.1083141-1-chenhuacai@loongson.cn/ > and > > But when the hardware doesn't correctly implement WC for PCIe BARs, then > this is a violation of the PCIe spec and a bit more serious issue for > the whole platform. > > So do we know the root cause now? Its not "we" here, we are not on the same side. So the proper expression is "Do you *really* know the root cause now"? > <rant>Or in all the 3 years we Not including you here. > just keep carrying a problematic workaround downstream, Who told you that its problematic? Again, its works *super* well for *both* discrete GPU and Loongson integrated graphics. With the patch applied, Loongson has been sell tons of machine and making million dollars, if not billion. So *who* exactly told you that its problematic? Did him ever use Loongson machine? Did him responsible for Loongson downstream *product* and commercial deals? If you can't effectively answer those questions, please stop your nonsense. > burying the head into the sand like an ostrich, I neverbury my head into the sand and I'm not like an ostrich. In the past, Loongson downstream developers are rather diligent and busy . I don't think I deserve this insult, so please stop it and apology. CCing conduct@kernel.org and CCing shuah. > and self comforting with "oh they don't understand our hardware"?!</rant> This is exactly what I'm helping to resolve, just help the community to figure out the root issue. Seek better solution so that we can avoid endless the low-end repeation. We certainly don't hope Loongson continue to produce such an ill CPU and/or bridge chips all the time. > Even if the problem is really "they don't understand our hardware" you > need to provide some materials to help people understanding the hardware > better. Again, Its not me posting the patch anyway. So please stop the blaming and the spams. > If we cannot figure out the root cause or a proper fix is too difficult, > we should *at least* have a cmdline option and/or a configuration option > to allow the user to decide, like how we treat these spectre-like bugs. > "Should the option be enabled or disabled by default" can be debated > later. > > And please try to fix the hardware, to me it will be a compelling reason > to pay some money for an upgrade :). > -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-10 12:26 ` Xi Ruoyao 2023-10-13 11:12 ` Sui Jingfeng 2023-10-13 12:51 ` Sui Jingfeng @ 2024-12-02 16:23 ` Sui Jingfeng 2024-12-17 18:18 ` Shuah 2024-12-17 23:44 ` Icenowy Zheng 2 siblings, 2 replies; 31+ messages in thread From: Sui Jingfeng @ 2024-12-02 16:23 UTC (permalink / raw) To: Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel Hi, On 10/10/23 20:26, Xi Ruoyao wrote: > On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: > >> >> On LoongArch, cached mapping and uncached mappings are DMA-coherent and guaranteed by >> the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU is involved. Therefore, >> On downstream kernel, We disable write combine(WC) mappings at the drm drivers side. > > Why it's only an issue when 3D GPU is involved? No one saying that only 3D GPU is suffer from this kind of issue, I just meant that the issue is there at least for GPU > What's the difference between 3D GPUs and other devices? Is it possible that the other > devices (say neural accelerators) start to perform DMA accesses in a > similar way and then suddenly broken? You, the patch contributor or the maintainer or whatever stuff should carry on the test, right? We are not intended to against the patch though. >> - For buffers at VRAM(device memory), we replace the WC mappings with uncached mappings. >> - For buffers reside in RAM, we replace the WC mappings with cached mappings. >> >> By this way, we were able to minimum the side effects, and meet the usable requirements >> for all of the GPU drivers. > > AFAIK there has been some clear NAK from DRM maintainers towards this > "approach". So it's not possible to be applied upstream. That's your guys problems, stealing other programmer's patch. And then, submit it to upstream without knowing and/or presenting decent hardware details. >> For DMA non-coherent buffers, we should try to implement arch-specific dma_map_ops, >> invalidate the CPU cache and flush the CPU write buffer before the device do DMA. Instead >> of pretend to be DMA coherent for all buffers, a kernel cmdline is not a system level >> solution for all of GPU drivers and OS release. > > IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper location > of the workaround is in the bridge chip driver. Or am I > misunderstanding something? > You are misunderstanding everything and ranting like a dog. The write buffers are inside the CPU, and the write-combine is related to *both* the CPU side and the GPU side. The GPU side could choose no snooping access mode, while the CPU side have to address such request properly. What's we arguing is that if this is a hardware bug of north bridge, we at least still should be able to use WC at the CPU side, that is, WC on system pages should be usable without any issue. While the weird commit disable everything. ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-02 16:23 ` Sui Jingfeng @ 2024-12-17 18:18 ` Shuah 2024-12-18 3:24 ` Sui Jingfeng 2024-12-17 23:44 ` Icenowy Zheng 1 sibling, 1 reply; 31+ messages in thread From: Shuah @ 2024-12-17 18:18 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, shuah, conduct@kernel.org On 12/2/24 09:23, Sui Jingfeng wrote: > Hi, > >> IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper location >> of the workaround is in the bridge chip driver. Or am I >> misunderstanding something? >> > > You are misunderstanding everything and ranting like a dog. > Sui Jingfeng, This is not the way to work with your fellow developers in the community to express disagreements. I would recommend following up with an apology. thanks, -- Shuah ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-17 18:18 ` Shuah @ 2024-12-18 3:24 ` Sui Jingfeng 2024-12-18 6:23 ` Icenowy Zheng 2024-12-20 16:43 ` Shuah 0 siblings, 2 replies; 31+ messages in thread From: Sui Jingfeng @ 2024-12-18 3:24 UTC (permalink / raw) To: Shuah, Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org Hi, On 2024/12/18 02:18, Shuah wrote: > On 12/2/24 09:23, Sui Jingfeng wrote: >> Hi, >> > >>> IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper >>> location >>> of the workaround is in the bridge chip driver. Or am I >>> misunderstanding something? >>> >> >> You are misunderstanding everything and ranting like a dog. >> > > Sui Jingfeng, > > This is not the way to work with your fellow developers in the > community to express disagreements. I'm not expressing disagreements, but argue that the contributor and/or talker should provide *sufficient* hardware details and tests. Instead of pointless *ranting* in order to get harmful patch merged. The discussion completely not make scene at all. > I would recommend following up with an apology. > I will not apology to indecent contributors and/or maintainers like this, never. > thanks, > -- Shuah > > > > > -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 3:24 ` Sui Jingfeng @ 2024-12-18 6:23 ` Icenowy Zheng 2024-12-18 10:05 ` Sui Jingfeng 2024-12-20 16:43 ` Shuah 1 sibling, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-18 6:23 UTC (permalink / raw) To: Sui Jingfeng, Shuah, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org 在 2024-12-18星期三的 11:24 +0800,Sui Jingfeng写道: > Hi, > > > On 2024/12/18 02:18, Shuah wrote: > > On 12/2/24 09:23, Sui Jingfeng wrote: > > > Hi, > > > > > > > > > IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper > > > > location > > > > of the workaround is in the bridge chip driver. Or am I > > > > misunderstanding something? > > > > > > > > > > You are misunderstanding everything and ranting like a dog. > > > > > > > Sui Jingfeng, > > > > This is not the way to work with your fellow developers in the > > community to express disagreements. > > > I'm not expressing disagreements, but argue that the contributor > and/or talker should provide *sufficient* hardware details and > tests. Instead of pointless *ranting* in order to get harmful > patch merged. I don't think the original patch is harmful. It changes wrong things to corect, do not break known-good things, although I assume it will slightly harm the performance -- but I think performance w/o correctness is just meaningless. In the perspective of graphics, you can nearly always "enhance your performance" by sending everything to /dev/null and pretend them to be done. > > The discussion completely not make scene at all. > > > > I would recommend following up with an apology. > > > > I will not apology to indecent contributors and/or maintainers > like this, never. > > > > thanks, > > -- Shuah > > > > > > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 6:23 ` Icenowy Zheng @ 2024-12-18 10:05 ` Sui Jingfeng 2024-12-18 12:37 ` Icenowy Zheng 0 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2024-12-18 10:05 UTC (permalink / raw) To: Icenowy Zheng, Shuah, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org Hi, On 2024/12/18 14:23, Icenowy Zheng wrote: > 在 2024-12-18星期三的 11:24 +0800,Sui Jingfeng写道: >> Hi, >> >> >> On 2024/12/18 02:18, Shuah wrote: >>> On 12/2/24 09:23, Sui Jingfeng wrote: >>>> Hi, >>>> >>>>> IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper >>>>> location >>>>> of the workaround is in the bridge chip driver. Or am I >>>>> misunderstanding something? >>>>> >>>> You are misunderstanding everything and ranting like a dog. >>>> >>> Sui Jingfeng, >>> >>> This is not the way to work with your fellow developers in the >>> community to express disagreements. >> >> I'm not expressing disagreements, but argue that the contributor >> and/or talker should provide *sufficient* hardware details and >> tests. Instead of pointless *ranting* in order to get harmful >> patch merged. > I don't think the original patch is harmful. > I have told you countless times, that is, disabling Write-Combine made the performance of BMC graphics(drm/ast) decrease dramatically. And downstream Loongson developer really dislike related patch, and asking for solutions to me. >> The discussion completely not make scene at all. >> >> >>> I would recommend following up with an apology. >>> >> I will not apology to indecent contributors and/or maintainers >> like this, never. >> >> >>> thanks, >>> -- Shuah >>> >>> >>> >>> >>> -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 10:05 ` Sui Jingfeng @ 2024-12-18 12:37 ` Icenowy Zheng 2024-12-19 3:17 ` Sui Jingfeng 0 siblings, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-18 12:37 UTC (permalink / raw) To: Sui Jingfeng, Shuah, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org 在 2024-12-18星期三的 18:05 +0800,Sui Jingfeng写道: > Hi, > > > On 2024/12/18 14:23, Icenowy Zheng wrote: > > 在 2024-12-18星期三的 11:24 +0800,Sui Jingfeng写道: > > > Hi, > > > > > > > > > On 2024/12/18 02:18, Shuah wrote: > > > > On 12/2/24 09:23, Sui Jingfeng wrote: > > > > > Hi, > > > > > > > > > > > IIUC this is a hardware bug of 7A1000 and 7A2000, so the > > > > > > proper > > > > > > location > > > > > > of the workaround is in the bridge chip driver. Or am I > > > > > > misunderstanding something? > > > > > > > > > > > You are misunderstanding everything and ranting like a dog. > > > > > > > > > Sui Jingfeng, > > > > > > > > This is not the way to work with your fellow developers in the > > > > community to express disagreements. > > > > > > I'm not expressing disagreements, but argue that the contributor > > > and/or talker should provide *sufficient* hardware details and > > > tests. Instead of pointless *ranting* in order to get harmful > > > patch merged. > > I don't think the original patch is harmful. > > > > I have told you countless times, that is, disabling Write-Combine > made the performance of BMC graphics(drm/ast) decrease dramatically. So do the ast driver have any kind of command queues that needs to be read by the card and executed? If it has, I assume it will break like amdgpu/radeon if this is utilized (I remember ast has some 2D accel); if it does not have, then maybe you can add an architecture-specific pgprot type (e.g. pgprot_wuc), see how pgprot_noncached_wc is defined for PowerPC or pgprot_device is defined for ARM64 -- in fact you may find another architecture that have some page attribute that behave like WUC on LongArch; and then utilize the newly introduced pgprot type in drm/ast, instead of using WUC as pgprot_writecombine, which do not exactly match (usual driver expect an ARM-like behavior of pgprot_combine, I assume). > And downstream Loongson developer really dislike related patch, > and asking for solutions to me. > > > > > The discussion completely not make scene at all. > > > > > > > > > > I would recommend following up with an apology. > > > > > > > I will not apology to indecent contributors and/or maintainers > > > like this, never. > > > > > > > > > > thanks, > > > > -- Shuah > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 12:37 ` Icenowy Zheng @ 2024-12-19 3:17 ` Sui Jingfeng 2024-12-19 4:54 ` Icenowy Zheng 0 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2024-12-19 3:17 UTC (permalink / raw) To: Icenowy Zheng, Shuah, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org Hi, On 2024/12/18 20:37, Icenowy Zheng wrote: > 在 2024-12-18星期三的 18:05 +0800,Sui Jingfeng写道: >> Hi, >> >> >> On 2024/12/18 14:23, Icenowy Zheng wrote: >>> 在 2024-12-18星期三的 11:24 +0800,Sui Jingfeng写道: >>>> Hi, >>>> >>>> >>>> On 2024/12/18 02:18, Shuah wrote: >>>>> On 12/2/24 09:23, Sui Jingfeng wrote: >>>>>> Hi, >>>>>> >>>>>>> IIUC this is a hardware bug of 7A1000 and 7A2000, so the >>>>>>> proper >>>>>>> location >>>>>>> of the workaround is in the bridge chip driver. Or am I >>>>>>> misunderstanding something? >>>>>>> >>>>>> You are misunderstanding everything and ranting like a dog. >>>>>> >>>>> Sui Jingfeng, >>>>> >>>>> This is not the way to work with your fellow developers in the >>>>> community to express disagreements. >>>> I'm not expressing disagreements, but argue that the contributor >>>> and/or talker should provide *sufficient* hardware details and >>>> tests. Instead of pointless *ranting* in order to get harmful >>>> patch merged. >>> I don't think the original patch is harmful. >>> >> I have told you countless times, that is, disabling Write-Combine >> made the performance of BMC graphics(drm/ast) decrease dramatically. > So do the ast driver have any kind of command queues that needs to be > read by the card and executed? If it has, I assume it will break like > amdgpu/radeon if this is utilized (I remember ast has some 2D accel); > if it does not have, then maybe you can add an architecture-specific > pgprot type (e.g. pgprot_wuc), see how pgprot_noncached_wc is defined > for PowerPC or pgprot_device is defined for ARM64 drm/ast driver doesn't perform DMA to/from system RAM so far. > -- in fact you may > find another architecture that have some page attribute that behave > like WUC on LongArch; and then utilize the newly introduced pgprot type > in drm/ast, instead of using WUC as pgprot_writecombine, which do not > exactly match (usual driver expect an ARM-like behavior of > pgprot_combine, I assume). This exactly saying that your patch is unqualified, because its not entirely an arch-specific problem. Your patch, together with other WC disable patch make the WC mapping of LoongArch completely broken. then you told us "you don't think the original patch is harmful" ??? >> And downstream Loongson developer really dislike related patch, >> and asking for solutions to me. >> >> >>>> The discussion completely not make scene at all. >>>> >>>> >>>>> I would recommend following up with an apology. >>>>> >>>> I will not apology to indecent contributors and/or maintainers >>>> like this, never. >>>> >>>> >>>>> thanks, >>>>> -- Shuah >>>>> >>>>> >>>>> >>>>> >>>>> -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 3:17 ` Sui Jingfeng @ 2024-12-19 4:54 ` Icenowy Zheng 0 siblings, 0 replies; 31+ messages in thread From: Icenowy Zheng @ 2024-12-19 4:54 UTC (permalink / raw) To: Sui Jingfeng, Shuah, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org 在 2024-12-19星期四的 11:17 +0800,Sui Jingfeng写道: > Hi, > > > On 2024/12/18 20:37, Icenowy Zheng wrote: > > 在 2024-12-18星期三的 18:05 +0800,Sui Jingfeng写道: > > > Hi, > > > > > > > > > On 2024/12/18 14:23, Icenowy Zheng wrote: > > > > 在 2024-12-18星期三的 11:24 +0800,Sui Jingfeng写道: > > > > > Hi, > > > > > > > > > > > > > > > On 2024/12/18 02:18, Shuah wrote: > > > > > > On 12/2/24 09:23, Sui Jingfeng wrote: > > > > > > > Hi, > > > > > > > > > > > > > > > IIUC this is a hardware bug of 7A1000 and 7A2000, so > > > > > > > > the > > > > > > > > proper > > > > > > > > location > > > > > > > > of the workaround is in the bridge chip driver. Or am > > > > > > > > I > > > > > > > > misunderstanding something? > > > > > > > > > > > > > > > You are misunderstanding everything and ranting like a > > > > > > > dog. > > > > > > > > > > > > > Sui Jingfeng, > > > > > > > > > > > > This is not the way to work with your fellow developers in > > > > > > the > > > > > > community to express disagreements. > > > > > I'm not expressing disagreements, but argue that the > > > > > contributor > > > > > and/or talker should provide *sufficient* hardware details > > > > > and > > > > > tests. Instead of pointless *ranting* in order to get harmful > > > > > patch merged. > > > > I don't think the original patch is harmful. > > > > > > > I have told you countless times, that is, disabling Write-Combine > > > made the performance of BMC graphics(drm/ast) decrease > > > dramatically. > > So do the ast driver have any kind of command queues that needs to > > be > > read by the card and executed? If it has, I assume it will break > > like > > amdgpu/radeon if this is utilized (I remember ast has some 2D > > accel); > > if it does not have, then maybe you can add an architecture- > > specific > > pgprot type (e.g. pgprot_wuc), see how pgprot_noncached_wc is > > defined > > for PowerPC or pgprot_device is defined for ARM64 > > > drm/ast driver doesn't perform DMA to/from system RAM so far. > > > > -- in fact you may > > find another architecture that have some page attribute that behave > > like WUC on LongArch; and then utilize the newly introduced pgprot > > type > > in drm/ast, instead of using WUC as pgprot_writecombine, which do > > not > > exactly match (usual driver expect an ARM-like behavior of > > pgprot_combine, I assume). > > > This exactly saying that your patch is unqualified, because its not > entirely an arch-specific problem. Well I don't declare that there is one architecture, I am only saying "may". I am only describing the fact that architectures can have custom pgprot_*() functions that maps to some architecture details when mapping to generic pgprot_*() is not appropriate. > > Your patch, together with other WC disable patch make the WC mapping > of LoongArch completely broken. then you told us "you don't think > the original patch is harmful" ??? WC mapping is not broken now. WC is only an optimization, not a requirement. If you see `include/linux/pgtable.h`, you will find the fallback implementation of pgprot_writecombine() is just pgprot_noncached(), which is just strongly ordered. However the original implementation that maps WC to LoongArch WUC is what is broken, so broken that observable by desktop end-users. > > > > > And downstream Loongson developer really dislike related patch, > > > and asking for solutions to me. > > > > > > > > > > > The discussion completely not make scene at all. > > > > > > > > > > > > > > > > I would recommend following up with an apology. > > > > > > > > > > > I will not apology to indecent contributors and/or > > > > > maintainers > > > > > like this, never. > > > > > > > > > > > > > > > > thanks, > > > > > > -- Shuah > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 3:24 ` Sui Jingfeng 2024-12-18 6:23 ` Icenowy Zheng @ 2024-12-20 16:43 ` Shuah 1 sibling, 0 replies; 31+ messages in thread From: Shuah @ 2024-12-20 16:43 UTC (permalink / raw) To: Sui Jingfeng, Shuah, Xi Ruoyao, WANG Xuerui, Icenowy Zheng, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel, conduct@kernel.org On 12/17/24 20:24, Sui Jingfeng wrote: > Hi, > > > On 2024/12/18 02:18, Shuah wrote: >> On 12/2/24 09:23, Sui Jingfeng wrote: >>> Hi, >>> >> >>>> IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper location >>>> of the workaround is in the bridge chip driver. Or am I >>>> misunderstanding something? >>>> >>> >>> You are misunderstanding everything and ranting like a dog. >>> >> >> Sui Jingfeng, >> >> This is not the way to work with your fellow developers in the >> community to express disagreements. > > > I'm not expressing disagreements, but argue that the contributor > and/or talker should provide *sufficient* hardware details and > tests. Instead of pointless *ranting* in order to get harmful > patch merged. > > The discussion completely not make scene at all. > > >> I would recommend following up with an apology. >> > > I will not apology to indecent contributors and/or maintainers > like this, never. > Either way the way the wording you are using your discussion is not what is expected and understood as the code of conduct. https://docs.kernel.org/process/code-of-conduct.html thanks, -- Shuah (On behalf of the Code of Conduct Committee) ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-02 16:23 ` Sui Jingfeng 2024-12-17 18:18 ` Shuah @ 2024-12-17 23:44 ` Icenowy Zheng 2024-12-18 3:05 ` Sui Jingfeng 1 sibling, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-17 23:44 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2024-12-03星期二的 00:23 +0800,Sui Jingfeng写道: > Hi, > > On 10/10/23 20:26, Xi Ruoyao wrote: > > On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: > > > > > > > > On LoongArch, cached mapping and uncached mappings are DMA- > > > coherent and guaranteed by > > > the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU > > > is involved. Therefore, > > > On downstream kernel, We disable write combine(WC) mappings at > > > the drm drivers side. > > > > Why it's only an issue when 3D GPU is involved? > > No one saying that only 3D GPU is suffer from this kind of issue, > I just meant that the issue is there at least for GPU > > > What's the difference between 3D GPUs and other devices? Is it > > possible that the other > > devices (say neural accelerators) start to perform DMA accesses in > > a > > similar way and then suddenly broken? > > You, the patch contributor or the maintainer or whatever stuff > should carry on the test, right? Well doing some test on PCIe peripherals need some professional tool, then I assume who raises the idea should do it, because not everyone can do. > > We are not intended to against the patch though. > > > > - For buffers at VRAM(device memory), we replace the WC mappings > > > with uncached mappings. > > > - For buffers reside in RAM, we replace the WC mappings with > > > cached mappings. > > > > > > By this way, we were able to minimum the side effects, and meet > > > the usable requirements > > > for all of the GPU drivers. > > > > AFAIK there has been some clear NAK from DRM maintainers towards > > this > > "approach". So it's not possible to be applied upstream. > > That's your guys problems, stealing other programmer's patch. > And then, submit it to upstream without knowing and/or presenting > decent hardware details. > > > > > For DMA non-coherent buffers, we should try to implement arch- > > > specific dma_map_ops, > > > invalidate the CPU cache and flush the CPU write buffer before > > > the device do DMA. Instead > > > of pretend to be DMA coherent for all buffers, a kernel cmdline > > > is not a system level > > > solution for all of GPU drivers and OS release. > > > > IIUC this is a hardware bug of 7A1000 and 7A2000, so the proper > > location > > of the workaround is in the bridge chip driver. Or am I > > misunderstanding something? > > > > You are misunderstanding everything and ranting like a dog. > > The write buffers are inside the CPU, and the write-combine is > related > to *both* the CPU side and the GPU side. The GPU side could choose > no snooping access mode, while the CPU side have to address such > request > properly. Well I think the radeon driver unconditionally maps VRAM with WC property, and only map system memory with WC when drm_arch_can_wc_memory() test passes, this is why blocklisting LoongArch in drm_arch_can_wc_memory() do not solve all problems; and for the GPU to access its own memory (VRAM), snooping the CPU sounds not acceptable. > > What's we arguing is that if this is a hardware bug of north bridge, > we > at least still should be able to use WC at the CPU side, that is, WC > on > system pages should be usable without any issue. While the weird > commit > disable everything. Well, what's the point of using WC on system pages? Why don't we just use normal cached property? I think non-cached memory attributes are only there for communication with peripherals, and at least 3A5000/6000, no meaningful DMA-capable peripheral could be accessible w/o the bridge chip. > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-17 23:44 ` Icenowy Zheng @ 2024-12-18 3:05 ` Sui Jingfeng 2024-12-18 5:47 ` Icenowy Zheng 0 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2024-12-18 3:05 UTC (permalink / raw) To: Icenowy Zheng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel Hi, On 2024/12/18 07:44, Icenowy Zheng wrote: > 在 2024-12-03星期二的 00:23 +0800,Sui Jingfeng写道: >> Hi, >> >> On 10/10/23 20:26, Xi Ruoyao wrote: >>> On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: >>> >>>> On LoongArch, cached mapping and uncached mappings are DMA- >>>> coherent and guaranteed by >>>> the hardware. While WC mappings is *NOT* DMA-coherent when 3D GPU >>>> is involved. Therefore, >>>> On downstream kernel, We disable write combine(WC) mappings at >>>> the drm drivers side. >>> Why it's only an issue when 3D GPU is involved? >> No one saying that only 3D GPU is suffer from this kind of issue, >> I just meant that the issue is there at least for GPU >> >>> What's the difference between 3D GPUs and other devices? Is it >>> possible that the other >>> devices (say neural accelerators) start to perform DMA accesses in >>> a >>> similar way and then suddenly broken? >> You, the patch contributor or the maintainer or whatever stuff >> should carry on the test, right? > Well doing some test on PCIe peripherals need some professional tool, > then I assume who raises the idea should do it, because not everyone > can do. You are posting the patch, and then you are acclaiming that you are not even able to do sufficient test? right? ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 3:05 ` Sui Jingfeng @ 2024-12-18 5:47 ` Icenowy Zheng 2024-12-18 10:29 ` Sui Jingfeng 0 siblings, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-18 5:47 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2024-12-18星期三的 11:05 +0800,Sui Jingfeng写道: > Hi, > > > On 2024/12/18 07:44, Icenowy Zheng wrote: > > 在 2024-12-03星期二的 00:23 +0800,Sui Jingfeng写道: > > > Hi, > > > > > > On 10/10/23 20:26, Xi Ruoyao wrote: > > > > On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: > > > > > > > > > On LoongArch, cached mapping and uncached mappings are DMA- > > > > > coherent and guaranteed by > > > > > the hardware. While WC mappings is *NOT* DMA-coherent when 3D > > > > > GPU > > > > > is involved. Therefore, > > > > > On downstream kernel, We disable write combine(WC) mappings > > > > > at > > > > > the drm drivers side. > > > > Why it's only an issue when 3D GPU is involved? > > > No one saying that only 3D GPU is suffer from this kind of issue, > > > I just meant that the issue is there at least for GPU > > > > > > > What's the difference between 3D GPUs and other devices? Is it > > > > possible that the other > > > > devices (say neural accelerators) start to perform DMA accesses > > > > in > > > > a > > > > similar way and then suddenly broken? > > > You, the patch contributor or the maintainer or whatever stuff > > > should carry on the test, right? > > Well doing some test on PCIe peripherals need some professional > > tool, > > then I assume who raises the idea should do it, because not > > everyone > > can do. > > > You are posting the patch, and then you are acclaiming that you are > not even able to do sufficient test? right? What I can test is that w/o this patch things break and w/ this patch things work. If you are against this patch, you need to provide more effective evidence / a better solution than my approach. Trying to make the problem more complex without providing extra evidence isn't constructive. > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 5:47 ` Icenowy Zheng @ 2024-12-18 10:29 ` Sui Jingfeng 2024-12-18 12:43 ` Icenowy Zheng 0 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2024-12-18 10:29 UTC (permalink / raw) To: Icenowy Zheng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 2024/12/18 13:47, Icenowy Zheng wrote: > 在 2024-12-18星期三的 11:05 +0800,Sui Jingfeng写道: >> Hi, >> >> >> On 2024/12/18 07:44, Icenowy Zheng wrote: >>> 在 2024-12-03星期二的 00:23 +0800,Sui Jingfeng写道: >>>> Hi, >>>> >>>> On 10/10/23 20:26, Xi Ruoyao wrote: >>>>> On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: >>>>> >>>>>> On LoongArch, cached mapping and uncached mappings are DMA- >>>>>> coherent and guaranteed by >>>>>> the hardware. While WC mappings is *NOT* DMA-coherent when 3D >>>>>> GPU >>>>>> is involved. Therefore, >>>>>> On downstream kernel, We disable write combine(WC) mappings >>>>>> at >>>>>> the drm drivers side. >>>>> Why it's only an issue when 3D GPU is involved? >>>> No one saying that only 3D GPU is suffer from this kind of issue, >>>> I just meant that the issue is there at least for GPU >>>> >>>>> What's the difference between 3D GPUs and other devices? Is it >>>>> possible that the other >>>>> devices (say neural accelerators) start to perform DMA accesses >>>>> in >>>>> a >>>>> similar way and then suddenly broken? >>>> You, the patch contributor or the maintainer or whatever stuff >>>> should carry on the test, right? >>> Well doing some test on PCIe peripherals need some professional >>> tool, >>> then I assume who raises the idea should do it, because not >>> everyone >>> can do. >> >> You are posting the patch, and then you are acclaiming that you are >> not even able to do sufficient test? right? > What I can test is that w/o this patch things break and w/ this patch > things work. > If you are against this patch, you need to provide more effective > evidence / a better solution than my approach. I have told you several times, we are asking for hardware details. Why the Loongarch suffer from the WC bug, like the Loongson Mips? Why the performance of drm/ast's drop dramatically when your patch is applied? >> -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 10:29 ` Sui Jingfeng @ 2024-12-18 12:43 ` Icenowy Zheng 2024-12-19 2:54 ` Sui Jingfeng 0 siblings, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-18 12:43 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2024-12-18星期三的 18:29 +0800,Sui Jingfeng写道: > > On 2024/12/18 13:47, Icenowy Zheng wrote: > > 在 2024-12-18星期三的 11:05 +0800,Sui Jingfeng写道: > > > Hi, > > > > > > > > > On 2024/12/18 07:44, Icenowy Zheng wrote: > > > > 在 2024-12-03星期二的 00:23 +0800,Sui Jingfeng写道: > > > > > Hi, > > > > > > > > > > On 10/10/23 20:26, Xi Ruoyao wrote: > > > > > > On Tue, 2023-10-10 at 11:02 +0800, Sui Jingfeng wrote: > > > > > > > > > > > > > On LoongArch, cached mapping and uncached mappings are > > > > > > > DMA- > > > > > > > coherent and guaranteed by > > > > > > > the hardware. While WC mappings is *NOT* DMA-coherent > > > > > > > when 3D > > > > > > > GPU > > > > > > > is involved. Therefore, > > > > > > > On downstream kernel, We disable write combine(WC) > > > > > > > mappings > > > > > > > at > > > > > > > the drm drivers side. > > > > > > Why it's only an issue when 3D GPU is involved? > > > > > No one saying that only 3D GPU is suffer from this kind of > > > > > issue, > > > > > I just meant that the issue is there at least for GPU > > > > > > > > > > > What's the difference between 3D GPUs and other devices? > > > > > > Is it > > > > > > possible that the other > > > > > > devices (say neural accelerators) start to perform DMA > > > > > > accesses > > > > > > in > > > > > > a > > > > > > similar way and then suddenly broken? > > > > > You, the patch contributor or the maintainer or whatever > > > > > stuff > > > > > should carry on the test, right? > > > > Well doing some test on PCIe peripherals need some professional > > > > tool, > > > > then I assume who raises the idea should do it, because not > > > > everyone > > > > can do. > > > > > > You are posting the patch, and then you are acclaiming that you > > > are > > > not even able to do sufficient test? right? > > What I can test is that w/o this patch things break and w/ this > > patch > > things work. > > > > If you are against this patch, you need to provide more effective > > evidence / a better solution than my approach. > > > I have told you several times, we are asking for hardware details. > Why the Loongarch suffer from the WC bug, like the Loongson Mips? > Why the performance of drm/ast's drop dramatically when your patch > is applied? Well the fact is that before applying this patch, graphical glitches usually happen on amdgpu/radeon, and after applying this they disappears -- this matches what happens when WC is enabled on MIPS Loongson. For the fact of drm/ast's dramatical drop, it's because write to the framebuffer can no longer be reordered. In fact if using amdgpu/radeon w/o graphical acceleration, the same issue happens. (This can be clearly seen when on MIPS Loongson w/ RS780E -- outputing the boot log can dramatically slow down the boot process). What I can assume now is that the Loongson WC/WUC is safe for display framebuffers, but not safe for GPU commands/textures/etc. > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-18 12:43 ` Icenowy Zheng @ 2024-12-19 2:54 ` Sui Jingfeng 2024-12-19 4:49 ` Icenowy Zheng 0 siblings, 1 reply; 31+ messages in thread From: Sui Jingfeng @ 2024-12-19 2:54 UTC (permalink / raw) To: Icenowy Zheng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 2024/12/18 20:43, Icenowy Zheng wrote: > For the fact of drm/ast's dramatical drop, it's because write to the > framebuffer can no longer be reordered. No, your understanding is wrong, very very wrong and a big wrong. It's not because it can't reorder the write. Rather, it's because that the CPU can't do write gathering and can't do burst write any more. So do you still think your patch is harmless? -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 2:54 ` Sui Jingfeng @ 2024-12-19 4:49 ` Icenowy Zheng 2024-12-19 5:49 ` Sui Jingfeng 0 siblings, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-19 4:49 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道: > > On 2024/12/18 20:43, Icenowy Zheng wrote: > > For the fact of drm/ast's dramatical drop, it's because write to > > the > > framebuffer can no longer be reordered. > > > No, your understanding is wrong, very very wrong and a big wrong. > > It's not because it can't reorder the write. Rather, it's because > that the CPU can't do write gathering and can't do burst write any > more. Write gathering is a kind of write reordering, comparing to strongly ordered writing (which is literally one byte per write). > > So do you still think your patch is harmless? Well, I said that performance w/o correctness is meaningless. > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 4:49 ` Icenowy Zheng @ 2024-12-19 5:49 ` Sui Jingfeng 2024-12-19 6:34 ` Icenowy Zheng 2024-12-19 6:38 ` Icenowy Zheng 0 siblings, 2 replies; 31+ messages in thread From: Sui Jingfeng @ 2024-12-19 5:49 UTC (permalink / raw) To: Icenowy Zheng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 2024/12/19 12:49, Icenowy Zheng wrote: > 在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道: >> On 2024/12/18 20:43, Icenowy Zheng wrote: >>> For the fact of drm/ast's dramatical drop, it's because write to >>> the >>> framebuffer can no longer be reordered. >> >> No, your understanding is wrong, very very wrong and a big wrong. >> >> It's not because it can't reorder the write. Rather, it's because >> that the CPU can't do write gathering and can't do burst write any >> more. > Write gathering is a kind of write reordering, No, your understanding is broken. Write gathering *isn't* a kind of write reordering. Its doesn't have to reorder, it just cache the write operation with the CPU's write buffer. > comparing to strongly > ordered writing (which is literally one byte per write). > >> So do you still think your patch is harmless? > Well, I said that performance w/o correctness is meaningless. The point is that Write-Combine on drm/ast will get both correctness and performance. >> -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 5:49 ` Sui Jingfeng @ 2024-12-19 6:34 ` Icenowy Zheng 2024-12-19 7:46 ` Sui Jingfeng 2024-12-19 6:38 ` Icenowy Zheng 1 sibling, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-19 6:34 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2024-12-19星期四的 13:49 +0800,Sui Jingfeng写道: > > On 2024/12/19 12:49, Icenowy Zheng wrote: > > 在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道: > > > On 2024/12/18 20:43, Icenowy Zheng wrote: > > > > For the fact of drm/ast's dramatical drop, it's because write > > > > to > > > > the > > > > framebuffer can no longer be reordered. > > > > > > No, your understanding is wrong, very very wrong and a big wrong. > > > > > > It's not because it can't reorder the write. Rather, it's because > > > that the CPU can't do write gathering and can't do burst write > > > any > > > more. > > Write gathering is a kind of write reordering, > > > No, your understanding is broken. > > Write gathering *isn't* a kind of write reordering. > Its doesn't have to reorder, it just cache the write operation with > the CPU's write buffer. > > > > comparing to strongly > > ordered writing (which is literally one byte per write). > > > > > So do you still think your patch is harmless? > > Well, I said that performance w/o correctness is meaningless. > > > The point is that Write-Combine on drm/ast will get both correctness > and performance. But in general writecombining is broken (if it's broken in one place, it's broken and not totally right, and needs to be handled). I said in another mail that if you want to keep this performance, you should add another pgprot_ type instead of using pgprot_writecombine (which breaks other drivers). > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 6:34 ` Icenowy Zheng @ 2024-12-19 7:46 ` Sui Jingfeng 0 siblings, 0 replies; 31+ messages in thread From: Sui Jingfeng @ 2024-12-19 7:46 UTC (permalink / raw) To: Icenowy Zheng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 2024/12/19 14:34, Icenowy Zheng wrote: > 在 2024-12-19星期四的 13:49 +0800,Sui Jingfeng写道: >> On 2024/12/19 12:49, Icenowy Zheng wrote: >>> 在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道: >>>> On 2024/12/18 20:43, Icenowy Zheng wrote: >>>>> For the fact of drm/ast's dramatical drop, it's because write >>>>> to >>>>> the >>>>> framebuffer can no longer be reordered. >>>> No, your understanding is wrong, very very wrong and a big wrong. >>>> >>>> It's not because it can't reorder the write. Rather, it's because >>>> that the CPU can't do write gathering and can't do burst write >>>> any >>>> more. >>> Write gathering is a kind of write reordering, >> >> No, your understanding is broken. >> >> Write gathering *isn't* a kind of write reordering. >> Its doesn't have to reorder, it just cache the write operation with >> the CPU's write buffer. >> >> >>> comparing to strongly >>> ordered writing (which is literally one byte per write). >>> >>>> So do you still think your patch is harmless? >>> Well, I said that performance w/o correctness is meaningless. >> >> The point is that Write-Combine on drm/ast will get both correctness >> and performance. > But in general writecombining is broken Cached coherent ARM64 SoC has similar issue, but they didn't disable WC in that arch-specific code. Again, your understanding is broken. > (if it's broken in one place, > it's broken and not totally right, and needs to be handled). No, your logic is completely wrong here. 1) We needs the *hardware* details (not bug phenomenon description) before make further judgement. 2) Its very likely that the specific device driver that suffer from the problems is broken. 3) Loongson downstream kernel has the write-combine enabled, and all drm drivers works very well. Its just that you guys chase to radeon/andgpu driver, doing the abuse at arch specific code. > -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 5:49 ` Sui Jingfeng 2024-12-19 6:34 ` Icenowy Zheng @ 2024-12-19 6:38 ` Icenowy Zheng 2024-12-19 10:39 ` Sui Jingfeng 1 sibling, 1 reply; 31+ messages in thread From: Icenowy Zheng @ 2024-12-19 6:38 UTC (permalink / raw) To: Sui Jingfeng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2024-12-19星期四的 13:49 +0800,Sui Jingfeng写道: > > On 2024/12/19 12:49, Icenowy Zheng wrote: > > 在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道: > > > On 2024/12/18 20:43, Icenowy Zheng wrote: > > > > For the fact of drm/ast's dramatical drop, it's because write > > > > to > > > > the > > > > framebuffer can no longer be reordered. > > > > > > No, your understanding is wrong, very very wrong and a big wrong. > > > > > > It's not because it can't reorder the write. Rather, it's because > > > that the CPU can't do write gathering and can't do burst write > > > any > > > more. > > Write gathering is a kind of write reordering, > > > No, your understanding is broken. > > Write gathering *isn't* a kind of write reordering. It is, it changes the order "write A -> write B -> write C -> write D" to "write ABCD concurrently". If one of B/C/D is a register that triggers latching A, in the former case it will latch A correctly but in the latter case it will wrongly latch the old value of A instead, so write gathering is not strongly-ordered. > Its doesn't have to reorder, it just cache the write operation with > the CPU's write buffer. > > > > comparing to strongly > > ordered writing (which is literally one byte per write). > > > > > So do you still think your patch is harmless? > > Well, I said that performance w/o correctness is meaningless. > > > The point is that Write-Combine on drm/ast will get both correctness > and performance. > > > > > ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2024-12-19 6:38 ` Icenowy Zheng @ 2024-12-19 10:39 ` Sui Jingfeng 0 siblings, 0 replies; 31+ messages in thread From: Sui Jingfeng @ 2024-12-19 10:39 UTC (permalink / raw) To: Icenowy Zheng, Xi Ruoyao, WANG Xuerui, Huacai Chen Cc: Andrew Morton, Mike Rapoport (IBM), Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Zhen Lei, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel On 2024/12/19 14:38, Icenowy Zheng wrote: > 在 2024-12-19星期四的 13:49 +0800,Sui Jingfeng写道: >> On 2024/12/19 12:49, Icenowy Zheng wrote: >>> 在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道: >>>> On 2024/12/18 20:43, Icenowy Zheng wrote: >>>>> For the fact of drm/ast's dramatical drop, it's because write >>>>> to >>>>> the >>>>> framebuffer can no longer be reordered. >>>> No, your understanding is wrong, very very wrong and a big wrong. >>>> >>>> It's not because it can't reorder the write. Rather, it's because >>>> that the CPU can't do write gathering and can't do burst write >>>> any >>>> more. >>> Write gathering is a kind of write reordering, >> >> No, your understanding is broken. >> >> Write gathering *isn't* a kind of write reordering. > It is, it changes the order "write A -> write B -> write C -> write D" > to "write ABCD concurrently". The reorder mentioned here isn't the main reason that affect the performance. While the cache-like behavior and better bandwidth utilizing (burst write) is. > If one of B/C/D is a register that triggers latching A Mips/Loonarch CPUs doesn't allow *uncached read* bypass writes. This means that when you issue a *uncached read*, the former write operation must have been resolved by the hardware memory. But this is true only for *uncached read* issued by the *CPU*. How can the "write B", "write C" and "write D" will trigger the latching A here? > in the former case it will latch A correctly but > in the latter case it will wrongly latch the old value of A instead, so > write gathering is not strongly-ordered. > > For accesses from the CPU side, registers are mapped with *uncached*. register access by the CPU are all strong ordered. All DRM drivers mapped their register with strong order uncached fashion. Do you ever seen any exceptions? Even with the command submit approach, registers will not get written to the hardware until the kickoff command is issued to the hardware. The write order depend on the occurrence order in the ring buffer, not the issue order. Commands that rank first in the ring buffer will get executed first. But there is still no hints that "write B", "write C" and "write D" will lead to "latch A", So please stop cheating us by making up cock-and-bull story. >> Its doesn't have to reorder, it just cache the write operation with >> the CPU's write buffer. >> >> >>> comparing to strongly >>> ordered writing (which is literally one byte per write). >>> >>>> So do you still think your patch is harmless? >>> Well, I said that performance w/o correctness is meaningless. >> >> The point is that Write-Combine on drm/ast will get both correctness >> and performance. >> >> -- Best regards, Sui ^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc 2023-10-09 14:32 ` [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc Sui Jingfeng 2023-10-10 0:15 ` WANG Xuerui @ 2023-10-10 0:50 ` Icenowy Zheng 1 sibling, 0 replies; 31+ messages in thread From: Icenowy Zheng @ 2023-10-10 0:50 UTC (permalink / raw) To: Sui Jingfeng, Huacai Chen, WANG Xuerui Cc: Andrew Morton, Weihao Li, Mike Rapoport (IBM), Jun Yi, Baoquan He, Matthew Wilcox (Oracle), David Hildenbrand, Hongchen Zhang, Binbin Zhou, Zhen Lei, Tiezhu Yang, Thomas Gleixner, Zhihong Dong, loongarch, linux-kernel 在 2023-10-09星期一的 22:32 +0800,Sui Jingfeng写道: > Hi, > > > On 2023/10/9 12:28, Icenowy Zheng wrote: > > Currently the code disables WUC only disables it for ioremap_wc(), > > which > > is only used when mapping writecombine pages like ioremap() (mapped > > to > > the kernel space). For VRAM mapped in TTM/GEM, it's mapped with a > > crafted pgprot with pgprot_writecombine() function, which isn't > > corrently disabled now. > > > > Disable WUC for pgprot_writecombine() (fallback to SUC) too. > > > > This improves AMDGPU driver stability on Loongson 3A5000 machines. > > > > Signed-off-by: Icenowy Zheng <uwu@icenowy.me> > > --- > > Changes since v1: > > - Removed _WC macros > > - Mention ioremap_wc in commit message > > > > arch/loongarch/include/asm/io.h | 5 ++--- > > arch/loongarch/include/asm/pgtable-bits.h | 4 +++- > > arch/loongarch/kernel/setup.c | 10 +++++----- > > 3 files changed, 10 insertions(+), 9 deletions(-) > > > > diff --git a/arch/loongarch/include/asm/io.h > > b/arch/loongarch/include/asm/io.h > > index 0dcb36b32cb25..290aad87a8847 100644 > > --- a/arch/loongarch/include/asm/io.h > > +++ b/arch/loongarch/include/asm/io.h > > @@ -52,10 +52,9 @@ static inline void __iomem > > *ioremap_prot(phys_addr_t offset, unsigned long size, > > * @offset: bus address of the memory > > * @size: size of the resource to map > > */ > > -extern pgprot_t pgprot_wc; > > - > > #define ioremap_wc(offset, size) \ > > - ioremap_prot((offset), (size), pgprot_val(pgprot_wc)) > > + ioremap_prot((offset), (size), pgprot_val( \ > > + wc_enabled ? PAGE_KERNEL_WUC : PAGE_KERNEL_SUC)) > > > > #define ioremap_cache(offset, size) \ > > ioremap_prot((offset), (size), pgprot_val(PAGE_KERNEL)) > > diff --git a/arch/loongarch/include/asm/pgtable-bits.h > > b/arch/loongarch/include/asm/pgtable-bits.h > > index 35348d4c4209a..a3d827701736d 100644 > > --- a/arch/loongarch/include/asm/pgtable-bits.h > > +++ b/arch/loongarch/include/asm/pgtable-bits.h > > @@ -92,6 +92,8 @@ > > > > #ifndef __ASSEMBLY__ > > > > +extern bool wc_enabled; > > + > > #define _PAGE_IOREMAP pgprot_val(PAGE_KERNEL_SUC) > > > > #define pgprot_noncached pgprot_noncached > > @@ -111,7 +113,7 @@ static inline pgprot_t > > pgprot_writecombine(pgprot_t _prot) > > { > > unsigned long prot = pgprot_val(_prot); > > > > - prot = (prot & ~_CACHE_MASK) | _CACHE_WUC; > > + prot = (prot & ~_CACHE_MASK) | (wc_enabled ? _CACHE_WUC : > > _CACHE_SUC); > > > > return __pgprot(prot); > > } > > diff --git a/arch/loongarch/kernel/setup.c > > b/arch/loongarch/kernel/setup.c > > index 7783f0a3d742c..465c1dbb6f4b4 100644 > > --- a/arch/loongarch/kernel/setup.c > > +++ b/arch/loongarch/kernel/setup.c > > @@ -161,19 +161,19 @@ static void __init smbios_parse(void) > > } > > > > #ifdef CONFIG_ARCH_WRITECOMBINE > > -pgprot_t pgprot_wc = PAGE_KERNEL_WUC; > > +bool wc_enabled = true; > > #else > > -pgprot_t pgprot_wc = PAGE_KERNEL_SUC; > > +bool wc_enabled; > > #endif > > > > -EXPORT_SYMBOL(pgprot_wc); > > +EXPORT_SYMBOL(wc_enabled); > > > > static int __init setup_writecombine(char *p) > > { > > if (!strcmp(p, "on")) > > - pgprot_wc = PAGE_KERNEL_WUC; > > + wc_enabled = true; > > else if (!strcmp(p, "off")) > > - pgprot_wc = PAGE_KERNEL_SUC; > > + wc_enabled = false; > > else > > pr_warn("Unknown writecombine setting \"%s\".\n", > > p); > > > > > Good catch! > > But this will make the write combine(WC) mappings completely unusable > on LoongArch. Not completely unusable -- we always have the writecombine=on kernel parameter that overrides it (or CONFIG_ARCH_WRITECOMBINE build-time option). People knowing what they're doing can utilize them to gain back the performace of WUC, however for the default setup I prefer functionality correctness to (unstable) performance. > This is nearly equivalent to say that LoongArch don't support write > combine at all. > But the write combine(WC) mappings works fine for software based drm > drivers, > such as drm/loongson and drm/ast etc. Even include drm/radeon and > drm/amdgpu with > pure software rendering setup (by putting Option "Accel" "off" into > 10-amdgpu.conf > or 10-radeon.conf) After merge this patch, the performance drop > dramatically for > 2D software rendering based display controller drivers. > > Well, this patch itself is a good catch, as it is a fix for the > commit <16c52e503043> > ("LoongArch: Make WriteCombine configurable for ioremap()"). But I'm > afraid that > both of this commit and the <16c52e503043> commit are not a *real* > fix write combine Yes, this is a workaround rather than a fix, however I'm a software developer and this is the best I can do. If you can, please report this to your hardware developers and try to get it fixed for 3A7000/7A3000 (or maybe grabbing some chicken bit that will fix this for existing chips). I will be quite appreciated. Thanks, Icenowy > related issue on LoongArch. It just negative sidestep of the real > problem. > ^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2025-01-21 9:20 UTC | newest] Thread overview: 31+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20231009042841.635366-1-uwu@icenowy.me> 2023-10-09 14:32 ` [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc Sui Jingfeng 2023-10-10 0:15 ` WANG Xuerui 2023-10-10 3:02 ` Sui Jingfeng 2023-10-10 12:26 ` Xi Ruoyao 2023-10-13 11:12 ` Sui Jingfeng 2023-10-13 12:51 ` Sui Jingfeng 2023-10-13 13:15 ` Xi Ruoyao 2023-10-13 13:53 ` Xi Ruoyao 2025-01-21 9:19 ` Sui Jingfeng 2024-12-02 16:23 ` Sui Jingfeng 2024-12-17 18:18 ` Shuah 2024-12-18 3:24 ` Sui Jingfeng 2024-12-18 6:23 ` Icenowy Zheng 2024-12-18 10:05 ` Sui Jingfeng 2024-12-18 12:37 ` Icenowy Zheng 2024-12-19 3:17 ` Sui Jingfeng 2024-12-19 4:54 ` Icenowy Zheng 2024-12-20 16:43 ` Shuah 2024-12-17 23:44 ` Icenowy Zheng 2024-12-18 3:05 ` Sui Jingfeng 2024-12-18 5:47 ` Icenowy Zheng 2024-12-18 10:29 ` Sui Jingfeng 2024-12-18 12:43 ` Icenowy Zheng 2024-12-19 2:54 ` Sui Jingfeng 2024-12-19 4:49 ` Icenowy Zheng 2024-12-19 5:49 ` Sui Jingfeng 2024-12-19 6:34 ` Icenowy Zheng 2024-12-19 7:46 ` Sui Jingfeng 2024-12-19 6:38 ` Icenowy Zheng 2024-12-19 10:39 ` Sui Jingfeng 2023-10-10 0:50 ` Icenowy Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).