* flush_dcache_page vs kunmap_local @ 2021-11-04 15:00 Matthew Wilcox 2021-11-04 15:30 ` Linus Torvalds 0 siblings, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2021-11-04 15:00 UTC (permalink / raw) To: Linus Torvalds, Christoph Hellwig Cc: linux-arch, Ira Weiny, Andrew Morton, linux-kernel, Thomas Gleixner In https://lore.kernel.org/lkml/CAHk-=wijdojzo56FzYqE5TOYw2Vws7ik3LEMGj9SPQaJJ+Z73Q@mail.gmail.com/ Linus offers the opinion that kunmap calls should imply a flush_dcache_page(). Christoph added calls to flush_dcache_page() in commit 8dad53a11f8d. Was this "voodoo programming", or was there a real problem being addressed? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 15:00 flush_dcache_page vs kunmap_local Matthew Wilcox @ 2021-11-04 15:30 ` Linus Torvalds 2021-11-04 16:54 ` Catalin Marinas 2021-11-04 18:38 ` Matthew Wilcox 0 siblings, 2 replies; 10+ messages in thread From: Linus Torvalds @ 2021-11-04 15:30 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner On Thu, Nov 4, 2021 at 8:03 AM Matthew Wilcox <willy@infradead.org> wrote: > > Linus offers the opinion that kunmap calls should imply a > flush_dcache_page(). Christoph added calls to flush_dcache_page() > in commit 8dad53a11f8d. Was this "voodoo programming", or was there > a real problem being addressed? I don't think anybody actually uses/cares about flush_dcache_page() at all, and pretty much all uses are random and voodoo. No sane architecture uses pure virtual caches, and the insane ones haven't been an issue for a long time either. But if there are still systems with pure virtual caches, and they need manual cache flushing, then I do think that kunmap is one of the points that needs it, since that's the "I'm done accessing this data through this virtual address" place. End result: I really don't think anybody cares any more (and only truly broken architectures ever did). I'd personally be perfectly happy just saying "we might as well drop support for non-coherent caches entirely". But as long as we have those random odd "flush dcache manually" things, I think kunmap() is one of the places that probably should continue to do them. Of course, the kunmap case is _doubly_ irrelevant, because we should certainly hope that not only are those noncoherent pure virtual caches a thing of the past, highmem itself should be going away. Why did this come up? Do you actually have some hardware or situation that cares? Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 15:30 ` Linus Torvalds @ 2021-11-04 16:54 ` Catalin Marinas 2021-11-04 17:08 ` Linus Torvalds 2021-11-04 18:38 ` Matthew Wilcox 1 sibling, 1 reply; 10+ messages in thread From: Catalin Marinas @ 2021-11-04 16:54 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner On Thu, Nov 04, 2021 at 08:30:55AM -0700, Linus Torvalds wrote: > On Thu, Nov 4, 2021 at 8:03 AM Matthew Wilcox <willy@infradead.org> wrote: > > Linus offers the opinion that kunmap calls should imply a > > flush_dcache_page(). Christoph added calls to flush_dcache_page() > > in commit 8dad53a11f8d. Was this "voodoo programming", or was there > > a real problem being addressed? > > I don't think anybody actually uses/cares about flush_dcache_page() at > all, and pretty much all uses are random and voodoo. We do. flush_dcache_page() is not just about virtual caches. On arm32/64 (and powerpc), even with PIPT-like caches, we use it to flag a page's D-cache as no longer clean. Subsequently in set_pte_at(), if the mapping is executable, we do the cache maintenance to ensure the I and D caches are coherent with each other. I wouldn't add this call to kmap/kunmap_local(), it would be a slight unnecessary overhead (we had a customer complaining about kmap_atomic() breaking write-streaming, I think the new kmap_local() solved this problem, if in the right context). -- Catalin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 16:54 ` Catalin Marinas @ 2021-11-04 17:08 ` Linus Torvalds 2021-11-04 18:04 ` Catalin Marinas 0 siblings, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2021-11-04 17:08 UTC (permalink / raw) To: Catalin Marinas Cc: Matthew Wilcox, Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner On Thu, Nov 4, 2021 at 9:54 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > We do. flush_dcache_page() is not just about virtual caches. On arm32/64 > (and powerpc), even with PIPT-like caches, we use it to flag a page's > D-cache as no longer clean. Subsequently in set_pte_at(), if the mapping > is executable, we do the cache maintenance to ensure the I and D caches > are coherent with each other. Ugh,. ok, so we have two very different use-cases for that function. Perhaps more importantly, they have hugely different semantics. For you, it's about pages that can be mapped executable, so it's only relevant for mappable pages. For the traditional broken pure virtual cache case, it's not about user mappings at all, it's about any data structure that we might have in highmem. Of course, I think we got rid of most of the other uses of highmem, and we no longer put any "normal" kernel data in highmem pages. There used to be patches that did inodes and things like that in highmem, and they actually depended on the "cache the virtual address so that it's always the same" behavior. > I wouldn't add this call to kmap/kunmap_local(), it would be a slight > unnecessary overhead (we had a customer complaining about kmap_atomic() > breaking write-streaming, I think the new kmap_local() solved this > problem, if in the right context). kmap_local() ends up being (I think) fundamentally broken for virtual cache coherency anyway, because two different CPU's can see two different virtual addresses at the same time for the same page (in ways that the old kmap interfaces could not). So maybe the answer is "let's forget about the old virtual cache coherence issue, and make it purely about the I$ mapping case". At that point, kmap is irrelevant from a virtual address standpoint and so it doesn't make much sense to fliush on kunmap - but anybody who writes to a page still needs that flush_dcache_page() thing. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 17:08 ` Linus Torvalds @ 2021-11-04 18:04 ` Catalin Marinas 2021-11-04 18:20 ` Russell King (Oracle) 2021-11-04 18:23 ` Linus Torvalds 0 siblings, 2 replies; 10+ messages in thread From: Catalin Marinas @ 2021-11-04 18:04 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner, Russell King + rmk On Thu, Nov 04, 2021 at 10:08:40AM -0700, Linus Torvalds wrote: > On Thu, Nov 4, 2021 at 9:54 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > We do. flush_dcache_page() is not just about virtual caches. On arm32/64 > > (and powerpc), even with PIPT-like caches, we use it to flag a page's > > D-cache as no longer clean. Subsequently in set_pte_at(), if the mapping > > is executable, we do the cache maintenance to ensure the I and D caches > > are coherent with each other. > > Ugh,. ok, so we have two very different use-cases for that function. > > Perhaps more importantly, they have hugely different semantics. For > you, it's about pages that can be mapped executable, so it's only > relevant for mappable pages. > > For the traditional broken pure virtual cache case, it's not about > user mappings at all, it's about any data structure that we might have > in highmem. > > Of course, I think we got rid of most of the other uses of highmem, > and we no longer put any "normal" kernel data in highmem pages. There > used to be patches that did inodes and things like that in highmem, > and they actually depended on the "cache the virtual address so that > it's always the same" behavior. We can still have ptes in highmem. > > I wouldn't add this call to kmap/kunmap_local(), it would be a slight > > unnecessary overhead (we had a customer complaining about kmap_atomic() > > breaking write-streaming, I think the new kmap_local() solved this > > problem, if in the right context). > > kmap_local() ends up being (I think) fundamentally broken for virtual > cache coherency anyway, because two different CPU's can see two > different virtual addresses at the same time for the same page (in > ways that the old kmap interfaces could not). Luckily I don't think we have a (working) SMP system with VIVT caches. On UP, looking at arm, for VIVT caches it flushes the D-cache before kunmap_local() (arch_kmap_local_pre_unmap()). So any new kmap_local() would see the correct data even if it's in a different location. > So maybe the answer is "let's forget about the old virtual cache > coherence issue, and make it purely about the I$ mapping case". We still have VIVT processors supported in the kernel and a few where the VIPT cache is aliasing (some ARMv6 CPUs). On these, flush_dcache_page() is still used to ensure the user aliases are coherent with the kernel one, so it's not just about the I/D-cache coherency. > At that point, kmap is irrelevant from a virtual address standpoint > and so it doesn't make much sense to fliush on kunmap - but anybody > who writes to a page still needs that flush_dcache_page() thing. The cachetlb.rst doc states the two cases where flush_dcache_page() should be called: 1. After writing to a page cache page (that's what we need on arm64 for the I-cache). 2. Before reading from a page cache page and user mappings potentially exist. I think arm32 ensures the D-cache user aliases are coherent with the kernel one (added rmk to confirm). Now, whether the kernel code does call flush_dcache_page() in the above scenarios is another matter. But if we are to remove the 2nd case, for VIVT/aliasing-VIPT hardware we'd need kmap() to perform some cache maintenance even if the page is not in highmem. -- Catalin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 18:04 ` Catalin Marinas @ 2021-11-04 18:20 ` Russell King (Oracle) 2021-11-04 18:23 ` Linus Torvalds 1 sibling, 0 replies; 10+ messages in thread From: Russell King (Oracle) @ 2021-11-04 18:20 UTC (permalink / raw) To: Catalin Marinas Cc: Linus Torvalds, Matthew Wilcox, Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner On Thu, Nov 04, 2021 at 06:04:45PM +0000, Catalin Marinas wrote: > The cachetlb.rst doc states the two cases where flush_dcache_page() > should be called: > > 1. After writing to a page cache page (that's what we need on arm64 for > the I-cache). > > 2. Before reading from a page cache page and user mappings potentially > exist. I think arm32 ensures the D-cache user aliases are coherent > with the kernel one (added rmk to confirm). Yes, where necessary, we flush the user aliases in flush_dcache_page(). -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last! ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 18:04 ` Catalin Marinas 2021-11-04 18:20 ` Russell King (Oracle) @ 2021-11-04 18:23 ` Linus Torvalds 2021-11-05 19:40 ` Catalin Marinas 1 sibling, 1 reply; 10+ messages in thread From: Linus Torvalds @ 2021-11-04 18:23 UTC (permalink / raw) To: Catalin Marinas Cc: Matthew Wilcox, Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner, Russell King On Thu, Nov 4, 2021 at 11:04 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > Luckily I don't think we have a (working) SMP system with VIVT caches. > On UP, looking at arm, for VIVT caches it flushes the D-cache before > kunmap_local() (arch_kmap_local_pre_unmap()). So any new kmap_local() > would see the correct data even if it's in a different location. Ok, good. Yeah, because kmap_local and SMP really would seem to be a "that can't work with VIVT". > We still have VIVT processors supported in the kernel and a few where > the VIPT cache is aliasing (some ARMv6 CPUs). On these, > flush_dcache_page() is still used to ensure the user aliases are > coherent with the kernel one, so it's not just about the I/D-cache > coherency. Maybe we could try to split it up and make each function have more well-defined rules? One of the issues with the flush_dcache thing is that it's always been so ad-hoc and it's not been hugely clear. For example, people seem to think it's purely about flushing writes. But for the virtual aliasing issue and kmap, you may need to flush purely between reads too - just to make sure that you don't have any stale contents. That's why kunmap needs to have some unconditional cache flush thing for the virtual aliasing issue. But hey, it's entirely possible that it should *not* have that "flush_dcache_page()" thing, but something that is private to the architecture. So VIVT arm (and whoever else) would continue to do the cache flushing at kunmap_local time (or kmap - I don't think it matters which one you do, as long as you make sure there are no stale contents from the previous use of that address). And then we'd relegate flush_dcache_page() purely for uses where somebody modifies the data and wants to make sure it ends up being coherent with subsequent uses (whether kmap and VIVT or I$/D$ coherency issues)? > The cachetlb.rst doc states the two cases where flush_dcache_page() > should be called: > > 1. After writing to a page cache page (that's what we need on arm64 for > the I-cache). > > 2. Before reading from a page cache page and user mappings potentially > exist. I think arm32 ensures the D-cache user aliases are coherent > with the kernel one (added rmk to confirm). I think the "kernel cache coherency" matters too. The PTE contents thing seems relevant if we use kmap for that... So I do think that the "page cache or user mapping" is not necessarily the only case. But personally I consider these situations so broken at a hardware level that I can't find it in myself to care too deeply. Because user space with non-coherent I$/D$ should do its own cache flushing if it does "read()" to modify an executable range - exactly the same way it has to do it for just doing regular stores to that range. It really shouldn't be the kernel that cares at all. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 18:23 ` Linus Torvalds @ 2021-11-05 19:40 ` Catalin Marinas 0 siblings, 0 replies; 10+ messages in thread From: Catalin Marinas @ 2021-11-05 19:40 UTC (permalink / raw) To: Linus Torvalds Cc: Matthew Wilcox, Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner, Russell King On Thu, Nov 04, 2021 at 11:23:56AM -0700, Linus Torvalds wrote: > On Thu, Nov 4, 2021 at 11:04 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > We still have VIVT processors supported in the kernel and a few where > > the VIPT cache is aliasing (some ARMv6 CPUs). On these, > > flush_dcache_page() is still used to ensure the user aliases are > > coherent with the kernel one, so it's not just about the I/D-cache > > coherency. > > Maybe we could try to split it up and make each function have more > well-defined rules? One of the issues with the flush_dcache thing is > that it's always been so ad-hoc and it's not been hugely clear. [...] > So VIVT arm (and whoever else) would continue to do the cache flushing > at kunmap_local time (or kmap - I don't think it matters which one you > do, as long as you make sure there are no stale contents from the > previous use of that address). > > And then we'd relegate flush_dcache_page() purely for uses where > somebody modifies the data and wants to make sure it ends up being > coherent with subsequent uses (whether kmap and VIVT or I$/D$ > coherency issues)? For PIPT hardware (I suppose most newish architectures), flush_dcache_page() only matters with separate I$ and D$. So we can indeed redefine it as only meaningful when a user page has been written by the kernel (and maybe we can give it a better name). For VIVT, kmap/kunmap() can take care of synchronising the aliases. If the kmap'ed page has a user mapping, kmap() would also need to flush the aliases, not just kunmap() (currently relying on flush_dcache_page() for the read side as well). I suspect this is going to make kmap() more expensive for those highmem pages only used by the kernel, or for pages not yet mapped in user space (say during a page fault on mmap'ed file). Long time ago on arm32 we used to do a check with page_mapping() and mapping_mapped() but I think the latter disappeared from the kernel. > > The cachetlb.rst doc states the two cases where flush_dcache_page() > > should be called: > > > > 1. After writing to a page cache page (that's what we need on arm64 for > > the I-cache). > > > > 2. Before reading from a page cache page and user mappings potentially > > exist. I think arm32 ensures the D-cache user aliases are coherent > > with the kernel one (added rmk to confirm). > > I think the "kernel cache coherency" matters too. The PTE contents > thing seems relevant if we use kmap for that... > > So I do think that the "page cache or user mapping" is not necessarily > the only case. At least the arm32 set_pte() for VIVT caches does its own D$ flushing on the kmap() address. So kunmap() flushing is not strictly necessary for this specific case (I think). But there may be other cases where it matters. > But personally I consider these situations so broken at a hardware > level that I can't find it in myself to care too deeply. We've had them supported in mainline for so many years, and working (mostly, there was the odd driver that did not call the right API). But I'm fine with deprecating them, making them slower in favour of cleaner semantics of kmap, flush_dcache_page etc. > Because user space with non-coherent I$/D$ should do its own cache > flushing if it does "read()" to modify an executable range - exactly > the same way it has to do it for just doing regular stores to that > range. Yes, if the user did a read(), it should flush the caches it cares about. I don't think we even have a flush_dcache_page() call in the kernel in these cases, just copy_to_user(). Basically the kernel should only flush if it wrote via its own mapping (linear, kmap). However, with mmap(PROT_EXEC), the user expects the I$/D$ to be coherent without explicit user cache maintenance, even with PIPT hardware. That's where flush_dcache_page() matters. We also had some weird bugs with a dynamic loader mapping a page initially as PROT_READ|PROT_EXEC, doing an mprotect(PROT_READ|PROT_WRITE) just to write some data (not text, so it never thought explicit cache flushing by user was needed) and back to mprotect(PROT_READ|PROT_EXEC). Because of the CoW, the new page did not have the I$/D$ synchronised, leading to the occasional SIGILL. Again, flush_dcache_page() after CoW is need, though hidden in the arch code (we do this on arm64 in copy_user_highpage()). -- Catalin ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 15:30 ` Linus Torvalds 2021-11-04 16:54 ` Catalin Marinas @ 2021-11-04 18:38 ` Matthew Wilcox 2021-11-04 21:02 ` Linus Torvalds 1 sibling, 1 reply; 10+ messages in thread From: Matthew Wilcox @ 2021-11-04 18:38 UTC (permalink / raw) To: Linus Torvalds Cc: Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner On Thu, Nov 04, 2021 at 08:30:55AM -0700, Linus Torvalds wrote: > Why did this come up? Do you actually have some hardware or situation > that cares? Oh, we're doing review of the XFS/iomap folio patches, which led to looking at zero_user_segments(), and I realised that memzero_page() was now functionally identical to zero_user(). And you'd been quite specific about not having flush_dcache_page() in there, so ... I wondered if you'd had a change of mind. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: flush_dcache_page vs kunmap_local 2021-11-04 18:38 ` Matthew Wilcox @ 2021-11-04 21:02 ` Linus Torvalds 0 siblings, 0 replies; 10+ messages in thread From: Linus Torvalds @ 2021-11-04 21:02 UTC (permalink / raw) To: Matthew Wilcox Cc: Christoph Hellwig, linux-arch, Ira Weiny, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner On Thu, Nov 4, 2021 at 11:39 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Thu, Nov 04, 2021 at 08:30:55AM -0700, Linus Torvalds wrote: > > Why did this come up? Do you actually have some hardware or situation > > that cares? > > Oh, we're doing review of the XFS/iomap folio patches, which led to > looking at zero_user_segments(), and I realised that memzero_page() > was now functionally identical to zero_user(). And you'd been quite > specific about not having flush_dcache_page() in there, so ... I wondered > if you'd had a change of mind. Ugh. I guess it ends up being there whether I like it or not. All that "zero_user_segments() stuff is too ugly for words, though, so I think whoever wrote it must have been on some interesting pharmaceuticals. What the hell are the two start/end things? And most users actually just want a single page and should never have used that thing. Nasty. I'm not touching that with a ten-foot pole. Linus ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-11-05 19:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-11-04 15:00 flush_dcache_page vs kunmap_local Matthew Wilcox 2021-11-04 15:30 ` Linus Torvalds 2021-11-04 16:54 ` Catalin Marinas 2021-11-04 17:08 ` Linus Torvalds 2021-11-04 18:04 ` Catalin Marinas 2021-11-04 18:20 ` Russell King (Oracle) 2021-11-04 18:23 ` Linus Torvalds 2021-11-05 19:40 ` Catalin Marinas 2021-11-04 18:38 ` Matthew Wilcox 2021-11-04 21:02 ` Linus Torvalds
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox