* Re: [PATCH] block: Remove special-casing of compound pages [not found] ` <170198306635.1954272.10907610290128291539.b4-ty@kernel.dk> @ 2024-02-29 18:25 ` Greg Edwards 2024-02-29 19:37 ` Matthew Wilcox 0 siblings, 1 reply; 3+ messages in thread From: Greg Edwards @ 2024-02-29 18:25 UTC (permalink / raw) To: Jens Axboe Cc: Matthew Wilcox (Oracle), Andrew Morton, Kirill A . Shutemov, Hugh Dickins, linux-mm, linux-block, linux-fsdevel, linux-kernel, stable, qemu-devel On Thu, Dec 07, 2023 at 02:04:26PM -0700, Jens Axboe wrote: > On Mon, 14 Aug 2023 15:41:00 +0100, Matthew Wilcox (Oracle) wrote: >> The special casing was originally added in pre-git history; reproducing >> the commit log here: >> >>> commit a318a92567d77 >>> Author: Andrew Morton <akpm@osdl.org> >>> Date: Sun Sep 21 01:42:22 2003 -0700 >>> >>> [PATCH] Speed up direct-io hugetlbpage handling >>> >>> This patch short-circuits all the direct-io page dirtying logic for >>> higher-order pages. Without this, we pointlessly bounce BIOs up to >>> keventd all the time. >> >> [...] > > Applied, thanks! > > [1/1] block: Remove special-casing of compound pages > commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 This commit results in a change of behavior for QEMU VMs backed by hugepages that open their VM disk image file with O_DIRECT (QEMU cache=none or cache.direct=on options). When the VM shuts down and the QEMU process exits, one or two hugepages may fail to free correctly. It appears to be a race, as it doesn't happen every time. From debugging on 6.8-rc6, when it occurs, the hugepage that fails to free has a non-zero refcount when it hits the folio_put_testzero(folio) test in release_pages(). On a failure test iteration with 1 GiB hugepages, the failing folio had a mapcount of 0, refcount of 35, and folio_maybe_dma_pinned was true. The problem only occurs when the VM disk image file is opened with O_DIRECT. When using QEMU cache=writeback or cache.direct=off options, it does not occur. We first noticed it on the 6.1.y stable kernel when this commit landed there (6.1.75). A very simple reproducer without KVM (just boot VM up, then shut it down): echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages qemu-system-x86_64 \ -cpu qemu64 \ -m 1024 \ -nographic \ -mem-path /dev/hugepages/vm00 \ -mem-prealloc \ -drive file=test.qcow2,if=none,cache=none,id=drive0 \ -device virtio-blk-pci,drive=drive0,id=disk0,bootindex=1 rm -f /dev/hugepages/vm00 Some testing notes: * occurs with 6.1.75, 6.6.14, 6.8-rc6, and linux-next-20240229 * occurs with 1 GiB and 2 MiB huge pages, with both hugetlbfs and memfd * occurs with QEMU 8.0.y, 8.1.y, 8.2.y, and master * occurs with (-enable-kvm -cpu host) or without (-cpu qemu64) KVM Thanks for your time! Greg ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] block: Remove special-casing of compound pages 2024-02-29 18:25 ` [PATCH] block: Remove special-casing of compound pages Greg Edwards @ 2024-02-29 19:37 ` Matthew Wilcox 2024-02-29 20:05 ` Greg Edwards 0 siblings, 1 reply; 3+ messages in thread From: Matthew Wilcox @ 2024-02-29 19:37 UTC (permalink / raw) To: Greg Edwards Cc: Jens Axboe, Andrew Morton, Kirill A . Shutemov, Hugh Dickins, linux-mm, linux-block, linux-fsdevel, linux-kernel, stable, qemu-devel On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote: > > [1/1] block: Remove special-casing of compound pages > > commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 > > This commit results in a change of behavior for QEMU VMs backed by hugepages > that open their VM disk image file with O_DIRECT (QEMU cache=none or > cache.direct=on options). When the VM shuts down and the QEMU process exits, > one or two hugepages may fail to free correctly. It appears to be a race, as > it doesn't happen every time. Hi Greg, By sheer coincidence the very next email after this one was: https://lore.kernel.org/linux-mm/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.com/T/#u Can you try Tony's patch and see if it fixes your problem? I haven't even begun to analyse either your email or his patch, but there's a strong likelihood that they're the same thing. ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] block: Remove special-casing of compound pages 2024-02-29 19:37 ` Matthew Wilcox @ 2024-02-29 20:05 ` Greg Edwards 0 siblings, 0 replies; 3+ messages in thread From: Greg Edwards @ 2024-02-29 20:05 UTC (permalink / raw) To: Matthew Wilcox Cc: Jens Axboe, Andrew Morton, Kirill A . Shutemov, Hugh Dickins, linux-mm, linux-block, linux-fsdevel, linux-kernel, stable, qemu-devel On Thu, Feb 29, 2024 at 07:37:11PM +0000, Matthew Wilcox wrote: > On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote: >>> [1/1] block: Remove special-casing of compound pages >>> commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55 >> >> This commit results in a change of behavior for QEMU VMs backed by hugepages >> that open their VM disk image file with O_DIRECT (QEMU cache=none or >> cache.direct=on options). When the VM shuts down and the QEMU process exits, >> one or two hugepages may fail to free correctly. It appears to be a race, as >> it doesn't happen every time. > > By sheer coincidence the very next email after this one was: > > https://lore.kernel.org/linux-mm/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.com/T/#u > > Can you try Tony's patch and see if it fixes your problem? > I haven't even begun to analyse either your email or his patch, > but there's a strong likelihood that they're the same thing. This does appear to fix it. Thank you! I'll do some more testing on it today, then add a Tested-by: tag if it holds up. Greg ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-02-29 21:21 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <20230814144100.596749-1-willy@infradead.org> [not found] ` <170198306635.1954272.10907610290128291539.b4-ty@kernel.dk> 2024-02-29 18:25 ` [PATCH] block: Remove special-casing of compound pages Greg Edwards 2024-02-29 19:37 ` Matthew Wilcox 2024-02-29 20:05 ` Greg Edwards
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).