* Re: [PATCH] block: Remove special-casing of compound pages
[not found] ` <170198306635.1954272.10907610290128291539.b4-ty@kernel.dk>
@ 2024-02-29 18:25 ` Greg Edwards
2024-02-29 19:37 ` Matthew Wilcox
0 siblings, 1 reply; 3+ messages in thread
From: Greg Edwards @ 2024-02-29 18:25 UTC (permalink / raw)
To: Jens Axboe
Cc: Matthew Wilcox (Oracle), Andrew Morton, Kirill A . Shutemov,
Hugh Dickins, linux-mm, linux-block, linux-fsdevel, linux-kernel,
stable, qemu-devel
On Thu, Dec 07, 2023 at 02:04:26PM -0700, Jens Axboe wrote:
> On Mon, 14 Aug 2023 15:41:00 +0100, Matthew Wilcox (Oracle) wrote:
>> The special casing was originally added in pre-git history; reproducing
>> the commit log here:
>>
>>> commit a318a92567d77
>>> Author: Andrew Morton <akpm@osdl.org>
>>> Date: Sun Sep 21 01:42:22 2003 -0700
>>>
>>> [PATCH] Speed up direct-io hugetlbpage handling
>>>
>>> This patch short-circuits all the direct-io page dirtying logic for
>>> higher-order pages. Without this, we pointlessly bounce BIOs up to
>>> keventd all the time.
>>
>> [...]
>
> Applied, thanks!
>
> [1/1] block: Remove special-casing of compound pages
> commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55
This commit results in a change of behavior for QEMU VMs backed by hugepages
that open their VM disk image file with O_DIRECT (QEMU cache=none or
cache.direct=on options). When the VM shuts down and the QEMU process exits,
one or two hugepages may fail to free correctly. It appears to be a race, as
it doesn't happen every time.
From debugging on 6.8-rc6, when it occurs, the hugepage that fails to free has
a non-zero refcount when it hits the folio_put_testzero(folio) test in
release_pages(). On a failure test iteration with 1 GiB hugepages, the failing
folio had a mapcount of 0, refcount of 35, and folio_maybe_dma_pinned was true.
The problem only occurs when the VM disk image file is opened with O_DIRECT.
When using QEMU cache=writeback or cache.direct=off options, it does not occur.
We first noticed it on the 6.1.y stable kernel when this commit landed there
(6.1.75).
A very simple reproducer without KVM (just boot VM up, then shut it down):
echo 512 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
qemu-system-x86_64 \
-cpu qemu64 \
-m 1024 \
-nographic \
-mem-path /dev/hugepages/vm00 \
-mem-prealloc \
-drive file=test.qcow2,if=none,cache=none,id=drive0 \
-device virtio-blk-pci,drive=drive0,id=disk0,bootindex=1
rm -f /dev/hugepages/vm00
Some testing notes:
* occurs with 6.1.75, 6.6.14, 6.8-rc6, and linux-next-20240229
* occurs with 1 GiB and 2 MiB huge pages, with both hugetlbfs and memfd
* occurs with QEMU 8.0.y, 8.1.y, 8.2.y, and master
* occurs with (-enable-kvm -cpu host) or without (-cpu qemu64) KVM
Thanks for your time!
Greg
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] block: Remove special-casing of compound pages
2024-02-29 18:25 ` [PATCH] block: Remove special-casing of compound pages Greg Edwards
@ 2024-02-29 19:37 ` Matthew Wilcox
2024-02-29 20:05 ` Greg Edwards
0 siblings, 1 reply; 3+ messages in thread
From: Matthew Wilcox @ 2024-02-29 19:37 UTC (permalink / raw)
To: Greg Edwards
Cc: Jens Axboe, Andrew Morton, Kirill A . Shutemov, Hugh Dickins,
linux-mm, linux-block, linux-fsdevel, linux-kernel, stable,
qemu-devel
On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote:
> > [1/1] block: Remove special-casing of compound pages
> > commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55
>
> This commit results in a change of behavior for QEMU VMs backed by hugepages
> that open their VM disk image file with O_DIRECT (QEMU cache=none or
> cache.direct=on options). When the VM shuts down and the QEMU process exits,
> one or two hugepages may fail to free correctly. It appears to be a race, as
> it doesn't happen every time.
Hi Greg,
By sheer coincidence the very next email after this one was:
https://lore.kernel.org/linux-mm/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.com/T/#u
Can you try Tony's patch and see if it fixes your problem?
I haven't even begun to analyse either your email or his patch,
but there's a strong likelihood that they're the same thing.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] block: Remove special-casing of compound pages
2024-02-29 19:37 ` Matthew Wilcox
@ 2024-02-29 20:05 ` Greg Edwards
0 siblings, 0 replies; 3+ messages in thread
From: Greg Edwards @ 2024-02-29 20:05 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Jens Axboe, Andrew Morton, Kirill A . Shutemov, Hugh Dickins,
linux-mm, linux-block, linux-fsdevel, linux-kernel, stable,
qemu-devel
On Thu, Feb 29, 2024 at 07:37:11PM +0000, Matthew Wilcox wrote:
> On Thu, Feb 29, 2024 at 11:25:13AM -0700, Greg Edwards wrote:
>>> [1/1] block: Remove special-casing of compound pages
>>> commit: 1b151e2435fc3a9b10c8946c6aebe9f3e1938c55
>>
>> This commit results in a change of behavior for QEMU VMs backed by hugepages
>> that open their VM disk image file with O_DIRECT (QEMU cache=none or
>> cache.direct=on options). When the VM shuts down and the QEMU process exits,
>> one or two hugepages may fail to free correctly. It appears to be a race, as
>> it doesn't happen every time.
>
> By sheer coincidence the very next email after this one was:
>
> https://lore.kernel.org/linux-mm/86e592a9-98d4-4cff-a646-0c0084328356@cybernetics.com/T/#u
>
> Can you try Tony's patch and see if it fixes your problem?
> I haven't even begun to analyse either your email or his patch,
> but there's a strong likelihood that they're the same thing.
This does appear to fix it. Thank you!
I'll do some more testing on it today, then add a Tested-by: tag if it
holds up.
Greg
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2024-02-29 21:21 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20230814144100.596749-1-willy@infradead.org>
[not found] ` <170198306635.1954272.10907610290128291539.b4-ty@kernel.dk>
2024-02-29 18:25 ` [PATCH] block: Remove special-casing of compound pages Greg Edwards
2024-02-29 19:37 ` Matthew Wilcox
2024-02-29 20:05 ` Greg Edwards
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).