* [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
@ 2024-05-23 6:39 Donet Tom
2024-05-23 19:13 ` Andrew Morton
2024-05-24 18:13 ` Muhammad Usama Anjum
0 siblings, 2 replies; 15+ messages in thread
From: Donet Tom @ 2024-05-23 6:39 UTC (permalink / raw)
To: Andrew Morton, Shuah Khan, Matthew Wilcox, Tony Battersby
Cc: linux-mm, linux-kselftest, linux-kernel, Ritesh Harjani,
Mike Rapoport, Muchun Song, David Hildenbrand, stable
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index eb5f39a2668b..87d8130b3376 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -71,6 +71,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..6f6587c7913c
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
+ if (fd < 0)
+ ksft_exit_fail_msg("Error opening file");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE
+ | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_msg("Error mapping memory");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_msg("Error writing to file");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ printf("No. Free pages before allocation : %d\n", free_hpage_b);
+ printf("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ printf("No. Free pages before allocation : %d\n", free_hpage_b);
+ printf("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+ return 0;
+}
+
--
2.39.3
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-23 6:39 [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages() Donet Tom
@ 2024-05-23 19:13 ` Andrew Morton
2024-05-23 20:40 ` David Hildenbrand
2024-05-24 18:13 ` Muhammad Usama Anjum
1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2024-05-23 19:13 UTC (permalink / raw)
To: Donet Tom
Cc: Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Ritesh Harjani, Mike Rapoport,
Muchun Song, David Hildenbrand, stable
On Thu, 23 May 2024 01:39:05 -0500 Donet Tom <donettom@linux.ibm.com> wrote:
> Commit 1b151e2435fc ("block: Remove special-casing of compound
> pages") caused a change in behaviour when releasing the pages
> if the buffer does not start at the beginning of the page. This
> was because the calculation of the number of pages to release
> was incorrect.
> This was fixed by commit 38b43539d64b ("block: Fix page refcounts
> for unaligned buffers in __bio_release_pages()").
>
> We pin the user buffer during direct I/O writes. If this buffer is a
> hugepage, bio_release_page() will unpin it and decrement all references
> and pin counts at ->bi_end_io. However, if any references to the hugepage
> remain post-I/O, the hugepage will not be freed upon unmap, leading
> to a memory leak.
>
> This patch verifies that a hugepage, used as a user buffer for DIO
> operations, is correctly freed upon unmapping, regardless of whether
> the offsets are aligned or unaligned w.r.t page boundary.
>
You have stable@vger.kernel.org in the mail headers, so I assume you're
proposing this for backporting. When doing this, please include
Cc: <stable@vger.kernel.org>
in the changelog footers and also include a Fixes: target. I'm
assuming the suitable Fixes: target for this patch is 38b43539d64b?
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-23 19:13 ` Andrew Morton
@ 2024-05-23 20:40 ` David Hildenbrand
2024-05-24 2:57 ` Andrew Morton
2024-05-24 6:53 ` Ritesh Harjani
0 siblings, 2 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-23 20:40 UTC (permalink / raw)
To: Andrew Morton, Donet Tom
Cc: Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Ritesh Harjani, Mike Rapoport,
Muchun Song, stable
On 23.05.24 21:13, Andrew Morton wrote:
> On Thu, 23 May 2024 01:39:05 -0500 Donet Tom <donettom@linux.ibm.com> wrote:
>
>> Commit 1b151e2435fc ("block: Remove special-casing of compound
>> pages") caused a change in behaviour when releasing the pages
>> if the buffer does not start at the beginning of the page. This
>> was because the calculation of the number of pages to release
>> was incorrect.
>> This was fixed by commit 38b43539d64b ("block: Fix page refcounts
>> for unaligned buffers in __bio_release_pages()").
>>
>> We pin the user buffer during direct I/O writes. If this buffer is a
>> hugepage, bio_release_page() will unpin it and decrement all references
>> and pin counts at ->bi_end_io. However, if any references to the hugepage
>> remain post-I/O, the hugepage will not be freed upon unmap, leading
>> to a memory leak.
>>
>> This patch verifies that a hugepage, used as a user buffer for DIO
>> operations, is correctly freed upon unmapping, regardless of whether
>> the offsets are aligned or unaligned w.r.t page boundary.
>>
>
Two SOF, is there a Co-developed-by: missing?
> You have stable@vger.kernel.org in the mail headers, so I assume you're
> proposing this for backporting. When doing this, please include
>
> Cc: <stable@vger.kernel.org>
>
> in the changelog footers and also include a Fixes: target. I'm
> assuming the suitable Fixes: target for this patch is 38b43539d64b?
This adds a new selfest to make sure what was fixed (and backported to
stable) remains fixed.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-23 20:40 ` David Hildenbrand
@ 2024-05-24 2:57 ` Andrew Morton
2024-05-24 6:21 ` David Hildenbrand
2024-05-24 6:53 ` Ritesh Harjani
1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2024-05-24 2:57 UTC (permalink / raw)
To: David Hildenbrand
Cc: Donet Tom, Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Ritesh Harjani, Mike Rapoport,
Muchun Song, stable
On Thu, 23 May 2024 22:40:25 +0200 David Hildenbrand <david@redhat.com> wrote:
> > You have stable@vger.kernel.org in the mail headers, so I assume you're
> > proposing this for backporting. When doing this, please include
> >
> > Cc: <stable@vger.kernel.org>
> >
> > in the changelog footers and also include a Fixes: target. I'm
> > assuming the suitable Fixes: target for this patch is 38b43539d64b?
>
> This adds a new selfest to make sure what was fixed (and backported to
> stable) remains fixed.
Sure. But we should provide -stable maintainers guidance for "how far
back to go". There isn't much point in backporting this into kernels
where it's known to fail!
I'm still thinking that we want this in kernels which have 38b43539d64b?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 2:57 ` Andrew Morton
@ 2024-05-24 6:21 ` David Hildenbrand
2024-05-24 6:43 ` Ritesh Harjani
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-24 6:21 UTC (permalink / raw)
To: Andrew Morton
Cc: Donet Tom, Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Ritesh Harjani, Mike Rapoport,
Muchun Song, stable
On 24.05.24 04:57, Andrew Morton wrote:
> On Thu, 23 May 2024 22:40:25 +0200 David Hildenbrand <david@redhat.com> wrote:
>
>>> You have stable@vger.kernel.org in the mail headers, so I assume you're
>>> proposing this for backporting. When doing this, please include
>>>
>>> Cc: <stable@vger.kernel.org>
>>>
>>> in the changelog footers and also include a Fixes: target. I'm
>>> assuming the suitable Fixes: target for this patch is 38b43539d64b?
>>
>> This adds a new selfest to make sure what was fixed (and backported to
>> stable) remains fixed.
>
> Sure. But we should provide -stable maintainers guidance for "how far
> back to go". There isn't much point in backporting this into kernels
> where it's known to fail!
I'm probably missing something important.
1) It's a test that does not fall into the common stable kernels
categories (see Documentation/process/stable-kernel-rules.rst).
2) If it fails in a kernel *it achieved its goal* of highlighting that
something serious is broken.
>
> I'm still thinking that we want this in kernels which have 38b43539d64b?
To hide that the other kernels are seriously broken and miss that fix?
Really (1) this shouldn't be backported. I'm not even sure it should be
a selftest (sounds more like a reproducer that we usually attach to
commits, but that's too late). And if people care about backporting it,
(2) you really want this test to succeed everywhere. Especially also to
find kernels *without* 38b43539d64b
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 6:21 ` David Hildenbrand
@ 2024-05-24 6:43 ` Ritesh Harjani
2024-05-24 7:01 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2024-05-24 6:43 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton
Cc: Donet Tom, Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Mike Rapoport, Muchun Song
David Hildenbrand <david@redhat.com> writes:
dropping stable@vger.kernel.org
> On 24.05.24 04:57, Andrew Morton wrote:
>> On Thu, 23 May 2024 22:40:25 +0200 David Hildenbrand <david@redhat.com> wrote:
>>
>>>> You have stable@vger.kernel.org in the mail headers, so I assume you're
>>>> proposing this for backporting. When doing this, please include
>>>>
>>>> Cc: <stable@vger.kernel.org>
>>>>
>>>> in the changelog footers and also include a Fixes: target. I'm
>>>> assuming the suitable Fixes: target for this patch is 38b43539d64b?
>>>
>>> This adds a new selfest to make sure what was fixed (and backported to
>>> stable) remains fixed.
>>
>> Sure. But we should provide -stable maintainers guidance for "how far
>> back to go". There isn't much point in backporting this into kernels
>> where it's known to fail!
>
> I'm probably missing something important.
>
> 1) It's a test that does not fall into the common stable kernels
> categories (see Documentation/process/stable-kernel-rules.rst).
>
> 2) If it fails in a kernel *it achieved its goal* of highlighting that
> something serious is broken.
>
>>
>> I'm still thinking that we want this in kernels which have 38b43539d64b?
>
> To hide that the other kernels are seriously broken and miss that fix?
>
> Really (1) this shouldn't be backported. I'm not even sure it should be
> a selftest (sounds more like a reproducer that we usually attach to
> commits, but that's too late). And if people care about backporting it,
> (2) you really want this test to succeed everywhere. Especially also to
> find kernels *without* 38b43539d64b
Sorry about the noise and cc'd to stable. I believe we don't need to
backport this test. The idea of adding a selftests was "also" to catch any
future bugs like this.
I am unaware of any functional test suite where we could add such tests
like how filesystems have fstests. Hence the ideas was to add this in
selftests.
So this begs the question which I also asked few people at LSFMM,
Does mm has any mmfvt (mm functional verification tests)? Should we have
something like this? Was it tried in past?
(Sorry, mmtests name is already taken - "MMTests is a configurable test suite that runs performance tests against arbitrary workloads.")
-ritesh
>
> --
> Cheers,
>
> David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 6:43 ` Ritesh Harjani
@ 2024-05-24 7:01 ` David Hildenbrand
2024-05-24 9:31 ` Donet Tom
0 siblings, 1 reply; 15+ messages in thread
From: David Hildenbrand @ 2024-05-24 7:01 UTC (permalink / raw)
To: Ritesh Harjani (IBM), Andrew Morton
Cc: Donet Tom, Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Mike Rapoport, Muchun Song
On 24.05.24 08:43, Ritesh Harjani (IBM) wrote:
> David Hildenbrand <david@redhat.com> writes:
>
> dropping stable@vger.kernel.org
>
>> On 24.05.24 04:57, Andrew Morton wrote:
>>> On Thu, 23 May 2024 22:40:25 +0200 David Hildenbrand <david@redhat.com> wrote:
>>>
>>>>> You have stable@vger.kernel.org in the mail headers, so I assume you're
>>>>> proposing this for backporting. When doing this, please include
>>>>>
>>>>> Cc: <stable@vger.kernel.org>
>>>>>
>>>>> in the changelog footers and also include a Fixes: target. I'm
>>>>> assuming the suitable Fixes: target for this patch is 38b43539d64b?
>>>>
>>>> This adds a new selfest to make sure what was fixed (and backported to
>>>> stable) remains fixed.
>>>
>>> Sure. But we should provide -stable maintainers guidance for "how far
>>> back to go". There isn't much point in backporting this into kernels
>>> where it's known to fail!
>>
>> I'm probably missing something important.
>>
>> 1) It's a test that does not fall into the common stable kernels
>> categories (see Documentation/process/stable-kernel-rules.rst).
>>
>> 2) If it fails in a kernel *it achieved its goal* of highlighting that
>> something serious is broken.
>>
>>>
>>> I'm still thinking that we want this in kernels which have 38b43539d64b?
>>
>> To hide that the other kernels are seriously broken and miss that fix?
>>
>> Really (1) this shouldn't be backported. I'm not even sure it should be
>> a selftest (sounds more like a reproducer that we usually attach to
>> commits, but that's too late). And if people care about backporting it,
>> (2) you really want this test to succeed everywhere. Especially also to
>> find kernels *without* 38b43539d64b
>
>
> Sorry about the noise and cc'd to stable. I believe we don't need to
> backport this test. The idea of adding a selftests was "also" to catch any
> future bugs like this.
Yes, for that purpose it's fine, but it has quite the "specific
reproducer taste". Having it as part of something that is prepared to
run against arbitrary kernels (which selftests frequently are not) to
detect known problems feels better.
>
> I am unaware of any functional test suite where we could add such tests
> like how filesystems have fstests. Hence the ideas was to add this in
> selftests.
LTP has quite some MM testcases in testcases/kernel/mem/. But it often
has a different focus (CVE or advanced feature/syscall tests). Now that
most things are CVEs ... it might not be a bad fit ... :)
>
> So this begs the question which I also asked few people at LSFMM,
> Does mm has any mmfvt (mm functional verification tests)? Should we have
> something like this? Was it tried in past?
I think LTP is mostly the only "bigger" thing we have that is prepared
to run against arbitrary kernel versions.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 7:01 ` David Hildenbrand
@ 2024-05-24 9:31 ` Donet Tom
2024-05-24 15:20 ` David Hildenbrand
0 siblings, 1 reply; 15+ messages in thread
From: Donet Tom @ 2024-05-24 9:31 UTC (permalink / raw)
To: David Hildenbrand, Ritesh Harjani (IBM), Andrew Morton
Cc: Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Mike Rapoport, Muchun Song
On 5/24/24 12:31, David Hildenbrand wrote:
> On 24.05.24 08:43, Ritesh Harjani (IBM) wrote:
>> David Hildenbrand <david@redhat.com> writes:
>>
>> dropping stable@vger.kernel.org
>>
>>> On 24.05.24 04:57, Andrew Morton wrote:
>>>> On Thu, 23 May 2024 22:40:25 +0200 David Hildenbrand
>>>> <david@redhat.com> wrote:
>>>>
>>>>>> You have stable@vger.kernel.org in the mail headers, so I assume
>>>>>> you're
>>>>>> proposing this for backporting. When doing this, please include
>>>>>>
>>>>>> Cc: <stable@vger.kernel.org>
>>>>>>
>>>>>> in the changelog footers and also include a Fixes: target. I'm
>>>>>> assuming the suitable Fixes: target for this patch is 38b43539d64b?
>>>>>
>>>>> This adds a new selfest to make sure what was fixed (and
>>>>> backported to
>>>>> stable) remains fixed.
>>>>
>>>> Sure. But we should provide -stable maintainers guidance for "how far
>>>> back to go". There isn't much point in backporting this into kernels
>>>> where it's known to fail!
>>>
>>> I'm probably missing something important.
>>>
>>> 1) It's a test that does not fall into the common stable kernels
>>> categories (see Documentation/process/stable-kernel-rules.rst).
>>>
>>> 2) If it fails in a kernel *it achieved its goal* of highlighting that
>>> something serious is broken.
>>>
>>>>
>>>> I'm still thinking that we want this in kernels which have
>>>> 38b43539d64b?
>>>
>>> To hide that the other kernels are seriously broken and miss that fix?
>>>
>>> Really (1) this shouldn't be backported. I'm not even sure it should be
>>> a selftest (sounds more like a reproducer that we usually attach to
>>> commits, but that's too late). And if people care about backporting it,
>>> (2) you really want this test to succeed everywhere. Especially also to
>>> find kernels *without* 38b43539d64b
>>
>>
>> Sorry about the noise and cc'd to stable. I believe we don't need to
>> backport this test. The idea of adding a selftests was "also" to
>> catch any
>> future bugs like this.
>
> Yes, for that purpose it's fine, but it has quite the "specific
> reproducer taste". Having it as part of something that is prepared to
> run against arbitrary kernels (which selftests frequently are not) to
> detect known problems feels better.
>
> I have seen some hugetlbfs directio tests in LTP. If you think
> selftest is not the correct place to add this test, we can drop this
> test from selftests and add it to LTP.
>
> Thanks
> Donet
>
>>
>> I am unaware of any functional test suite where we could add such tests
>> like how filesystems have fstests. Hence the ideas was to add this in
>> selftests.
>
> LTP has quite some MM testcases in testcases/kernel/mem/. But it often
> has a different focus (CVE or advanced feature/syscall tests). Now
> that most things are CVEs ... it might not be a bad fit ... :)
>
>>
>> So this begs the question which I also asked few people at LSFMM,
>> Does mm has any mmfvt (mm functional verification tests)? Should we have
>> something like this? Was it tried in past?
>
> I think LTP is mostly the only "bigger" thing we have that is prepared
> to run against arbitrary kernel versions.
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 9:31 ` Donet Tom
@ 2024-05-24 15:20 ` David Hildenbrand
0 siblings, 0 replies; 15+ messages in thread
From: David Hildenbrand @ 2024-05-24 15:20 UTC (permalink / raw)
To: Donet Tom, Ritesh Harjani (IBM), Andrew Morton
Cc: Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Mike Rapoport, Muchun Song
On 24.05.24 11:31, Donet Tom wrote:
>
> On 5/24/24 12:31, David Hildenbrand wrote:
>> On 24.05.24 08:43, Ritesh Harjani (IBM) wrote:
>>> David Hildenbrand <david@redhat.com> writes:
>>>
>>> dropping stable@vger.kernel.org
>>>
>>>> On 24.05.24 04:57, Andrew Morton wrote:
>>>>> On Thu, 23 May 2024 22:40:25 +0200 David Hildenbrand
>>>>> <david@redhat.com> wrote:
>>>>>
>>>>>>> You have stable@vger.kernel.org in the mail headers, so I assume
>>>>>>> you're
>>>>>>> proposing this for backporting. When doing this, please include
>>>>>>>
>>>>>>> Cc: <stable@vger.kernel.org>
>>>>>>>
>>>>>>> in the changelog footers and also include a Fixes: target. I'm
>>>>>>> assuming the suitable Fixes: target for this patch is 38b43539d64b?
>>>>>>
>>>>>> This adds a new selfest to make sure what was fixed (and
>>>>>> backported to
>>>>>> stable) remains fixed.
>>>>>
>>>>> Sure. But we should provide -stable maintainers guidance for "how far
>>>>> back to go". There isn't much point in backporting this into kernels
>>>>> where it's known to fail!
>>>>
>>>> I'm probably missing something important.
>>>>
>>>> 1) It's a test that does not fall into the common stable kernels
>>>> categories (see Documentation/process/stable-kernel-rules.rst).
>>>>
>>>> 2) If it fails in a kernel *it achieved its goal* of highlighting that
>>>> something serious is broken.
>>>>
>>>>>
>>>>> I'm still thinking that we want this in kernels which have
>>>>> 38b43539d64b?
>>>>
>>>> To hide that the other kernels are seriously broken and miss that fix?
>>>>
>>>> Really (1) this shouldn't be backported. I'm not even sure it should be
>>>> a selftest (sounds more like a reproducer that we usually attach to
>>>> commits, but that's too late). And if people care about backporting it,
>>>> (2) you really want this test to succeed everywhere. Especially also to
>>>> find kernels *without* 38b43539d64b
>>>
>>>
>>> Sorry about the noise and cc'd to stable. I believe we don't need to
>>> backport this test. The idea of adding a selftests was "also" to
>>> catch any
>>> future bugs like this.
>>
>> Yes, for that purpose it's fine, but it has quite the "specific
>> reproducer taste". Having it as part of something that is prepared to
>> run against arbitrary kernels (which selftests frequently are not) to
>> detect known problems feels better.
>>
> I have seen some hugetlbfs directio tests in LTP. If you think
> selftest is not the correct place to add this test, we can drop this
> test from selftests and add it to LTP.
I think LTP might be a better fit to spot such issues in the wild. But I
don't have a strong opinion.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-23 20:40 ` David Hildenbrand
2024-05-24 2:57 ` Andrew Morton
@ 2024-05-24 6:53 ` Ritesh Harjani
2024-05-24 20:12 ` Andrew Morton
1 sibling, 1 reply; 15+ messages in thread
From: Ritesh Harjani @ 2024-05-24 6:53 UTC (permalink / raw)
To: David Hildenbrand, Andrew Morton, Donet Tom
Cc: Shuah Khan, Matthew Wilcox, Tony Battersby, linux-mm,
linux-kselftest, linux-kernel, Mike Rapoport, Muchun Song
dropping stable email again.
David Hildenbrand <david@redhat.com> writes:
> On 23.05.24 21:13, Andrew Morton wrote:
>> On Thu, 23 May 2024 01:39:05 -0500 Donet Tom <donettom@linux.ibm.com> wrote:
>>
>>> Commit 1b151e2435fc ("block: Remove special-casing of compound
>>> pages") caused a change in behaviour when releasing the pages
>>> if the buffer does not start at the beginning of the page. This
>>> was because the calculation of the number of pages to release
>>> was incorrect.
>>> This was fixed by commit 38b43539d64b ("block: Fix page refcounts
>>> for unaligned buffers in __bio_release_pages()").
>>>
>>> We pin the user buffer during direct I/O writes. If this buffer is a
>>> hugepage, bio_release_page() will unpin it and decrement all references
>>> and pin counts at ->bi_end_io. However, if any references to the hugepage
>>> remain post-I/O, the hugepage will not be freed upon unmap, leading
>>> to a memory leak.
>>>
>>> This patch verifies that a hugepage, used as a user buffer for DIO
>>> operations, is correctly freed upon unmapping, regardless of whether
>>> the offsets are aligned or unaligned w.r.t page boundary.
>>>
>>
>
> Two SOF, is there a Co-developed-by: missing?
>
Sorry about that. Andrew, could you please add the tag (let me know if you
would like me to send v2). Will take care of it next time.
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
-ritesh
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 6:53 ` Ritesh Harjani
@ 2024-05-24 20:12 ` Andrew Morton
0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2024-05-24 20:12 UTC (permalink / raw)
To: Ritesh Harjani
Cc: David Hildenbrand, Donet Tom, Shuah Khan, Matthew Wilcox,
Tony Battersby, linux-mm, linux-kselftest, linux-kernel,
Mike Rapoport, Muchun Song
On Fri, 24 May 2024 12:23:15 +0530 Ritesh Harjani (IBM) <ritesh.list@gmail.com> wrote:
> >>> This patch verifies that a hugepage, used as a user buffer for DIO
> >>> operations, is correctly freed upon unmapping, regardless of whether
> >>> the offsets are aligned or unaligned w.r.t page boundary.
> >>>
> >>
> >
> > Two SOF, is there a Co-developed-by: missing?
> >
>
> Sorry about that. Andrew, could you please add the tag (let me know if you
> would like me to send v2). Will take care of it next time.
>
> Co-developed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
I made that edit, thanks.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-23 6:39 [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages() Donet Tom
2024-05-23 19:13 ` Andrew Morton
@ 2024-05-24 18:13 ` Muhammad Usama Anjum
2024-05-25 15:40 ` Donet Tom
1 sibling, 1 reply; 15+ messages in thread
From: Muhammad Usama Anjum @ 2024-05-24 18:13 UTC (permalink / raw)
To: Donet Tom, Andrew Morton, Shuah Khan, Matthew Wilcox,
Tony Battersby
Cc: Muhammad Usama Anjum, linux-mm, linux-kselftest, linux-kernel,
Ritesh Harjani, Mike Rapoport, Muchun Song, David Hildenbrand,
stable
Thank you for submitting a patch.
On 5/22/24 11:39 PM, Donet Tom wrote:
> Commit 1b151e2435fc ("block: Remove special-casing of compound
> pages") caused a change in behaviour when releasing the pages
> if the buffer does not start at the beginning of the page. This
> was because the calculation of the number of pages to release
> was incorrect.
> This was fixed by commit 38b43539d64b ("block: Fix page refcounts
> for unaligned buffers in __bio_release_pages()").
>
> We pin the user buffer during direct I/O writes. If this buffer is a
> hugepage, bio_release_page() will unpin it and decrement all references
> and pin counts at ->bi_end_io. However, if any references to the hugepage
> remain post-I/O, the hugepage will not be freed upon unmap, leading
> to a memory leak.
>
> This patch verifies that a hugepage, used as a user buffer for DIO
> operations, is correctly freed upon unmapping, regardless of whether
> the offsets are aligned or unaligned w.r.t page boundary.
>
> Test Result Fail Scenario (Without the fix)
> --------------------------------------------------------
> []# ./hugetlb_dio
> TAP version 13
> 1..4
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 1 : Huge pages freed successfully !
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 2 : Huge pages freed successfully !
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 3 : Huge pages freed successfully !
> No. Free pages before allocation : 7
> No. Free pages after munmap : 6
> not ok 4 : Huge pages not freed!
> Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
>
> Test Result PASS Scenario (With the fix)
> ---------------------------------------------------------
> []#./hugetlb_dio
> TAP version 13
> 1..4
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 1 : Huge pages freed successfully !
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 2 : Huge pages freed successfully !
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 3 : Huge pages freed successfully !
> No. Free pages before allocation : 7
> No. Free pages after munmap : 7
> ok 4 : Huge pages freed successfully !
> Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
> ---
> tools/testing/selftests/mm/Makefile | 1 +
> tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
Add this test to vm_runtest.sh as all the tests are run from this script in
CIs.
> 2 files changed, 119 insertions(+)
> create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
>
> diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
> index eb5f39a2668b..87d8130b3376 100644
> --- a/tools/testing/selftests/mm/Makefile
> +++ b/tools/testing/selftests/mm/Makefile
> @@ -71,6 +71,7 @@ TEST_GEN_FILES += ksm_functional_tests
> TEST_GEN_FILES += mdwe_test
> TEST_GEN_FILES += hugetlb_fault_after_madv
> TEST_GEN_FILES += hugetlb_madv_vs_map
> +TEST_GEN_FILES += hugetlb_dio
>
> ifneq ($(ARCH),arm64)
> TEST_GEN_FILES += soft-dirty
> diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
> new file mode 100644
> index 000000000000..6f6587c7913c
> --- /dev/null
> +++ b/tools/testing/selftests/mm/hugetlb_dio.c
> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This program tests for hugepage leaks after DIO writes to a file using a
> + * hugepage as the user buffer. During DIO, the user buffer is pinned and
> + * should be properly unpinned upon completion. This patch verifies that the
> + * kernel correctly unpins the buffer at DIO completion for both aligned and
> + * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
> + * is freed upon unmapping.
> + */
> +
> +#define _GNU_SOURCE
> +#include <stdio.h>
> +#include <sys/stat.h>
> +#include <stdlib.h>
> +#include <fcntl.h>
> +#include <stdint.h>
> +#include <unistd.h>
> +#include <string.h>
> +#include <sys/mman.h>
> +#include "vm_util.h"
> +#include "../kselftest.h"
> +
> +void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
> +{
> + int fd;
> + char *buffer = NULL;
> + char *orig_buffer = NULL;
> + size_t h_pagesize = 0;
> + size_t writesize;
> + int free_hpage_b = 0;
> + int free_hpage_a = 0;
> +
> + writesize = end_off - start_off;
> +
> + /* Get the default huge page size */
> + h_pagesize = default_huge_page_size();
> + if (!h_pagesize)
> + ksft_exit_fail_msg("Unable to determine huge page size\n");
> +
> + /* Open the file to DIO */
> + fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
> + if (fd < 0)
> + ksft_exit_fail_msg("Error opening file");
Use ksft_exit_fail_perror to print the info about the error
> +
> + /* Get the free huge pages before allocation */
> + free_hpage_b = get_free_hugepages();
> + if (free_hpage_b == 0) {
> + close(fd);
> + ksft_exit_skip("No free hugepage, exiting!\n");
> + }
> +
> + /* Allocate a hugetlb page */
> + orig_buffer = mmap(NULL, h_pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE
> + | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
Better align the arguments. Put all flags in one line instead of slitting
like this
> + if (orig_buffer == MAP_FAILED) {
> + close(fd);
> + ksft_exit_fail_msg("Error mapping memory");
nit: "\n" is missing from here.
> + }
> + buffer = orig_buffer;
> + buffer += start_off;
> +
> + memset(buffer, 'A', writesize);
> +
> + /* Write the buffer to the file */
> + if (write(fd, buffer, writesize) != (writesize)) {
> + munmap(orig_buffer, h_pagesize);
> + close(fd);
> + ksft_exit_fail_msg("Error writing to file");
> + }
> +
> + /* unmap the huge page */
> + munmap(orig_buffer, h_pagesize);
> + close(fd);
> +
> + /* Get the free huge pages after unmap*/
> + free_hpage_a = get_free_hugepages();
> +
> + /*
> + * If the no. of free hugepages before allocation and after unmap does
> + * not match - that means there could still be a page which is pinned.
> + */
> + if (free_hpage_a != free_hpage_b) {
> + printf("No. Free pages before allocation : %d\n", free_hpage_b);
Use ksft_print_msg instead
> + printf("No. Free pages after munmap : %d\n", free_hpage_a);
> + ksft_test_result_fail(": Huge pages not freed!\n");
> + } else {
> + printf("No. Free pages before allocation : %d\n", free_hpage_b);
> + printf("No. Free pages after munmap : %d\n", free_hpage_a);
> + ksft_test_result_pass(": Huge pages freed successfully !\n");
> + }
> +}
> +
> +int main(void)
> +{
> + size_t pagesize = 0;
> +
> + ksft_print_header();
> + ksft_set_plan(4);
> +
> + /* Get base page size */
> + pagesize = psize();
> +
> + /* start and end is aligned to pagesize */
> + run_dio_using_hugetlb(0, (pagesize * 3));
> +
> + /* start is aligned but end is not aligned */
> + run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
> +
> + /* start is unaligned and end is aligned */
> + run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
> +
> + /* both start and end are unaligned */
> + run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
> +
> + ksft_finished();
ksft_finished() never returns. Remove the following line.
> + return 0;
> +}
> +
--
BR,
Muhammad Usama Anjum
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-05-24 18:13 ` Muhammad Usama Anjum
@ 2024-05-25 15:40 ` Donet Tom
0 siblings, 0 replies; 15+ messages in thread
From: Donet Tom @ 2024-05-25 15:40 UTC (permalink / raw)
To: Muhammad Usama Anjum, Andrew Morton, Shuah Khan, Matthew Wilcox,
Tony Battersby
Cc: linux-mm, linux-kselftest, linux-kernel, Ritesh Harjani,
Mike Rapoport, Muchun Song, David Hildenbrand, stable
On 5/24/24 23:43, Muhammad Usama Anjum wrote:
> Thank you for submitting a patch.
>
> On 5/22/24 11:39 PM, Donet Tom wrote:
>> Commit 1b151e2435fc ("block: Remove special-casing of compound
>> pages") caused a change in behaviour when releasing the pages
>> if the buffer does not start at the beginning of the page. This
>> was because the calculation of the number of pages to release
>> was incorrect.
>> This was fixed by commit 38b43539d64b ("block: Fix page refcounts
>> for unaligned buffers in __bio_release_pages()").
>>
>> We pin the user buffer during direct I/O writes. If this buffer is a
>> hugepage, bio_release_page() will unpin it and decrement all references
>> and pin counts at ->bi_end_io. However, if any references to the hugepage
>> remain post-I/O, the hugepage will not be freed upon unmap, leading
>> to a memory leak.
>>
>> This patch verifies that a hugepage, used as a user buffer for DIO
>> operations, is correctly freed upon unmapping, regardless of whether
>> the offsets are aligned or unaligned w.r.t page boundary.
>>
>> Test Result Fail Scenario (Without the fix)
>> --------------------------------------------------------
>> []# ./hugetlb_dio
>> TAP version 13
>> 1..4
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 1 : Huge pages freed successfully !
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 2 : Huge pages freed successfully !
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 3 : Huge pages freed successfully !
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 6
>> not ok 4 : Huge pages not freed!
>> Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
>>
>> Test Result PASS Scenario (With the fix)
>> ---------------------------------------------------------
>> []#./hugetlb_dio
>> TAP version 13
>> 1..4
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 1 : Huge pages freed successfully !
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 2 : Huge pages freed successfully !
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 3 : Huge pages freed successfully !
>> No. Free pages before allocation : 7
>> No. Free pages after munmap : 7
>> ok 4 : Huge pages freed successfully !
>> Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
>>
>> Signed-off-by: Donet Tom <donettom@linux.ibm.com>
>> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
>> ---
>> tools/testing/selftests/mm/Makefile | 1 +
>> tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
> Add this test to vm_runtest.sh as all the tests are run from this script in
> CIs.
>
>> 2 files changed, 119 insertions(+)
>> create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
>>
>> diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
>> index eb5f39a2668b..87d8130b3376 100644
>> --- a/tools/testing/selftests/mm/Makefile
>> +++ b/tools/testing/selftests/mm/Makefile
>> @@ -71,6 +71,7 @@ TEST_GEN_FILES += ksm_functional_tests
>> TEST_GEN_FILES += mdwe_test
>> TEST_GEN_FILES += hugetlb_fault_after_madv
>> TEST_GEN_FILES += hugetlb_madv_vs_map
>> +TEST_GEN_FILES += hugetlb_dio
>>
>> ifneq ($(ARCH),arm64)
>> TEST_GEN_FILES += soft-dirty
>> diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
>> new file mode 100644
>> index 000000000000..6f6587c7913c
>> --- /dev/null
>> +++ b/tools/testing/selftests/mm/hugetlb_dio.c
>> @@ -0,0 +1,118 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * This program tests for hugepage leaks after DIO writes to a file using a
>> + * hugepage as the user buffer. During DIO, the user buffer is pinned and
>> + * should be properly unpinned upon completion. This patch verifies that the
>> + * kernel correctly unpins the buffer at DIO completion for both aligned and
>> + * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
>> + * is freed upon unmapping.
>> + */
>> +
>> +#define _GNU_SOURCE
>> +#include <stdio.h>
>> +#include <sys/stat.h>
>> +#include <stdlib.h>
>> +#include <fcntl.h>
>> +#include <stdint.h>
>> +#include <unistd.h>
>> +#include <string.h>
>> +#include <sys/mman.h>
>> +#include "vm_util.h"
>> +#include "../kselftest.h"
>> +
>> +void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
>> +{
>> + int fd;
>> + char *buffer = NULL;
>> + char *orig_buffer = NULL;
>> + size_t h_pagesize = 0;
>> + size_t writesize;
>> + int free_hpage_b = 0;
>> + int free_hpage_a = 0;
>> +
>> + writesize = end_off - start_off;
>> +
>> + /* Get the default huge page size */
>> + h_pagesize = default_huge_page_size();
>> + if (!h_pagesize)
>> + ksft_exit_fail_msg("Unable to determine huge page size\n");
>> +
>> + /* Open the file to DIO */
>> + fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
>> + if (fd < 0)
>> + ksft_exit_fail_msg("Error opening file");
> Use ksft_exit_fail_perror to print the info about the error
>> +
>> + /* Get the free huge pages before allocation */
>> + free_hpage_b = get_free_hugepages();
>> + if (free_hpage_b == 0) {
>> + close(fd);
>> + ksft_exit_skip("No free hugepage, exiting!\n");
>> + }
>> +
>> + /* Allocate a hugetlb page */
>> + orig_buffer = mmap(NULL, h_pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE
>> + | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
> Better align the arguments. Put all flags in one line instead of slitting
> like this
>
>> + if (orig_buffer == MAP_FAILED) {
>> + close(fd);
>> + ksft_exit_fail_msg("Error mapping memory");
> nit: "\n" is missing from here.
>
>> + }
>> + buffer = orig_buffer;
>> + buffer += start_off;
>> +
>> + memset(buffer, 'A', writesize);
>> +
>> + /* Write the buffer to the file */
>> + if (write(fd, buffer, writesize) != (writesize)) {
>> + munmap(orig_buffer, h_pagesize);
>> + close(fd);
>> + ksft_exit_fail_msg("Error writing to file");
>> + }
>> +
>> + /* unmap the huge page */
>> + munmap(orig_buffer, h_pagesize);
>> + close(fd);
>> +
>> + /* Get the free huge pages after unmap*/
>> + free_hpage_a = get_free_hugepages();
>> +
>> + /*
>> + * If the no. of free hugepages before allocation and after unmap does
>> + * not match - that means there could still be a page which is pinned.
>> + */
>> + if (free_hpage_a != free_hpage_b) {
>> + printf("No. Free pages before allocation : %d\n", free_hpage_b);
> Use ksft_print_msg instead
>
>> + printf("No. Free pages after munmap : %d\n", free_hpage_a);
>> + ksft_test_result_fail(": Huge pages not freed!\n");
>> + } else {
>> + printf("No. Free pages before allocation : %d\n", free_hpage_b);
>> + printf("No. Free pages after munmap : %d\n", free_hpage_a);
>> + ksft_test_result_pass(": Huge pages freed successfully !\n");
>> + }
>> +}
>> +
>> +int main(void)
>> +{
>> + size_t pagesize = 0;
>> +
>> + ksft_print_header();
>> + ksft_set_plan(4);
>> +
>> + /* Get base page size */
>> + pagesize = psize();
>> +
>> + /* start and end is aligned to pagesize */
>> + run_dio_using_hugetlb(0, (pagesize * 3));
>> +
>> + /* start is aligned but end is not aligned */
>> + run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
>> +
>> + /* start is unaligned and end is aligned */
>> + run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
>> +
>> + /* both start and end are unaligned */
>> + run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
>> +
>> + ksft_finished();
> ksft_finished() never returns. Remove the following line.
Thank you for all your suggestions. I will make all the changes and send V2.
Thanks
Donet
>> + return 0;
>> +}
>> +
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v2] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
@ 2024-06-04 13:28 Donet Tom
2024-06-06 13:14 ` [PATCH] " Pankaj Raghav
0 siblings, 1 reply; 15+ messages in thread
From: Donet Tom @ 2024-06-04 13:28 UTC (permalink / raw)
To: Andrew Morton, Shuah Khan, Matthew Wilcox, Tony Battersby
Cc: linux-mm, linux-kselftest, linux-kernel, Ritesh Harjani,
Mike Rapoport, Muchun Song, David Hildenbrand
Commit 1b151e2435fc ("block: Remove special-casing of compound
pages") caused a change in behaviour when releasing the pages
if the buffer does not start at the beginning of the page. This
was because the calculation of the number of pages to release
was incorrect.
This was fixed by commit 38b43539d64b ("block: Fix page refcounts
for unaligned buffers in __bio_release_pages()").
We pin the user buffer during direct I/O writes. If this buffer is a
hugepage, bio_release_page() will unpin it and decrement all references
and pin counts at ->bi_end_io. However, if any references to the hugepage
remain post-I/O, the hugepage will not be freed upon unmap, leading
to a memory leak.
This patch verifies that a hugepage, used as a user buffer for DIO
operations, is correctly freed upon unmapping, regardless of whether
the offsets are aligned or unaligned w.r.t page boundary.
Test Result Fail Scenario (Without the fix)
--------------------------------------------------------
[]# ./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 6
not ok 4 : Huge pages not freed!
Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
Test Result PASS Scenario (With the fix)
---------------------------------------------------------
[]#./hugetlb_dio
TAP version 13
1..4
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 1 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 2 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 3 : Huge pages freed successfully !
No. Free pages before allocation : 7
No. Free pages after munmap : 7
ok 4 : Huge pages freed successfully !
Totals: pass:4 fail:0 xfail:0 xpass:0 skip:0 error:0
V2:
- Addressed all review commets from Muhammad Usama Anjum
V1:
https://lore.kernel.org/all/20240523063905.3173-1-donettom@linux.ibm.com/#t
Signed-off-by: Donet Tom <donettom@linux.ibm.com>
Co-developed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
---
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/hugetlb_dio.c | 118 +++++++++++++++++++++++
2 files changed, 119 insertions(+)
create mode 100644 tools/testing/selftests/mm/hugetlb_dio.c
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 3b49bc3d0a3b..a1748a4c7df1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -73,6 +73,7 @@ TEST_GEN_FILES += ksm_functional_tests
TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
TEST_GEN_FILES += hugetlb_madv_vs_map
+TEST_GEN_FILES += hugetlb_dio
ifneq ($(ARCH),arm64)
TEST_GEN_FILES += soft-dirty
diff --git a/tools/testing/selftests/mm/hugetlb_dio.c b/tools/testing/selftests/mm/hugetlb_dio.c
new file mode 100644
index 000000000000..e4f4924179c8
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb_dio.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This program tests for hugepage leaks after DIO writes to a file using a
+ * hugepage as the user buffer. During DIO, the user buffer is pinned and
+ * should be properly unpinned upon completion. This patch verifies that the
+ * kernel correctly unpins the buffer at DIO completion for both aligned and
+ * unaligned user buffer offsets (w.r.t page boundary), ensuring the hugepage
+ * is freed upon unmapping.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/stat.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <stdint.h>
+#include <unistd.h>
+#include <string.h>
+#include <sys/mman.h>
+#include "vm_util.h"
+#include "../kselftest.h"
+
+void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
+{
+ int fd;
+ char *buffer = NULL;
+ char *orig_buffer = NULL;
+ size_t h_pagesize = 0;
+ size_t writesize;
+ int free_hpage_b = 0;
+ int free_hpage_a = 0;
+ const int mmap_flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB;
+ const int mmap_prot = PROT_READ | PROT_WRITE;
+
+ writesize = end_off - start_off;
+
+ /* Get the default huge page size */
+ h_pagesize = default_huge_page_size();
+ if (!h_pagesize)
+ ksft_exit_fail_msg("Unable to determine huge page size\n");
+
+ /* Open the file to DIO */
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
+ if (fd < 0)
+ ksft_exit_fail_perror("Error opening file\n");
+
+ /* Get the free huge pages before allocation */
+ free_hpage_b = get_free_hugepages();
+ if (free_hpage_b == 0) {
+ close(fd);
+ ksft_exit_skip("No free hugepage, exiting!\n");
+ }
+
+ /* Allocate a hugetlb page */
+ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0);
+ if (orig_buffer == MAP_FAILED) {
+ close(fd);
+ ksft_exit_fail_perror("Error mapping memory\n");
+ }
+ buffer = orig_buffer;
+ buffer += start_off;
+
+ memset(buffer, 'A', writesize);
+
+ /* Write the buffer to the file */
+ if (write(fd, buffer, writesize) != (writesize)) {
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+ ksft_exit_fail_perror("Error writing to file\n");
+ }
+
+ /* unmap the huge page */
+ munmap(orig_buffer, h_pagesize);
+ close(fd);
+
+ /* Get the free huge pages after unmap*/
+ free_hpage_a = get_free_hugepages();
+
+ /*
+ * If the no. of free hugepages before allocation and after unmap does
+ * not match - that means there could still be a page which is pinned.
+ */
+ if (free_hpage_a != free_hpage_b) {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_fail(": Huge pages not freed!\n");
+ } else {
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+ ksft_test_result_pass(": Huge pages freed successfully !\n");
+ }
+}
+
+int main(void)
+{
+ size_t pagesize = 0;
+
+ ksft_print_header();
+ ksft_set_plan(4);
+
+ /* Get base page size */
+ pagesize = psize();
+
+ /* start and end is aligned to pagesize */
+ run_dio_using_hugetlb(0, (pagesize * 3));
+
+ /* start is aligned but end is not aligned */
+ run_dio_using_hugetlb(0, (pagesize * 3) - (pagesize / 2));
+
+ /* start is unaligned and end is aligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3));
+
+ /* both start and end are unaligned */
+ run_dio_using_hugetlb(pagesize / 2, (pagesize * 3) + (pagesize / 2));
+
+ ksft_finished();
+}
+
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-06-04 13:28 [PATCH v2] " Donet Tom
@ 2024-06-06 13:14 ` Pankaj Raghav
2024-06-06 13:58 ` Donet Tom
0 siblings, 1 reply; 15+ messages in thread
From: Pankaj Raghav @ 2024-06-06 13:14 UTC (permalink / raw)
To: donettom
Cc: akpm, david, linux-kernel, linux-kselftest, linux-mm, ritesh.list,
rppt, shuah, songmuchun, tonyb, willy
> +void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
> +{
> + int fd;
> + char *buffer = NULL;
> + char *orig_buffer = NULL;
> + size_t h_pagesize = 0;
> + size_t writesize;
> + int free_hpage_b = 0;
> + int free_hpage_a = 0;
> +
> + writesize = end_off - start_off;
> +
> + /* Get the default huge page size */
> + h_pagesize = default_huge_page_size();
> + if (!h_pagesize)
> + ksft_exit_fail_msg("Unable to determine huge page size\n");
> +
> + /* Open the file to DIO */
> + fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
I encountered a build error as follows in NixOS:
In file included from /nix/store/fwh4fxd747m0py3ib3s5abamia9nrf90-glibc-2.39-52-dev/include/fcntl.h:342,
from hugetlb_dio.c:15:
In function ‘open’,
inlined from ‘run_dio_using_hugetlb’ at hugetlb_dio.c:41:7:
/nix/store/fwh4fxd747m0py3ib3s5abamia9nrf90-glibc-2.39-52-dev/include/bits/fcntl2.h:50:11: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
50 | __open_missing_mode ();
I saw a commit that fixed similar issues with open syscall before:
8b65ef5ad486 ("selftests/mm: Fix build with _FORTIFY_SOURCE")
So something like this should fix the issue?
- fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
+ fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
> + if (fd < 0)
> + ksft_exit_fail_msg("Error opening file");
> +
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()
2024-06-06 13:14 ` [PATCH] " Pankaj Raghav
@ 2024-06-06 13:58 ` Donet Tom
0 siblings, 0 replies; 15+ messages in thread
From: Donet Tom @ 2024-06-06 13:58 UTC (permalink / raw)
To: Pankaj Raghav
Cc: akpm, david, linux-kernel, linux-kselftest, linux-mm, ritesh.list,
rppt, shuah, songmuchun, tonyb, willy
On 6/6/24 18:44, Pankaj Raghav wrote:
>> +void run_dio_using_hugetlb(unsigned int start_off, unsigned int end_off)
>> +{
>> + int fd;
>> + char *buffer = NULL;
>> + char *orig_buffer = NULL;
>> + size_t h_pagesize = 0;
>> + size_t writesize;
>> + int free_hpage_b = 0;
>> + int free_hpage_a = 0;
>> +
>> + writesize = end_off - start_off;
>> +
>> + /* Get the default huge page size */
>> + h_pagesize = default_huge_page_size();
>> + if (!h_pagesize)
>> + ksft_exit_fail_msg("Unable to determine huge page size\n");
>> +
>> + /* Open the file to DIO */
>> + fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
> I encountered a build error as follows in NixOS:
>
> In file included from /nix/store/fwh4fxd747m0py3ib3s5abamia9nrf90-glibc-2.39-52-dev/include/fcntl.h:342,
> from hugetlb_dio.c:15:
> In function ‘open’,
> inlined from ‘run_dio_using_hugetlb’ at hugetlb_dio.c:41:7:
> /nix/store/fwh4fxd747m0py3ib3s5abamia9nrf90-glibc-2.39-52-dev/include/bits/fcntl2.h:50:11: error: call to ‘__open_missing_mode’ declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments
> 50 | __open_missing_mode ();
>
>
> I saw a commit that fixed similar issues with open syscall before:
> 8b65ef5ad486 ("selftests/mm: Fix build with _FORTIFY_SOURCE")
>
> So something like this should fix the issue?
>
> - fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT);
> + fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664);
Thank you Pankaj.
I am able to reproduce this error with "-D_FORTIFY_SOURCE=2 -O2". I will post v3 with this fix.
-donet
>
>> + if (fd < 0)
>> + ksft_exit_fail_msg("Error opening file");
>> +
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-06-06 13:59 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 6:39 [PATCH] selftest: mm: Test if hugepage does not get leaked during __bio_release_pages() Donet Tom
2024-05-23 19:13 ` Andrew Morton
2024-05-23 20:40 ` David Hildenbrand
2024-05-24 2:57 ` Andrew Morton
2024-05-24 6:21 ` David Hildenbrand
2024-05-24 6:43 ` Ritesh Harjani
2024-05-24 7:01 ` David Hildenbrand
2024-05-24 9:31 ` Donet Tom
2024-05-24 15:20 ` David Hildenbrand
2024-05-24 6:53 ` Ritesh Harjani
2024-05-24 20:12 ` Andrew Morton
2024-05-24 18:13 ` Muhammad Usama Anjum
2024-05-25 15:40 ` Donet Tom
-- strict thread matches above, loose matches on Subject: below --
2024-06-04 13:28 [PATCH v2] " Donet Tom
2024-06-06 13:14 ` [PATCH] " Pankaj Raghav
2024-06-06 13:58 ` Donet Tom
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox