[PATCH v1 0/3] mm/secretmem: one fix and one refactoring

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v1 0/3] mm/secretmem: one fix and one refactoring
@ 2024-03-25 13:41 David Hildenbrand
  2024-03-25 13:41 ` [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-03-25 13:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun

Patch #1 fixes a GUP-fast issue, whereby we might succeed in pinning
secretmem folios. Patch #2 extends the memfd_secret selftest to cover
that case. Patch #3 removes folio_is_secretmem() and instead lets
folio_fast_pin_allowed() cover that case as well.

With this series, the reproducer (+selftests) works as expected. To
test patch #3, the gup_longterm test does exactly what we need, and
keeps on working as expected.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: xingwei lee <xrivendell7@gmail.com>
Cc: yue sun <samsun1006219@gmail.com>

David Hildenbrand (3):
  mm/secretmem: fix GUP-fast succeeding on secretmem folios
  selftests/memfd_secret: add vmsplice() test
  mm: merge folio_is_secretmem() into folio_fast_pin_allowed()

 include/linux/secretmem.h                 | 21 ++---------
 mm/gup.c                                  | 33 ++++++++++-------
 tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
 3 files changed, 65 insertions(+), 33 deletions(-)

-- 
2.43.2



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios
  2024-03-25 13:41 [PATCH v1 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
@ 2024-03-25 13:41 ` David Hildenbrand
  2024-03-25 18:30   ` Andrew Morton
  2024-03-25 13:41 ` [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
  2024-03-25 13:41 ` [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed() David Hildenbrand
  2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-03-25 13:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun,
	Miklos Szeredi, stable

folio_is_secretmem() states that secretmem folios cannot be LRU folios:
so we may only exit early if we find an LRU folio. Yet, we exit early if
we find a folio that is not a secretmem folio.

Consequently, folio_is_secretmem() fails to detect secretmem folios and,
therefore, we can succeed in grabbing a secretmem folio during GUP-fast,
crashing the kernel when we later try reading/writing to the folio, because
the folio has been unmapped from the directmap.

Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2rmz0HQ@mail.gmail.com/
Debugged-by: Miklos Szeredi <miklos@szeredi.hu>
Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
Tested-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Cc: <stable@vger.kernel.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/secretmem.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 35f3a4a8ceb1..6996f1f53f14 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -16,7 +16,7 @@ static inline bool folio_is_secretmem(struct folio *folio)
 	 * We know that secretmem pages are not compound and LRU so we can
 	 * save a couple of cycles here.
 	 */
-	if (folio_test_large(folio) || !folio_test_lru(folio))
+	if (folio_test_large(folio) || folio_test_lru(folio))
 		return false;
 
 	mapping = (struct address_space *)
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios
  2024-03-25 13:41 ` [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
@ 2024-03-25 18:30   ` Andrew Morton
  2024-03-26 13:23     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2024-03-25 18:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Mike Rapoport, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun, Miklos Szeredi, stable

On Mon, 25 Mar 2024 14:41:12 +0100 David Hildenbrand <david@redhat.com> wrote:

> folio_is_secretmem() states that secretmem folios cannot be LRU folios:
> so we may only exit early if we find an LRU folio. Yet, we exit early if
> we find a folio that is not a secretmem folio.
> 
> ...
>
> Cc: <stable@vger.kernel.org>

Thanks, I split up this series.  Because this patch goes
mm-hotfixes-unstable -> mm-hotfixes-stable -> mainline whereas the
other two go mm-unstable -> mm-stable -> mainline at a later time.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios
  2024-03-25 18:30   ` Andrew Morton
@ 2024-03-26 13:23     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-03-26 13:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Mike Rapoport, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun, Miklos Szeredi, stable

On 25.03.24 19:30, Andrew Morton wrote:
> On Mon, 25 Mar 2024 14:41:12 +0100 David Hildenbrand <david@redhat.com> wrote:
> 
>> folio_is_secretmem() states that secretmem folios cannot be LRU folios:
>> so we may only exit early if we find an LRU folio. Yet, we exit early if
>> we find a folio that is not a secretmem folio.
>>
>> ...
>>
>> Cc: <stable@vger.kernel.org>
> 
> Thanks, I split up this series.  Because this patch goes
> mm-hotfixes-unstable -> mm-hotfixes-stable -> mainline whereas the
> other two go mm-unstable -> mm-stable -> mainline at a later time.

Makes sense. I'll resend a v2 later, because there are some major 
changes (the fix was wrong/incomplete, we have to remove the LRU test 
completely).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-25 13:41 [PATCH v1 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
  2024-03-25 13:41 ` [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
@ 2024-03-25 13:41 ` David Hildenbrand
  2024-03-26  6:17   ` Mike Rapoport
  2024-03-25 13:41 ` [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed() David Hildenbrand
  2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-03-25 13:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun

Let's add a simple reproducer for a scneario where GUP-fast could succeed
on secretmem folios, making vmsplice() succeed instead of failing. The
reproducer is based on a reproducer [1] by Miklos Szeredi.

Perform the ftruncate() only once, and check the return value.

For some reason, vmsplice() reliably fails (making the test succeed) when
we move the test_vmsplice() call after test_process_vm_read() /
test_ptrace(). Properly cleaning up in test_remote_access(), which is not
part of this change, won't change that behavior. Therefore, run the
vmsplice() test for now first -- something is a bit off once we involve
fork().

[1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
index 9b298f6a04b3..0acbdcf8230e 100644
--- a/tools/testing/selftests/mm/memfd_secret.c
+++ b/tools/testing/selftests/mm/memfd_secret.c
@@ -20,6 +20,7 @@
 #include <unistd.h>
 #include <errno.h>
 #include <stdio.h>
+#include <fcntl.h>
 
 #include "../kselftest.h"
 
@@ -83,6 +84,43 @@ static void test_mlock_limit(int fd)
 	pass("mlock limit is respected\n");
 }
 
+static void test_vmsplice(int fd)
+{
+	ssize_t transferred;
+	struct iovec iov;
+	int pipefd[2];
+	char *mem;
+
+	if (pipe(pipefd)) {
+		fail("pipe failed: %s\n", strerror(errno));
+		return;
+	}
+
+	mem = mmap(NULL, page_size, prot, mode, fd, 0);
+	if (mem == MAP_FAILED) {
+		fail("Unable to mmap secret memory\n");
+		goto close_pipe;
+	}
+
+	/*
+	 * vmsplice() may use GUP-fast, which must also fail. Prefault the
+	 * page table, so GUP-fast could find it.
+	 */
+	memset(mem, PATTERN, page_size);
+
+	iov.iov_base = mem;
+	iov.iov_len = page_size;
+	transferred = vmsplice(pipefd[1], &iov, 1, 0);
+
+	ksft_test_result(transferred < 0 && errno == EFAULT,
+			 "vmsplice is blocked as expected\n");
+
+	munmap(mem, page_size);
+close_pipe:
+	close(pipefd[0]);
+	close(pipefd[1]);
+}
+
 static void try_process_vm_read(int fd, int pipefd[2])
 {
 	struct iovec liov, riov;
@@ -187,7 +225,6 @@ static void test_remote_access(int fd, const char *name,
 		return;
 	}
 
-	ftruncate(fd, page_size);
 	memset(mem, PATTERN, page_size);
 
 	if (write(pipefd[1], &mem, sizeof(mem)) < 0) {
@@ -258,7 +295,7 @@ static void prepare(void)
 				   strerror(errno));
 }
 
-#define NUM_TESTS 4
+#define NUM_TESTS 5
 
 int main(int argc, char *argv[])
 {
@@ -277,9 +314,12 @@ int main(int argc, char *argv[])
 			ksft_exit_fail_msg("memfd_secret failed: %s\n",
 					   strerror(errno));
 	}
+	if (ftruncate(fd, page_size))
+		ksft_exit_fail_msg("ftruncate failed: %s\n", strerror(errno));
 
 	test_mlock_limit(fd);
 	test_file_apis(fd);
+	test_vmsplice(fd);
 	test_process_vm_read(fd);
 	test_ptrace(fd);
 
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-25 13:41 ` [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
@ 2024-03-26  6:17   ` Mike Rapoport
  2024-03-26 12:32     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Rapoport @ 2024-03-26  6:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun

Hi David,

On Mon, Mar 25, 2024 at 02:41:13PM +0100, David Hildenbrand wrote:
> Let's add a simple reproducer for a scneario where GUP-fast could succeed
> on secretmem folios, making vmsplice() succeed instead of failing. The
> reproducer is based on a reproducer [1] by Miklos Szeredi.
> 
> Perform the ftruncate() only once, and check the return value.
> 
> For some reason, vmsplice() reliably fails (making the test succeed) when
> we move the test_vmsplice() call after test_process_vm_read() /
> test_ptrace().

That's because ftruncate() call was in test_remote_access() and you need it
to mmap secretmem.

> Properly cleaning up in test_remote_access(), which is not
> part of this change, won't change that behavior. Therefore, run the
> vmsplice() test for now first -- something is a bit off once we involve
> fork().
> 
> [1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
>  1 file changed, 42 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
> index 9b298f6a04b3..0acbdcf8230e 100644
> --- a/tools/testing/selftests/mm/memfd_secret.c
> +++ b/tools/testing/selftests/mm/memfd_secret.c
> @@ -20,6 +20,7 @@
>  #include <unistd.h>
>  #include <errno.h>
>  #include <stdio.h>
> +#include <fcntl.h>
>  
>  #include "../kselftest.h"
>  
> @@ -83,6 +84,43 @@ static void test_mlock_limit(int fd)
>  	pass("mlock limit is respected\n");
>  }
>  
> +static void test_vmsplice(int fd)
> +{
> +	ssize_t transferred;
> +	struct iovec iov;
> +	int pipefd[2];
> +	char *mem;
> +
> +	if (pipe(pipefd)) {
> +		fail("pipe failed: %s\n", strerror(errno));
> +		return;
> +	}
> +
> +	mem = mmap(NULL, page_size, prot, mode, fd, 0);
> +	if (mem == MAP_FAILED) {
> +		fail("Unable to mmap secret memory\n");
> +		goto close_pipe;
> +	}
> +
> +	/*
> +	 * vmsplice() may use GUP-fast, which must also fail. Prefault the
> +	 * page table, so GUP-fast could find it.
> +	 */
> +	memset(mem, PATTERN, page_size);
> +
> +	iov.iov_base = mem;
> +	iov.iov_len = page_size;
> +	transferred = vmsplice(pipefd[1], &iov, 1, 0);
> +
> +	ksft_test_result(transferred < 0 && errno == EFAULT,
> +			 "vmsplice is blocked as expected\n");

The same message will be printed on success and on failure.

I think 

	if (transferred < 0 && errno == EFAULT)
		pass("vmsplice is blocked as expected");
	else
		fail("vmsplice: unexpected memory acccess");

is clearer than feeding different strings to ksft_test_result().

Other than that

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

> +
> +	munmap(mem, page_size);
> +close_pipe:
> +	close(pipefd[0]);
> +	close(pipefd[1]);
> +}

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-26  6:17   ` Mike Rapoport
@ 2024-03-26 12:32     ` David Hildenbrand
  2024-03-26 13:11       ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-03-26 12:32 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun

On 26.03.24 07:17, Mike Rapoport wrote:
> Hi David,
> 
> On Mon, Mar 25, 2024 at 02:41:13PM +0100, David Hildenbrand wrote:
>> Let's add a simple reproducer for a scneario where GUP-fast could succeed
>> on secretmem folios, making vmsplice() succeed instead of failing. The
>> reproducer is based on a reproducer [1] by Miklos Szeredi.
>>
>> Perform the ftruncate() only once, and check the return value.
>>
>> For some reason, vmsplice() reliably fails (making the test succeed) when
>> we move the test_vmsplice() call after test_process_vm_read() /
>> test_ptrace().
> 
> That's because ftruncate() call was in test_remote_access() and you need it
> to mmap secretmem.

I don't think that's the reason. I reshuffled the code a couple of times
without luck.

And in fact, even executing the vmsplice() test twice results in the
second iteration succeeding on an old kernel (6.7.4-200.fc39.x86_64).

ok 1 mlock limit is respected
ok 2 file IO is blocked as expected
not ok 3 vmsplice is blocked as expected
ok 4 vmsplice is blocked as expected
ok 5 process_vm_read is blocked as expected
ok 6 ptrace is blocked as expected

Note that the mmap()+memset() succeeded. So the secretmem pages should be in the page table.


Even weirder, if I simply mmap()+memset()+munmap() secretmem *once*, the test passes

diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
index 0acbdcf8230e..7a973ec6ac8f 100644
--- a/tools/testing/selftests/mm/memfd_secret.c
+++ b/tools/testing/selftests/mm/memfd_secret.c
@@ -96,6 +96,14 @@ static void test_vmsplice(int fd)
                 return;
         }
  
+       mem = mmap(NULL, page_size, prot, mode, fd, 0);
+       if (mem == MAP_FAILED) {
+               fail("Unable to mmap secret memory\n");
+               goto close_pipe;
+       }
+       memset(mem, PATTERN, page_size);
+       munmap(mem, page_size);
+
         mem = mmap(NULL, page_size, prot, mode, fd, 0);
         if (mem == MAP_FAILED) {
                 fail("Unable to mmap secret memory\n");

ok 1 mlock limit is respected
ok 2 file IO is blocked as expected
ok 3 vmsplice is blocked as expected
ok 4 process_vm_read is blocked as expected
ok 5 ptrace is blocked as expected


... could it be that munmap()+mmap() will end up turning these pages into LRU pages?

I am 100% sure that is happening -- likely, because VM_LOCKED is involved,
because on the patched kernel, I see the following:

ok 1 mlock limit is respected
ok 2 file IO is blocked as expected
ok 3 vmsplice is blocked as expected
not ok 4 vmsplice is blocked as expected
ok 5 process_vm_read is blocked as expected
ok 6 ptrace is blocked as expected


At this point, I think we should remove the LRU test for secretmem.

I'll adjust patch #1 and extend this test to cover that case as well.

> 
>> Properly cleaning up in test_remote_access(), which is not
>> part of this change, won't change that behavior. Therefore, run the
>> vmsplice() test for now first -- something is a bit off once we involve
>> fork().
>>
>> [1] https://lkml.kernel.org/r/CAJfpegt3UCsMmxd0taOY11Uaw5U=eS1fE5dn0wZX3HF0oy8-oQ@mail.gmail.com
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   tools/testing/selftests/mm/memfd_secret.c | 44 +++++++++++++++++++++--
>>   1 file changed, 42 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
>> index 9b298f6a04b3..0acbdcf8230e 100644
>> --- a/tools/testing/selftests/mm/memfd_secret.c
>> +++ b/tools/testing/selftests/mm/memfd_secret.c
>> @@ -20,6 +20,7 @@
>>   #include <unistd.h>
>>   #include <errno.h>
>>   #include <stdio.h>
>> +#include <fcntl.h>
>>   
>>   #include "../kselftest.h"
>>   
>> @@ -83,6 +84,43 @@ static void test_mlock_limit(int fd)
>>   	pass("mlock limit is respected\n");
>>   }
>>   
>> +static void test_vmsplice(int fd)
>> +{
>> +	ssize_t transferred;
>> +	struct iovec iov;
>> +	int pipefd[2];
>> +	char *mem;
>> +
>> +	if (pipe(pipefd)) {
>> +		fail("pipe failed: %s\n", strerror(errno));
>> +		return;
>> +	}
>> +
>> +	mem = mmap(NULL, page_size, prot, mode, fd, 0);
>> +	if (mem == MAP_FAILED) {
>> +		fail("Unable to mmap secret memory\n");
>> +		goto close_pipe;
>> +	}
>> +
>> +	/*
>> +	 * vmsplice() may use GUP-fast, which must also fail. Prefault the
>> +	 * page table, so GUP-fast could find it.
>> +	 */
>> +	memset(mem, PATTERN, page_size);
>> +
>> +	iov.iov_base = mem;
>> +	iov.iov_len = page_size;
>> +	transferred = vmsplice(pipefd[1], &iov, 1, 0);
>> +
>> +	ksft_test_result(transferred < 0 && errno == EFAULT,
>> +			 "vmsplice is blocked as expected\n");
> 
> The same message will be printed on success and on failure.
> 
> I think
> 
> 	if (transferred < 0 && errno == EFAULT)
> 		pass("vmsplice is blocked as expected");
> 	else
> 		fail("vmsplice: unexpected memory acccess");
> 
> is clearer than feeding different strings to ksft_test_result().
> 

Can do, thanks!

-- 
Cheers,

David / dhildenb



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test
  2024-03-26 12:32     ` David Hildenbrand
@ 2024-03-26 13:11       ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-03-26 13:11 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun

On 26.03.24 13:32, David Hildenbrand wrote:
> On 26.03.24 07:17, Mike Rapoport wrote:
>> Hi David,
>>
>> On Mon, Mar 25, 2024 at 02:41:13PM +0100, David Hildenbrand wrote:
>>> Let's add a simple reproducer for a scneario where GUP-fast could succeed
>>> on secretmem folios, making vmsplice() succeed instead of failing. The
>>> reproducer is based on a reproducer [1] by Miklos Szeredi.
>>>
>>> Perform the ftruncate() only once, and check the return value.
>>>
>>> For some reason, vmsplice() reliably fails (making the test succeed) when
>>> we move the test_vmsplice() call after test_process_vm_read() /
>>> test_ptrace().
>>
>> That's because ftruncate() call was in test_remote_access() and you need it
>> to mmap secretmem.
> 
> I don't think that's the reason. I reshuffled the code a couple of times
> without luck.
> 
> And in fact, even executing the vmsplice() test twice results in the
> second iteration succeeding on an old kernel (6.7.4-200.fc39.x86_64).
> 
> ok 1 mlock limit is respected
> ok 2 file IO is blocked as expected
> not ok 3 vmsplice is blocked as expected
> ok 4 vmsplice is blocked as expected
> ok 5 process_vm_read is blocked as expected
> ok 6 ptrace is blocked as expected
> 
> Note that the mmap()+memset() succeeded. So the secretmem pages should be in the page table.
> 
> 
> Even weirder, if I simply mmap()+memset()+munmap() secretmem *once*, the test passes
> 
> diff --git a/tools/testing/selftests/mm/memfd_secret.c b/tools/testing/selftests/mm/memfd_secret.c
> index 0acbdcf8230e..7a973ec6ac8f 100644
> --- a/tools/testing/selftests/mm/memfd_secret.c
> +++ b/tools/testing/selftests/mm/memfd_secret.c
> @@ -96,6 +96,14 @@ static void test_vmsplice(int fd)
>                   return;
>           }
>    
> +       mem = mmap(NULL, page_size, prot, mode, fd, 0);
> +       if (mem == MAP_FAILED) {
> +               fail("Unable to mmap secret memory\n");
> +               goto close_pipe;
> +       }
> +       memset(mem, PATTERN, page_size);
> +       munmap(mem, page_size);
> +
>           mem = mmap(NULL, page_size, prot, mode, fd, 0);
>           if (mem == MAP_FAILED) {
>                   fail("Unable to mmap secret memory\n");
> 
> ok 1 mlock limit is respected
> ok 2 file IO is blocked as expected
> ok 3 vmsplice is blocked as expected
> ok 4 process_vm_read is blocked as expected
> ok 5 ptrace is blocked as expected
> 
> 
> ... could it be that munmap()+mmap() will end up turning these pages into LRU pages?

Okay, now I am completely confused.

secretmem_fault() calls filemap_add_folio(), which should turn this into 
an LRU page.

So secretmem pages should always be LRU pages. .. unless we're batching 
in the LRU cache and haven't done the lru_add_drain() ...

And likely, the munmap() will drain the lru cache and turn the page into 
an LRU page.

Okay, I'll go make sure if that's the case. If so, relying on the page 
being LRU vs. not LRU in GUP-fast is unreliable and shall be dropped.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed()
  2024-03-25 13:41 [PATCH v1 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
  2024-03-25 13:41 ` [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
  2024-03-25 13:41 ` [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
@ 2024-03-25 13:41 ` David Hildenbrand
  2024-03-26  6:30   ` Mike Rapoport
  2 siblings, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-03-25 13:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-mm, David Hildenbrand, Andrew Morton, Mike Rapoport,
	Miklos Szeredi, Lorenzo Stoakes, xingwei lee, yue sun

folio_is_secretmem() is currently only used during GUP-fast, and using
it in wrong context where concurrent truncation might happen, could be
problematic.

Nowadays, folio_fast_pin_allowed() performs similar checks during
GUP-fast and contains a lot of careful handling -- READ_ONCE( -- ), sanity
checks -- lockdep_assert_irqs_disabled() --  and helpful comments on how
this handling is safe and correct.

So let's merge folio_is_secretmem() into folio_fast_pin_allowed(), still
avoiding checking the actual mapping only if really required.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/secretmem.h | 21 ++-------------------
 mm/gup.c                  | 33 +++++++++++++++++++++------------
 2 files changed, 23 insertions(+), 31 deletions(-)

diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 6996f1f53f14..e918f96881f5 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -6,25 +6,8 @@
 
 extern const struct address_space_operations secretmem_aops;
 
-static inline bool folio_is_secretmem(struct folio *folio)
+static inline bool secretmem_mapping(struct address_space *mapping)
 {
-	struct address_space *mapping;
-
-	/*
-	 * Using folio_mapping() is quite slow because of the actual call
-	 * instruction.
-	 * We know that secretmem pages are not compound and LRU so we can
-	 * save a couple of cycles here.
-	 */
-	if (folio_test_large(folio) || folio_test_lru(folio))
-		return false;
-
-	mapping = (struct address_space *)
-		((unsigned long)folio->mapping & ~PAGE_MAPPING_FLAGS);
-
-	if (!mapping || mapping != folio->mapping)
-		return false;
-
 	return mapping->a_ops == &secretmem_aops;
 }
 
@@ -38,7 +21,7 @@ static inline bool vma_is_secretmem(struct vm_area_struct *vma)
 	return false;
 }
 
-static inline bool folio_is_secretmem(struct folio *folio)
+static inline bool secretmem_mapping(struct address_space *mapping)
 {
 	return false;
 }
diff --git a/mm/gup.c b/mm/gup.c
index e7510b6ce765..69d8bc8e4451 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2472,6 +2472,8 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
  * This call assumes the caller has pinned the folio, that the lowest page table
  * level still points to this folio, and that interrupts have been disabled.
  *
+ * GUP-fast must reject all secretmem folios.
+ *
  * Writing to pinned file-backed dirty tracked folios is inherently problematic
  * (see comment describing the writable_file_mapping_allowed() function). We
  * therefore try to avoid the most egregious case of a long-term mapping doing
@@ -2484,22 +2486,32 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
 static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
 {
 	struct address_space *mapping;
+	bool check_secretmem = false;
+	bool reject_file_backed = false;
 	unsigned long mapping_flags;
 
 	/*
 	 * If we aren't pinning then no problematic write can occur. A long term
 	 * pin is the most egregious case so this is the one we disallow.
 	 */
-	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
+	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) ==
 	    (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
-		return true;
+		reject_file_backed = true;
+
+	/* We hold a folio reference, so we can safely access folio fields. */
 
-	/* The folio is pinned, so we can safely access folio fields. */
+	/* secretmem folios are only order-0 folios and never LRU folios. */
+	if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio) &&
+	    !folio_test_lru(folio))
+		check_secretmem = true;
+
+	if (!reject_file_backed && !check_secretmem)
+		return true;
 
 	if (WARN_ON_ONCE(folio_test_slab(folio)))
 		return false;
 
-	/* hugetlb mappings do not require dirty-tracking. */
+	/* hugetlb neither requires dirty-tracking nor can be secretmem. */
 	if (folio_test_hugetlb(folio))
 		return true;
 
@@ -2535,10 +2547,12 @@ static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
 
 	/*
 	 * At this point, we know the mapping is non-null and points to an
-	 * address_space object. The only remaining whitelisted file system is
-	 * shmem.
+	 * address_space object.
 	 */
-	return shmem_mapping(mapping);
+	if (check_secretmem && secretmem_mapping(mapping))
+		return false;
+	/* The only remaining allowed file system is shmem. */
+	return !reject_file_backed || shmem_mapping(mapping);
 }
 
 static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
@@ -2624,11 +2638,6 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
 		if (!folio)
 			goto pte_unmap;
 
-		if (unlikely(folio_is_secretmem(folio))) {
-			gup_put_folio(folio, 1, flags);
-			goto pte_unmap;
-		}
-
 		if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
 		    unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
 			gup_put_folio(folio, 1, flags);
-- 
2.43.2



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed()
  2024-03-25 13:41 ` [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed() David Hildenbrand
@ 2024-03-26  6:30   ` Mike Rapoport
  2024-03-26  8:40     ` David Hildenbrand
  0 siblings, 1 reply; 11+ messages in thread
From: Mike Rapoport @ 2024-03-26  6:30 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun

On Mon, Mar 25, 2024 at 02:41:14PM +0100, David Hildenbrand wrote:
> folio_is_secretmem() is currently only used during GUP-fast, and using
> it in wrong context where concurrent truncation might happen, could be
> problematic.
> 
> Nowadays, folio_fast_pin_allowed() performs similar checks during
> GUP-fast and contains a lot of careful handling -- READ_ONCE( -- ), sanity
> checks -- lockdep_assert_irqs_disabled() --  and helpful comments on how
> this handling is safe and correct.
> 
> So let's merge folio_is_secretmem() into folio_fast_pin_allowed(), still
> avoiding checking the actual mapping only if really required.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>

A few comments below, no strong feelings about them.

> ---
>  include/linux/secretmem.h | 21 ++-------------------
>  mm/gup.c                  | 33 +++++++++++++++++++++------------
>  2 files changed, 23 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
> index 6996f1f53f14..e918f96881f5 100644
> --- a/include/linux/secretmem.h
> +++ b/include/linux/secretmem.h
> @@ -6,25 +6,8 @@
>  
>  extern const struct address_space_operations secretmem_aops;
>  
> -static inline bool folio_is_secretmem(struct folio *folio)
> +static inline bool secretmem_mapping(struct address_space *mapping)
>  {
> -	struct address_space *mapping;
> -
> -	/*
> -	 * Using folio_mapping() is quite slow because of the actual call
> -	 * instruction.
> -	 * We know that secretmem pages are not compound and LRU so we can
> -	 * save a couple of cycles here.
> -	 */
> -	if (folio_test_large(folio) || folio_test_lru(folio))
> -		return false;
> -
> -	mapping = (struct address_space *)
> -		((unsigned long)folio->mapping & ~PAGE_MAPPING_FLAGS);
> -
> -	if (!mapping || mapping != folio->mapping)
> -		return false;
> -
>  	return mapping->a_ops == &secretmem_aops;
>  }
>  
> @@ -38,7 +21,7 @@ static inline bool vma_is_secretmem(struct vm_area_struct *vma)
>  	return false;
>  }
>  
> -static inline bool folio_is_secretmem(struct folio *folio)
> +static inline bool secretmem_mapping(struct address_space *mapping)
>  {
>  	return false;
>  }
> diff --git a/mm/gup.c b/mm/gup.c
> index e7510b6ce765..69d8bc8e4451 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2472,6 +2472,8 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>   * This call assumes the caller has pinned the folio, that the lowest page table
>   * level still points to this folio, and that interrupts have been disabled.
>   *
> + * GUP-fast must reject all secretmem folios.
> + *
>   * Writing to pinned file-backed dirty tracked folios is inherently problematic
>   * (see comment describing the writable_file_mapping_allowed() function). We
>   * therefore try to avoid the most egregious case of a long-term mapping doing
> @@ -2484,22 +2486,32 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>  static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)

Now when this function checks for gup in general, maybe it's worth to
rename it to, say, folio_fast_gup_allowed.

>  {
>  	struct address_space *mapping;
> +	bool check_secretmem = false;
> +	bool reject_file_backed = false;
>  	unsigned long mapping_flags;
>  
>  	/*
>  	 * If we aren't pinning then no problematic write can occur. A long term
>  	 * pin is the most egregious case so this is the one we disallow.
>  	 */
> -	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
> +	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) ==
>  	    (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
> -		return true;
> +		reject_file_backed = true;
> +
> +	/* We hold a folio reference, so we can safely access folio fields. */
>  
> -	/* The folio is pinned, so we can safely access folio fields. */
> +	/* secretmem folios are only order-0 folios and never LRU folios. */

Nit:                           ^ always

> +	if (IS_ENABLED(CONFIG_SECRETMEM) && !folio_test_large(folio) &&
> +	    !folio_test_lru(folio))
> +		check_secretmem = true;
> +
> +	if (!reject_file_backed && !check_secretmem)
> +		return true;
>  
>  	if (WARN_ON_ONCE(folio_test_slab(folio)))
>  		return false;
>  
> -	/* hugetlb mappings do not require dirty-tracking. */
> +	/* hugetlb neither requires dirty-tracking nor can be secretmem. */
>  	if (folio_test_hugetlb(folio))
>  		return true;
>  
> @@ -2535,10 +2547,12 @@ static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
>  
>  	/*
>  	 * At this point, we know the mapping is non-null and points to an
> -	 * address_space object. The only remaining whitelisted file system is
> -	 * shmem.
> +	 * address_space object.
>  	 */
> -	return shmem_mapping(mapping);
> +	if (check_secretmem && secretmem_mapping(mapping))
> +		return false;
> +	/* The only remaining allowed file system is shmem. */
> +	return !reject_file_backed || shmem_mapping(mapping);
>  }
>  
>  static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
> @@ -2624,11 +2638,6 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
>  		if (!folio)
>  			goto pte_unmap;
>  
> -		if (unlikely(folio_is_secretmem(folio))) {
> -			gup_put_folio(folio, 1, flags);
> -			goto pte_unmap;
> -		}
> -
>  		if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
>  		    unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
>  			gup_put_folio(folio, 1, flags);
> -- 
> 2.43.2
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed()
  2024-03-26  6:30   ` Mike Rapoport
@ 2024-03-26  8:40     ` David Hildenbrand
  0 siblings, 0 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-03-26  8:40 UTC (permalink / raw)
  To: Mike Rapoport
  Cc: linux-kernel, linux-mm, Andrew Morton, Miklos Szeredi,
	Lorenzo Stoakes, xingwei lee, yue sun

On 26.03.24 07:30, Mike Rapoport wrote:
> On Mon, Mar 25, 2024 at 02:41:14PM +0100, David Hildenbrand wrote:
>> folio_is_secretmem() is currently only used during GUP-fast, and using
>> it in wrong context where concurrent truncation might happen, could be
>> problematic.
>>
>> Nowadays, folio_fast_pin_allowed() performs similar checks during
>> GUP-fast and contains a lot of careful handling -- READ_ONCE( -- ), sanity

Re-reading my stuff ...

s/( -- )/() --/

>> checks -- lockdep_assert_irqs_disabled() --  and helpful comments on how
>> this handling is safe and correct.
>>
>> So let's merge folio_is_secretmem() into folio_fast_pin_allowed(), still
>> avoiding checking the actual mapping only if really required.

s/avoiding//

>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Reviewed-by: Mike Rapoport (IBM) <rppt@kernel.org>
> 

Thanks!

> A few comments below, no strong feelings about them.
> 
>> ---
>>   include/linux/secretmem.h | 21 ++-------------------
>>   mm/gup.c                  | 33 +++++++++++++++++++++------------
>>   2 files changed, 23 insertions(+), 31 deletions(-)
>>
>> diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
>> index 6996f1f53f14..e918f96881f5 100644
>> --- a/include/linux/secretmem.h
>> +++ b/include/linux/secretmem.h
>> @@ -6,25 +6,8 @@
>>   
>>   extern const struct address_space_operations secretmem_aops;
>>   
>> -static inline bool folio_is_secretmem(struct folio *folio)
>> +static inline bool secretmem_mapping(struct address_space *mapping)
>>   {
>> -	struct address_space *mapping;
>> -
>> -	/*
>> -	 * Using folio_mapping() is quite slow because of the actual call
>> -	 * instruction.
>> -	 * We know that secretmem pages are not compound and LRU so we can
>> -	 * save a couple of cycles here.
>> -	 */
>> -	if (folio_test_large(folio) || folio_test_lru(folio))
>> -		return false;
>> -
>> -	mapping = (struct address_space *)
>> -		((unsigned long)folio->mapping & ~PAGE_MAPPING_FLAGS);
>> -
>> -	if (!mapping || mapping != folio->mapping)
>> -		return false;
>> -
>>   	return mapping->a_ops == &secretmem_aops;
>>   }
>>   
>> @@ -38,7 +21,7 @@ static inline bool vma_is_secretmem(struct vm_area_struct *vma)
>>   	return false;
>>   }
>>   
>> -static inline bool folio_is_secretmem(struct folio *folio)
>> +static inline bool secretmem_mapping(struct address_space *mapping)
>>   {
>>   	return false;
>>   }
>> diff --git a/mm/gup.c b/mm/gup.c
>> index e7510b6ce765..69d8bc8e4451 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -2472,6 +2472,8 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>>    * This call assumes the caller has pinned the folio, that the lowest page table
>>    * level still points to this folio, and that interrupts have been disabled.
>>    *
>> + * GUP-fast must reject all secretmem folios.
>> + *
>>    * Writing to pinned file-backed dirty tracked folios is inherently problematic
>>    * (see comment describing the writable_file_mapping_allowed() function). We
>>    * therefore try to avoid the most egregious case of a long-term mapping doing
>> @@ -2484,22 +2486,32 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
>>   static bool folio_fast_pin_allowed(struct folio *folio, unsigned int flags)
> 
> Now when this function checks for gup in general, maybe it's worth to
> rename it to, say, folio_fast_gup_allowed.

Had the exact the same thought, so I'll do it!

Not sure about "fast gup" vs. "gup fast" vs. "lockless gup", it's all 
inconsistent and we should likely clean that up.

Likely, we should just prefix all relevant functions with "gup_fast". 
I'll call this "gup_fast_folio_allowed" for now.

The first description of the function becomes: "Used in the GUP-fast 
path to determine whether GUP is permitted to work on a specific folio."

> 
>>   {
>>   	struct address_space *mapping;
>> +	bool check_secretmem = false;
>> +	bool reject_file_backed = false;
>>   	unsigned long mapping_flags;
>>   
>>   	/*
>>   	 * If we aren't pinning then no problematic write can occur. A long term
>>   	 * pin is the most egregious case so this is the one we disallow.
>>   	 */
>> -	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) !=
>> +	if ((flags & (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE)) ==
>>   	    (FOLL_PIN | FOLL_LONGTERM | FOLL_WRITE))
>> -		return true;
>> +		reject_file_backed = true;
>> +
>> +	/* We hold a folio reference, so we can safely access folio fields. */
>>   
>> -	/* The folio is pinned, so we can safely access folio fields. */
>> +	/* secretmem folios are only order-0 folios and never LRU folios. */
> 
> Nit:                           ^ always

ack


Thanks for the review!

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-26 13:23 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 13:41 [PATCH v1 0/3] mm/secretmem: one fix and one refactoring David Hildenbrand
2024-03-25 13:41 ` [PATCH v1 1/3] mm/secretmem: fix GUP-fast succeeding on secretmem folios David Hildenbrand
2024-03-25 18:30   ` Andrew Morton
2024-03-26 13:23     ` David Hildenbrand
2024-03-25 13:41 ` [PATCH v1 2/3] selftests/memfd_secret: add vmsplice() test David Hildenbrand
2024-03-26  6:17   ` Mike Rapoport
2024-03-26 12:32     ` David Hildenbrand
2024-03-26 13:11       ` David Hildenbrand
2024-03-25 13:41 ` [PATCH v1 3/3] mm: merge folio_is_secretmem() into folio_fast_pin_allowed() David Hildenbrand
2024-03-26  6:30   ` Mike Rapoport
2024-03-26  8:40     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).