* [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
@ 2025-01-29 1:22 Andrii Nakryiko
2025-01-29 1:22 ` [PATCH v2 bpf-next 2/2] bpf: avoid holding freeze_mutex during mmap operation Andrii Nakryiko
2025-01-29 18:10 ` [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic patchwork-bot+netdevbpf
0 siblings, 2 replies; 3+ messages in thread
From: Andrii Nakryiko @ 2025-01-29 1:22 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau
Cc: andrii, kernel-team, Jann Horn, Suren Baghdasaryan, Shakeel Butt
For all BPF maps we ensure that VM_MAYWRITE is cleared when
memory-mapping BPF map contents as initially read-only VMA. This is
because in some cases BPF verifier relies on the underlying data to not
be modified afterwards by user space, so once something is mapped
read-only, it shouldn't be re-mmap'ed as read-write.
As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and
map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for
read-write mappings, and if VM_WRITE is not set, there is no way for
user space to upgrade read-only mapping to read-write one.
This patch cleans up this VM_WRITE vs VM_MAYWRITE handling within
bpf_map_mmap(), which is an entry point for any BPF map mmap()-ing
logic. We also drop unnecessary sanitization of VM_MAYWRITE in BPF
ringbuf's map_mmap() callback implementation, as it is already performed
by common code in bpf_map_mmap().
Note, though, that in bpf_map_mmap_{open,close}() callbacks we can't
drop VM_MAYWRITE use, because it's possible (and is outside of
subsystem's control) to have initially read-write memory mapping, which
is subsequently dropped to read-only by user space through mprotect().
In such case, from BPF verifier POV it's read-write data throughout the
lifetime of BPF map, and is counted as "active writer".
But its VMAs will start out as VM_WRITE|VM_MAYWRITE, then mprotect() can
change it to just VM_MAYWRITE (and no VM_WRITE), so when its finally
munmap()'ed and bpf_map_mmap_close() is called, vm_flags will be just
VM_MAYWRITE, but we still need to decrement active writer count with
bpf_map_write_active_dec() as it's still considered to be a read-write
mapping by the rest of BPF subsystem.
Similar reasoning applies to bpf_map_mmap_open(), which is called
whenever mmap(), munmap(), and/or mprotect() forces mm subsystem to
split original VMA into multiple discontiguous VMAs.
Memory-mapping handling is a bit tricky, yes.
Cc: Jann Horn <jannh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/ringbuf.c | 4 ----
kernel/bpf/syscall.c | 10 ++++++++--
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/kernel/bpf/ringbuf.c b/kernel/bpf/ringbuf.c
index e1cfe890e0be..1499d8caa9a3 100644
--- a/kernel/bpf/ringbuf.c
+++ b/kernel/bpf/ringbuf.c
@@ -268,8 +268,6 @@ static int ringbuf_map_mmap_kern(struct bpf_map *map, struct vm_area_struct *vma
/* allow writable mapping for the consumer_pos only */
if (vma->vm_pgoff != 0 || vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EPERM;
- } else {
- vm_flags_clear(vma, VM_MAYWRITE);
}
/* remap_vmalloc_range() checks size and offset constraints */
return remap_vmalloc_range(vma, rb_map->rb,
@@ -289,8 +287,6 @@ static int ringbuf_map_mmap_user(struct bpf_map *map, struct vm_area_struct *vma
* position, and the ring buffer data itself.
*/
return -EPERM;
- } else {
- vm_flags_clear(vma, VM_MAYWRITE);
}
/* remap_vmalloc_range() checks size and offset constraints */
return remap_vmalloc_range(vma, rb_map->rb, vma->vm_pgoff + RINGBUF_PGOFF);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 0daf098e3207..9bec3dce421f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1065,15 +1065,21 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
vma->vm_ops = &bpf_map_default_vmops;
vma->vm_private_data = map;
vm_flags_clear(vma, VM_MAYEXEC);
+ /* If mapping is read-only, then disallow potentially re-mapping with
+ * PROT_WRITE by dropping VM_MAYWRITE flag. This VM_MAYWRITE clearing
+ * means that as far as BPF map's memory-mapped VMAs are concerned,
+ * VM_WRITE and VM_MAYWRITE and equivalent, if one of them is set,
+ * both should be set, so we can forget about VM_MAYWRITE and always
+ * check just VM_WRITE
+ */
if (!(vma->vm_flags & VM_WRITE))
- /* disallow re-mapping with PROT_WRITE */
vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma);
if (err)
goto out;
- if (vma->vm_flags & VM_MAYWRITE)
+ if (vma->vm_flags & VM_WRITE)
bpf_map_write_active_inc(map);
out:
mutex_unlock(&map->freeze_mutex);
--
2.43.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH v2 bpf-next 2/2] bpf: avoid holding freeze_mutex during mmap operation
2025-01-29 1:22 [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic Andrii Nakryiko
@ 2025-01-29 1:22 ` Andrii Nakryiko
2025-01-29 18:10 ` [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: Andrii Nakryiko @ 2025-01-29 1:22 UTC (permalink / raw)
To: bpf, ast, daniel, martin.lau
Cc: andrii, kernel-team, Jann Horn, Suren Baghdasaryan, Shakeel Butt,
syzbot+4dc041c686b7c816a71e
We use map->freeze_mutex to prevent races between map_freeze() and
memory mapping BPF map contents with writable permissions. The way we
naively do this means we'll hold freeze_mutex for entire duration of all
the mm and VMA manipulations, which is completely unnecessary. This can
potentially also lead to deadlocks, as reported by syzbot in [0].
So, instead, hold freeze_mutex only during writeability checks, bump
(proactively) "write active" count for the map, unlock the mutex and
proceed with mmap logic. And only if something went wrong during mmap
logic, then undo that "write active" counter increment.
[0] https://lore.kernel.org/bpf/678dcbc9.050a0220.303755.0066.GAE@google.com/
Fixes: fc9702273e2e ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Reported-by: syzbot+4dc041c686b7c816a71e@syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
kernel/bpf/syscall.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9bec3dce421f..14d6e99459d3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1035,7 +1035,7 @@ static const struct vm_operations_struct bpf_map_default_vmops = {
static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
{
struct bpf_map *map = filp->private_data;
- int err;
+ int err = 0;
if (!map->ops->map_mmap || !IS_ERR_OR_NULL(map->record))
return -ENOTSUPP;
@@ -1059,7 +1059,12 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
err = -EACCES;
goto out;
}
+ bpf_map_write_active_inc(map);
}
+out:
+ mutex_unlock(&map->freeze_mutex);
+ if (err)
+ return err;
/* set default open/close callbacks */
vma->vm_ops = &bpf_map_default_vmops;
@@ -1076,13 +1081,11 @@ static int bpf_map_mmap(struct file *filp, struct vm_area_struct *vma)
vm_flags_clear(vma, VM_MAYWRITE);
err = map->ops->map_mmap(map, vma);
- if (err)
- goto out;
+ if (err) {
+ if (vma->vm_flags & VM_WRITE)
+ bpf_map_write_active_dec(map);
+ }
- if (vma->vm_flags & VM_WRITE)
- bpf_map_write_active_inc(map);
-out:
- mutex_unlock(&map->freeze_mutex);
return err;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
2025-01-29 1:22 [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic Andrii Nakryiko
2025-01-29 1:22 ` [PATCH v2 bpf-next 2/2] bpf: avoid holding freeze_mutex during mmap operation Andrii Nakryiko
@ 2025-01-29 18:10 ` patchwork-bot+netdevbpf
1 sibling, 0 replies; 3+ messages in thread
From: patchwork-bot+netdevbpf @ 2025-01-29 18:10 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: bpf, ast, daniel, martin.lau, kernel-team, jannh, surenb,
shakeel.butt
Hello:
This series was applied to bpf/bpf.git (master)
by Alexei Starovoitov <ast@kernel.org>:
On Tue, 28 Jan 2025 17:22:45 -0800 you wrote:
> For all BPF maps we ensure that VM_MAYWRITE is cleared when
> memory-mapping BPF map contents as initially read-only VMA. This is
> because in some cases BPF verifier relies on the underlying data to not
> be modified afterwards by user space, so once something is mapped
> read-only, it shouldn't be re-mmap'ed as read-write.
>
> As such, it's not necessary to check VM_MAYWRITE in bpf_map_mmap() and
> map->ops->map_mmap() callbacks: VM_WRITE should be consistently set for
> read-write mappings, and if VM_WRITE is not set, there is no way for
> user space to upgrade read-only mapping to read-write one.
>
> [...]
Here is the summary with links:
- [v2,bpf-next,1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic
https://git.kernel.org/bpf/bpf/c/98671a0fd1f1
- [v2,bpf-next,2/2] bpf: avoid holding freeze_mutex during mmap operation
https://git.kernel.org/bpf/bpf/c/bc27c52eea18
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-29 18:10 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-29 1:22 [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic Andrii Nakryiko
2025-01-29 1:22 ` [PATCH v2 bpf-next 2/2] bpf: avoid holding freeze_mutex during mmap operation Andrii Nakryiko
2025-01-29 18:10 ` [PATCH v2 bpf-next 1/2] bpf: unify VM_WRITE vs VM_MAYWRITE use in BPF map mmaping logic patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox