* [PATCH 02/39] introduce fd_file(), convert all accessors to it.
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
@ 2024-07-30 5:15 ` viro
2024-08-07 9:55 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 03/39] struct fd: representation change viro
` (37 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
For any changes of struct fd representation we need to
turn existing accesses to fields into calls of wrappers.
Accesses to struct fd::flags are very few (3 in linux/file.h,
1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
explicit initializers).
Those can be dealt with in the commit converting to
new layout; accesses to struct fd::file are too many for that.
This commit converts (almost) all of f.file to
fd_file(f). It's not entirely mechanical ('file' is used as
a member name more than just in struct fd) and it does not
even attempt to distinguish the uses in pointer context from
those in boolean context; the latter will be eventually turned
into a separate helper (fd_empty()).
NOTE: mass conversion to fd_empty(), tempting as it
might be, is a bad idea; better do that piecewise in commit
that convert from fdget...() to CLASS(...).
[conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
caught by git; fs/stat.c one got caught by git grep]
[fs/xattr.c conflict]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
arch/alpha/kernel/osf_sys.c | 4 +-
arch/arm/kernel/sys_oabi-compat.c | 10 +-
arch/powerpc/kvm/book3s_64_vio.c | 4 +-
arch/powerpc/kvm/powerpc.c | 12 +--
arch/powerpc/platforms/cell/spu_syscalls.c | 8 +-
arch/x86/kernel/cpu/sgx/main.c | 4 +-
arch/x86/kvm/svm/sev.c | 16 +--
drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 8 +-
drivers/gpu/drm/drm_syncobj.c | 6 +-
drivers/infiniband/core/ucma.c | 6 +-
drivers/infiniband/core/uverbs_cmd.c | 8 +-
drivers/media/mc/mc-request.c | 6 +-
drivers/media/rc/lirc_dev.c | 8 +-
drivers/vfio/group.c | 6 +-
drivers/vfio/virqfd.c | 6 +-
drivers/virt/acrn/irqfd.c | 6 +-
drivers/xen/privcmd.c | 10 +-
fs/btrfs/ioctl.c | 4 +-
fs/coda/inode.c | 4 +-
fs/eventfd.c | 4 +-
fs/eventpoll.c | 30 +++---
fs/ext4/ioctl.c | 6 +-
fs/f2fs/file.c | 6 +-
fs/fcntl.c | 38 +++----
fs/fhandle.c | 4 +-
fs/fsopen.c | 6 +-
fs/fuse/dev.c | 6 +-
fs/ioctl.c | 30 +++---
fs/kernel_read_file.c | 4 +-
fs/locks.c | 14 +--
fs/namei.c | 10 +-
fs/namespace.c | 12 +--
fs/notify/fanotify/fanotify_user.c | 12 +--
fs/notify/inotify/inotify_user.c | 12 +--
fs/ocfs2/cluster/heartbeat.c | 6 +-
fs/open.c | 24 ++---
fs/overlayfs/file.c | 40 +++----
fs/quota/quota.c | 8 +-
fs/read_write.c | 118 ++++++++++-----------
fs/readdir.c | 20 ++--
fs/remap_range.c | 2 +-
fs/select.c | 8 +-
fs/signalfd.c | 6 +-
fs/smb/client/ioctl.c | 8 +-
fs/splice.c | 22 ++--
fs/stat.c | 8 +-
fs/statfs.c | 4 +-
fs/sync.c | 14 +--
fs/timerfd.c | 8 +-
fs/utimes.c | 4 +-
fs/xattr.c | 36 +++----
fs/xfs/xfs_exchrange.c | 4 +-
fs/xfs/xfs_handle.c | 4 +-
fs/xfs/xfs_ioctl.c | 28 ++---
include/linux/cleanup.h | 2 +-
include/linux/file.h | 6 +-
io_uring/sqpoll.c | 10 +-
ipc/mqueue.c | 50 ++++-----
kernel/bpf/bpf_inode_storage.c | 14 +--
kernel/bpf/btf.c | 6 +-
kernel/bpf/syscall.c | 42 ++++----
kernel/bpf/token.c | 10 +-
kernel/cgroup/cgroup.c | 4 +-
kernel/events/core.c | 12 +--
kernel/module/main.c | 2 +-
kernel/nsproxy.c | 12 +--
kernel/pid.c | 10 +-
kernel/signal.c | 6 +-
kernel/sys.c | 10 +-
kernel/taskstats.c | 4 +-
kernel/watch_queue.c | 4 +-
mm/fadvise.c | 4 +-
mm/filemap.c | 6 +-
mm/memcontrol-v1.c | 12 +--
mm/readahead.c | 10 +-
net/core/net_namespace.c | 6 +-
net/socket.c | 12 +--
security/integrity/ima/ima_main.c | 4 +-
security/landlock/syscalls.c | 22 ++--
security/loadpin/loadpin.c | 4 +-
sound/core/pcm_native.c | 6 +-
virt/kvm/eventfd.c | 6 +-
virt/kvm/vfio.c | 8 +-
83 files changed, 504 insertions(+), 502 deletions(-)
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index e5f881bc8288..56fea57f9642 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -160,10 +160,10 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
.count = count
};
- if (!arg.file)
+ if (!fd_file(arg))
return -EBADF;
- error = iterate_dir(arg.file, &buf.ctx);
+ error = iterate_dir(fd_file(arg), &buf.ctx);
if (error >= 0)
error = buf.error;
if (count != buf.count)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index d00f4040a9f5..f5781ff54a5c 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -239,19 +239,19 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
struct flock64 flock;
long err = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
goto out;
switch (cmd) {
case F_GETLK64:
case F_OFD_GETLK:
- err = security_file_fcntl(f.file, cmd, arg);
+ err = security_file_fcntl(fd_file(f), cmd, arg);
if (err)
break;
err = get_oabi_flock(&flock, argp);
if (err)
break;
- err = fcntl_getlk64(f.file, cmd, &flock);
+ err = fcntl_getlk64(fd_file(f), cmd, &flock);
if (!err)
err = put_oabi_flock(&flock, argp);
break;
@@ -259,13 +259,13 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
case F_SETLKW64:
case F_OFD_SETLK:
case F_OFD_SETLKW:
- err = security_file_fcntl(f.file, cmd, arg);
+ err = security_file_fcntl(fd_file(f), cmd, arg);
if (err)
break;
err = get_oabi_flock(&flock, argp);
if (err)
break;
- err = fcntl_setlk64(fd, f.file, cmd, &flock);
+ err = fcntl_setlk64(fd, fd_file(f), cmd, &flock);
break;
default:
err = sys_fcntl64(fd, cmd, arg);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 3ff3de9a52ac..34c0adb9fdbf 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -118,12 +118,12 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
struct fd f;
f = fdget(tablefd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
rcu_read_lock();
list_for_each_entry_rcu(stt, &kvm->arch.spapr_tce_tables, list) {
- if (stt == f.file->private_data) {
+ if (stt == fd_file(f)->private_data) {
found = true;
break;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 5e6c7b527677..f14329989e9a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1938,11 +1938,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
r = -EBADF;
f = fdget(cap->args[0]);
- if (!f.file)
+ if (!fd_file(f))
break;
r = -EPERM;
- dev = kvm_device_from_filp(f.file);
+ dev = kvm_device_from_filp(fd_file(f));
if (dev)
r = kvmppc_mpic_connect_vcpu(dev, vcpu, cap->args[1]);
@@ -1957,11 +1957,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
r = -EBADF;
f = fdget(cap->args[0]);
- if (!f.file)
+ if (!fd_file(f))
break;
r = -EPERM;
- dev = kvm_device_from_filp(f.file);
+ dev = kvm_device_from_filp(fd_file(f));
if (dev) {
if (xics_on_xive())
r = kvmppc_xive_connect_vcpu(dev, vcpu, cap->args[1]);
@@ -1980,7 +1980,7 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
r = -EBADF;
f = fdget(cap->args[0]);
- if (!f.file)
+ if (!fd_file(f))
break;
r = -ENXIO;
@@ -1990,7 +1990,7 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
}
r = -EPERM;
- dev = kvm_device_from_filp(f.file);
+ dev = kvm_device_from_filp(fd_file(f));
if (dev)
r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
cap->args[1]);
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index 87ad7d563cfa..cd7d42fc12a6 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -66,8 +66,8 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
if (flags & SPU_CREATE_AFFINITY_SPU) {
struct fd neighbor = fdget(neighbor_fd);
ret = -EBADF;
- if (neighbor.file) {
- ret = calls->create_thread(name, flags, mode, neighbor.file);
+ if (fd_file(neighbor)) {
+ ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
fdput(neighbor);
}
} else
@@ -89,8 +89,8 @@ SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
ret = -EBADF;
arg = fdget(fd);
- if (arg.file) {
- ret = calls->spu_run(arg.file, unpc, ustatus);
+ if (fd_file(arg)) {
+ ret = calls->spu_run(fd_file(arg), unpc, ustatus);
fdput(arg);
}
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 27892e57c4ef..d01deb386395 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -895,10 +895,10 @@ int sgx_set_attribute(unsigned long *allowed_attributes,
{
struct fd f = fdget(attribute_fd);
- if (!f.file)
+ if (!fd_file(f))
return -EINVAL;
- if (f.file->f_op != &sgx_provision_fops) {
+ if (fd_file(f)->f_op != &sgx_provision_fops) {
fdput(f);
return -EINVAL;
}
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a16c873b3232..0e38b5223263 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -534,10 +534,10 @@ static int __sev_issue_cmd(int fd, int id, void *data, int *error)
int ret;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- ret = sev_issue_cmd_external_user(f.file, id, data, error);
+ ret = sev_issue_cmd_external_user(fd_file(f), id, data, error);
fdput(f);
return ret;
@@ -2078,15 +2078,15 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
bool charged = false;
int ret;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (!file_is_kvm(f.file)) {
+ if (!file_is_kvm(fd_file(f))) {
ret = -EBADF;
goto out_fput;
}
- source_kvm = f.file->private_data;
+ source_kvm = fd_file(f)->private_data;
ret = sev_lock_two_vms(kvm, source_kvm);
if (ret)
goto out_fput;
@@ -2801,15 +2801,15 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
struct kvm_sev_info *source_sev, *mirror_sev;
int ret;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (!file_is_kvm(f.file)) {
+ if (!file_is_kvm(fd_file(f))) {
ret = -EBADF;
goto e_source_fput;
}
- source_kvm = f.file->private_data;
+ source_kvm = fd_file(f)->private_data;
ret = sev_lock_two_vms(kvm, source_kvm);
if (ret)
goto e_source_fput;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index 863b2a34b2d6..a9298cb8d19a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -43,10 +43,10 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
uint32_t id;
int r;
- if (!f.file)
+ if (!fd_file(f))
return -EINVAL;
- r = amdgpu_file_to_fpriv(f.file, &fpriv);
+ r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
if (r) {
fdput(f);
return r;
@@ -72,10 +72,10 @@ static int amdgpu_sched_context_priority_override(struct amdgpu_device *adev,
struct amdgpu_ctx *ctx;
int r;
- if (!f.file)
+ if (!fd_file(f))
return -EINVAL;
- r = amdgpu_file_to_fpriv(f.file, &fpriv);
+ r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
if (r) {
fdput(f);
return r;
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index a0e94217b511..7fb31ca3b5fc 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -715,16 +715,16 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
struct fd f = fdget(fd);
int ret;
- if (!f.file)
+ if (!fd_file(f))
return -EINVAL;
- if (f.file->f_op != &drm_syncobj_file_fops) {
+ if (fd_file(f)->f_op != &drm_syncobj_file_fops) {
fdput(f);
return -EINVAL;
}
/* take a reference to put in the idr */
- syncobj = f.file->private_data;
+ syncobj = fd_file(f)->private_data;
drm_syncobj_get(syncobj);
idr_preload(GFP_KERNEL);
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 5f5ad8faf86e..dc57d07a1f45 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1624,13 +1624,13 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
/* Get current fd to protect against it being closed */
f = fdget(cmd.fd);
- if (!f.file)
+ if (!fd_file(f))
return -ENOENT;
- if (f.file->f_op != &ucma_fops) {
+ if (fd_file(f)->f_op != &ucma_fops) {
ret = -EINVAL;
goto file_put;
}
- cur_file = f.file->private_data;
+ cur_file = fd_file(f)->private_data;
/* Validate current fd and prevent destruction of id. */
ctx = ucma_get_ctx(cur_file, cmd.id);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 1b3ea71f2c33..3f85575cf971 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -584,12 +584,12 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
if (cmd.fd != -1) {
/* search for file descriptor */
f = fdget(cmd.fd);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto err_tree_mutex_unlock;
}
- inode = file_inode(f.file);
+ inode = file_inode(fd_file(f));
xrcd = find_xrcd(ibudev, inode);
if (!xrcd && !(cmd.oflags & O_CREAT)) {
/* no file descriptor. Need CREATE flag */
@@ -632,7 +632,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
atomic_inc(&xrcd->usecnt);
}
- if (f.file)
+ if (fd_file(f))
fdput(f);
mutex_unlock(&ibudev->xrcd_tree_mutex);
@@ -648,7 +648,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
uobj_alloc_abort(&obj->uobject, attrs);
err_tree_mutex_unlock:
- if (f.file)
+ if (fd_file(f))
fdput(f);
mutex_unlock(&ibudev->xrcd_tree_mutex);
diff --git a/drivers/media/mc/mc-request.c b/drivers/media/mc/mc-request.c
index addb8f2d8939..e064914c476e 100644
--- a/drivers/media/mc/mc-request.c
+++ b/drivers/media/mc/mc-request.c
@@ -254,12 +254,12 @@ media_request_get_by_fd(struct media_device *mdev, int request_fd)
return ERR_PTR(-EBADR);
f = fdget(request_fd);
- if (!f.file)
+ if (!fd_file(f))
goto err_no_req_fd;
- if (f.file->f_op != &request_fops)
+ if (fd_file(f)->f_op != &request_fops)
goto err_fput;
- req = f.file->private_data;
+ req = fd_file(f)->private_data;
if (req->mdev != mdev)
goto err_fput;
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index 717c441b4a86..b8dfd530fab7 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -820,20 +820,20 @@ struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
struct lirc_fh *fh;
struct rc_dev *dev;
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (f.file->f_op != &lirc_fops) {
+ if (fd_file(f)->f_op != &lirc_fops) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- if (write && !(f.file->f_mode & FMODE_WRITE)) {
+ if (write && !(fd_file(f)->f_mode & FMODE_WRITE)) {
fdput(f);
return ERR_PTR(-EPERM);
}
- fh = f.file->private_data;
+ fh = fd_file(f)->private_data;
dev = fh->rc;
get_device(&dev->dev);
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index ded364588d29..95b336de8a17 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -112,7 +112,7 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
return -EFAULT;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
mutex_lock(&group->group_lock);
@@ -125,13 +125,13 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
goto out_unlock;
}
- container = vfio_container_from_file(f.file);
+ container = vfio_container_from_file(fd_file(f));
if (container) {
ret = vfio_container_attach_group(container, group);
goto out_unlock;
}
- iommufd = iommufd_ctx_from_file(f.file);
+ iommufd = iommufd_ctx_from_file(fd_file(f));
if (!IS_ERR(iommufd)) {
if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) &&
group->type == VFIO_NO_IOMMU)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index 532269133801..d22881245e89 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -134,12 +134,12 @@ int vfio_virqfd_enable(void *opaque,
INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
irqfd = fdget(fd);
- if (!irqfd.file) {
+ if (!fd_file(irqfd)) {
ret = -EBADF;
goto err_fd;
}
- ctx = eventfd_ctx_fileget(irqfd.file);
+ ctx = eventfd_ctx_fileget(fd_file(irqfd));
if (IS_ERR(ctx)) {
ret = PTR_ERR(ctx);
goto err_ctx;
@@ -171,7 +171,7 @@ int vfio_virqfd_enable(void *opaque,
init_waitqueue_func_entry(&virqfd->wait, virqfd_wakeup);
init_poll_funcptr(&virqfd->pt, virqfd_ptable_queue_proc);
- events = vfs_poll(irqfd.file, &virqfd->pt);
+ events = vfs_poll(fd_file(irqfd), &virqfd->pt);
/*
* Check if there was an event already pending on the eventfd
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index d4ad211dce7a..9994d818bb7e 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -125,12 +125,12 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
INIT_WORK(&irqfd->shutdown, hsm_irqfd_shutdown_work);
f = fdget(args->fd);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto out;
}
- eventfd = eventfd_ctx_fileget(f.file);
+ eventfd = eventfd_ctx_fileget(fd_file(f));
if (IS_ERR(eventfd)) {
ret = PTR_ERR(eventfd);
goto fail;
@@ -157,7 +157,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
mutex_unlock(&vm->irqfds_lock);
/* Check the pending event in this stage */
- events = vfs_poll(f.file, &irqfd->pt);
+ events = vfs_poll(fd_file(f), &irqfd->pt);
if (events & EPOLLIN)
acrn_irqfd_inject(irqfd);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 9563650dfbaf..54e4f285c0f4 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -959,12 +959,12 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
f = fdget(irqfd->fd);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto error_kfree;
}
- kirqfd->eventfd = eventfd_ctx_fileget(f.file);
+ kirqfd->eventfd = eventfd_ctx_fileget(fd_file(f));
if (IS_ERR(kirqfd->eventfd)) {
ret = PTR_ERR(kirqfd->eventfd);
goto error_fd_put;
@@ -995,7 +995,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
* Check if there was an event already pending on the eventfd before we
* registered, and trigger it as if we didn't miss it.
*/
- events = vfs_poll(f.file, &kirqfd->pt);
+ events = vfs_poll(fd_file(f), &kirqfd->pt);
if (events & EPOLLIN)
irqfd_inject(kirqfd);
@@ -1345,12 +1345,12 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
return -ENOMEM;
f = fdget(ioeventfd->event_fd);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto error_kfree;
}
- kioeventfd->eventfd = eventfd_ctx_fileget(f.file);
+ kioeventfd->eventfd = eventfd_ctx_fileget(fd_file(f));
fdput(f);
if (IS_ERR(kioeventfd->eventfd)) {
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index e0a664b8a46a..32ddd3d31719 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1312,12 +1312,12 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
} else {
struct fd src = fdget(fd);
struct inode *src_inode;
- if (!src.file) {
+ if (!fd_file(src)) {
ret = -EINVAL;
goto out_drop_write;
}
- src_inode = file_inode(src.file);
+ src_inode = file_inode(fd_file(src));
if (src_inode->i_sb != file_inode(file)->i_sb) {
btrfs_info(BTRFS_I(file_inode(file))->root->fs_info,
"Snapshot src from another FS");
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 6898dc621011..7d56b6d1e4c3 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -127,9 +127,9 @@ static int coda_parse_fd(struct fs_context *fc, int fd)
int idx;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- inode = file_inode(f.file);
+ inode = file_inode(fd_file(f));
if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
fdput(f);
return invalf(fc, "code: Not coda psdev");
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 9afdb722fa92..22c934f3a080 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -349,9 +349,9 @@ struct eventfd_ctx *eventfd_ctx_fdget(int fd)
{
struct eventfd_ctx *ctx;
struct fd f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- ctx = eventfd_ctx_fileget(f.file);
+ ctx = eventfd_ctx_fileget(fd_file(f));
fdput(f);
return ctx;
}
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f53ca4f7fced..28d1a754cf33 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2266,17 +2266,17 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
error = -EBADF;
f = fdget(epfd);
- if (!f.file)
+ if (!fd_file(f))
goto error_return;
/* Get the "struct file *" for the target file */
tf = fdget(fd);
- if (!tf.file)
+ if (!fd_file(tf))
goto error_fput;
/* The target file descriptor must support poll */
error = -EPERM;
- if (!file_can_poll(tf.file))
+ if (!file_can_poll(fd_file(tf)))
goto error_tgt_fput;
/* Check if EPOLLWAKEUP is allowed */
@@ -2289,7 +2289,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
* adding an epoll file descriptor inside itself.
*/
error = -EINVAL;
- if (f.file == tf.file || !is_file_epoll(f.file))
+ if (fd_file(f) == fd_file(tf) || !is_file_epoll(fd_file(f)))
goto error_tgt_fput;
/*
@@ -2300,7 +2300,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
if (ep_op_has_event(op) && (epds->events & EPOLLEXCLUSIVE)) {
if (op == EPOLL_CTL_MOD)
goto error_tgt_fput;
- if (op == EPOLL_CTL_ADD && (is_file_epoll(tf.file) ||
+ if (op == EPOLL_CTL_ADD && (is_file_epoll(fd_file(tf)) ||
(epds->events & ~EPOLLEXCLUSIVE_OK_BITS)))
goto error_tgt_fput;
}
@@ -2309,7 +2309,7 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
* At this point it is safe to assume that the "private_data" contains
* our own data structure.
*/
- ep = f.file->private_data;
+ ep = fd_file(f)->private_data;
/*
* When we insert an epoll file descriptor inside another epoll file
@@ -2330,16 +2330,16 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
if (error)
goto error_tgt_fput;
if (op == EPOLL_CTL_ADD) {
- if (READ_ONCE(f.file->f_ep) || ep->gen == loop_check_gen ||
- is_file_epoll(tf.file)) {
+ if (READ_ONCE(fd_file(f)->f_ep) || ep->gen == loop_check_gen ||
+ is_file_epoll(fd_file(tf))) {
mutex_unlock(&ep->mtx);
error = epoll_mutex_lock(&epnested_mutex, 0, nonblock);
if (error)
goto error_tgt_fput;
loop_check_gen++;
full_check = 1;
- if (is_file_epoll(tf.file)) {
- tep = tf.file->private_data;
+ if (is_file_epoll(fd_file(tf))) {
+ tep = fd_file(tf)->private_data;
error = -ELOOP;
if (ep_loop_check(ep, tep) != 0)
goto error_tgt_fput;
@@ -2355,14 +2355,14 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
* above, we can be sure to be able to use the item looked up by
* ep_find() till we release the mutex.
*/
- epi = ep_find(ep, tf.file, fd);
+ epi = ep_find(ep, fd_file(tf), fd);
error = -EINVAL;
switch (op) {
case EPOLL_CTL_ADD:
if (!epi) {
epds->events |= EPOLLERR | EPOLLHUP;
- error = ep_insert(ep, epds, tf.file, fd, full_check);
+ error = ep_insert(ep, epds, fd_file(tf), fd, full_check);
} else
error = -EEXIST;
break;
@@ -2443,7 +2443,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
/* Get the "struct file *" for the eventpoll file */
f = fdget(epfd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
/*
@@ -2451,14 +2451,14 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
* the user passed to us _is_ an eventpoll file.
*/
error = -EINVAL;
- if (!is_file_epoll(f.file))
+ if (!is_file_epoll(fd_file(f)))
goto error_fput;
/*
* At this point it is safe to assume that the "private_data" contains
* our own data structure.
*/
- ep = f.file->private_data;
+ ep = fd_file(f)->private_data;
/* Time to fish for events ... */
error = ep_poll(ep, events, maxevents, to);
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index e8bf5972dd47..1c77400bd88e 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1343,10 +1343,10 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
me.moved_len = 0;
donor = fdget(me.donor_fd);
- if (!donor.file)
+ if (!fd_file(donor))
return -EBADF;
- if (!(donor.file->f_mode & FMODE_WRITE)) {
+ if (!(fd_file(donor)->f_mode & FMODE_WRITE)) {
err = -EBADF;
goto mext_out;
}
@@ -1367,7 +1367,7 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
if (err)
goto mext_out;
- err = ext4_move_extents(filp, donor.file, me.orig_start,
+ err = ext4_move_extents(filp, fd_file(donor), me.orig_start,
me.donor_start, me.len, &me.moved_len);
mnt_drop_write_file(filp);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 168f08507004..903337f8d21a 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3014,10 +3014,10 @@ static int __f2fs_ioc_move_range(struct file *filp,
return -EBADF;
dst = fdget(range->dst_fd);
- if (!dst.file)
+ if (!fd_file(dst))
return -EBADF;
- if (!(dst.file->f_mode & FMODE_WRITE)) {
+ if (!(fd_file(dst)->f_mode & FMODE_WRITE)) {
err = -EBADF;
goto err_out;
}
@@ -3026,7 +3026,7 @@ static int __f2fs_ioc_move_range(struct file *filp,
if (err)
goto err_out;
- err = f2fs_move_file_range(filp, range->pos_in, dst.file,
+ err = f2fs_move_file_range(filp, range->pos_in, fd_file(dst),
range->pos_out, range->len);
mnt_drop_write_file(filp);
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 300e5d9ad913..2b5616762354 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -340,7 +340,7 @@ static long f_dupfd_query(int fd, struct file *filp)
* overkill, but given our lockless file pointer lookup, the
* alternatives are complicated.
*/
- return f.file == filp;
+ return fd_file(f) == filp;
}
static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
@@ -479,17 +479,17 @@ SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
struct fd f = fdget_raw(fd);
long err = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
goto out;
- if (unlikely(f.file->f_mode & FMODE_PATH)) {
+ if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
if (!check_fcntl_cmd(cmd))
goto out1;
}
- err = security_file_fcntl(f.file, cmd, arg);
+ err = security_file_fcntl(fd_file(f), cmd, arg);
if (!err)
- err = do_fcntl(fd, cmd, arg, f.file);
+ err = do_fcntl(fd, cmd, arg, fd_file(f));
out1:
fdput(f);
@@ -506,15 +506,15 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
struct flock64 flock;
long err = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
goto out;
- if (unlikely(f.file->f_mode & FMODE_PATH)) {
+ if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
if (!check_fcntl_cmd(cmd))
goto out1;
}
- err = security_file_fcntl(f.file, cmd, arg);
+ err = security_file_fcntl(fd_file(f), cmd, arg);
if (err)
goto out1;
@@ -524,7 +524,7 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
err = -EFAULT;
if (copy_from_user(&flock, argp, sizeof(flock)))
break;
- err = fcntl_getlk64(f.file, cmd, &flock);
+ err = fcntl_getlk64(fd_file(f), cmd, &flock);
if (!err && copy_to_user(argp, &flock, sizeof(flock)))
err = -EFAULT;
break;
@@ -535,10 +535,10 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
err = -EFAULT;
if (copy_from_user(&flock, argp, sizeof(flock)))
break;
- err = fcntl_setlk64(fd, f.file, cmd, &flock);
+ err = fcntl_setlk64(fd, fd_file(f), cmd, &flock);
break;
default:
- err = do_fcntl(fd, cmd, arg, f.file);
+ err = do_fcntl(fd, cmd, arg, fd_file(f));
break;
}
out1:
@@ -643,15 +643,15 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
struct flock flock;
long err = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
return err;
- if (unlikely(f.file->f_mode & FMODE_PATH)) {
+ if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
if (!check_fcntl_cmd(cmd))
goto out_put;
}
- err = security_file_fcntl(f.file, cmd, arg);
+ err = security_file_fcntl(fd_file(f), cmd, arg);
if (err)
goto out_put;
@@ -660,7 +660,7 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
err = get_compat_flock(&flock, compat_ptr(arg));
if (err)
break;
- err = fcntl_getlk(f.file, convert_fcntl_cmd(cmd), &flock);
+ err = fcntl_getlk(fd_file(f), convert_fcntl_cmd(cmd), &flock);
if (err)
break;
err = fixup_compat_flock(&flock);
@@ -672,7 +672,7 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
err = get_compat_flock64(&flock, compat_ptr(arg));
if (err)
break;
- err = fcntl_getlk(f.file, convert_fcntl_cmd(cmd), &flock);
+ err = fcntl_getlk(fd_file(f), convert_fcntl_cmd(cmd), &flock);
if (!err)
err = put_compat_flock64(&flock, compat_ptr(arg));
break;
@@ -681,7 +681,7 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
err = get_compat_flock(&flock, compat_ptr(arg));
if (err)
break;
- err = fcntl_setlk(fd, f.file, convert_fcntl_cmd(cmd), &flock);
+ err = fcntl_setlk(fd, fd_file(f), convert_fcntl_cmd(cmd), &flock);
break;
case F_SETLK64:
case F_SETLKW64:
@@ -690,10 +690,10 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
err = get_compat_flock64(&flock, compat_ptr(arg));
if (err)
break;
- err = fcntl_setlk(fd, f.file, convert_fcntl_cmd(cmd), &flock);
+ err = fcntl_setlk(fd, fd_file(f), convert_fcntl_cmd(cmd), &flock);
break;
default:
- err = do_fcntl(fd, cmd, arg, f.file);
+ err = do_fcntl(fd, cmd, arg, fd_file(f));
break;
}
out_put:
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 6e8cea16790e..3f07b52874a8 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -125,9 +125,9 @@ static int get_path_from_fd(int fd, struct path *root)
spin_unlock(&fs->lock);
} else {
struct fd f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- *root = f.file->f_path;
+ *root = fd_file(f)->f_path;
path_get(root);
fdput(f);
}
diff --git a/fs/fsopen.c b/fs/fsopen.c
index ed2dd000622e..ee92ca58429e 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -394,13 +394,13 @@ SYSCALL_DEFINE5(fsconfig,
}
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
ret = -EINVAL;
- if (f.file->f_op != &fscontext_fops)
+ if (fd_file(f)->f_op != &fscontext_fops)
goto out_f;
- fc = f.file->private_data;
+ fc = fd_file(f)->private_data;
if (fc->ops == &legacy_fs_context_ops) {
switch (cmd) {
case FSCONFIG_SET_BINARY:
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..991b9ae8e7c9 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2321,15 +2321,15 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
return -EFAULT;
f = fdget(oldfd);
- if (!f.file)
+ if (!fd_file(f))
return -EINVAL;
/*
* Check against file->f_op because CUSE
* uses the same ioctl handler.
*/
- if (f.file->f_op == file->f_op)
- fud = fuse_get_dev(f.file);
+ if (fd_file(f)->f_op == file->f_op)
+ fud = fuse_get_dev(fd_file(f));
res = -EINVAL;
if (fud) {
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 64776891120c..6e0c954388d4 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -235,9 +235,9 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
loff_t cloned;
int ret;
- if (!src_file.file)
+ if (!fd_file(src_file))
return -EBADF;
- cloned = vfs_clone_file_range(src_file.file, off, dst_file, destoff,
+ cloned = vfs_clone_file_range(fd_file(src_file), off, dst_file, destoff,
olen, 0);
if (cloned < 0)
ret = cloned;
@@ -895,16 +895,16 @@ SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
struct fd f = fdget(fd);
int error;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = security_file_ioctl(f.file, cmd, arg);
+ error = security_file_ioctl(fd_file(f), cmd, arg);
if (error)
goto out;
- error = do_vfs_ioctl(f.file, fd, cmd, arg);
+ error = do_vfs_ioctl(fd_file(f), fd, cmd, arg);
if (error == -ENOIOCTLCMD)
- error = vfs_ioctl(f.file, cmd, arg);
+ error = vfs_ioctl(fd_file(f), cmd, arg);
out:
fdput(f);
@@ -953,32 +953,32 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
struct fd f = fdget(fd);
int error;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = security_file_ioctl_compat(f.file, cmd, arg);
+ error = security_file_ioctl_compat(fd_file(f), cmd, arg);
if (error)
goto out;
switch (cmd) {
/* FICLONE takes an int argument, so don't use compat_ptr() */
case FICLONE:
- error = ioctl_file_clone(f.file, arg, 0, 0, 0);
+ error = ioctl_file_clone(fd_file(f), arg, 0, 0, 0);
break;
#if defined(CONFIG_X86_64)
/* these get messy on amd64 due to alignment differences */
case FS_IOC_RESVSP_32:
case FS_IOC_RESVSP64_32:
- error = compat_ioctl_preallocate(f.file, 0, compat_ptr(arg));
+ error = compat_ioctl_preallocate(fd_file(f), 0, compat_ptr(arg));
break;
case FS_IOC_UNRESVSP_32:
case FS_IOC_UNRESVSP64_32:
- error = compat_ioctl_preallocate(f.file, FALLOC_FL_PUNCH_HOLE,
+ error = compat_ioctl_preallocate(fd_file(f), FALLOC_FL_PUNCH_HOLE,
compat_ptr(arg));
break;
case FS_IOC_ZERO_RANGE_32:
- error = compat_ioctl_preallocate(f.file, FALLOC_FL_ZERO_RANGE,
+ error = compat_ioctl_preallocate(fd_file(f), FALLOC_FL_ZERO_RANGE,
compat_ptr(arg));
break;
#endif
@@ -998,13 +998,13 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
* argument.
*/
default:
- error = do_vfs_ioctl(f.file, fd, cmd,
+ error = do_vfs_ioctl(fd_file(f), fd, cmd,
(unsigned long)compat_ptr(arg));
if (error != -ENOIOCTLCMD)
break;
- if (f.file->f_op->compat_ioctl)
- error = f.file->f_op->compat_ioctl(f.file, cmd, arg);
+ if (fd_file(f)->f_op->compat_ioctl)
+ error = fd_file(f)->f_op->compat_ioctl(fd_file(f), cmd, arg);
if (error == -ENOIOCTLCMD)
error = -ENOTTY;
break;
diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index c429c42a6867..9ff37ae650ea 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -178,10 +178,10 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
struct fd f = fdget(fd);
ssize_t ret = -EBADF;
- if (!f.file || !(f.file->f_mode & FMODE_READ))
+ if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
goto out;
- ret = kernel_read_file(f.file, offset, buf, buf_size, file_size, id);
+ ret = kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
out:
fdput(f);
return ret;
diff --git a/fs/locks.c b/fs/locks.c
index 9afb16e0683f..239759a356e9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2153,15 +2153,15 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
error = -EBADF;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return error;
- if (type != F_UNLCK && !(f.file->f_mode & (FMODE_READ | FMODE_WRITE)))
+ if (type != F_UNLCK && !(fd_file(f)->f_mode & (FMODE_READ | FMODE_WRITE)))
goto out_putf;
- flock_make_lock(f.file, &fl, type);
+ flock_make_lock(fd_file(f), &fl, type);
- error = security_file_lock(f.file, fl.c.flc_type);
+ error = security_file_lock(fd_file(f), fl.c.flc_type);
if (error)
goto out_putf;
@@ -2169,12 +2169,12 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
if (can_sleep)
fl.c.flc_flags |= FL_SLEEP;
- if (f.file->f_op->flock)
- error = f.file->f_op->flock(f.file,
+ if (fd_file(f)->f_op->flock)
+ error = fd_file(f)->f_op->flock(fd_file(f),
(can_sleep) ? F_SETLKW : F_SETLK,
&fl);
else
- error = locks_lock_file_wait(f.file, &fl);
+ error = locks_lock_file_wait(fd_file(f), &fl);
locks_release_private(&fl);
out_putf:
diff --git a/fs/namei.c b/fs/namei.c
index 5512cb10fa89..af86e3330594 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2492,25 +2492,25 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
struct fd f = fdget_raw(nd->dfd);
struct dentry *dentry;
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
if (flags & LOOKUP_LINKAT_EMPTY) {
- if (f.file->f_cred != current_cred() &&
- !ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
+ if (fd_file(f)->f_cred != current_cred() &&
+ !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
fdput(f);
return ERR_PTR(-ENOENT);
}
}
- dentry = f.file->f_path.dentry;
+ dentry = fd_file(f)->f_path.dentry;
if (*s && unlikely(!d_can_lookup(dentry))) {
fdput(f);
return ERR_PTR(-ENOTDIR);
}
- nd->path = f.file->f_path;
+ nd->path = fd_file(f)->f_path;
if (flags & LOOKUP_RCU) {
nd->inode = nd->path.dentry->d_inode;
nd->seq = read_seqcount_begin(&nd->path.dentry->d_seq);
diff --git a/fs/namespace.c b/fs/namespace.c
index 328087a4df8a..c46d48bb38cd 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4099,14 +4099,14 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
}
f = fdget(fs_fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
ret = -EINVAL;
- if (f.file->f_op != &fscontext_fops)
+ if (fd_file(f)->f_op != &fscontext_fops)
goto err_fsfd;
- fc = f.file->private_data;
+ fc = fd_file(f)->private_data;
ret = mutex_lock_interruptible(&fc->uapi_mutex);
if (ret < 0)
@@ -4649,15 +4649,15 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
return -EINVAL;
f = fdget(attr->userns_fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (!proc_ns_file(f.file)) {
+ if (!proc_ns_file(fd_file(f))) {
err = -EINVAL;
goto out_fput;
}
- ns = get_proc_ns(file_inode(f.file));
+ ns = get_proc_ns(file_inode(fd_file(f)));
if (ns->ops->type != CLONE_NEWUSER) {
err = -EINVAL;
goto out_fput;
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9ec313e9f6e1..13454e5fd3fb 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1006,17 +1006,17 @@ static int fanotify_find_path(int dfd, const char __user *filename,
struct fd f = fdget(dfd);
ret = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
goto out;
ret = -ENOTDIR;
if ((flags & FAN_MARK_ONLYDIR) &&
- !(S_ISDIR(file_inode(f.file)->i_mode))) {
+ !(S_ISDIR(file_inode(fd_file(f))->i_mode))) {
fdput(f);
goto out;
}
- *path = f.file->f_path;
+ *path = fd_file(f)->f_path;
path_get(path);
fdput(f);
} else {
@@ -1753,14 +1753,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
}
f = fdget(fanotify_fd);
- if (unlikely(!f.file))
+ if (unlikely(!fd_file(f)))
return -EBADF;
/* verify that this is indeed an fanotify instance */
ret = -EINVAL;
- if (unlikely(f.file->f_op != &fanotify_fops))
+ if (unlikely(fd_file(f)->f_op != &fanotify_fops))
goto fput_and_out;
- group = f.file->private_data;
+ group = fd_file(f)->private_data;
/*
* An unprivileged user is not allowed to setup mount nor filesystem
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 4ffc30606e0b..c7e451d5bd51 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -753,7 +753,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
return -EINVAL;
f = fdget(fd);
- if (unlikely(!f.file))
+ if (unlikely(!fd_file(f)))
return -EBADF;
/* IN_MASK_ADD and IN_MASK_CREATE don't make sense together */
@@ -763,7 +763,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
}
/* verify that this is indeed an inotify instance */
- if (unlikely(f.file->f_op != &inotify_fops)) {
+ if (unlikely(fd_file(f)->f_op != &inotify_fops)) {
ret = -EINVAL;
goto fput_and_out;
}
@@ -780,7 +780,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
/* inode held in place by reference to path; group by fget on fd */
inode = path.dentry->d_inode;
- group = f.file->private_data;
+ group = fd_file(f)->private_data;
/* create/update an inode mark */
ret = inotify_update_watch(group, inode, mask);
@@ -798,14 +798,14 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
int ret = -EINVAL;
f = fdget(fd);
- if (unlikely(!f.file))
+ if (unlikely(!fd_file(f)))
return -EBADF;
/* verify that this is indeed an inotify instance */
- if (unlikely(f.file->f_op != &inotify_fops))
+ if (unlikely(fd_file(f)->f_op != &inotify_fops))
goto out;
- group = f.file->private_data;
+ group = fd_file(f)->private_data;
i_mark = inotify_idr_find(group, wd);
if (unlikely(!i_mark))
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 1bde1281d514..4b9f45d7049e 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1785,17 +1785,17 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
goto out;
f = fdget(fd);
- if (f.file == NULL)
+ if (fd_file(f) == NULL)
goto out;
if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
reg->hr_block_bytes == 0)
goto out2;
- if (!S_ISBLK(f.file->f_mapping->host->i_mode))
+ if (!S_ISBLK(fd_file(f)->f_mapping->host->i_mode))
goto out2;
- reg->hr_bdev_file = bdev_file_open_by_dev(f.file->f_mapping->host->i_rdev,
+ reg->hr_bdev_file = bdev_file_open_by_dev(fd_file(f)->f_mapping->host->i_rdev,
BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
if (IS_ERR(reg->hr_bdev_file)) {
ret = PTR_ERR(reg->hr_bdev_file);
diff --git a/fs/open.c b/fs/open.c
index 22adbef7ecc2..a388828ccd22 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -193,10 +193,10 @@ long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
if (length < 0)
return -EINVAL;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = do_ftruncate(f.file, length, small);
+ error = do_ftruncate(fd_file(f), length, small);
fdput(f);
return error;
@@ -353,8 +353,8 @@ int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len)
struct fd f = fdget(fd);
int error = -EBADF;
- if (f.file) {
- error = vfs_fallocate(f.file, mode, offset, len);
+ if (fd_file(f)) {
+ error = vfs_fallocate(fd_file(f), mode, offset, len);
fdput(f);
}
return error;
@@ -585,16 +585,16 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
int error;
error = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
goto out;
error = -ENOTDIR;
- if (!d_can_lookup(f.file->f_path.dentry))
+ if (!d_can_lookup(fd_file(f)->f_path.dentry))
goto out_putf;
- error = file_permission(f.file, MAY_EXEC | MAY_CHDIR);
+ error = file_permission(fd_file(f), MAY_EXEC | MAY_CHDIR);
if (!error)
- set_fs_pwd(current->fs, &f.file->f_path);
+ set_fs_pwd(current->fs, &fd_file(f)->f_path);
out_putf:
fdput(f);
out:
@@ -675,8 +675,8 @@ SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
struct fd f = fdget(fd);
int err = -EBADF;
- if (f.file) {
- err = vfs_fchmod(f.file, mode);
+ if (fd_file(f)) {
+ err = vfs_fchmod(fd_file(f), mode);
fdput(f);
}
return err;
@@ -869,8 +869,8 @@ int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
struct fd f = fdget(fd);
int error = -EBADF;
- if (f.file) {
- error = vfs_fchown(f.file, user, group);
+ if (fd_file(f)) {
+ error = vfs_fchown(fd_file(f), user, group);
fdput(f);
}
return error;
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 1a411cae57ed..c4963d0c5549 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -209,13 +209,13 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
* files, so we use the real file to perform seeks.
*/
ovl_inode_lock(inode);
- real.file->f_pos = file->f_pos;
+ fd_file(real)->f_pos = file->f_pos;
old_cred = ovl_override_creds(inode->i_sb);
- ret = vfs_llseek(real.file, offset, whence);
+ ret = vfs_llseek(fd_file(real), offset, whence);
revert_creds(old_cred);
- file->f_pos = real.file->f_pos;
+ file->f_pos = fd_file(real)->f_pos;
ovl_inode_unlock(inode);
fdput(real);
@@ -275,7 +275,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
if (ret)
return ret;
- ret = backing_file_read_iter(real.file, iter, iocb, iocb->ki_flags,
+ ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
&ctx);
fdput(real);
@@ -314,7 +314,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
* this property in case it is set by the issuer.
*/
ifl &= ~IOCB_DIO_CALLER_COMP;
- ret = backing_file_write_iter(real.file, iter, iocb, ifl, &ctx);
+ ret = backing_file_write_iter(fd_file(real), iter, iocb, ifl, &ctx);
fdput(real);
out_unlock:
@@ -339,7 +339,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
if (ret)
return ret;
- ret = backing_file_splice_read(real.file, ppos, pipe, len, flags, &ctx);
+ ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
fdput(real);
return ret;
@@ -348,7 +348,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
/*
* Calling iter_file_splice_write() directly from overlay's f_op may deadlock
* due to lock order inversion between pipe->mutex in iter_file_splice_write()
- * and file_start_write(real.file) in ovl_write_iter().
+ * and file_start_write(fd_file(real)) in ovl_write_iter().
*
* So do everything ovl_write_iter() does and call iter_file_splice_write() on
* the real file.
@@ -373,7 +373,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
if (ret)
goto out_unlock;
- ret = backing_file_splice_write(pipe, real.file, ppos, len, flags, &ctx);
+ ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
fdput(real);
out_unlock:
@@ -397,9 +397,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
return ret;
/* Don't sync lower file for fear of receiving EROFS error */
- if (file_inode(real.file) == ovl_inode_upper(file_inode(file))) {
+ if (file_inode(fd_file(real)) == ovl_inode_upper(file_inode(file))) {
old_cred = ovl_override_creds(file_inode(file)->i_sb);
- ret = vfs_fsync_range(real.file, start, end, datasync);
+ ret = vfs_fsync_range(fd_file(real), start, end, datasync);
revert_creds(old_cred);
}
@@ -439,7 +439,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
goto out_unlock;
old_cred = ovl_override_creds(file_inode(file)->i_sb);
- ret = vfs_fallocate(real.file, mode, offset, len);
+ ret = vfs_fallocate(fd_file(real), mode, offset, len);
revert_creds(old_cred);
/* Update size */
@@ -464,7 +464,7 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
return ret;
old_cred = ovl_override_creds(file_inode(file)->i_sb);
- ret = vfs_fadvise(real.file, offset, len, advice);
+ ret = vfs_fadvise(fd_file(real), offset, len, advice);
revert_creds(old_cred);
fdput(real);
@@ -509,18 +509,18 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
old_cred = ovl_override_creds(file_inode(file_out)->i_sb);
switch (op) {
case OVL_COPY:
- ret = vfs_copy_file_range(real_in.file, pos_in,
- real_out.file, pos_out, len, flags);
+ ret = vfs_copy_file_range(fd_file(real_in), pos_in,
+ fd_file(real_out), pos_out, len, flags);
break;
case OVL_CLONE:
- ret = vfs_clone_file_range(real_in.file, pos_in,
- real_out.file, pos_out, len, flags);
+ ret = vfs_clone_file_range(fd_file(real_in), pos_in,
+ fd_file(real_out), pos_out, len, flags);
break;
case OVL_DEDUPE:
- ret = vfs_dedupe_file_range_one(real_in.file, pos_in,
- real_out.file, pos_out, len,
+ ret = vfs_dedupe_file_range_one(fd_file(real_in), pos_in,
+ fd_file(real_out), pos_out, len,
flags);
break;
}
@@ -583,9 +583,9 @@ static int ovl_flush(struct file *file, fl_owner_t id)
if (err)
return err;
- if (real.file->f_op->flush) {
+ if (fd_file(real)->f_op->flush) {
old_cred = ovl_override_creds(file_inode(file)->i_sb);
- err = real.file->f_op->flush(real.file, id);
+ err = fd_file(real)->f_op->flush(fd_file(real), id);
revert_creds(old_cred);
}
fdput(real);
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 0e41fb84060f..290157bc7bec 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -980,7 +980,7 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
int ret;
f = fdget_raw(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
ret = -EINVAL;
@@ -988,12 +988,12 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
goto out;
if (quotactl_cmd_write(cmds)) {
- ret = mnt_want_write(f.file->f_path.mnt);
+ ret = mnt_want_write(fd_file(f)->f_path.mnt);
if (ret)
goto out;
}
- sb = f.file->f_path.mnt->mnt_sb;
+ sb = fd_file(f)->f_path.mnt->mnt_sb;
if (quotactl_cmd_onoff(cmds))
down_write(&sb->s_umount);
else
@@ -1007,7 +1007,7 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
up_read(&sb->s_umount);
if (quotactl_cmd_write(cmds))
- mnt_drop_write(f.file->f_path.mnt);
+ mnt_drop_write(fd_file(f)->f_path.mnt);
out:
fdput(f);
return ret;
diff --git a/fs/read_write.c b/fs/read_write.c
index 90e283b31ca1..59d6d0dee579 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -294,12 +294,12 @@ static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
{
off_t retval;
struct fd f = fdget_pos(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
retval = -EINVAL;
if (whence <= SEEK_MAX) {
- loff_t res = vfs_llseek(f.file, offset, whence);
+ loff_t res = vfs_llseek(fd_file(f), offset, whence);
retval = res;
if (res != (loff_t)retval)
retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */
@@ -330,14 +330,14 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
struct fd f = fdget_pos(fd);
loff_t offset;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
retval = -EINVAL;
if (whence > SEEK_MAX)
goto out_putf;
- offset = vfs_llseek(f.file, ((loff_t) offset_high << 32) | offset_low,
+ offset = vfs_llseek(fd_file(f), ((loff_t) offset_high << 32) | offset_low,
whence);
retval = (int)offset;
@@ -610,15 +610,15 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
- if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
+ if (fd_file(f)) {
+ loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
ppos = &pos;
}
- ret = vfs_read(f.file, buf, count, ppos);
+ ret = vfs_read(fd_file(f), buf, count, ppos);
if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ fd_file(f)->f_pos = pos;
fdput_pos(f);
}
return ret;
@@ -634,15 +634,15 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
- if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
+ if (fd_file(f)) {
+ loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
ppos = &pos;
}
- ret = vfs_write(f.file, buf, count, ppos);
+ ret = vfs_write(fd_file(f), buf, count, ppos);
if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ fd_file(f)->f_pos = pos;
fdput_pos(f);
}
@@ -665,10 +665,10 @@ ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
return -EINVAL;
f = fdget(fd);
- if (f.file) {
+ if (fd_file(f)) {
ret = -ESPIPE;
- if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_read(f.file, buf, count, &pos);
+ if (fd_file(f)->f_mode & FMODE_PREAD)
+ ret = vfs_read(fd_file(f), buf, count, &pos);
fdput(f);
}
@@ -699,10 +699,10 @@ ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
return -EINVAL;
f = fdget(fd);
- if (f.file) {
+ if (fd_file(f)) {
ret = -ESPIPE;
- if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_write(f.file, buf, count, &pos);
+ if (fd_file(f)->f_mode & FMODE_PWRITE)
+ ret = vfs_write(fd_file(f), buf, count, &pos);
fdput(f);
}
@@ -985,15 +985,15 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
- if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
+ if (fd_file(f)) {
+ loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
ppos = &pos;
}
- ret = vfs_readv(f.file, vec, vlen, ppos, flags);
+ ret = vfs_readv(fd_file(f), vec, vlen, ppos, flags);
if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ fd_file(f)->f_pos = pos;
fdput_pos(f);
}
@@ -1009,15 +1009,15 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
struct fd f = fdget_pos(fd);
ssize_t ret = -EBADF;
- if (f.file) {
- loff_t pos, *ppos = file_ppos(f.file);
+ if (fd_file(f)) {
+ loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
ppos = &pos;
}
- ret = vfs_writev(f.file, vec, vlen, ppos, flags);
+ ret = vfs_writev(fd_file(f), vec, vlen, ppos, flags);
if (ret >= 0 && ppos)
- f.file->f_pos = pos;
+ fd_file(f)->f_pos = pos;
fdput_pos(f);
}
@@ -1043,10 +1043,10 @@ static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
return -EINVAL;
f = fdget(fd);
- if (f.file) {
+ if (fd_file(f)) {
ret = -ESPIPE;
- if (f.file->f_mode & FMODE_PREAD)
- ret = vfs_readv(f.file, vec, vlen, &pos, flags);
+ if (fd_file(f)->f_mode & FMODE_PREAD)
+ ret = vfs_readv(fd_file(f), vec, vlen, &pos, flags);
fdput(f);
}
@@ -1066,10 +1066,10 @@ static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
return -EINVAL;
f = fdget(fd);
- if (f.file) {
+ if (fd_file(f)) {
ret = -ESPIPE;
- if (f.file->f_mode & FMODE_PWRITE)
- ret = vfs_writev(f.file, vec, vlen, &pos, flags);
+ if (fd_file(f)->f_mode & FMODE_PWRITE)
+ ret = vfs_writev(fd_file(f), vec, vlen, &pos, flags);
fdput(f);
}
@@ -1235,19 +1235,19 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
*/
retval = -EBADF;
in = fdget(in_fd);
- if (!in.file)
+ if (!fd_file(in))
goto out;
- if (!(in.file->f_mode & FMODE_READ))
+ if (!(fd_file(in)->f_mode & FMODE_READ))
goto fput_in;
retval = -ESPIPE;
if (!ppos) {
- pos = in.file->f_pos;
+ pos = fd_file(in)->f_pos;
} else {
pos = *ppos;
- if (!(in.file->f_mode & FMODE_PREAD))
+ if (!(fd_file(in)->f_mode & FMODE_PREAD))
goto fput_in;
}
- retval = rw_verify_area(READ, in.file, &pos, count);
+ retval = rw_verify_area(READ, fd_file(in), &pos, count);
if (retval < 0)
goto fput_in;
if (count > MAX_RW_COUNT)
@@ -1258,13 +1258,13 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
*/
retval = -EBADF;
out = fdget(out_fd);
- if (!out.file)
+ if (!fd_file(out))
goto fput_in;
- if (!(out.file->f_mode & FMODE_WRITE))
+ if (!(fd_file(out)->f_mode & FMODE_WRITE))
goto fput_out;
- in_inode = file_inode(in.file);
- out_inode = file_inode(out.file);
- out_pos = out.file->f_pos;
+ in_inode = file_inode(fd_file(in));
+ out_inode = file_inode(fd_file(out));
+ out_pos = fd_file(out)->f_pos;
if (!max)
max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
@@ -1284,33 +1284,33 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
* and the application is arguably buggy if it doesn't expect
* EAGAIN on a non-blocking file descriptor.
*/
- if (in.file->f_flags & O_NONBLOCK)
+ if (fd_file(in)->f_flags & O_NONBLOCK)
fl = SPLICE_F_NONBLOCK;
#endif
- opipe = get_pipe_info(out.file, true);
+ opipe = get_pipe_info(fd_file(out), true);
if (!opipe) {
- retval = rw_verify_area(WRITE, out.file, &out_pos, count);
+ retval = rw_verify_area(WRITE, fd_file(out), &out_pos, count);
if (retval < 0)
goto fput_out;
- retval = do_splice_direct(in.file, &pos, out.file, &out_pos,
+ retval = do_splice_direct(fd_file(in), &pos, fd_file(out), &out_pos,
count, fl);
} else {
- if (out.file->f_flags & O_NONBLOCK)
+ if (fd_file(out)->f_flags & O_NONBLOCK)
fl |= SPLICE_F_NONBLOCK;
- retval = splice_file_to_pipe(in.file, opipe, &pos, count, fl);
+ retval = splice_file_to_pipe(fd_file(in), opipe, &pos, count, fl);
}
if (retval > 0) {
add_rchar(current, retval);
add_wchar(current, retval);
- fsnotify_access(in.file);
- fsnotify_modify(out.file);
- out.file->f_pos = out_pos;
+ fsnotify_access(fd_file(in));
+ fsnotify_modify(fd_file(out));
+ fd_file(out)->f_pos = out_pos;
if (ppos)
*ppos = pos;
else
- in.file->f_pos = pos;
+ fd_file(in)->f_pos = pos;
}
inc_syscr(current);
@@ -1583,11 +1583,11 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
ssize_t ret = -EBADF;
f_in = fdget(fd_in);
- if (!f_in.file)
+ if (!fd_file(f_in))
goto out2;
f_out = fdget(fd_out);
- if (!f_out.file)
+ if (!fd_file(f_out))
goto out1;
ret = -EFAULT;
@@ -1595,21 +1595,21 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
goto out;
} else {
- pos_in = f_in.file->f_pos;
+ pos_in = fd_file(f_in)->f_pos;
}
if (off_out) {
if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
goto out;
} else {
- pos_out = f_out.file->f_pos;
+ pos_out = fd_file(f_out)->f_pos;
}
ret = -EINVAL;
if (flags != 0)
goto out;
- ret = vfs_copy_file_range(f_in.file, pos_in, f_out.file, pos_out, len,
+ ret = vfs_copy_file_range(fd_file(f_in), pos_in, fd_file(f_out), pos_out, len,
flags);
if (ret > 0) {
pos_in += ret;
@@ -1619,14 +1619,14 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
if (copy_to_user(off_in, &pos_in, sizeof(loff_t)))
ret = -EFAULT;
} else {
- f_in.file->f_pos = pos_in;
+ fd_file(f_in)->f_pos = pos_in;
}
if (off_out) {
if (copy_to_user(off_out, &pos_out, sizeof(loff_t)))
ret = -EFAULT;
} else {
- f_out.file->f_pos = pos_out;
+ fd_file(f_out)->f_pos = pos_out;
}
}
diff --git a/fs/readdir.c b/fs/readdir.c
index d6c82421902a..6d29cab8576e 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -225,10 +225,10 @@ SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
.dirent = dirent
};
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = iterate_dir(f.file, &buf.ctx);
+ error = iterate_dir(fd_file(f), &buf.ctx);
if (buf.result)
error = buf.result;
@@ -318,10 +318,10 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
int error;
f = fdget_pos(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = iterate_dir(f.file, &buf.ctx);
+ error = iterate_dir(fd_file(f), &buf.ctx);
if (error >= 0)
error = buf.error;
if (buf.prev_reclen) {
@@ -401,10 +401,10 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
int error;
f = fdget_pos(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = iterate_dir(f.file, &buf.ctx);
+ error = iterate_dir(fd_file(f), &buf.ctx);
if (error >= 0)
error = buf.error;
if (buf.prev_reclen) {
@@ -483,10 +483,10 @@ COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
.dirent = dirent
};
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = iterate_dir(f.file, &buf.ctx);
+ error = iterate_dir(fd_file(f), &buf.ctx);
if (buf.result)
error = buf.result;
@@ -569,10 +569,10 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
int error;
f = fdget_pos(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = iterate_dir(f.file, &buf.ctx);
+ error = iterate_dir(fd_file(f), &buf.ctx);
if (error >= 0)
error = buf.error;
if (buf.prev_reclen) {
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 28246dfc8485..4403d5c68fcb 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -537,7 +537,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
for (i = 0, info = same->info; i < count; i++, info++) {
struct fd dst_fd = fdget(info->dest_fd);
- struct file *dst_file = dst_fd.file;
+ struct file *dst_file = fd_file(dst_fd);
if (!dst_file) {
info->status = -EBADF;
diff --git a/fs/select.c b/fs/select.c
index 9515c3fa1a03..97e1009dde00 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -532,10 +532,10 @@ static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec
continue;
mask = EPOLLNVAL;
f = fdget(i);
- if (f.file) {
+ if (fd_file(f)) {
wait_key_set(wait, in, out, bit,
busy_flag);
- mask = vfs_poll(f.file, wait);
+ mask = vfs_poll(fd_file(f), wait);
fdput(f);
}
@@ -864,13 +864,13 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
goto out;
mask = EPOLLNVAL;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
goto out;
/* userland u16 ->events contains POLL... bitmap */
filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
pwait->_key = filter | busy_flag;
- mask = vfs_poll(f.file, pwait);
+ mask = vfs_poll(fd_file(f), pwait);
if (mask & busy_flag)
*can_busy_poll = true;
mask &= filter; /* Mask out unneeded events. */
diff --git a/fs/signalfd.c b/fs/signalfd.c
index ec7b2da2477a..777e889ab0e8 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -289,10 +289,10 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
fd_install(ufd, file);
} else {
struct fd f = fdget(ufd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- ctx = f.file->private_data;
- if (f.file->f_op != &signalfd_fops) {
+ ctx = fd_file(f)->private_data;
+ if (fd_file(f)->f_op != &signalfd_fops) {
fdput(f);
return -EINVAL;
}
diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index 855ac5a62edf..94bf2e5014d9 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -90,23 +90,23 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
}
src_file = fdget(srcfd);
- if (!src_file.file) {
+ if (!fd_file(src_file)) {
rc = -EBADF;
goto out_drop_write;
}
- if (src_file.file->f_op->unlocked_ioctl != cifs_ioctl) {
+ if (fd_file(src_file)->f_op->unlocked_ioctl != cifs_ioctl) {
rc = -EBADF;
cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
goto out_fput;
}
- src_inode = file_inode(src_file.file);
+ src_inode = file_inode(fd_file(src_file));
rc = -EINVAL;
if (S_ISDIR(src_inode->i_mode))
goto out_fput;
- rc = cifs_file_copychunk_range(xid, src_file.file, 0, dst_file, 0,
+ rc = cifs_file_copychunk_range(xid, fd_file(src_file), 0, dst_file, 0,
src_inode->i_size, 0);
if (rc > 0)
rc = 0;
diff --git a/fs/splice.c b/fs/splice.c
index 60aed8de21f8..06232d7e505f 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1566,11 +1566,11 @@ static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
static int vmsplice_type(struct fd f, int *type)
{
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (f.file->f_mode & FMODE_WRITE) {
+ if (fd_file(f)->f_mode & FMODE_WRITE) {
*type = ITER_SOURCE;
- } else if (f.file->f_mode & FMODE_READ) {
+ } else if (fd_file(f)->f_mode & FMODE_READ) {
*type = ITER_DEST;
} else {
fdput(f);
@@ -1621,9 +1621,9 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
if (!iov_iter_count(&iter))
error = 0;
else if (type == ITER_SOURCE)
- error = vmsplice_to_pipe(f.file, &iter, flags);
+ error = vmsplice_to_pipe(fd_file(f), &iter, flags);
else
- error = vmsplice_to_user(f.file, &iter, flags);
+ error = vmsplice_to_user(fd_file(f), &iter, flags);
kfree(iov);
out_fdput:
@@ -1646,10 +1646,10 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
error = -EBADF;
in = fdget(fd_in);
- if (in.file) {
+ if (fd_file(in)) {
out = fdget(fd_out);
- if (out.file) {
- error = __do_splice(in.file, off_in, out.file, off_out,
+ if (fd_file(out)) {
+ error = __do_splice(fd_file(in), off_in, fd_file(out), off_out,
len, flags);
fdput(out);
}
@@ -2016,10 +2016,10 @@ SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
error = -EBADF;
in = fdget(fdin);
- if (in.file) {
+ if (fd_file(in)) {
out = fdget(fdout);
- if (out.file) {
- error = do_tee(in.file, out.file, len, flags);
+ if (fd_file(out)) {
+ error = do_tee(fd_file(in), fd_file(out), len, flags);
fdput(out);
}
fdput(in);
diff --git a/fs/stat.c b/fs/stat.c
index 89ce1be56310..41e598376d7e 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -224,9 +224,9 @@ int vfs_fstat(int fd, struct kstat *stat)
int error;
f = fdget_raw(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = vfs_getattr(&f.file->f_path, stat, STATX_BASIC_STATS, 0);
+ error = vfs_getattr(&fd_file(f)->f_path, stat, STATX_BASIC_STATS, 0);
fdput(f);
return error;
}
@@ -277,9 +277,9 @@ static int vfs_statx_fd(int fd, int flags, struct kstat *stat,
u32 request_mask)
{
CLASS(fd_raw, f)(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- return vfs_statx_path(&f.file->f_path, flags, stat, request_mask);
+ return vfs_statx_path(&fd_file(f)->f_path, flags, stat, request_mask);
}
/**
diff --git a/fs/statfs.c b/fs/statfs.c
index 96d1c3edf289..9c7bb27e7932 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -116,8 +116,8 @@ int fd_statfs(int fd, struct kstatfs *st)
{
struct fd f = fdget_raw(fd);
int error = -EBADF;
- if (f.file) {
- error = vfs_statfs(&f.file->f_path, st);
+ if (fd_file(f)) {
+ error = vfs_statfs(&fd_file(f)->f_path, st);
fdput(f);
}
return error;
diff --git a/fs/sync.c b/fs/sync.c
index dc725914e1ed..67df255eb189 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -152,15 +152,15 @@ SYSCALL_DEFINE1(syncfs, int, fd)
struct super_block *sb;
int ret, ret2;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- sb = f.file->f_path.dentry->d_sb;
+ sb = fd_file(f)->f_path.dentry->d_sb;
down_read(&sb->s_umount);
ret = sync_filesystem(sb);
up_read(&sb->s_umount);
- ret2 = errseq_check_and_advance(&sb->s_wb_err, &f.file->f_sb_err);
+ ret2 = errseq_check_and_advance(&sb->s_wb_err, &fd_file(f)->f_sb_err);
fdput(f);
return ret ? ret : ret2;
@@ -208,8 +208,8 @@ static int do_fsync(unsigned int fd, int datasync)
struct fd f = fdget(fd);
int ret = -EBADF;
- if (f.file) {
- ret = vfs_fsync(f.file, datasync);
+ if (fd_file(f)) {
+ ret = vfs_fsync(fd_file(f), datasync);
fdput(f);
}
return ret;
@@ -360,8 +360,8 @@ int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
ret = -EBADF;
f = fdget(fd);
- if (f.file)
- ret = sync_file_range(f.file, offset, nbytes, flags);
+ if (fd_file(f))
+ ret = sync_file_range(fd_file(f), offset, nbytes, flags);
fdput(f);
return ret;
diff --git a/fs/timerfd.c b/fs/timerfd.c
index 4bf2f8bfec11..137523e0bb21 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -397,9 +397,9 @@ static const struct file_operations timerfd_fops = {
static int timerfd_fget(int fd, struct fd *p)
{
struct fd f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (f.file->f_op != &timerfd_fops) {
+ if (fd_file(f)->f_op != &timerfd_fops) {
fdput(f);
return -EINVAL;
}
@@ -482,7 +482,7 @@ static int do_timerfd_settime(int ufd, int flags,
ret = timerfd_fget(ufd, &f);
if (ret)
return ret;
- ctx = f.file->private_data;
+ ctx = fd_file(f)->private_data;
if (isalarm(ctx) && !capable(CAP_WAKE_ALARM)) {
fdput(f);
@@ -546,7 +546,7 @@ static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
int ret = timerfd_fget(ufd, &f);
if (ret)
return ret;
- ctx = f.file->private_data;
+ ctx = fd_file(f)->private_data;
spin_lock_irq(&ctx->wqh.lock);
if (ctx->expired && ctx->tintv) {
diff --git a/fs/utimes.c b/fs/utimes.c
index 3701b3946f88..99b26f792b89 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -115,9 +115,9 @@ static int do_utimes_fd(int fd, struct timespec64 *times, int flags)
return -EINVAL;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- error = vfs_utimes(&f.file->f_path, times);
+ error = vfs_utimes(&fd_file(f)->f_path, times);
fdput(f);
return error;
}
diff --git a/fs/xattr.c b/fs/xattr.c
index 7672ce5486c5..05ec7e7d9e87 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -697,19 +697,19 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
int error;
CLASS(fd, f)(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- audit_file(f.file);
+ audit_file(fd_file(f));
error = setxattr_copy(name, &ctx);
if (error)
return error;
- error = mnt_want_write_file(f.file);
+ error = mnt_want_write_file(fd_file(f));
if (!error) {
- error = do_setxattr(file_mnt_idmap(f.file),
- f.file->f_path.dentry, &ctx);
- mnt_drop_write_file(f.file);
+ error = do_setxattr(file_mnt_idmap(fd_file(f)),
+ fd_file(f)->f_path.dentry, &ctx);
+ mnt_drop_write_file(fd_file(f));
}
kvfree(ctx.kvalue);
return error;
@@ -812,10 +812,10 @@ SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
struct fd f = fdget(fd);
ssize_t error = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
return error;
- audit_file(f.file);
- error = getxattr(file_mnt_idmap(f.file), f.file->f_path.dentry,
+ audit_file(fd_file(f));
+ error = getxattr(file_mnt_idmap(fd_file(f)), fd_file(f)->f_path.dentry,
name, value, size);
fdput(f);
return error;
@@ -888,10 +888,10 @@ SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
struct fd f = fdget(fd);
ssize_t error = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
return error;
- audit_file(f.file);
- error = listxattr(f.file->f_path.dentry, list, size);
+ audit_file(fd_file(f));
+ error = listxattr(fd_file(f)->f_path.dentry, list, size);
fdput(f);
return error;
}
@@ -954,9 +954,9 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
char kname[XATTR_NAME_MAX + 1];
int error = -EBADF;
- if (!f.file)
+ if (!fd_file(f))
return error;
- audit_file(f.file);
+ audit_file(fd_file(f));
error = strncpy_from_user(kname, name, sizeof(kname));
if (error == 0 || error == sizeof(kname))
@@ -964,11 +964,11 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
if (error < 0)
return error;
- error = mnt_want_write_file(f.file);
+ error = mnt_want_write_file(fd_file(f));
if (!error) {
- error = removexattr(file_mnt_idmap(f.file),
- f.file->f_path.dentry, kname);
- mnt_drop_write_file(f.file);
+ error = removexattr(file_mnt_idmap(fd_file(f)),
+ fd_file(f)->f_path.dentry, kname);
+ mnt_drop_write_file(fd_file(f));
}
fdput(f);
return error;
diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
index c8a655c92c92..9790e0f45d14 100644
--- a/fs/xfs/xfs_exchrange.c
+++ b/fs/xfs/xfs_exchrange.c
@@ -794,9 +794,9 @@ xfs_ioc_exchange_range(
fxr.flags = args.flags;
file1 = fdget(args.file1_fd);
- if (!file1.file)
+ if (!fd_file(file1))
return -EBADF;
- fxr.file1 = file1.file;
+ fxr.file1 = fd_file(file1);
error = xfs_exchange_range(&fxr);
fdput(file1);
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index cf5acbd3c7ca..7bcc4f519cb8 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -92,9 +92,9 @@ xfs_find_handle(
if (cmd == XFS_IOC_FD_TO_HANDLE) {
f = fdget(hreq->fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- inode = file_inode(f.file);
+ inode = file_inode(fd_file(f));
} else {
error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
if (error)
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 4e933db75b12..95641817370b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1005,33 +1005,33 @@ xfs_ioc_swapext(
/* Pull information for the target fd */
f = fdget((int)sxp->sx_fdtarget);
- if (!f.file) {
+ if (!fd_file(f)) {
error = -EINVAL;
goto out;
}
- if (!(f.file->f_mode & FMODE_WRITE) ||
- !(f.file->f_mode & FMODE_READ) ||
- (f.file->f_flags & O_APPEND)) {
+ if (!(fd_file(f)->f_mode & FMODE_WRITE) ||
+ !(fd_file(f)->f_mode & FMODE_READ) ||
+ (fd_file(f)->f_flags & O_APPEND)) {
error = -EBADF;
goto out_put_file;
}
tmp = fdget((int)sxp->sx_fdtmp);
- if (!tmp.file) {
+ if (!fd_file(tmp)) {
error = -EINVAL;
goto out_put_file;
}
- if (!(tmp.file->f_mode & FMODE_WRITE) ||
- !(tmp.file->f_mode & FMODE_READ) ||
- (tmp.file->f_flags & O_APPEND)) {
+ if (!(fd_file(tmp)->f_mode & FMODE_WRITE) ||
+ !(fd_file(tmp)->f_mode & FMODE_READ) ||
+ (fd_file(tmp)->f_flags & O_APPEND)) {
error = -EBADF;
goto out_put_tmp_file;
}
- if (IS_SWAPFILE(file_inode(f.file)) ||
- IS_SWAPFILE(file_inode(tmp.file))) {
+ if (IS_SWAPFILE(file_inode(fd_file(f))) ||
+ IS_SWAPFILE(file_inode(fd_file(tmp)))) {
error = -EINVAL;
goto out_put_tmp_file;
}
@@ -1041,14 +1041,14 @@ xfs_ioc_swapext(
* before we cast and access them as XFS structures as we have no
* control over what the user passes us here.
*/
- if (f.file->f_op != &xfs_file_operations ||
- tmp.file->f_op != &xfs_file_operations) {
+ if (fd_file(f)->f_op != &xfs_file_operations ||
+ fd_file(tmp)->f_op != &xfs_file_operations) {
error = -EINVAL;
goto out_put_tmp_file;
}
- ip = XFS_I(file_inode(f.file));
- tip = XFS_I(file_inode(tmp.file));
+ ip = XFS_I(file_inode(fd_file(f)));
+ tip = XFS_I(file_inode(fd_file(tmp)));
if (ip->i_mount != tip->i_mount) {
error = -EINVAL;
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index d9e613803df1..a3d3e888cf1f 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -98,7 +98,7 @@ const volatile void * __must_check_fn(const volatile void *val)
* DEFINE_CLASS(fdget, struct fd, fdput(_T), fdget(fd), int fd)
*
* CLASS(fdget, f)(fd);
- * if (!f.file)
+ * if (!fd_file(f))
* return -EBADF;
*
* // use 'f' without concern
diff --git a/include/linux/file.h b/include/linux/file.h
index 237931f20739..0f3f369f2450 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -42,10 +42,12 @@ struct fd {
#define FDPUT_FPUT 1
#define FDPUT_POS_UNLOCK 2
+#define fd_file(f) ((f).file)
+
static inline void fdput(struct fd fd)
{
if (fd.flags & FDPUT_FPUT)
- fput(fd.file);
+ fput(fd_file(fd));
}
extern struct file *fget(unsigned int fd);
@@ -79,7 +81,7 @@ static inline struct fd fdget_pos(int fd)
static inline void fdput_pos(struct fd f)
{
if (f.flags & FDPUT_POS_UNLOCK)
- __f_unlock_pos(f.file);
+ __f_unlock_pos(fd_file(f));
fdput(f);
}
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index b3722e5275e7..ffa7d341bd95 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -108,14 +108,14 @@ static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
struct fd f;
f = fdget(p->wq_fd);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-ENXIO);
- if (!io_is_uring_fops(f.file)) {
+ if (!io_is_uring_fops(fd_file(f))) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- ctx_attach = f.file->private_data;
+ ctx_attach = fd_file(f)->private_data;
sqd = ctx_attach->sq_data;
if (!sqd) {
fdput(f);
@@ -418,9 +418,9 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
struct fd f;
f = fdget(p->wq_fd);
- if (!f.file)
+ if (!fd_file(f))
return -ENXIO;
- if (!io_is_uring_fops(f.file)) {
+ if (!io_is_uring_fops(fd_file(f))) {
fdput(f);
return -EINVAL;
}
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index a7cbd69efbef..34fa0bd8bb11 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1085,20 +1085,20 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
audit_mq_sendrecv(mqdes, msg_len, msg_prio, ts);
f = fdget(mqdes);
- if (unlikely(!f.file)) {
+ if (unlikely(!fd_file(f))) {
ret = -EBADF;
goto out;
}
- inode = file_inode(f.file);
- if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+ inode = file_inode(fd_file(f));
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
ret = -EBADF;
goto out_fput;
}
info = MQUEUE_I(inode);
- audit_file(f.file);
+ audit_file(fd_file(f));
- if (unlikely(!(f.file->f_mode & FMODE_WRITE))) {
+ if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE))) {
ret = -EBADF;
goto out_fput;
}
@@ -1138,7 +1138,7 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
}
if (info->attr.mq_curmsgs == info->attr.mq_maxmsg) {
- if (f.file->f_flags & O_NONBLOCK) {
+ if (fd_file(f)->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
} else {
wait.task = current;
@@ -1199,20 +1199,20 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
audit_mq_sendrecv(mqdes, msg_len, 0, ts);
f = fdget(mqdes);
- if (unlikely(!f.file)) {
+ if (unlikely(!fd_file(f))) {
ret = -EBADF;
goto out;
}
- inode = file_inode(f.file);
- if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+ inode = file_inode(fd_file(f));
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
ret = -EBADF;
goto out_fput;
}
info = MQUEUE_I(inode);
- audit_file(f.file);
+ audit_file(fd_file(f));
- if (unlikely(!(f.file->f_mode & FMODE_READ))) {
+ if (unlikely(!(fd_file(f)->f_mode & FMODE_READ))) {
ret = -EBADF;
goto out_fput;
}
@@ -1242,7 +1242,7 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
}
if (info->attr.mq_curmsgs == 0) {
- if (f.file->f_flags & O_NONBLOCK) {
+ if (fd_file(f)->f_flags & O_NONBLOCK) {
spin_unlock(&info->lock);
ret = -EAGAIN;
} else {
@@ -1356,11 +1356,11 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
/* and attach it to the socket */
retry:
f = fdget(notification->sigev_signo);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto out;
}
- sock = netlink_getsockbyfilp(f.file);
+ sock = netlink_getsockbyfilp(fd_file(f));
fdput(f);
if (IS_ERR(sock)) {
ret = PTR_ERR(sock);
@@ -1379,13 +1379,13 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
}
f = fdget(mqdes);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto out;
}
- inode = file_inode(f.file);
- if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+ inode = file_inode(fd_file(f));
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
ret = -EBADF;
goto out_fput;
}
@@ -1460,31 +1460,31 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
return -EINVAL;
f = fdget(mqdes);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (unlikely(f.file->f_op != &mqueue_file_operations)) {
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
fdput(f);
return -EBADF;
}
- inode = file_inode(f.file);
+ inode = file_inode(fd_file(f));
info = MQUEUE_I(inode);
spin_lock(&info->lock);
if (old) {
*old = info->attr;
- old->mq_flags = f.file->f_flags & O_NONBLOCK;
+ old->mq_flags = fd_file(f)->f_flags & O_NONBLOCK;
}
if (new) {
audit_mq_getsetattr(mqdes, new);
- spin_lock(&f.file->f_lock);
+ spin_lock(&fd_file(f)->f_lock);
if (new->mq_flags & O_NONBLOCK)
- f.file->f_flags |= O_NONBLOCK;
+ fd_file(f)->f_flags |= O_NONBLOCK;
else
- f.file->f_flags &= ~O_NONBLOCK;
- spin_unlock(&f.file->f_lock);
+ fd_file(f)->f_flags &= ~O_NONBLOCK;
+ spin_unlock(&fd_file(f)->f_lock);
inode_set_atime_to_ts(inode, inode_set_ctime_current(inode));
}
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index b0ef45db207c..0a79aee6523d 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -80,10 +80,10 @@ static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
struct bpf_local_storage_data *sdata;
struct fd f = fdget_raw(*(int *)key);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- sdata = inode_storage_lookup(file_inode(f.file), map, true);
+ sdata = inode_storage_lookup(file_inode(fd_file(f)), map, true);
fdput(f);
return sdata ? sdata->data : NULL;
}
@@ -94,14 +94,14 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
struct bpf_local_storage_data *sdata;
struct fd f = fdget_raw(*(int *)key);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (!inode_storage_ptr(file_inode(f.file))) {
+ if (!inode_storage_ptr(file_inode(fd_file(f)))) {
fdput(f);
return -EBADF;
}
- sdata = bpf_local_storage_update(file_inode(f.file),
+ sdata = bpf_local_storage_update(file_inode(fd_file(f)),
(struct bpf_local_storage_map *)map,
value, map_flags, GFP_ATOMIC);
fdput(f);
@@ -126,10 +126,10 @@ static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
struct fd f = fdget_raw(*(int *)key);
int err;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- err = inode_storage_delete(file_inode(f.file), map);
+ err = inode_storage_delete(file_inode(fd_file(f)), map);
fdput(f);
return err;
}
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 520f49f422fe..4de1e3dc2284 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7677,15 +7677,15 @@ struct btf *btf_get_by_fd(int fd)
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (f.file->f_op != &btf_fops) {
+ if (fd_file(f)->f_op != &btf_fops) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- btf = f.file->private_data;
+ btf = fd_file(f)->private_data;
refcount_inc(&btf->refcnt);
fdput(f);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bf6c5f685ea2..3093bf2cc266 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -829,7 +829,7 @@ static int bpf_map_release(struct inode *inode, struct file *filp)
static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f)
{
- fmode_t mode = f.file->f_mode;
+ fmode_t mode = fd_file(f)->f_mode;
/* Our file permissions may have been overridden by global
* map permissions facing syscall side.
@@ -1423,14 +1423,14 @@ static int map_create(union bpf_attr *attr)
*/
struct bpf_map *__bpf_map_get(struct fd f)
{
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (f.file->f_op != &bpf_map_fops) {
+ if (fd_file(f)->f_op != &bpf_map_fops) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- return f.file->private_data;
+ return fd_file(f)->private_data;
}
void bpf_map_inc(struct bpf_map *map)
@@ -1651,7 +1651,7 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
goto free_key;
}
- err = bpf_map_update_value(map, f.file, key, value, attr->flags);
+ err = bpf_map_update_value(map, fd_file(f), key, value, attr->flags);
if (!err)
maybe_wait_bpf_programs(map);
@@ -2409,14 +2409,14 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
static struct bpf_prog *____bpf_prog_get(struct fd f)
{
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (f.file->f_op != &bpf_prog_fops) {
+ if (fd_file(f)->f_op != &bpf_prog_fops) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- return f.file->private_data;
+ return fd_file(f)->private_data;
}
void bpf_prog_add(struct bpf_prog *prog, int i)
@@ -3259,14 +3259,14 @@ struct bpf_link *bpf_link_get_from_fd(u32 ufd)
struct fd f = fdget(ufd);
struct bpf_link *link;
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (f.file->f_op != &bpf_link_fops && f.file->f_op != &bpf_link_fops_poll) {
+ if (fd_file(f)->f_op != &bpf_link_fops && fd_file(f)->f_op != &bpf_link_fops_poll) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- link = f.file->private_data;
+ link = fd_file(f)->private_data;
bpf_link_inc(link);
fdput(f);
@@ -4982,19 +4982,19 @@ static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
return -EINVAL;
f = fdget(ufd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADFD;
- if (f.file->f_op == &bpf_prog_fops)
- err = bpf_prog_get_info_by_fd(f.file, f.file->private_data, attr,
+ if (fd_file(f)->f_op == &bpf_prog_fops)
+ err = bpf_prog_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
uattr);
- else if (f.file->f_op == &bpf_map_fops)
- err = bpf_map_get_info_by_fd(f.file, f.file->private_data, attr,
+ else if (fd_file(f)->f_op == &bpf_map_fops)
+ err = bpf_map_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
uattr);
- else if (f.file->f_op == &btf_fops)
- err = bpf_btf_get_info_by_fd(f.file, f.file->private_data, attr, uattr);
- else if (f.file->f_op == &bpf_link_fops || f.file->f_op == &bpf_link_fops_poll)
- err = bpf_link_get_info_by_fd(f.file, f.file->private_data,
+ else if (fd_file(f)->f_op == &btf_fops)
+ err = bpf_btf_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr, uattr);
+ else if (fd_file(f)->f_op == &bpf_link_fops || fd_file(f)->f_op == &bpf_link_fops_poll)
+ err = bpf_link_get_info_by_fd(fd_file(f), fd_file(f)->private_data,
attr, uattr);
else
err = -EINVAL;
@@ -5215,7 +5215,7 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
else if (cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH)
BPF_DO_BATCH(map->ops->map_lookup_and_delete_batch, map, attr, uattr);
else if (cmd == BPF_MAP_UPDATE_BATCH)
- BPF_DO_BATCH(map->ops->map_update_batch, map, f.file, attr, uattr);
+ BPF_DO_BATCH(map->ops->map_update_batch, map, fd_file(f), attr, uattr);
else
BPF_DO_BATCH(map->ops->map_delete_batch, map, attr, uattr);
err_put:
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index d6ccf8d00eab..9a1d356e79ed 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -122,10 +122,10 @@ int bpf_token_create(union bpf_attr *attr)
int err, fd;
f = fdget(attr->token_create.bpffs_fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- path = f.file->f_path;
+ path = fd_file(f)->f_path;
path_get(&path);
fdput(f);
@@ -235,14 +235,14 @@ struct bpf_token *bpf_token_get_from_fd(u32 ufd)
struct fd f = fdget(ufd);
struct bpf_token *token;
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (f.file->f_op != &bpf_token_fops) {
+ if (fd_file(f)->f_op != &bpf_token_fops) {
fdput(f);
return ERR_PTR(-EINVAL);
}
- token = f.file->private_data;
+ token = fd_file(f)->private_data;
bpf_token_inc(token);
fdput(f);
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index c8e4b62b436a..b96489277f70 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6901,10 +6901,10 @@ struct cgroup *cgroup_v1v2_get_from_fd(int fd)
{
struct cgroup *cgrp;
struct fd f = fdget_raw(fd);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- cgrp = cgroup_v1v2_get_from_file(f.file);
+ cgrp = cgroup_v1v2_get_from_file(fd_file(f));
fdput(f);
return cgrp;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index aa3450bdc227..17b19d3e74ba 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -933,10 +933,10 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
struct fd f = fdget(fd);
int ret = 0;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- css = css_tryget_online_from_dir(f.file->f_path.dentry,
+ css = css_tryget_online_from_dir(fd_file(f)->f_path.dentry,
&perf_event_cgrp_subsys);
if (IS_ERR(css)) {
ret = PTR_ERR(css);
@@ -5898,10 +5898,10 @@ static const struct file_operations perf_fops;
static inline int perf_fget_light(int fd, struct fd *p)
{
struct fd f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (f.file->f_op != &perf_fops) {
+ if (fd_file(f)->f_op != &perf_fops) {
fdput(f);
return -EBADF;
}
@@ -5961,7 +5961,7 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
ret = perf_fget_light(arg, &output);
if (ret)
return ret;
- output_event = output.file->private_data;
+ output_event = fd_file(output)->private_data;
ret = perf_event_set_output(event, output_event);
fdput(output);
} else {
@@ -12549,7 +12549,7 @@ SYSCALL_DEFINE5(perf_event_open,
err = perf_fget_light(group_fd, &group);
if (err)
goto err_fd;
- group_leader = group.file->private_data;
+ group_leader = fd_file(group)->private_data;
if (flags & PERF_FLAG_FD_OUTPUT)
output_event = group_leader;
if (flags & PERF_FLAG_FD_NO_GROUP)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index d9592195c5bb..6ed334eecc14 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3211,7 +3211,7 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
return -EINVAL;
f = fdget(fd);
- err = idempotent_init_module(f.file, uargs, flags);
+ err = idempotent_init_module(fd_file(f), uargs, flags);
fdput(f);
return err;
}
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index 6ec3deec68c2..dc952c3b05af 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -550,15 +550,15 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
struct nsset nsset = {};
int err = 0;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- if (proc_ns_file(f.file)) {
- ns = get_proc_ns(file_inode(f.file));
+ if (proc_ns_file(fd_file(f))) {
+ ns = get_proc_ns(file_inode(fd_file(f)));
if (flags && (ns->ops->type != flags))
err = -EINVAL;
flags = ns->ops->type;
- } else if (!IS_ERR(pidfd_pid(f.file))) {
+ } else if (!IS_ERR(pidfd_pid(fd_file(f)))) {
err = check_setns_flags(flags);
} else {
err = -EINVAL;
@@ -570,10 +570,10 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
if (err)
goto out;
- if (proc_ns_file(f.file))
+ if (proc_ns_file(fd_file(f)))
err = validate_ns(&nsset, ns);
else
- err = validate_nsset(&nsset, pidfd_pid(f.file));
+ err = validate_nsset(&nsset, pidfd_pid(fd_file(f)));
if (!err) {
commit_nsset(&nsset);
perf_event_namespaces(current);
diff --git a/kernel/pid.c b/kernel/pid.c
index da76ed1873f7..2715afb77eab 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -540,13 +540,13 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
struct pid *pid;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- pid = pidfd_pid(f.file);
+ pid = pidfd_pid(fd_file(f));
if (!IS_ERR(pid)) {
get_pid(pid);
- *flags = f.file->f_flags;
+ *flags = fd_file(f)->f_flags;
}
fdput(f);
@@ -755,10 +755,10 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
return -EINVAL;
f = fdget(pidfd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- pid = pidfd_pid(f.file);
+ pid = pidfd_pid(fd_file(f));
if (IS_ERR(pid))
ret = PTR_ERR(pid);
else
diff --git a/kernel/signal.c b/kernel/signal.c
index 60c737e423a1..cc5d87cfa7c0 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3922,11 +3922,11 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
return -EINVAL;
f = fdget(pidfd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
/* Is this a pidfd? */
- pid = pidfd_to_pid(f.file);
+ pid = pidfd_to_pid(fd_file(f));
if (IS_ERR(pid)) {
ret = PTR_ERR(pid);
goto err;
@@ -3939,7 +3939,7 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
switch (flags) {
case 0:
/* Infer scope from the type of pidfd. */
- if (f.file->f_flags & PIDFD_THREAD)
+ if (fd_file(f)->f_flags & PIDFD_THREAD)
type = PIDTYPE_PID;
else
type = PIDTYPE_TGID;
diff --git a/kernel/sys.c b/kernel/sys.c
index 3a2df1bd9f64..a4be1e568ff5 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1916,10 +1916,10 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
int err;
exe = fdget(fd);
- if (!exe.file)
+ if (!fd_file(exe))
return -EBADF;
- inode = file_inode(exe.file);
+ inode = file_inode(fd_file(exe));
/*
* Because the original mm->exe_file points to executable file, make
@@ -1927,14 +1927,14 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
* overall picture.
*/
err = -EACCES;
- if (!S_ISREG(inode->i_mode) || path_noexec(&exe.file->f_path))
+ if (!S_ISREG(inode->i_mode) || path_noexec(&fd_file(exe)->f_path))
goto exit;
- err = file_permission(exe.file, MAY_EXEC);
+ err = file_permission(fd_file(exe), MAY_EXEC);
if (err)
goto exit;
- err = replace_mm_exe_file(mm, exe.file);
+ err = replace_mm_exe_file(mm, fd_file(exe));
exit:
fdput(exe);
return err;
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 4354ea231fab..0700f40c53ac 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -419,7 +419,7 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
fd = nla_get_u32(info->attrs[CGROUPSTATS_CMD_ATTR_FD]);
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return 0;
size = nla_total_size(sizeof(struct cgroupstats));
@@ -440,7 +440,7 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
stats = nla_data(na);
memset(stats, 0, sizeof(*stats));
- rc = cgroupstats_build(stats, f.file->f_path.dentry);
+ rc = cgroupstats_build(stats, fd_file(f)->f_path.dentry);
if (rc < 0) {
nlmsg_free(rep_skb);
goto err;
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index 03b90d7d2175..d36242fd4936 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -666,8 +666,8 @@ struct watch_queue *get_watch_queue(int fd)
struct fd f;
f = fdget(fd);
- if (f.file) {
- pipe = get_pipe_info(f.file, false);
+ if (fd_file(f)) {
+ pipe = get_pipe_info(fd_file(f), false);
if (pipe && pipe->watch_queue) {
wqueue = pipe->watch_queue;
kref_get(&wqueue->usage);
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 6c39d42f16dc..532dee205c6e 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -193,10 +193,10 @@ int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
struct fd f = fdget(fd);
int ret;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
- ret = vfs_fadvise(f.file, offset, len, advice);
+ ret = vfs_fadvise(fd_file(f), offset, len, advice);
fdput(f);
return ret;
diff --git a/mm/filemap.c b/mm/filemap.c
index d62150418b91..0b5cbd644fdd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4393,7 +4393,7 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
struct cachestat cs;
pgoff_t first_index, last_index;
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
if (copy_from_user(&csr, cstat_range,
@@ -4403,7 +4403,7 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
}
/* hugetlbfs is not supported */
- if (is_file_hugepages(f.file)) {
+ if (is_file_hugepages(fd_file(f))) {
fdput(f);
return -EOPNOTSUPP;
}
@@ -4417,7 +4417,7 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
last_index =
csr.len == 0 ? ULONG_MAX : (csr.off + csr.len - 1) >> PAGE_SHIFT;
memset(&cs, 0, sizeof(struct cachestat));
- mapping = f.file->f_mapping;
+ mapping = fd_file(f)->f_mapping;
filemap_cachestat(mapping, first_index, last_index, &cs);
fdput(f);
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 417c96f2da28..9725c731fb21 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1860,26 +1860,26 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
INIT_WORK(&event->remove, memcg_event_remove);
efile = fdget(efd);
- if (!efile.file) {
+ if (!fd_file(efile)) {
ret = -EBADF;
goto out_kfree;
}
- event->eventfd = eventfd_ctx_fileget(efile.file);
+ event->eventfd = eventfd_ctx_fileget(fd_file(efile));
if (IS_ERR(event->eventfd)) {
ret = PTR_ERR(event->eventfd);
goto out_put_efile;
}
cfile = fdget(cfd);
- if (!cfile.file) {
+ if (!fd_file(cfile)) {
ret = -EBADF;
goto out_put_eventfd;
}
/* the process need read permission on control file */
/* AV: shouldn't we check that it's been opened for read instead? */
- ret = file_permission(cfile.file, MAY_READ);
+ ret = file_permission(fd_file(cfile), MAY_READ);
if (ret < 0)
goto out_put_cfile;
@@ -1887,7 +1887,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
* The control file must be a regular cgroup1 file. As a regular cgroup
* file can't be renamed, it's safe to access its name afterwards.
*/
- cdentry = cfile.file->f_path.dentry;
+ cdentry = fd_file(cfile)->f_path.dentry;
if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) {
ret = -EINVAL;
goto out_put_cfile;
@@ -1939,7 +1939,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
if (ret)
goto out_put_css;
- vfs_poll(efile.file, &event->pt);
+ vfs_poll(fd_file(efile), &event->pt);
spin_lock_irq(&memcg->event_list_lock);
list_add(&event->list, &memcg->event_list);
diff --git a/mm/readahead.c b/mm/readahead.c
index 517c0be7ce66..e83fe1c6e5ac 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -646,7 +646,7 @@ ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
ret = -EBADF;
f = fdget(fd);
- if (!f.file || !(f.file->f_mode & FMODE_READ))
+ if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
goto out;
/*
@@ -655,12 +655,12 @@ ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
* on this file, then we must return -EINVAL.
*/
ret = -EINVAL;
- if (!f.file->f_mapping || !f.file->f_mapping->a_ops ||
- (!S_ISREG(file_inode(f.file)->i_mode) &&
- !S_ISBLK(file_inode(f.file)->i_mode)))
+ if (!fd_file(f)->f_mapping || !fd_file(f)->f_mapping->a_ops ||
+ (!S_ISREG(file_inode(fd_file(f))->i_mode) &&
+ !S_ISBLK(file_inode(fd_file(f))->i_mode)))
goto out;
- ret = vfs_fadvise(f.file, offset, count, POSIX_FADV_WILLNEED);
+ ret = vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
out:
fdput(f);
return ret;
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 6a823ba906c6..18b7c8f31234 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -711,11 +711,11 @@ struct net *get_net_ns_by_fd(int fd)
struct fd f = fdget(fd);
struct net *net = ERR_PTR(-EINVAL);
- if (!f.file)
+ if (!fd_file(f))
return ERR_PTR(-EBADF);
- if (proc_ns_file(f.file)) {
- struct ns_common *ns = get_proc_ns(file_inode(f.file));
+ if (proc_ns_file(fd_file(f))) {
+ struct ns_common *ns = get_proc_ns(file_inode(fd_file(f)));
if (ns->ops == &netns_operations)
net = get_net(container_of(ns, struct net, ns));
}
diff --git a/net/socket.c b/net/socket.c
index fcbdd5bc47ac..f77a42a74510 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -556,8 +556,8 @@ static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
struct socket *sock;
*err = -EBADF;
- if (f.file) {
- sock = sock_from_file(f.file);
+ if (fd_file(f)) {
+ sock = sock_from_file(fd_file(f));
if (likely(sock)) {
*fput_needed = f.flags & FDPUT_FPUT;
return sock;
@@ -2008,8 +2008,8 @@ int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
struct fd f;
f = fdget(fd);
- if (f.file) {
- ret = __sys_accept4_file(f.file, upeer_sockaddr,
+ if (fd_file(f)) {
+ ret = __sys_accept4_file(fd_file(f), upeer_sockaddr,
upeer_addrlen, flags);
fdput(f);
}
@@ -2070,12 +2070,12 @@ int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
struct fd f;
f = fdget(fd);
- if (f.file) {
+ if (fd_file(f)) {
struct sockaddr_storage address;
ret = move_addr_to_kernel(uservaddr, addrlen, &address);
if (!ret)
- ret = __sys_connect_file(f.file, &address, addrlen, 0);
+ ret = __sys_connect_file(fd_file(f), &address, addrlen, 0);
fdput(f);
}
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index f04f43af651c..e7c1d3ae33fe 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1068,10 +1068,10 @@ void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
return;
f = fdget(kernel_fd);
- if (!f.file)
+ if (!fd_file(f))
return;
- process_buffer_measurement(file_mnt_idmap(f.file), file_inode(f.file),
+ process_buffer_measurement(file_mnt_idmap(fd_file(f)), file_inode(fd_file(f)),
buf, size, "kexec-cmdline", KEXEC_CMDLINE, 0,
NULL, false, NULL, 0);
fdput(f);
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index ccc8bc6c1584..00b63971ab64 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -238,19 +238,19 @@ static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
struct landlock_ruleset *ruleset;
ruleset_f = fdget(fd);
- if (!ruleset_f.file)
+ if (!fd_file(ruleset_f))
return ERR_PTR(-EBADF);
/* Checks FD type and access right. */
- if (ruleset_f.file->f_op != &ruleset_fops) {
+ if (fd_file(ruleset_f)->f_op != &ruleset_fops) {
ruleset = ERR_PTR(-EBADFD);
goto out_fdput;
}
- if (!(ruleset_f.file->f_mode & mode)) {
+ if (!(fd_file(ruleset_f)->f_mode & mode)) {
ruleset = ERR_PTR(-EPERM);
goto out_fdput;
}
- ruleset = ruleset_f.file->private_data;
+ ruleset = fd_file(ruleset_f)->private_data;
if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
ruleset = ERR_PTR(-EINVAL);
goto out_fdput;
@@ -277,22 +277,22 @@ static int get_path_from_fd(const s32 fd, struct path *const path)
/* Handles O_PATH. */
f = fdget_raw(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
/*
* Forbids ruleset FDs, internal filesystems (e.g. nsfs), including
* pseudo filesystems that will never be mountable (e.g. sockfs,
* pipefs).
*/
- if ((f.file->f_op == &ruleset_fops) ||
- (f.file->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
- (f.file->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
- d_is_negative(f.file->f_path.dentry) ||
- IS_PRIVATE(d_backing_inode(f.file->f_path.dentry))) {
+ if ((fd_file(f)->f_op == &ruleset_fops) ||
+ (fd_file(f)->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
+ (fd_file(f)->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
+ d_is_negative(fd_file(f)->f_path.dentry) ||
+ IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry))) {
err = -EBADFD;
goto out_fdput;
}
- *path = f.file->f_path;
+ *path = fd_file(f)->f_path;
path_get(path);
out_fdput:
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index 93fd4d47b334..02144ec39f43 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -296,7 +296,7 @@ static int read_trusted_verity_root_digests(unsigned int fd)
return -EPERM;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EINVAL;
data = kzalloc(SZ_4K, GFP_KERNEL);
@@ -305,7 +305,7 @@ static int read_trusted_verity_root_digests(unsigned int fd)
goto err;
}
- rc = kernel_read_file(f.file, 0, (void **)&data, SZ_4K - 1, NULL, READING_POLICY);
+ rc = kernel_read_file(fd_file(f), 0, (void **)&data, SZ_4K - 1, NULL, READING_POLICY);
if (rc < 0)
goto err;
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index 4057f9f10aee..cbb9c972cb93 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2250,12 +2250,12 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
bool nonatomic = substream->pcm->nonatomic;
CLASS(fd, f)(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADFD;
- if (!is_pcm_file(f.file))
+ if (!is_pcm_file(fd_file(f)))
return -EBADFD;
- pcm_file = f.file->private_data;
+ pcm_file = fd_file(f)->private_data;
substream1 = pcm_file->substream;
if (substream == substream1)
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 229570059a1b..65efb3735e79 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -327,12 +327,12 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
f = fdget(args->fd);
- if (!f.file) {
+ if (!fd_file(f)) {
ret = -EBADF;
goto out;
}
- eventfd = eventfd_ctx_fileget(f.file);
+ eventfd = eventfd_ctx_fileget(fd_file(f));
if (IS_ERR(eventfd)) {
ret = PTR_ERR(eventfd);
goto fail;
@@ -419,7 +419,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
* Check if there was an event already pending on the eventfd
* before we registered, and trigger it as if we didn't miss it.
*/
- events = vfs_poll(f.file, &irqfd->pt);
+ events = vfs_poll(fd_file(f), &irqfd->pt);
if (events & EPOLLIN)
schedule_work(&irqfd->inject);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 76b7f6085dcd..388ae471d258 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -194,7 +194,7 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
int ret;
f = fdget(fd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
ret = -ENOENT;
@@ -202,7 +202,7 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
mutex_lock(&kv->lock);
list_for_each_entry(kvf, &kv->file_list, node) {
- if (kvf->file != f.file)
+ if (kvf->file != fd_file(f))
continue;
list_del(&kvf->node);
@@ -240,7 +240,7 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
return -EFAULT;
f = fdget(param.groupfd);
- if (!f.file)
+ if (!fd_file(f))
return -EBADF;
ret = -ENOENT;
@@ -248,7 +248,7 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
mutex_lock(&kv->lock);
list_for_each_entry(kvf, &kv->file_list, node) {
- if (kvf->file != f.file)
+ if (kvf->file != fd_file(f))
continue;
if (!kvf->iommu_group) {
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 02/39] introduce fd_file(), convert all accessors to it.
2024-07-30 5:15 ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
@ 2024-08-07 9:55 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 9:55 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:48AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> For any changes of struct fd representation we need to
> turn existing accesses to fields into calls of wrappers.
> Accesses to struct fd::flags are very few (3 in linux/file.h,
> 1 in net/socket.c, 3 in fs/overlayfs/file.c and 3 more in
> explicit initializers).
> Those can be dealt with in the commit converting to
> new layout; accesses to struct fd::file are too many for that.
> This commit converts (almost) all of f.file to
> fd_file(f). It's not entirely mechanical ('file' is used as
> a member name more than just in struct fd) and it does not
> even attempt to distinguish the uses in pointer context from
> those in boolean context; the latter will be eventually turned
> into a separate helper (fd_empty()).
>
> NOTE: mass conversion to fd_empty(), tempting as it
> might be, is a bad idea; better do that piecewise in commit
> that convert from fdget...() to CLASS(...).
>
> [conflicts in fs/fhandle.c, kernel/bpf/syscall.c, mm/memcontrol.c
> caught by git; fs/stat.c one got caught by git grep]
> [fs/xattr.c conflict]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 03/39] struct fd: representation change
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
2024-07-30 5:15 ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
@ 2024-07-30 5:15 ` viro
2024-07-30 18:10 ` Josef Bacik
2024-08-07 10:03 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
` (36 subsequent siblings)
38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
The absolute majority of instances comes from fdget() and its
relatives; the underlying primitives actually return a struct file
reference and a couple of flags encoded into an unsigned long - the lower
two bits of file address are always zero, so we can stash the flags
into those. On the way out we use __to_fd() to unpack that unsigned
long into struct fd.
Let's use that representation for struct fd itself - make it
a structure with a single unsigned long member (.word), with the value
equal either to (unsigned long)p | flags, p being an address of some
struct file instance, or to 0 for an empty fd.
Note that we never used a struct fd instance with NULL ->file
and non-zero ->flags; the emptiness had been checked as (!f.file) and
we expected e.g. fdput(empty) to be a no-op. With new representation
we can use (!f.word) for emptiness check; that is enough for compiler
to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
will be a no-op in such case.
For now the new predicate (fd_empty(f)) has no users; all the
existing checks have form (!fd_file(f)). We will convert to fd_empty()
use later; here we only define it (and tell the compiler that it's
unlikely to return true).
This commit only deals with representation change; there will
be followups.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/infiniband/core/uverbs_cmd.c | 2 +-
fs/overlayfs/file.c | 28 +++++++++++++++-------------
fs/xfs/xfs_handle.c | 2 +-
include/linux/file.h | 22 ++++++++++++++++------
kernel/events/core.c | 2 +-
net/socket.c | 2 +-
6 files changed, 35 insertions(+), 23 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3f85575cf971..a4cce360df21 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -572,7 +572,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
struct inode *inode = NULL;
int new_xrcd = 0;
struct ib_device *ib_dev;
- struct fd f = {};
+ struct fd f = EMPTY_FD;
int ret;
ret = uverbs_request(attrs, &cmd, sizeof(cmd));
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index c4963d0c5549..2b7a5a3a7a2f 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -93,11 +93,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
bool allow_meta)
{
struct dentry *dentry = file_dentry(file);
+ struct file *realfile = file->private_data;
struct path realpath;
int err;
- real->flags = 0;
- real->file = file->private_data;
+ real->word = (unsigned long)realfile;
if (allow_meta) {
ovl_path_real(dentry, &realpath);
@@ -113,16 +113,17 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
return -EIO;
/* Has it been copied up since we'd opened it? */
- if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) {
- real->flags = FDPUT_FPUT;
- real->file = ovl_open_realfile(file, &realpath);
-
- return PTR_ERR_OR_ZERO(real->file);
+ if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
+ struct file *f = ovl_open_realfile(file, &realpath);
+ if (IS_ERR(f))
+ return PTR_ERR(f);
+ real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT;
+ return 0;
}
/* Did the flags change since open? */
- if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS))
- return ovl_change_flags(real->file, file->f_flags);
+ if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
+ return ovl_change_flags(realfile, file->f_flags);
return 0;
}
@@ -130,10 +131,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
static int ovl_real_fdget(const struct file *file, struct fd *real)
{
if (d_is_dir(file_dentry(file))) {
- real->flags = 0;
- real->file = ovl_dir_real_file(file, false);
-
- return PTR_ERR_OR_ZERO(real->file);
+ struct file *f = ovl_dir_real_file(file, false);
+ if (IS_ERR(f))
+ return PTR_ERR(f);
+ real->word = (unsigned long)f;
+ return 0;
}
return ovl_real_fdget_meta(file, real, false);
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index 7bcc4f519cb8..49e5e5f04e60 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -85,7 +85,7 @@ xfs_find_handle(
int hsize;
xfs_handle_t handle;
struct inode *inode;
- struct fd f = {NULL};
+ struct fd f = EMPTY_FD;
struct path path;
int error;
struct xfs_inode *ip;
diff --git a/include/linux/file.h b/include/linux/file.h
index 0f3f369f2450..bdd6e1766839 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -35,18 +35,28 @@ static inline void fput_light(struct file *file, int fput_needed)
fput(file);
}
+/* either a reference to struct file + flags
+ * (cloned vs. borrowed, pos locked), with
+ * flags stored in lower bits of value,
+ * or empty (represented by 0).
+ */
struct fd {
- struct file *file;
- unsigned int flags;
+ unsigned long word;
};
#define FDPUT_FPUT 1
#define FDPUT_POS_UNLOCK 2
-#define fd_file(f) ((f).file)
+#define fd_file(f) ((struct file *)((f).word & ~3))
+static inline bool fd_empty(struct fd f)
+{
+ return unlikely(!f.word);
+}
+
+#define EMPTY_FD (struct fd){0}
static inline void fdput(struct fd fd)
{
- if (fd.flags & FDPUT_FPUT)
+ if (fd.word & FDPUT_FPUT)
fput(fd_file(fd));
}
@@ -60,7 +70,7 @@ extern void __f_unlock_pos(struct file *);
static inline struct fd __to_fd(unsigned long v)
{
- return (struct fd){(struct file *)(v & ~3),v & 3};
+ return (struct fd){v};
}
static inline struct fd fdget(unsigned int fd)
@@ -80,7 +90,7 @@ static inline struct fd fdget_pos(int fd)
static inline void fdput_pos(struct fd f)
{
- if (f.flags & FDPUT_POS_UNLOCK)
+ if (f.word & FDPUT_POS_UNLOCK)
__f_unlock_pos(fd_file(f));
fdput(f);
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 17b19d3e74ba..fd2ac9c7fd77 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -12474,7 +12474,7 @@ SYSCALL_DEFINE5(perf_event_open,
struct perf_event_attr attr;
struct perf_event_context *ctx;
struct file *event_file = NULL;
- struct fd group = {NULL, 0};
+ struct fd group = EMPTY_FD;
struct task_struct *task = NULL;
struct pmu *pmu;
int event_fd;
diff --git a/net/socket.c b/net/socket.c
index f77a42a74510..c0d4f5032374 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -559,7 +559,7 @@ static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
if (fd_file(f)) {
sock = sock_from_file(fd_file(f));
if (likely(sock)) {
- *fput_needed = f.flags & FDPUT_FPUT;
+ *fput_needed = f.word & FDPUT_FPUT;
return sock;
}
*err = -ENOTSOCK;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 03/39] struct fd: representation change
2024-07-30 5:15 ` [PATCH 03/39] struct fd: representation change viro
@ 2024-07-30 18:10 ` Josef Bacik
2024-08-07 10:07 ` Christian Brauner
2024-08-07 10:03 ` Christian Brauner
1 sibling, 1 reply; 134+ messages in thread
From: Josef Bacik @ 2024-07-30 18:10 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue, Jul 30, 2024 at 01:15:49AM -0400, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> The absolute majority of instances comes from fdget() and its
> relatives; the underlying primitives actually return a struct file
> reference and a couple of flags encoded into an unsigned long - the lower
> two bits of file address are always zero, so we can stash the flags
> into those. On the way out we use __to_fd() to unpack that unsigned
> long into struct fd.
>
> Let's use that representation for struct fd itself - make it
> a structure with a single unsigned long member (.word), with the value
> equal either to (unsigned long)p | flags, p being an address of some
> struct file instance, or to 0 for an empty fd.
>
> Note that we never used a struct fd instance with NULL ->file
> and non-zero ->flags; the emptiness had been checked as (!f.file) and
> we expected e.g. fdput(empty) to be a no-op. With new representation
> we can use (!f.word) for emptiness check; that is enough for compiler
> to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
> will be a no-op in such case.
>
> For now the new predicate (fd_empty(f)) has no users; all the
> existing checks have form (!fd_file(f)). We will convert to fd_empty()
> use later; here we only define it (and tell the compiler that it's
> unlikely to return true).
>
> This commit only deals with representation change; there will
> be followups.
I'm still trawling through all of this code and trying to grok it, but one thing
I kept wondering was wtf would we do this ->word trick in the first place? It
seemed needlessly complicated and I don't love having structs with inprecise
members.
But then buried deep in your cover letter you have this
"""
It's not that hard to deal with - the real primitives behind fdget()
et.al. are returning an unsigned long value, unpacked by (inlined)
__to_fd() into the current struct file * + int. Linus suggested that
keeping that unsigned long around with the extractions done by inlined
accessors should generate a sane code and that turns out to be the case.
Turning struct fd into a struct-wrapped unsinged long, with
fd_empty(f) => unlikely(f.word == 0)
fd_file(f) => (struct file *)(f.word & ~3)
fdput(f) => if (f.word & 1) fput(fd_file(f))
ends up with compiler doing the right thing. The cost is the patch
footprint, of course - we need to switch f.file to fd_file(f) all over
the tree, and it's not doable with simple search and replace; there are
false positives, etc. Might become a PITA at merge time; however,
actual update of that from 6.10-rc1 to 6.11-rc1 had brought surprisingly
few conflicts.
"""
Which makes a whole lot of sense. The member name doesn't matter much since
we're now using helpers everywhere, but for an idiot like me having something
like this
struct fd {
unsigned long __file_ptr;
};
would make it so that you don't have random people deciding they can access
fd->file, and it helps make it more clear what exactly the point of this thing
is.
That being said I think renaming it really isn't the important part, I think
including something like the bit you wrote above in the commit message would
help when somebody is trying to figure out why this more obtuse strategy is used
rather than just having struct file *__file with the low bit masking stuff
tacked on the side.
I'm still working through all the patches, but in general the doc made sense,
and the patches thusfar are easy enough to follow. Thanks,
Josef
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 03/39] struct fd: representation change
2024-07-30 5:15 ` [PATCH 03/39] struct fd: representation change viro
2024-07-30 18:10 ` Josef Bacik
@ 2024-08-07 10:03 ` Christian Brauner
1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:03 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:49AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> The absolute majority of instances comes from fdget() and its
> relatives; the underlying primitives actually return a struct file
> reference and a couple of flags encoded into an unsigned long - the lower
> two bits of file address are always zero, so we can stash the flags
> into those. On the way out we use __to_fd() to unpack that unsigned
> long into struct fd.
>
> Let's use that representation for struct fd itself - make it
> a structure with a single unsigned long member (.word), with the value
> equal either to (unsigned long)p | flags, p being an address of some
> struct file instance, or to 0 for an empty fd.
>
> Note that we never used a struct fd instance with NULL ->file
> and non-zero ->flags; the emptiness had been checked as (!f.file) and
> we expected e.g. fdput(empty) to be a no-op. With new representation
> we can use (!f.word) for emptiness check; that is enough for compiler
> to figure out that (f.word & FDPUT_FPUT) will be false and that fdput(f)
> will be a no-op in such case.
>
> For now the new predicate (fd_empty(f)) has no users; all the
> existing checks have form (!fd_file(f)). We will convert to fd_empty()
> use later; here we only define it (and tell the compiler that it's
> unlikely to return true).
>
> This commit only deals with representation change; there will
> be followups.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> drivers/infiniband/core/uverbs_cmd.c | 2 +-
> fs/overlayfs/file.c | 28 +++++++++++++++-------------
> fs/xfs/xfs_handle.c | 2 +-
> include/linux/file.h | 22 ++++++++++++++++------
> kernel/events/core.c | 2 +-
> net/socket.c | 2 +-
> 6 files changed, 35 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 3f85575cf971..a4cce360df21 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -572,7 +572,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
> struct inode *inode = NULL;
> int new_xrcd = 0;
> struct ib_device *ib_dev;
> - struct fd f = {};
> + struct fd f = EMPTY_FD;
> int ret;
>
> ret = uverbs_request(attrs, &cmd, sizeof(cmd));
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index c4963d0c5549..2b7a5a3a7a2f 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -93,11 +93,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
> bool allow_meta)
> {
> struct dentry *dentry = file_dentry(file);
> + struct file *realfile = file->private_data;
> struct path realpath;
> int err;
>
> - real->flags = 0;
> - real->file = file->private_data;
> + real->word = (unsigned long)realfile;
>
> if (allow_meta) {
> ovl_path_real(dentry, &realpath);
> @@ -113,16 +113,17 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
> return -EIO;
>
> /* Has it been copied up since we'd opened it? */
> - if (unlikely(file_inode(real->file) != d_inode(realpath.dentry))) {
> - real->flags = FDPUT_FPUT;
> - real->file = ovl_open_realfile(file, &realpath);
> -
> - return PTR_ERR_OR_ZERO(real->file);
> + if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
> + struct file *f = ovl_open_realfile(file, &realpath);
> + if (IS_ERR(f))
> + return PTR_ERR(f);
> + real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT;
> + return 0;
> }
>
> /* Did the flags change since open? */
> - if (unlikely((file->f_flags ^ real->file->f_flags) & ~OVL_OPEN_FLAGS))
> - return ovl_change_flags(real->file, file->f_flags);
> + if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
> + return ovl_change_flags(realfile, file->f_flags);
>
> return 0;
> }
> @@ -130,10 +131,11 @@ static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
> static int ovl_real_fdget(const struct file *file, struct fd *real)
> {
> if (d_is_dir(file_dentry(file))) {
> - real->flags = 0;
> - real->file = ovl_dir_real_file(file, false);
> -
> - return PTR_ERR_OR_ZERO(real->file);
> + struct file *f = ovl_dir_real_file(file, false);
> + if (IS_ERR(f))
> + return PTR_ERR(f);
> + real->word = (unsigned long)f;
> + return 0;
> }
>
> return ovl_real_fdget_meta(file, real, false);
> diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
> index 7bcc4f519cb8..49e5e5f04e60 100644
> --- a/fs/xfs/xfs_handle.c
> +++ b/fs/xfs/xfs_handle.c
> @@ -85,7 +85,7 @@ xfs_find_handle(
> int hsize;
> xfs_handle_t handle;
> struct inode *inode;
> - struct fd f = {NULL};
> + struct fd f = EMPTY_FD;
> struct path path;
> int error;
> struct xfs_inode *ip;
> diff --git a/include/linux/file.h b/include/linux/file.h
> index 0f3f369f2450..bdd6e1766839 100644
> --- a/include/linux/file.h
> +++ b/include/linux/file.h
> @@ -35,18 +35,28 @@ static inline void fput_light(struct file *file, int fput_needed)
> fput(file);
> }
>
> +/* either a reference to struct file + flags
> + * (cloned vs. borrowed, pos locked), with
> + * flags stored in lower bits of value,
> + * or empty (represented by 0).
> + */
> struct fd {
> - struct file *file;
> - unsigned int flags;
> + unsigned long word;
> };
> #define FDPUT_FPUT 1
> #define FDPUT_POS_UNLOCK 2
>
> -#define fd_file(f) ((f).file)
> +#define fd_file(f) ((struct file *)((f).word & ~3))
Minor thing but it would be nice if you'd use the macros for this
instead of the raw number:
#define fd_file(f) ((struct file *)((f).word & ~(FDPUT_FPUT | FDPUT_POS_UNLOCK))
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 04/39] add struct fd constructors, get rid of __to_fd()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
2024-07-30 5:15 ` [PATCH 02/39] introduce fd_file(), convert all accessors to it viro
2024-07-30 5:15 ` [PATCH 03/39] struct fd: representation change viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:09 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
` (35 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Make __fdget() et.al. return struct fd directly.
New helpers: BORROWED_FD(file) and CLONED_FD(file), for
borrowed and cloned file references resp.
NOTE: this might need tuning; in particular, inline on
__fget_light() is there to keep the code generation same as
before - we probably want to keep it inlined in fdget() et.al.
(especially so in fdget_pos()), but that needs profiling.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/file.c | 26 +++++++++++++-------------
include/linux/file.h | 33 +++++++++++----------------------
2 files changed, 24 insertions(+), 35 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index a3b72aa64f11..6f5ed4daafd2 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -1128,7 +1128,7 @@ EXPORT_SYMBOL(task_lookup_next_fdget_rcu);
* The fput_needed flag returned by fget_light should be passed to the
* corresponding fput_light.
*/
-static unsigned long __fget_light(unsigned int fd, fmode_t mask)
+static inline struct fd __fget_light(unsigned int fd, fmode_t mask)
{
struct files_struct *files = current->files;
struct file *file;
@@ -1145,22 +1145,22 @@ static unsigned long __fget_light(unsigned int fd, fmode_t mask)
if (likely(atomic_read_acquire(&files->count) == 1)) {
file = files_lookup_fd_raw(files, fd);
if (!file || unlikely(file->f_mode & mask))
- return 0;
- return (unsigned long)file;
+ return EMPTY_FD;
+ return BORROWED_FD(file);
} else {
file = __fget_files(files, fd, mask);
if (!file)
- return 0;
- return FDPUT_FPUT | (unsigned long)file;
+ return EMPTY_FD;
+ return CLONED_FD(file);
}
}
-unsigned long __fdget(unsigned int fd)
+struct fd fdget(unsigned int fd)
{
return __fget_light(fd, FMODE_PATH);
}
-EXPORT_SYMBOL(__fdget);
+EXPORT_SYMBOL(fdget);
-unsigned long __fdget_raw(unsigned int fd)
+struct fd fdget_raw(unsigned int fd)
{
return __fget_light(fd, 0);
}
@@ -1181,16 +1181,16 @@ static inline bool file_needs_f_pos_lock(struct file *file)
(file_count(file) > 1 || file->f_op->iterate_shared);
}
-unsigned long __fdget_pos(unsigned int fd)
+struct fd fdget_pos(unsigned int fd)
{
- unsigned long v = __fdget(fd);
- struct file *file = (struct file *)(v & ~3);
+ struct fd f = fdget(fd);
+ struct file *file = fd_file(f);
if (file && file_needs_f_pos_lock(file)) {
- v |= FDPUT_POS_UNLOCK;
+ f.word |= FDPUT_POS_UNLOCK;
mutex_lock(&file->f_pos_lock);
}
- return v;
+ return f;
}
void __f_unlock_pos(struct file *f)
diff --git a/include/linux/file.h b/include/linux/file.h
index bdd6e1766839..00a42604d322 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -53,6 +53,14 @@ static inline bool fd_empty(struct fd f)
}
#define EMPTY_FD (struct fd){0}
+static inline struct fd BORROWED_FD(struct file *f)
+{
+ return (struct fd){(unsigned long)f};
+}
+static inline struct fd CLONED_FD(struct file *f)
+{
+ return (struct fd){(unsigned long)f | FDPUT_FPUT};
+}
static inline void fdput(struct fd fd)
{
@@ -63,30 +71,11 @@ static inline void fdput(struct fd fd)
extern struct file *fget(unsigned int fd);
extern struct file *fget_raw(unsigned int fd);
extern struct file *fget_task(struct task_struct *task, unsigned int fd);
-extern unsigned long __fdget(unsigned int fd);
-extern unsigned long __fdget_raw(unsigned int fd);
-extern unsigned long __fdget_pos(unsigned int fd);
extern void __f_unlock_pos(struct file *);
-static inline struct fd __to_fd(unsigned long v)
-{
- return (struct fd){v};
-}
-
-static inline struct fd fdget(unsigned int fd)
-{
- return __to_fd(__fdget(fd));
-}
-
-static inline struct fd fdget_raw(unsigned int fd)
-{
- return __to_fd(__fdget_raw(fd));
-}
-
-static inline struct fd fdget_pos(int fd)
-{
- return __to_fd(__fdget_pos(fd));
-}
+struct fd fdget(unsigned int fd);
+struct fd fdget_raw(unsigned int fd);
+struct fd fdget_pos(unsigned int fd);
static inline void fdput_pos(struct fd f)
{
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 04/39] add struct fd constructors, get rid of __to_fd()
2024-07-30 5:15 ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
@ 2024-08-07 10:09 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:09 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:50AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Make __fdget() et.al. return struct fd directly.
> New helpers: BORROWED_FD(file) and CLONED_FD(file), for
> borrowed and cloned file references resp.
>
> NOTE: this might need tuning; in particular, inline on
> __fget_light() is there to keep the code generation same as
> before - we probably want to keep it inlined in fdget() et.al.
> (especially so in fdget_pos()), but that needs profiling.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (2 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 04/39] add struct fd constructors, get rid of __to_fd() viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:10 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
` (34 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Currently all emptiness checks are done as fd_file(...) in boolean
context (usually something like if (!fd_file(f))...); those will be
taken care of later.
However, there's a couple of places where we do those checks as
'store fd_file(...) into a variable, then check if this variable is
NULL' and those are harder to spot.
Get rid of those now.
use fd_empty() instead of extracting file and then checking it for NULL.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/remap_range.c | 5 ++---
kernel/module/main.c | 4 +++-
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 4403d5c68fcb..017d0d1ea6c9 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -537,9 +537,8 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
for (i = 0, info = same->info; i < count; i++, info++) {
struct fd dst_fd = fdget(info->dest_fd);
- struct file *dst_file = fd_file(dst_fd);
- if (!dst_file) {
+ if (fd_empty(dst_fd)) {
info->status = -EBADF;
goto next_loop;
}
@@ -549,7 +548,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
goto next_fdput;
}
- deduped = vfs_dedupe_file_range_one(file, off, dst_file,
+ deduped = vfs_dedupe_file_range_one(file, off, fd_file(dst_fd),
info->dest_offset, len,
REMAP_FILE_CAN_SHORTEN);
if (deduped == -EBADE)
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 6ed334eecc14..93cc9fea9d9d 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3180,7 +3180,7 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
{
struct idempotent idem;
- if (!f || !(f->f_mode & FMODE_READ))
+ if (!(f->f_mode & FMODE_READ))
return -EBADF;
/* See if somebody else is doing the operation? */
@@ -3211,6 +3211,8 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
return -EINVAL;
f = fdget(fd);
+ if (fd_empty(f))
+ return -EBADF;
err = idempotent_init_module(fd_file(f), uargs, flags);
fdput(f);
return err;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range()
2024-07-30 5:15 ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
@ 2024-08-07 10:10 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:10 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:51AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Currently all emptiness checks are done as fd_file(...) in boolean
> context (usually something like if (!fd_file(f))...); those will be
> taken care of later.
>
> However, there's a couple of places where we do those checks as
> 'store fd_file(...) into a variable, then check if this variable is
> NULL' and those are harder to spot.
>
> Get rid of those now.
>
> use fd_empty() instead of extracting file and then checking it for NULL.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 06/39] net/socket.c: switch to CLASS(fd)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (3 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 05/39] regularize emptiness checks in fini_module(2) and vfs_dedupe_file_range() viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:13 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that viro
` (33 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
I strongly suspect that important part in sockfd_lookup_light()
is avoiding needless file refcount operations, not the marginal reduction
of the register pressure from not keeping a struct file pointer in
the caller.
If that's true, we should get the same benefits from straight
fdget()/fdput(). And AFAICS with sane use of CLASS(fd) we can get a
better code generation...
Would be nice if somebody tested it on networking test suites
(including benchmarks)...
sockfd_lookup_light() does fdget(), uses sock_from_file() to
get the associated socket and returns the struct socket reference to
the caller, along with "do we need to fput()" flag. No matching fdput(),
the caller does its equivalent manually, using the fact that sock->file
points to the struct file the socket has come from.
Get rid of that - have the callers do fdget()/fdput() and
use sock_from_file() directly. That kills sockfd_lookup_light()
and fput_light() (no users left).
What's more, we can get rid of explicit fdget()/fdput() by
switching to CLASS(fd, ...) - code generation does not suffer, since
now fdput() inserted on "descriptor is not opened" failure exit
is recognized to be a no-op by compiler.
We could split that commit in two (getting rid of sockd_lookup_light()
and switch to CLASS(fd, ...)), but AFAICS it ends up being harder to read
that way.
[conflicts in a couple of functions]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/linux/file.h | 6 -
net/socket.c | 303 +++++++++++++++++++------------------------
2 files changed, 137 insertions(+), 172 deletions(-)
diff --git a/include/linux/file.h b/include/linux/file.h
index 00a42604d322..3353d70fd460 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -29,12 +29,6 @@ extern struct file *alloc_file_pseudo_noaccount(struct inode *, struct vfsmount
extern struct file *alloc_file_clone(struct file *, int flags,
const struct file_operations *);
-static inline void fput_light(struct file *file, int fput_needed)
-{
- if (fput_needed)
- fput(file);
-}
-
/* either a reference to struct file + flags
* (cloned vs. borrowed, pos locked), with
* flags stored in lower bits of value,
diff --git a/net/socket.c b/net/socket.c
index c0d4f5032374..c09c69eddafa 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -510,7 +510,7 @@ static int sock_map_fd(struct socket *sock, int flags)
struct socket *sock_from_file(struct file *file)
{
- if (file->f_op == &socket_file_ops)
+ if (likely(file->f_op == &socket_file_ops))
return file->private_data; /* set in sock_alloc_file */
return NULL;
@@ -550,24 +550,6 @@ struct socket *sockfd_lookup(int fd, int *err)
}
EXPORT_SYMBOL(sockfd_lookup);
-static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
-{
- struct fd f = fdget(fd);
- struct socket *sock;
-
- *err = -EBADF;
- if (fd_file(f)) {
- sock = sock_from_file(fd_file(f));
- if (likely(sock)) {
- *fput_needed = f.word & FDPUT_FPUT;
- return sock;
- }
- *err = -ENOTSOCK;
- fdput(f);
- }
- return NULL;
-}
-
static ssize_t sockfs_listxattr(struct dentry *dentry, char *buffer,
size_t size)
{
@@ -1848,16 +1830,20 @@ int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen)
{
struct socket *sock;
struct sockaddr_storage address;
- int err, fput_needed;
-
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (sock) {
- err = move_addr_to_kernel(umyaddr, addrlen, &address);
- if (!err)
- err = __sys_bind_socket(sock, &address, addrlen);
- fput_light(sock->file, fput_needed);
- }
- return err;
+ CLASS(fd, f)(fd);
+ int err;
+
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ err = move_addr_to_kernel(umyaddr, addrlen, &address);
+ if (unlikely(err))
+ return err;
+
+ return __sys_bind_socket(sock, &address, addrlen);
}
SYSCALL_DEFINE3(bind, int, fd, struct sockaddr __user *, umyaddr, int, addrlen)
@@ -1886,15 +1872,16 @@ int __sys_listen_socket(struct socket *sock, int backlog)
int __sys_listen(int fd, int backlog)
{
+ CLASS(fd, f)(fd);
struct socket *sock;
- int err, fput_needed;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (sock) {
- err = __sys_listen_socket(sock, backlog);
- fput_light(sock->file, fput_needed);
- }
- return err;
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ return __sys_listen_socket(sock, backlog);
}
SYSCALL_DEFINE2(listen, int, fd, int, backlog)
@@ -2004,17 +1991,12 @@ static int __sys_accept4_file(struct file *file, struct sockaddr __user *upeer_s
int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
int __user *upeer_addrlen, int flags)
{
- int ret = -EBADF;
- struct fd f;
+ CLASS(fd, f)(fd);
- f = fdget(fd);
- if (fd_file(f)) {
- ret = __sys_accept4_file(fd_file(f), upeer_sockaddr,
+ if (fd_empty(f))
+ return -EBADF;
+ return __sys_accept4_file(fd_file(f), upeer_sockaddr,
upeer_addrlen, flags);
- fdput(f);
- }
-
- return ret;
}
SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
@@ -2066,20 +2048,18 @@ int __sys_connect_file(struct file *file, struct sockaddr_storage *address,
int __sys_connect(int fd, struct sockaddr __user *uservaddr, int addrlen)
{
- int ret = -EBADF;
- struct fd f;
+ struct sockaddr_storage address;
+ CLASS(fd, f)(fd);
+ int ret;
- f = fdget(fd);
- if (fd_file(f)) {
- struct sockaddr_storage address;
+ if (fd_empty(f))
+ return -EBADF;
- ret = move_addr_to_kernel(uservaddr, addrlen, &address);
- if (!ret)
- ret = __sys_connect_file(fd_file(f), &address, addrlen, 0);
- fdput(f);
- }
+ ret = move_addr_to_kernel(uservaddr, addrlen, &address);
+ if (ret)
+ return ret;
- return ret;
+ return __sys_connect_file(fd_file(f), &address, addrlen, 0);
}
SYSCALL_DEFINE3(connect, int, fd, struct sockaddr __user *, uservaddr,
@@ -2098,26 +2078,25 @@ int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
{
struct socket *sock;
struct sockaddr_storage address;
- int err, fput_needed;
+ CLASS(fd, f)(fd);
+ int err;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- goto out;
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
err = security_socket_getsockname(sock);
if (err)
- goto out_put;
+ return err;
err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 0);
if (err < 0)
- goto out_put;
- /* "err" is actually length in this case */
- err = move_addr_to_user(&address, err, usockaddr, usockaddr_len);
+ return err;
-out_put:
- fput_light(sock->file, fput_needed);
-out:
- return err;
+ /* "err" is actually length in this case */
+ return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
}
SYSCALL_DEFINE3(getsockname, int, fd, struct sockaddr __user *, usockaddr,
@@ -2136,26 +2115,25 @@ int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
{
struct socket *sock;
struct sockaddr_storage address;
- int err, fput_needed;
+ CLASS(fd, f)(fd);
+ int err;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (sock != NULL) {
- const struct proto_ops *ops = READ_ONCE(sock->ops);
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
- err = security_socket_getpeername(sock);
- if (err) {
- fput_light(sock->file, fput_needed);
- return err;
- }
+ err = security_socket_getpeername(sock);
+ if (err)
+ return err;
- err = ops->getname(sock, (struct sockaddr *)&address, 1);
- if (err >= 0)
- /* "err" is actually length in this case */
- err = move_addr_to_user(&address, err, usockaddr,
- usockaddr_len);
- fput_light(sock->file, fput_needed);
- }
- return err;
+ err = READ_ONCE(sock->ops)->getname(sock, (struct sockaddr *)&address, 1);
+ if (err < 0)
+ return err;
+
+ /* "err" is actually length in this case */
+ return move_addr_to_user(&address, err, usockaddr, usockaddr_len);
}
SYSCALL_DEFINE3(getpeername, int, fd, struct sockaddr __user *, usockaddr,
@@ -2176,14 +2154,17 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
struct sockaddr_storage address;
int err;
struct msghdr msg;
- int fput_needed;
err = import_ubuf(ITER_SOURCE, buff, len, &msg.msg_iter);
if (unlikely(err))
return err;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- goto out;
+
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
msg.msg_name = NULL;
msg.msg_control = NULL;
@@ -2193,7 +2174,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
if (addr) {
err = move_addr_to_kernel(addr, addr_len, &address);
if (err < 0)
- goto out_put;
+ return err;
msg.msg_name = (struct sockaddr *)&address;
msg.msg_namelen = addr_len;
}
@@ -2201,12 +2182,7 @@ int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags,
if (sock->file->f_flags & O_NONBLOCK)
flags |= MSG_DONTWAIT;
msg.msg_flags = flags;
- err = __sock_sendmsg(sock, &msg);
-
-out_put:
- fput_light(sock->file, fput_needed);
-out:
- return err;
+ return __sock_sendmsg(sock, &msg);
}
SYSCALL_DEFINE6(sendto, int, fd, void __user *, buff, size_t, len,
@@ -2241,14 +2217,18 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
};
struct socket *sock;
int err, err2;
- int fput_needed;
err = import_ubuf(ITER_DEST, ubuf, size, &msg.msg_iter);
if (unlikely(err))
return err;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- goto out;
+
+ CLASS(fd, f)(fd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
if (sock->file->f_flags & O_NONBLOCK)
flags |= MSG_DONTWAIT;
@@ -2260,9 +2240,6 @@ int __sys_recvfrom(int fd, void __user *ubuf, size_t size, unsigned int flags,
if (err2 < 0)
err = err2;
}
-
- fput_light(sock->file, fput_needed);
-out:
return err;
}
@@ -2337,17 +2314,16 @@ int __sys_setsockopt(int fd, int level, int optname, char __user *user_optval,
{
sockptr_t optval = USER_SOCKPTR(user_optval);
bool compat = in_compat_syscall();
- int err, fput_needed;
struct socket *sock;
+ CLASS(fd, f)(fd);
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
-
- err = do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
- fput_light(sock->file, fput_needed);
- return err;
+ return do_sock_setsockopt(sock, compat, level, optname, optval, optlen);
}
SYSCALL_DEFINE5(setsockopt, int, fd, int, level, int, optname,
@@ -2403,20 +2379,17 @@ EXPORT_SYMBOL(do_sock_getsockopt);
int __sys_getsockopt(int fd, int level, int optname, char __user *optval,
int __user *optlen)
{
- int err, fput_needed;
struct socket *sock;
- bool compat;
+ CLASS(fd, f)(fd);
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
- compat = in_compat_syscall();
- err = do_sock_getsockopt(sock, compat, level, optname,
+ return do_sock_getsockopt(sock, in_compat_syscall(), level, optname,
USER_SOCKPTR(optval), USER_SOCKPTR(optlen));
-
- fput_light(sock->file, fput_needed);
- return err;
}
SYSCALL_DEFINE5(getsockopt, int, fd, int, level, int, optname,
@@ -2442,15 +2415,16 @@ int __sys_shutdown_sock(struct socket *sock, int how)
int __sys_shutdown(int fd, int how)
{
- int err, fput_needed;
struct socket *sock;
+ CLASS(fd, f)(fd);
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (sock != NULL) {
- err = __sys_shutdown_sock(sock, how);
- fput_light(sock->file, fput_needed);
- }
- return err;
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
+
+ return __sys_shutdown_sock(sock, how);
}
SYSCALL_DEFINE2(shutdown, int, fd, int, how)
@@ -2666,22 +2640,21 @@ long __sys_sendmsg_sock(struct socket *sock, struct msghdr *msg,
long __sys_sendmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
bool forbid_cmsg_compat)
{
- int fput_needed, err;
struct msghdr msg_sys;
struct socket *sock;
if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
return -EINVAL;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- goto out;
+ CLASS(fd, f)(fd);
- err = ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
- fput_light(sock->file, fput_needed);
-out:
- return err;
+ return ___sys_sendmsg(sock, msg, &msg_sys, flags, NULL, 0);
}
SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int, flags)
@@ -2696,7 +2669,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct user_msghdr __user *, msg, unsigned int
int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
unsigned int flags, bool forbid_cmsg_compat)
{
- int fput_needed, err, datagrams;
+ int err, datagrams;
struct socket *sock;
struct mmsghdr __user *entry;
struct compat_mmsghdr __user *compat_entry;
@@ -2712,9 +2685,13 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
datagrams = 0;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
+ CLASS(fd, f)(fd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
used_address.name_len = UINT_MAX;
entry = mmsg;
@@ -2751,8 +2728,6 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int vlen,
cond_resched();
}
- fput_light(sock->file, fput_needed);
-
/* We only return an error if no datagrams were able to be sent */
if (datagrams != 0)
return datagrams;
@@ -2874,22 +2849,21 @@ long __sys_recvmsg_sock(struct socket *sock, struct msghdr *msg,
long __sys_recvmsg(int fd, struct user_msghdr __user *msg, unsigned int flags,
bool forbid_cmsg_compat)
{
- int fput_needed, err;
struct msghdr msg_sys;
struct socket *sock;
if (forbid_cmsg_compat && (flags & MSG_CMSG_COMPAT))
return -EINVAL;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- goto out;
+ CLASS(fd, f)(fd);
- err = ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
- fput_light(sock->file, fput_needed);
-out:
- return err;
+ return ___sys_recvmsg(sock, msg, &msg_sys, flags, 0);
}
SYSCALL_DEFINE3(recvmsg, int, fd, struct user_msghdr __user *, msg,
@@ -2906,7 +2880,7 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
unsigned int vlen, unsigned int flags,
struct timespec64 *timeout)
{
- int fput_needed, err, datagrams;
+ int err, datagrams;
struct socket *sock;
struct mmsghdr __user *entry;
struct compat_mmsghdr __user *compat_entry;
@@ -2921,16 +2895,18 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
datagrams = 0;
- sock = sockfd_lookup_light(fd, &err, &fput_needed);
- if (!sock)
- return err;
+ CLASS(fd, f)(fd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ sock = sock_from_file(fd_file(f));
+ if (unlikely(!sock))
+ return -ENOTSOCK;
if (likely(!(flags & MSG_ERRQUEUE))) {
err = sock_error(sock->sk);
- if (err) {
- datagrams = err;
- goto out_put;
- }
+ if (err)
+ return err;
}
entry = mmsg;
@@ -2987,12 +2963,10 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
}
if (err == 0)
- goto out_put;
+ return datagrams;
- if (datagrams == 0) {
- datagrams = err;
- goto out_put;
- }
+ if (datagrams == 0)
+ return err;
/*
* We may return less entries than requested (vlen) if the
@@ -3007,9 +2981,6 @@ static int do_recvmmsg(int fd, struct mmsghdr __user *mmsg,
*/
WRITE_ONCE(sock->sk->sk_err, -err);
}
-out_put:
- fput_light(sock->file, fput_needed);
-
return datagrams;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 06/39] net/socket.c: switch to CLASS(fd)
2024-07-30 5:15 ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
@ 2024-08-07 10:13 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:13 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:52AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> I strongly suspect that important part in sockfd_lookup_light()
> is avoiding needless file refcount operations, not the marginal reduction
> of the register pressure from not keeping a struct file pointer in
> the caller.
>
> If that's true, we should get the same benefits from straight
> fdget()/fdput(). And AFAICS with sane use of CLASS(fd) we can get a
> better code generation...
>
> Would be nice if somebody tested it on networking test suites
> (including benchmarks)...
>
> sockfd_lookup_light() does fdget(), uses sock_from_file() to
> get the associated socket and returns the struct socket reference to
> the caller, along with "do we need to fput()" flag. No matching fdput(),
> the caller does its equivalent manually, using the fact that sock->file
> points to the struct file the socket has come from.
>
> Get rid of that - have the callers do fdget()/fdput() and
> use sock_from_file() directly. That kills sockfd_lookup_light()
> and fput_light() (no users left).
>
> What's more, we can get rid of explicit fdget()/fdput() by
> switching to CLASS(fd, ...) - code generation does not suffer, since
> now fdput() inserted on "descriptor is not opened" failure exit
> is recognized to be a no-op by compiler.
>
> We could split that commit in two (getting rid of sockd_lookup_light()
> and switch to CLASS(fd, ...)), but AFAICS it ends up being harder to read
> that way.
>
> [conflicts in a couple of functions]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (4 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 06/39] net/socket.c: switch to CLASS(fd) viro
@ 2024-07-30 5:15 ` viro
2024-07-30 5:15 ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
` (32 subsequent siblings)
38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Similar to struct fd; unlike struct fd, it can represent
error values.
Accessors:
* fd_empty(f): true if f represents an error
* fd_file(f): just as for struct fd it yields a pointer to
struct file if fd_empty(f) is false. If
fd_empty(f) is true, fd_file(f) is guaranteed
_not_ to be an address of any object (IS_ERR()
will be true in that case)
* fd_error(f): if f represents an error, returns that error,
otherwise the return value is junk.
Constructors:
* ERR_FD(-E...): an instance encoding given error [ERR_FDERR, perhaps?]
* BORROWED_FDERR(file): if file points to a struct file instance,
return a struct fderr representing that file
reference with no flags set.
if file is an ERR_PTR(-E...), return a struct
fderr representing that error.
file MUST NOT be NULL.
* CLONED_FDERR(file): similar, but in case when file points to
a struct file instance, set FDPUT_FPUT in flags.
Same destructor as for struct fd; I'm not entirely convinced that
playing with _Generic is a good idea here, but for now let's go
that way...
See fs/overlayfs/file.c for example of use.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/overlayfs/file.c | 125 +++++++++++++++++++++----------------------
include/linux/file.h | 38 +++++++++++--
2 files changed, 95 insertions(+), 68 deletions(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 2b7a5a3a7a2f..4b9e145bc7b8 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -89,56 +89,47 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
return 0;
}
-static int ovl_real_fdget_meta(const struct file *file, struct fd *real,
- bool allow_meta)
+static struct fderr ovl_real_fdget_meta(const struct file *file, bool allow_meta)
{
struct dentry *dentry = file_dentry(file);
struct file *realfile = file->private_data;
struct path realpath;
int err;
- real->word = (unsigned long)realfile;
-
if (allow_meta) {
ovl_path_real(dentry, &realpath);
} else {
/* lazy lookup and verify of lowerdata */
err = ovl_verify_lowerdata(dentry);
if (err)
- return err;
+ return ERR_FD(err);
ovl_path_realdata(dentry, &realpath);
}
if (!realpath.dentry)
- return -EIO;
+ return ERR_FD(-EIO);
/* Has it been copied up since we'd opened it? */
if (unlikely(file_inode(realfile) != d_inode(realpath.dentry))) {
- struct file *f = ovl_open_realfile(file, &realpath);
- if (IS_ERR(f))
- return PTR_ERR(f);
- real->word = (unsigned long)ovl_open_realfile(file, &realpath) | FDPUT_FPUT;
- return 0;
+ return CLONED_FDERR(ovl_open_realfile(file, &realpath));
}
/* Did the flags change since open? */
- if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS))
- return ovl_change_flags(realfile, file->f_flags);
+ if (unlikely((file->f_flags ^ realfile->f_flags) & ~OVL_OPEN_FLAGS)) {
+ err = ovl_change_flags(realfile, file->f_flags);
+ if (err)
+ return ERR_FD(err);
+ }
- return 0;
+ return BORROWED_FDERR(realfile);
}
-static int ovl_real_fdget(const struct file *file, struct fd *real)
+static struct fderr ovl_real_fdget(const struct file *file)
{
- if (d_is_dir(file_dentry(file))) {
- struct file *f = ovl_dir_real_file(file, false);
- if (IS_ERR(f))
- return PTR_ERR(f);
- real->word = (unsigned long)f;
- return 0;
- }
+ if (d_is_dir(file_dentry(file)))
+ return BORROWED_FDERR(ovl_dir_real_file(file, false));
- return ovl_real_fdget_meta(file, real, false);
+ return ovl_real_fdget_meta(file, false);
}
static int ovl_open(struct inode *inode, struct file *file)
@@ -183,7 +174,7 @@ static int ovl_release(struct inode *inode, struct file *file)
static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
{
struct inode *inode = file_inode(file);
- struct fd real;
+ struct fderr real;
const struct cred *old_cred;
loff_t ret;
@@ -199,9 +190,9 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
return vfs_setpos(file, 0, 0);
}
- ret = ovl_real_fdget(file, &real);
- if (ret)
- return ret;
+ real = ovl_real_fdget(file);
+ if (fd_empty(real))
+ return fd_error(real);
/*
* Overlay file f_pos is the master copy that is preserved
@@ -262,7 +253,7 @@ static void ovl_file_accessed(struct file *file)
static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct file *file = iocb->ki_filp;
- struct fd real;
+ struct fderr real;
ssize_t ret;
struct backing_file_ctx ctx = {
.cred = ovl_creds(file_inode(file)->i_sb),
@@ -273,9 +264,9 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
if (!iov_iter_count(iter))
return 0;
- ret = ovl_real_fdget(file, &real);
- if (ret)
- return ret;
+ real = ovl_real_fdget(file);
+ if (fd_empty(real))
+ return fd_error(real);
ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
&ctx);
@@ -288,7 +279,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
- struct fd real;
+ struct fderr real;
ssize_t ret;
int ifl = iocb->ki_flags;
struct backing_file_ctx ctx = {
@@ -304,9 +295,11 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
/* Update mode */
ovl_copyattr(inode);
- ret = ovl_real_fdget(file, &real);
- if (ret)
+ real = ovl_real_fdget(file);
+ if (fd_empty(real)) {
+ ret = fd_error(real);
goto out_unlock;
+ }
if (!ovl_should_sync(OVL_FS(inode->i_sb)))
ifl &= ~(IOCB_DSYNC | IOCB_SYNC);
@@ -329,7 +322,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
{
- struct fd real;
+ struct fderr real;
ssize_t ret;
struct backing_file_ctx ctx = {
.cred = ovl_creds(file_inode(in)->i_sb),
@@ -337,9 +330,9 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
.accessed = ovl_file_accessed,
};
- ret = ovl_real_fdget(in, &real);
- if (ret)
- return ret;
+ real = ovl_real_fdget(in);
+ if (fd_empty(real))
+ return fd_error(real);
ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
fdput(real);
@@ -358,7 +351,7 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags)
{
- struct fd real;
+ struct fderr real;
struct inode *inode = file_inode(out);
ssize_t ret;
struct backing_file_ctx ctx = {
@@ -371,9 +364,11 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
/* Update mode */
ovl_copyattr(inode);
- ret = ovl_real_fdget(out, &real);
- if (ret)
+ real = ovl_real_fdget(out);
+ if (fd_empty(real)) {
+ ret = fd_error(real);
goto out_unlock;
+ }
ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
fdput(real);
@@ -386,7 +381,7 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
{
- struct fd real;
+ struct fderr real;
const struct cred *old_cred;
int ret;
@@ -394,9 +389,9 @@ static int ovl_fsync(struct file *file, loff_t start, loff_t end, int datasync)
if (ret <= 0)
return ret;
- ret = ovl_real_fdget_meta(file, &real, !datasync);
- if (ret)
- return ret;
+ real = ovl_real_fdget_meta(file, !datasync);
+ if (fd_empty(real))
+ return fd_error(real);
/* Don't sync lower file for fear of receiving EROFS error */
if (file_inode(fd_file(real)) == ovl_inode_upper(file_inode(file))) {
@@ -425,7 +420,7 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
{
struct inode *inode = file_inode(file);
- struct fd real;
+ struct fderr real;
const struct cred *old_cred;
int ret;
@@ -435,10 +430,11 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
ret = file_remove_privs(file);
if (ret)
goto out_unlock;
-
- ret = ovl_real_fdget(file, &real);
- if (ret)
+ real = ovl_real_fdget(file);
+ if (fd_empty(real)) {
+ ret = fd_error(real);
goto out_unlock;
+ }
old_cred = ovl_override_creds(file_inode(file)->i_sb);
ret = vfs_fallocate(fd_file(real), mode, offset, len);
@@ -457,13 +453,13 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
{
- struct fd real;
+ struct fderr real;
const struct cred *old_cred;
int ret;
- ret = ovl_real_fdget(file, &real);
- if (ret)
- return ret;
+ real = ovl_real_fdget(file);
+ if (fd_empty(real))
+ return fd_error(real);
old_cred = ovl_override_creds(file_inode(file)->i_sb);
ret = vfs_fadvise(fd_file(real), offset, len, advice);
@@ -485,7 +481,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
loff_t len, unsigned int flags, enum ovl_copyop op)
{
struct inode *inode_out = file_inode(file_out);
- struct fd real_in, real_out;
+ struct fderr real_in, real_out;
const struct cred *old_cred;
loff_t ret;
@@ -498,13 +494,16 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
goto out_unlock;
}
- ret = ovl_real_fdget(file_out, &real_out);
- if (ret)
+ real_out = ovl_real_fdget(file_out);
+ if (fd_empty(real_out)) {
+ ret = fd_error(real_out);
goto out_unlock;
+ }
- ret = ovl_real_fdget(file_in, &real_in);
- if (ret) {
+ real_in = ovl_real_fdget(file_in);
+ if (fd_empty(real_in)) {
fdput(real_out);
+ ret = fd_error(real_in);
goto out_unlock;
}
@@ -577,13 +576,13 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
static int ovl_flush(struct file *file, fl_owner_t id)
{
- struct fd real;
+ struct fderr real;
const struct cred *old_cred;
- int err;
+ int err = 0;
- err = ovl_real_fdget(file, &real);
- if (err)
- return err;
+ real = ovl_real_fdget(file);
+ if (fd_empty(real))
+ return fd_error(real);
if (fd_file(real)->f_op->flush) {
old_cred = ovl_override_creds(file_inode(file)->i_sb);
diff --git a/include/linux/file.h b/include/linux/file.h
index 3353d70fd460..d3165d7a8112 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -10,6 +10,7 @@
#include <linux/types.h>
#include <linux/posix_types.h>
#include <linux/errno.h>
+#include <linux/err.h>
#include <linux/cleanup.h>
struct file;
@@ -37,13 +38,26 @@ extern struct file *alloc_file_clone(struct file *, int flags,
struct fd {
unsigned long word;
};
+
+/* either a reference to struct file + flags
+ * (cloned vs. borrowed, pos locked), with
+ * flags stored in lower bits of value,
+ * or an error (represented by small negative value).
+ */
+struct fderr {
+ unsigned long word;
+};
+
#define FDPUT_FPUT 1
#define FDPUT_POS_UNLOCK 2
+#define fd_empty(f) _Generic((f), \
+ struct fd: unlikely(!(f).word), \
+ struct fderr: IS_ERR_VALUE((f).word))
#define fd_file(f) ((struct file *)((f).word & ~3))
-static inline bool fd_empty(struct fd f)
+static inline long fd_error(struct fderr f)
{
- return unlikely(!f.word);
+ return (long)f.word;
}
#define EMPTY_FD (struct fd){0}
@@ -56,11 +70,25 @@ static inline struct fd CLONED_FD(struct file *f)
return (struct fd){(unsigned long)f | FDPUT_FPUT};
}
-static inline void fdput(struct fd fd)
+static inline struct fderr ERR_FD(long n)
+{
+ return (struct fderr){(unsigned long)n};
+}
+static inline struct fderr BORROWED_FDERR(struct file *f)
{
- if (fd.word & FDPUT_FPUT)
- fput(fd_file(fd));
+ return (struct fderr){(unsigned long)f};
}
+static inline struct fderr CLONED_FDERR(struct file *f)
+{
+ if (IS_ERR(f))
+ return BORROWED_FDERR(f);
+ return (struct fderr){(unsigned long)f | FDPUT_FPUT};
+}
+
+#define fdput(f) (void) (_Generic((f), \
+ struct fderr: IS_ERR_VALUE((f).word), \
+ struct fd: true) && \
+ ((f).word & FDPUT_FPUT) && (fput(fd_file(f)),0))
extern struct file *fget(unsigned int fd);
extern struct file *fget_raw(unsigned int fd);
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (5 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 07/39] introduce struct fderr, convert overlayfs uses to that viro
@ 2024-07-30 5:15 ` viro
2024-07-30 19:10 ` Josef Bacik
2024-08-07 10:23 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
` (31 subsequent siblings)
38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
There are four places where we end up adding an extra scope
covering just the range from constructor to destructor;
not sure if that's the best way to handle that.
The functions in question are ovl_write_iter(), ovl_splice_write(),
ovl_fadvise() and ovl_copyfile().
This is very likely *NOT* the final form of that thing - it
needs to be discussed.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/overlayfs/file.c | 72 ++++++++++++++++++---------------------------
1 file changed, 29 insertions(+), 43 deletions(-)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index 4b9e145bc7b8..a2911c632137 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -132,6 +132,8 @@ static struct fderr ovl_real_fdget(const struct file *file)
return ovl_real_fdget_meta(file, false);
}
+DEFINE_CLASS(fd_real, struct fderr, fdput(_T), ovl_real_fdget(file), struct file *file)
+
static int ovl_open(struct inode *inode, struct file *file)
{
struct dentry *dentry = file_dentry(file);
@@ -174,7 +176,6 @@ static int ovl_release(struct inode *inode, struct file *file)
static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
{
struct inode *inode = file_inode(file);
- struct fderr real;
const struct cred *old_cred;
loff_t ret;
@@ -190,7 +191,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
return vfs_setpos(file, 0, 0);
}
- real = ovl_real_fdget(file);
+ CLASS(fd_real, real)(file);
if (fd_empty(real))
return fd_error(real);
@@ -211,8 +212,6 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
file->f_pos = fd_file(real)->f_pos;
ovl_inode_unlock(inode);
- fdput(real);
-
return ret;
}
@@ -253,8 +252,6 @@ static void ovl_file_accessed(struct file *file)
static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct file *file = iocb->ki_filp;
- struct fderr real;
- ssize_t ret;
struct backing_file_ctx ctx = {
.cred = ovl_creds(file_inode(file)->i_sb),
.user_file = file,
@@ -264,22 +261,18 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
if (!iov_iter_count(iter))
return 0;
- real = ovl_real_fdget(file);
+ CLASS(fd_real, real)(file);
if (fd_empty(real))
return fd_error(real);
- ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
- &ctx);
- fdput(real);
-
- return ret;
+ return backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
+ &ctx);
}
static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
- struct fderr real;
ssize_t ret;
int ifl = iocb->ki_flags;
struct backing_file_ctx ctx = {
@@ -295,7 +288,9 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
/* Update mode */
ovl_copyattr(inode);
- real = ovl_real_fdget(file);
+ {
+
+ CLASS(fd_real, real)(file);
if (fd_empty(real)) {
ret = fd_error(real);
goto out_unlock;
@@ -310,7 +305,8 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
*/
ifl &= ~IOCB_DIO_CALLER_COMP;
ret = backing_file_write_iter(fd_file(real), iter, iocb, ifl, &ctx);
- fdput(real);
+
+ }
out_unlock:
inode_unlock(inode);
@@ -322,22 +318,18 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags)
{
- struct fderr real;
- ssize_t ret;
+ CLASS(fd_real, real)(in);
struct backing_file_ctx ctx = {
.cred = ovl_creds(file_inode(in)->i_sb),
.user_file = in,
.accessed = ovl_file_accessed,
};
- real = ovl_real_fdget(in);
if (fd_empty(real))
return fd_error(real);
- ret = backing_file_splice_read(fd_file(real), ppos, pipe, len, flags, &ctx);
- fdput(real);
-
- return ret;
+ return backing_file_splice_read(fd_file(real), ppos, pipe, len, flags,
+ &ctx);
}
/*
@@ -351,7 +343,6 @@ static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
loff_t *ppos, size_t len, unsigned int flags)
{
- struct fderr real;
struct inode *inode = file_inode(out);
ssize_t ret;
struct backing_file_ctx ctx = {
@@ -364,15 +355,17 @@ static ssize_t ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
/* Update mode */
ovl_copyattr(inode);
- real = ovl_real_fdget(out);
+ {
+
+ CLASS(fd_real, real)(out);
if (fd_empty(real)) {
ret = fd_error(real);
goto out_unlock;
}
ret = backing_file_splice_write(pipe, fd_file(real), ppos, len, flags, &ctx);
- fdput(real);
+ }
out_unlock:
inode_unlock(inode);
@@ -420,7 +413,6 @@ static int ovl_mmap(struct file *file, struct vm_area_struct *vma)
static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
{
struct inode *inode = file_inode(file);
- struct fderr real;
const struct cred *old_cred;
int ret;
@@ -430,7 +422,9 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
ret = file_remove_privs(file);
if (ret)
goto out_unlock;
- real = ovl_real_fdget(file);
+ {
+
+ CLASS(fd_real, real)(file);
if (fd_empty(real)) {
ret = fd_error(real);
goto out_unlock;
@@ -443,8 +437,7 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
/* Update size */
ovl_file_modified(file);
- fdput(real);
-
+ }
out_unlock:
inode_unlock(inode);
@@ -453,11 +446,10 @@ static long ovl_fallocate(struct file *file, int mode, loff_t offset, loff_t len
static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
{
- struct fderr real;
+ CLASS(fd_real, real)(file);
const struct cred *old_cred;
int ret;
- real = ovl_real_fdget(file);
if (fd_empty(real))
return fd_error(real);
@@ -465,8 +457,6 @@ static int ovl_fadvise(struct file *file, loff_t offset, loff_t len, int advice)
ret = vfs_fadvise(fd_file(real), offset, len, advice);
revert_creds(old_cred);
- fdput(real);
-
return ret;
}
@@ -481,7 +471,6 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
loff_t len, unsigned int flags, enum ovl_copyop op)
{
struct inode *inode_out = file_inode(file_out);
- struct fderr real_in, real_out;
const struct cred *old_cred;
loff_t ret;
@@ -494,15 +483,16 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
goto out_unlock;
}
- real_out = ovl_real_fdget(file_out);
+ {
+
+ CLASS(fd_real, real_out)(file_out);
if (fd_empty(real_out)) {
ret = fd_error(real_out);
goto out_unlock;
}
- real_in = ovl_real_fdget(file_in);
+ CLASS(fd_real, real_in)(file_in);
if (fd_empty(real_in)) {
- fdput(real_out);
ret = fd_error(real_in);
goto out_unlock;
}
@@ -530,8 +520,7 @@ static loff_t ovl_copyfile(struct file *file_in, loff_t pos_in,
/* Update size */
ovl_file_modified(file_out);
- fdput(real_in);
- fdput(real_out);
+ }
out_unlock:
inode_unlock(inode_out);
@@ -576,11 +565,10 @@ static loff_t ovl_remap_file_range(struct file *file_in, loff_t pos_in,
static int ovl_flush(struct file *file, fl_owner_t id)
{
- struct fderr real;
+ CLASS(fd_real, real)(file);
const struct cred *old_cred;
int err = 0;
- real = ovl_real_fdget(file);
if (fd_empty(real))
return fd_error(real);
@@ -589,8 +577,6 @@ static int ovl_flush(struct file *file, fl_owner_t id)
err = fd_file(real)->f_op->flush(fd_file(real), id);
revert_creds(old_cred);
}
- fdput(real);
-
return err;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
2024-07-30 5:15 ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
@ 2024-07-30 19:10 ` Josef Bacik
2024-07-30 21:12 ` Al Viro
2024-08-07 10:23 ` Christian Brauner
1 sibling, 1 reply; 134+ messages in thread
From: Josef Bacik @ 2024-07-30 19:10 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue, Jul 30, 2024 at 01:15:54AM -0400, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> There are four places where we end up adding an extra scope
> covering just the range from constructor to destructor;
> not sure if that's the best way to handle that.
>
> The functions in question are ovl_write_iter(), ovl_splice_write(),
> ovl_fadvise() and ovl_copyfile().
>
> This is very likely *NOT* the final form of that thing - it
> needs to be discussed.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> fs/overlayfs/file.c | 72 ++++++++++++++++++---------------------------
> 1 file changed, 29 insertions(+), 43 deletions(-)
>
> diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
> index 4b9e145bc7b8..a2911c632137 100644
> --- a/fs/overlayfs/file.c
> +++ b/fs/overlayfs/file.c
> @@ -132,6 +132,8 @@ static struct fderr ovl_real_fdget(const struct file *file)
> return ovl_real_fdget_meta(file, false);
> }
>
> +DEFINE_CLASS(fd_real, struct fderr, fdput(_T), ovl_real_fdget(file), struct file *file)
> +
> static int ovl_open(struct inode *inode, struct file *file)
> {
> struct dentry *dentry = file_dentry(file);
> @@ -174,7 +176,6 @@ static int ovl_release(struct inode *inode, struct file *file)
> static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
> {
> struct inode *inode = file_inode(file);
> - struct fderr real;
> const struct cred *old_cred;
> loff_t ret;
>
> @@ -190,7 +191,7 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
> return vfs_setpos(file, 0, 0);
> }
>
> - real = ovl_real_fdget(file);
> + CLASS(fd_real, real)(file);
> if (fd_empty(real))
> return fd_error(real);
>
> @@ -211,8 +212,6 @@ static loff_t ovl_llseek(struct file *file, loff_t offset, int whence)
> file->f_pos = fd_file(real)->f_pos;
> ovl_inode_unlock(inode);
>
> - fdput(real);
> -
> return ret;
> }
>
> @@ -253,8 +252,6 @@ static void ovl_file_accessed(struct file *file)
> static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> {
> struct file *file = iocb->ki_filp;
> - struct fderr real;
> - ssize_t ret;
> struct backing_file_ctx ctx = {
> .cred = ovl_creds(file_inode(file)->i_sb),
> .user_file = file,
> @@ -264,22 +261,18 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
> if (!iov_iter_count(iter))
> return 0;
>
> - real = ovl_real_fdget(file);
> + CLASS(fd_real, real)(file);
> if (fd_empty(real))
> return fd_error(real);
>
> - ret = backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
> - &ctx);
> - fdput(real);
> -
> - return ret;
> + return backing_file_read_iter(fd_file(real), iter, iocb, iocb->ki_flags,
> + &ctx);
> }
>
> static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> {
> struct file *file = iocb->ki_filp;
> struct inode *inode = file_inode(file);
> - struct fderr real;
> ssize_t ret;
> int ifl = iocb->ki_flags;
> struct backing_file_ctx ctx = {
> @@ -295,7 +288,9 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> /* Update mode */
> ovl_copyattr(inode);
>
> - real = ovl_real_fdget(file);
> + {
Is this what we want to do from a code cleanliness standpoint? This feels
pretty ugly to me, I feal like it would be better to have something like
scoped_class(fd_real, real) {
// code
}
rather than the {} at the same indent level as the underlying block.
I don't feel super strongly about this, but I do feel like we need to either
explicitly say "this is the way/an acceptable way to do this" from a code
formatting standpoint, or we need to come up with a cleaner way of representing
the scoped area. Thanks,
Josef
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
2024-07-30 19:10 ` Josef Bacik
@ 2024-07-30 21:12 ` Al Viro
2024-07-31 21:11 ` Josef Bacik
0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-07-30 21:12 UTC (permalink / raw)
To: Josef Bacik
Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue, Jul 30, 2024 at 03:10:25PM -0400, Josef Bacik wrote:
> On Tue, Jul 30, 2024 at 01:15:54AM -0400, viro@kernel.org wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > There are four places where we end up adding an extra scope
> > covering just the range from constructor to destructor;
> > not sure if that's the best way to handle that.
> >
> > The functions in question are ovl_write_iter(), ovl_splice_write(),
> > ovl_fadvise() and ovl_copyfile().
> >
> > This is very likely *NOT* the final form of that thing - it
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > needs to be discussed.
> Is this what we want to do from a code cleanliness standpoint? This feels
> pretty ugly to me, I feal like it would be better to have something like
>
> scoped_class(fd_real, real) {
> // code
> }
>
> rather than the {} at the same indent level as the underlying block.
>
> I don't feel super strongly about this, but I do feel like we need to either
> explicitly say "this is the way/an acceptable way to do this" from a code
> formatting standpoint, or we need to come up with a cleaner way of representing
> the scoped area.
That's a bit painful in these cases - sure, we can do something like
scoped_class(fd_real, real)(file) {
if (fd_empty(fd_real)) {
ret = fd_error(real);
break;
}
old_cred = ovl_override_creds(file_inode(file)->i_sb);
ret = vfs_fallocate(fd_file(real), mode, offset, len);
revert_creds(old_cred);
/* Update size */
ovl_file_modified(file);
}
but that use of break would need to be documented. And IMO anything like
scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
&task->signal->cred_guard_mutex) {
is just distasteful ;-/ Control flow should _not_ be hidden that way;
it's hard on casual reader.
The variant I'd put in there is obviously not suitable for merge - we need
something else, the question is what that something should be...
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
2024-07-30 21:12 ` Al Viro
@ 2024-07-31 21:11 ` Josef Bacik
0 siblings, 0 replies; 134+ messages in thread
From: Josef Bacik @ 2024-07-31 21:11 UTC (permalink / raw)
To: Al Viro
Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue, Jul 30, 2024 at 10:12:25PM +0100, Al Viro wrote:
> On Tue, Jul 30, 2024 at 03:10:25PM -0400, Josef Bacik wrote:
> > On Tue, Jul 30, 2024 at 01:15:54AM -0400, viro@kernel.org wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > There are four places where we end up adding an extra scope
> > > covering just the range from constructor to destructor;
> > > not sure if that's the best way to handle that.
> > >
> > > The functions in question are ovl_write_iter(), ovl_splice_write(),
> > > ovl_fadvise() and ovl_copyfile().
> > >
> > > This is very likely *NOT* the final form of that thing - it
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > needs to be discussed.
>
Fair, I think I misunderstood what you were unhappy with in that code.
> > Is this what we want to do from a code cleanliness standpoint? This feels
> > pretty ugly to me, I feal like it would be better to have something like
> >
> > scoped_class(fd_real, real) {
> > // code
> > }
> >
> > rather than the {} at the same indent level as the underlying block.
> >
> > I don't feel super strongly about this, but I do feel like we need to either
> > explicitly say "this is the way/an acceptable way to do this" from a code
> > formatting standpoint, or we need to come up with a cleaner way of representing
> > the scoped area.
>
> That's a bit painful in these cases - sure, we can do something like
> scoped_class(fd_real, real)(file) {
> if (fd_empty(fd_real)) {
> ret = fd_error(real);
> break;
> }
> old_cred = ovl_override_creds(file_inode(file)->i_sb);
> ret = vfs_fallocate(fd_file(real), mode, offset, len);
> revert_creds(old_cred);
>
> /* Update size */
> ovl_file_modified(file);
> }
> but that use of break would need to be documented. And IMO anything like
> scoped_cond_guard (mutex_intr, return -ERESTARTNOINTR,
> &task->signal->cred_guard_mutex) {
> is just distasteful ;-/ Control flow should _not_ be hidden that way;
> it's hard on casual reader.
>
> The variant I'd put in there is obviously not suitable for merge - we need
> something else, the question is what that something should be...
I went and looked at our c++ codebase to see what they do here, and it appears
that this is the accepted norm for this style of scoped variables
{
CLASS(fd_real, real_out)(file_out);
// blah blah
}
Looking at our code guidelines this appears to be the widely accepted norm, and
I don't hate it. I feel like this is more readable than the scoped_class()
idea, and is honestly the cleanest solution. Thanks,
Josef
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...)
2024-07-30 5:15 ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
2024-07-30 19:10 ` Josef Bacik
@ 2024-08-07 10:23 ` Christian Brauner
1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:23 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:54AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> There are four places where we end up adding an extra scope
> covering just the range from constructor to destructor;
> not sure if that's the best way to handle that.
I think it's fine and not worth obsessing about it.
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 09/39] timerfd: switch to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (6 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 08/39] experimental: convert fs/overlayfs/file.c to CLASS(...) viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:24 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
` (30 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Fold timerfd_fget() into both callers to have fdget() and fdput() in
the same scope. Could be done in different ways, but this is probably
the smallest solution.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/timerfd.c | 40 ++++++++++++++--------------------------
1 file changed, 14 insertions(+), 26 deletions(-)
diff --git a/fs/timerfd.c b/fs/timerfd.c
index 137523e0bb21..4c32244b0508 100644
--- a/fs/timerfd.c
+++ b/fs/timerfd.c
@@ -394,19 +394,6 @@ static const struct file_operations timerfd_fops = {
.unlocked_ioctl = timerfd_ioctl,
};
-static int timerfd_fget(int fd, struct fd *p)
-{
- struct fd f = fdget(fd);
- if (!fd_file(f))
- return -EBADF;
- if (fd_file(f)->f_op != &timerfd_fops) {
- fdput(f);
- return -EINVAL;
- }
- *p = f;
- return 0;
-}
-
SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags)
{
int ufd;
@@ -471,7 +458,6 @@ static int do_timerfd_settime(int ufd, int flags,
const struct itimerspec64 *new,
struct itimerspec64 *old)
{
- struct fd f;
struct timerfd_ctx *ctx;
int ret;
@@ -479,15 +465,17 @@ static int do_timerfd_settime(int ufd, int flags,
!itimerspec64_valid(new))
return -EINVAL;
- ret = timerfd_fget(ufd, &f);
- if (ret)
- return ret;
+ CLASS(fd, f)(ufd);
+ if (fd_empty(f))
+ return -EBADF;
+
+ if (fd_file(f)->f_op != &timerfd_fops)
+ return -EINVAL;
+
ctx = fd_file(f)->private_data;
- if (isalarm(ctx) && !capable(CAP_WAKE_ALARM)) {
- fdput(f);
+ if (isalarm(ctx) && !capable(CAP_WAKE_ALARM))
return -EPERM;
- }
timerfd_setup_cancel(ctx, flags);
@@ -535,17 +523,18 @@ static int do_timerfd_settime(int ufd, int flags,
ret = timerfd_setup(ctx, flags, new);
spin_unlock_irq(&ctx->wqh.lock);
- fdput(f);
return ret;
}
static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
{
- struct fd f;
struct timerfd_ctx *ctx;
- int ret = timerfd_fget(ufd, &f);
- if (ret)
- return ret;
+ CLASS(fd, f)(ufd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ if (fd_file(f)->f_op != &timerfd_fops)
+ return -EINVAL;
ctx = fd_file(f)->private_data;
spin_lock_irq(&ctx->wqh.lock);
@@ -567,7 +556,6 @@ static int do_timerfd_gettime(int ufd, struct itimerspec64 *t)
t->it_value = ktime_to_timespec64(timerfd_get_remaining(ctx));
t->it_interval = ktime_to_timespec64(ctx->tintv);
spin_unlock_irq(&ctx->wqh.lock);
- fdput(f);
return 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 09/39] timerfd: switch to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
@ 2024-08-07 10:24 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:24 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:55AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Fold timerfd_fget() into both callers to have fdget() and fdput() in
> the same scope. Could be done in different ways, but this is probably
> the smallest solution.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (7 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 09/39] timerfd: switch to CLASS(fd, ...) viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:25 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
` (29 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Lift fdget() and fdput() out of perf_fget_light(), turning it into
is_perf_file(struct fd f). The life gets easier in both callers
if we do fdget() unconditionally, including the case when we are
given -1 instead of a descriptor - that avoids a reassignment in
perf_event_open(2) and it avoids a nasty temptation in _perf_ioctl()
where we must *not* lift output_event out of scope for output.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
kernel/events/core.c | 49 +++++++++++++++-----------------------------
1 file changed, 16 insertions(+), 33 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fd2ac9c7fd77..dae815c30514 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5895,18 +5895,9 @@ EXPORT_SYMBOL_GPL(perf_event_period);
static const struct file_operations perf_fops;
-static inline int perf_fget_light(int fd, struct fd *p)
+static inline bool is_perf_file(struct fd f)
{
- struct fd f = fdget(fd);
- if (!fd_file(f))
- return -EBADF;
-
- if (fd_file(f)->f_op != &perf_fops) {
- fdput(f);
- return -EBADF;
- }
- *p = f;
- return 0;
+ return !fd_empty(f) && fd_file(f)->f_op == &perf_fops;
}
static int perf_event_set_output(struct perf_event *event,
@@ -5954,20 +5945,14 @@ static long _perf_ioctl(struct perf_event *event, unsigned int cmd, unsigned lon
case PERF_EVENT_IOC_SET_OUTPUT:
{
- int ret;
+ CLASS(fd, output)(arg); // arg == -1 => empty
+ struct perf_event *output_event = NULL;
if (arg != -1) {
- struct perf_event *output_event;
- struct fd output;
- ret = perf_fget_light(arg, &output);
- if (ret)
- return ret;
+ if (!is_perf_file(output))
+ return -EBADF;
output_event = fd_file(output)->private_data;
- ret = perf_event_set_output(event, output_event);
- fdput(output);
- } else {
- ret = perf_event_set_output(event, NULL);
}
- return ret;
+ return perf_event_set_output(event, output_event);
}
case PERF_EVENT_IOC_SET_FILTER:
@@ -12474,7 +12459,6 @@ SYSCALL_DEFINE5(perf_event_open,
struct perf_event_attr attr;
struct perf_event_context *ctx;
struct file *event_file = NULL;
- struct fd group = EMPTY_FD;
struct task_struct *task = NULL;
struct pmu *pmu;
int event_fd;
@@ -12545,10 +12529,12 @@ SYSCALL_DEFINE5(perf_event_open,
if (event_fd < 0)
return event_fd;
+ CLASS(fd, group)(group_fd); // group_fd == -1 => empty
if (group_fd != -1) {
- err = perf_fget_light(group_fd, &group);
- if (err)
+ if (!is_perf_file(group)) {
+ err = -EBADF;
goto err_fd;
+ }
group_leader = fd_file(group)->private_data;
if (flags & PERF_FLAG_FD_OUTPUT)
output_event = group_leader;
@@ -12560,7 +12546,7 @@ SYSCALL_DEFINE5(perf_event_open,
task = find_lively_task_by_vpid(pid);
if (IS_ERR(task)) {
err = PTR_ERR(task);
- goto err_group_fd;
+ goto err_fd;
}
}
@@ -12827,12 +12813,11 @@ SYSCALL_DEFINE5(perf_event_open,
mutex_unlock(¤t->perf_event_mutex);
/*
- * Drop the reference on the group_event after placing the
- * new event on the sibling_list. This ensures destruction
- * of the group leader will find the pointer to itself in
- * perf_group_detach().
+ * File reference in group guarantees that group_leader has been
+ * kept alive until we place the new event on the sibling_list.
+ * This ensures destruction of the group leader will find
+ * the pointer to itself in perf_group_detach().
*/
- fdput(group);
fd_install(event_fd, event_file);
return event_fd;
@@ -12851,8 +12836,6 @@ SYSCALL_DEFINE5(perf_event_open,
err_task:
if (task)
put_task_struct(task);
-err_group_fd:
- fdput(group);
err_fd:
put_unused_fd(event_fd);
return err;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd)
2024-07-30 5:15 ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
@ 2024-08-07 10:25 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:25 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:56AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Lift fdget() and fdput() out of perf_fget_light(), turning it into
> is_perf_file(struct fd f). The life gets easier in both callers
> if we do fdget() unconditionally, including the case when we are
> given -1 instead of a descriptor - that avoids a reassignment in
> perf_event_open(2) and it avoids a nasty temptation in _perf_ioctl()
> where we must *not* lift output_event out of scope for output.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (8 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 10/39] get rid of perf_fget_light(), convert kernel/events/core.c to CLASS(fd) viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:26 ` Christian Brauner
2024-07-30 5:15 ` [PATCH 12/39] do_mq_notify(): saner skb freeing on failures viro
` (28 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/linux/netlink.h | 2 +-
ipc/mqueue.c | 8 +-------
net/netlink/af_netlink.c | 9 +++++++--
3 files changed, 9 insertions(+), 10 deletions(-)
diff --git a/include/linux/netlink.h b/include/linux/netlink.h
index b332c2048c75..a48a30842d84 100644
--- a/include/linux/netlink.h
+++ b/include/linux/netlink.h
@@ -239,7 +239,7 @@ int netlink_register_notifier(struct notifier_block *nb);
int netlink_unregister_notifier(struct notifier_block *nb);
/* finegrained unicast helpers: */
-struct sock *netlink_getsockbyfilp(struct file *filp);
+struct sock *netlink_getsockbyfd(int fd);
int netlink_attachskb(struct sock *sk, struct sk_buff *skb,
long *timeo, struct sock *ssk);
void netlink_detachskb(struct sock *sk, struct sk_buff *skb);
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 34fa0bd8bb11..fd05e3d4f7b6 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1355,13 +1355,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
skb_put(nc, NOTIFY_COOKIE_LEN);
/* and attach it to the socket */
retry:
- f = fdget(notification->sigev_signo);
- if (!fd_file(f)) {
- ret = -EBADF;
- goto out;
- }
- sock = netlink_getsockbyfilp(fd_file(f));
- fdput(f);
+ sock = netlink_getsockbyfd(notification->sigev_signo);
if (IS_ERR(sock)) {
ret = PTR_ERR(sock);
goto free_skb;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 0b7a89db3ab7..42451ac355d0 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1180,11 +1180,16 @@ static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid)
return sock;
}
-struct sock *netlink_getsockbyfilp(struct file *filp)
+struct sock *netlink_getsockbyfd(int fd)
{
- struct inode *inode = file_inode(filp);
+ CLASS(fd, f)(fd);
+ struct inode *inode;
struct sock *sock;
+ if (fd_empty(f))
+ return ERR_PTR(-EBADF);
+
+ inode = file_inode(fd_file(f));
if (!S_ISSOCK(inode->i_mode))
return ERR_PTR(-ENOTSOCK);
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor
2024-07-30 5:15 ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
@ 2024-08-07 10:26 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:26 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:57AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 12/39] do_mq_notify(): saner skb freeing on failures
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (9 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 11/39] switch netlink_getsockbyfilp() to taking descriptor viro
@ 2024-07-30 5:15 ` viro
2024-07-30 5:15 ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
` (27 subsequent siblings)
38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
cleanup is convoluted enough as it is; it's easier to have early
failure outs do explicit kfree_skb(nc), rather than going to
convolutions needed to reuse the cleanup from late failures.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
ipc/mqueue.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index fd05e3d4f7b6..48640a362637 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1347,8 +1347,8 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
if (copy_from_user(nc->data,
notification->sigev_value.sival_ptr,
NOTIFY_COOKIE_LEN)) {
- ret = -EFAULT;
- goto free_skb;
+ kfree_skb(nc);
+ return -EFAULT;
}
/* TODO: add a header? */
@@ -1357,16 +1357,14 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
retry:
sock = netlink_getsockbyfd(notification->sigev_signo);
if (IS_ERR(sock)) {
- ret = PTR_ERR(sock);
- goto free_skb;
+ kfree_skb(nc);
+ return PTR_ERR(sock);
}
timeo = MAX_SCHEDULE_TIMEOUT;
ret = netlink_attachskb(sock, nc, &timeo, NULL);
- if (ret == 1) {
- sock = NULL;
+ if (ret == 1)
goto retry;
- }
if (ret)
return ret;
}
@@ -1425,10 +1423,6 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
out:
if (sock)
netlink_detachskb(sock, nc);
- else
-free_skb:
- dev_kfree_skb(nc);
-
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (10 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 12/39] do_mq_notify(): saner skb freeing on failures viro
@ 2024-07-30 5:15 ` viro
2024-08-07 10:27 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 14/39] simplify xfs_find_handle() a bit viro
` (26 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:15 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
semi-simple. The only failure exit before fdget() is a return, the
only thing done after fdput() is transposable with it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
ipc/mqueue.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 48640a362637..4f1dec518fae 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1317,7 +1317,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
{
int ret;
- struct fd f;
struct sock *sock;
struct inode *inode;
struct mqueue_inode_info *info;
@@ -1370,8 +1369,8 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
}
}
- f = fdget(mqdes);
- if (!fd_file(f)) {
+ CLASS(fd, f)(mqdes);
+ if (fd_empty(f)) {
ret = -EBADF;
goto out;
}
@@ -1379,7 +1378,7 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
inode = file_inode(fd_file(f));
if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
ret = -EBADF;
- goto out_fput;
+ goto out;
}
info = MQUEUE_I(inode);
@@ -1418,8 +1417,6 @@ static int do_mq_notify(mqd_t mqdes, const struct sigevent *notification)
inode_set_atime_to_ts(inode, inode_set_ctime_current(inode));
}
spin_unlock(&info->lock);
-out_fput:
- fdput(f);
out:
if (sock)
netlink_detachskb(sock, nc);
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
@ 2024-08-07 10:27 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:27 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:15:59AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> semi-simple. The only failure exit before fdget() is a return, the
> only thing done after fdput() is transposable with it.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 14/39] simplify xfs_find_handle() a bit
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (11 preceding siblings ...)
2024-07-30 5:15 ` [PATCH 13/39] do_mq_notify(): switch to CLASS(fd, ...) viro
@ 2024-07-30 5:16 ` viro
2024-07-30 5:16 ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
` (25 subsequent siblings)
38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
XFS_IOC_FD_TO_HANDLE can grab a reference to copied ->f_path and
let the file go; results in simpler control flow - cleanup is
the same for both "by descriptor" and "by pathname" cases.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/xfs/xfs_handle.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index 49e5e5f04e60..f19fce557354 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -85,22 +85,23 @@ xfs_find_handle(
int hsize;
xfs_handle_t handle;
struct inode *inode;
- struct fd f = EMPTY_FD;
struct path path;
int error;
struct xfs_inode *ip;
if (cmd == XFS_IOC_FD_TO_HANDLE) {
- f = fdget(hreq->fd);
- if (!fd_file(f))
+ CLASS(fd, f)(hreq->fd);
+
+ if (fd_empty(f))
return -EBADF;
- inode = file_inode(fd_file(f));
+ path = fd_file(f)->f_path;
+ path_get(&path);
} else {
error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
if (error)
return error;
- inode = d_inode(path.dentry);
}
+ inode = d_inode(path.dentry);
ip = XFS_I(inode);
/*
@@ -134,10 +135,7 @@ xfs_find_handle(
error = 0;
out_put:
- if (cmd == XFS_IOC_FD_TO_HANDLE)
- fdput(f);
- else
- path_put(&path);
+ path_put(&path);
return error;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* [PATCH 15/39] convert vmsplice() to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (12 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 14/39] simplify xfs_find_handle() a bit viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:27 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 16/39] convert __bpf_prog_get() " viro
` (24 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Irregularity here is fdput() not in the same scope as fdget(); we could
just lift it out vmsplice_type() in vmsplice(2), but there's no much
point keeping vmsplice_type() separate after that...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/splice.c | 33 ++++++++++-----------------------
1 file changed, 10 insertions(+), 23 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index 06232d7e505f..29cd39d7f4a0 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1564,21 +1564,6 @@ static ssize_t vmsplice_to_pipe(struct file *file, struct iov_iter *iter,
return ret;
}
-static int vmsplice_type(struct fd f, int *type)
-{
- if (!fd_file(f))
- return -EBADF;
- if (fd_file(f)->f_mode & FMODE_WRITE) {
- *type = ITER_SOURCE;
- } else if (fd_file(f)->f_mode & FMODE_READ) {
- *type = ITER_DEST;
- } else {
- fdput(f);
- return -EBADF;
- }
- return 0;
-}
-
/*
* Note that vmsplice only really supports true splicing _from_ user memory
* to a pipe, not the other way around. Splicing from user memory is a simple
@@ -1602,21 +1587,25 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
struct iovec *iov = iovstack;
struct iov_iter iter;
ssize_t error;
- struct fd f;
int type;
if (unlikely(flags & ~SPLICE_F_ALL))
return -EINVAL;
- f = fdget(fd);
- error = vmsplice_type(f, &type);
- if (error)
- return error;
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
+ return -EBADF;
+ if (fd_file(f)->f_mode & FMODE_WRITE)
+ type = ITER_SOURCE;
+ else if (fd_file(f)->f_mode & FMODE_READ)
+ type = ITER_DEST;
+ else
+ return -EBADF;
error = import_iovec(type, uiov, nr_segs,
ARRAY_SIZE(iovstack), &iov, &iter);
if (error < 0)
- goto out_fdput;
+ return error;
if (!iov_iter_count(&iter))
error = 0;
@@ -1626,8 +1615,6 @@ SYSCALL_DEFINE4(vmsplice, int, fd, const struct iovec __user *, uiov,
error = vmsplice_to_user(fd_file(f), &iter, flags);
kfree(iov);
-out_fdput:
- fdput(f);
return error;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 15/39] convert vmsplice() to CLASS(fd, ...)
2024-07-30 5:16 ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
@ 2024-08-07 10:27 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:27 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:01AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Irregularity here is fdput() not in the same scope as fdget(); we could
> just lift it out vmsplice_type() in vmsplice(2), but there's no much
> point keeping vmsplice_type() separate after that...
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (13 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 15/39] convert vmsplice() to CLASS(fd, ...) viro
@ 2024-07-30 5:16 ` viro
2024-08-06 21:08 ` Andrii Nakryiko
2024-08-07 10:28 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
` (23 subsequent siblings)
38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Irregularity here is fdput() not in the same scope as fdget();
just fold ____bpf_prog_get() into its (only) caller and that's
it...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
kernel/bpf/syscall.c | 32 +++++++++++---------------------
1 file changed, 11 insertions(+), 21 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 3093bf2cc266..c5b252c0faa8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2407,18 +2407,6 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
O_RDWR | O_CLOEXEC);
}
-static struct bpf_prog *____bpf_prog_get(struct fd f)
-{
- if (!fd_file(f))
- return ERR_PTR(-EBADF);
- if (fd_file(f)->f_op != &bpf_prog_fops) {
- fdput(f);
- return ERR_PTR(-EINVAL);
- }
-
- return fd_file(f)->private_data;
-}
-
void bpf_prog_add(struct bpf_prog *prog, int i)
{
atomic64_add(i, &prog->aux->refcnt);
@@ -2474,20 +2462,22 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
bool attach_drv)
{
- struct fd f = fdget(ufd);
+ CLASS(fd, f)(ufd);
struct bpf_prog *prog;
- prog = ____bpf_prog_get(f);
- if (IS_ERR(prog))
+ if (fd_empty(f))
+ return ERR_PTR(-EBADF);
+ if (fd_file(f)->f_op != &bpf_prog_fops)
+ return ERR_PTR(-EINVAL);
+
+ prog = fd_file(f)->private_data;
+ if (IS_ERR(prog)) // can that actually happen?
return prog;
- if (!bpf_prog_get_ok(prog, attach_type, attach_drv)) {
- prog = ERR_PTR(-EINVAL);
- goto out;
- }
+
+ if (!bpf_prog_get_ok(prog, attach_type, attach_drv))
+ return ERR_PTR(-EINVAL);
bpf_prog_inc(prog);
-out:
- fdput(f);
return prog;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
2024-07-30 5:16 ` [PATCH 16/39] convert __bpf_prog_get() " viro
@ 2024-08-06 21:08 ` Andrii Nakryiko
2024-08-07 10:28 ` Christian Brauner
1 sibling, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 21:08 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Mon, Jul 29, 2024 at 10:19 PM <viro@kernel.org> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Irregularity here is fdput() not in the same scope as fdget();
> just fold ____bpf_prog_get() into its (only) caller and that's
> it...
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> kernel/bpf/syscall.c | 32 +++++++++++---------------------
> 1 file changed, 11 insertions(+), 21 deletions(-)
>
Folding makes total sense, the logic lgtm (though I find CLASS(fd,
f)(ufd) utterly non-intuitive naming-wise). Extra IS_ERR(prog) check
should be dropped though, see below.
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 3093bf2cc266..c5b252c0faa8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2407,18 +2407,6 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
> O_RDWR | O_CLOEXEC);
> }
>
> -static struct bpf_prog *____bpf_prog_get(struct fd f)
> -{
> - if (!fd_file(f))
> - return ERR_PTR(-EBADF);
> - if (fd_file(f)->f_op != &bpf_prog_fops) {
> - fdput(f);
> - return ERR_PTR(-EINVAL);
> - }
> -
> - return fd_file(f)->private_data;
> -}
> -
> void bpf_prog_add(struct bpf_prog *prog, int i)
> {
> atomic64_add(i, &prog->aux->refcnt);
> @@ -2474,20 +2462,22 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
> static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
> bool attach_drv)
> {
> - struct fd f = fdget(ufd);
> + CLASS(fd, f)(ufd);
> struct bpf_prog *prog;
>
> - prog = ____bpf_prog_get(f);
> - if (IS_ERR(prog))
> + if (fd_empty(f))
> + return ERR_PTR(-EBADF);
> + if (fd_file(f)->f_op != &bpf_prog_fops)
> + return ERR_PTR(-EINVAL);
> +
> + prog = fd_file(f)->private_data;
> + if (IS_ERR(prog)) // can that actually happen?
no, it can't, private_data will always be a valid pointer, otherwise
that file would never be successfully created
> return prog;
> - if (!bpf_prog_get_ok(prog, attach_type, attach_drv)) {
> - prog = ERR_PTR(-EINVAL);
> - goto out;
> - }
> +
> + if (!bpf_prog_get_ok(prog, attach_type, attach_drv))
> + return ERR_PTR(-EINVAL);
>
> bpf_prog_inc(prog);
> -out:
> - fdput(f);
> return prog;
> }
>
> --
> 2.39.2
>
>
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
2024-07-30 5:16 ` [PATCH 16/39] convert __bpf_prog_get() " viro
2024-08-06 21:08 ` Andrii Nakryiko
@ 2024-08-07 10:28 ` Christian Brauner
1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:28 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:02AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Irregularity here is fdput() not in the same scope as fdget();
> just fold ____bpf_prog_get() into its (only) caller and that's
> it...
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> kernel/bpf/syscall.c | 32 +++++++++++---------------------
> 1 file changed, 11 insertions(+), 21 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 3093bf2cc266..c5b252c0faa8 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2407,18 +2407,6 @@ int bpf_prog_new_fd(struct bpf_prog *prog)
> O_RDWR | O_CLOEXEC);
> }
>
> -static struct bpf_prog *____bpf_prog_get(struct fd f)
> -{
> - if (!fd_file(f))
> - return ERR_PTR(-EBADF);
> - if (fd_file(f)->f_op != &bpf_prog_fops) {
> - fdput(f);
> - return ERR_PTR(-EINVAL);
> - }
> -
> - return fd_file(f)->private_data;
> -}
> -
> void bpf_prog_add(struct bpf_prog *prog, int i)
> {
> atomic64_add(i, &prog->aux->refcnt);
> @@ -2474,20 +2462,22 @@ bool bpf_prog_get_ok(struct bpf_prog *prog,
> static struct bpf_prog *__bpf_prog_get(u32 ufd, enum bpf_prog_type *attach_type,
> bool attach_drv)
> {
> - struct fd f = fdget(ufd);
> + CLASS(fd, f)(ufd);
> struct bpf_prog *prog;
>
> - prog = ____bpf_prog_get(f);
> - if (IS_ERR(prog))
> + if (fd_empty(f))
> + return ERR_PTR(-EBADF);
> + if (fd_file(f)->f_op != &bpf_prog_fops)
> + return ERR_PTR(-EINVAL);
> +
> + prog = fd_file(f)->private_data;
> + if (IS_ERR(prog)) // can that actually happen?
> return prog;
With or without that removed:
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (14 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 16/39] convert __bpf_prog_get() " viro
@ 2024-07-30 5:16 ` viro
2024-08-06 22:32 ` Andrii Nakryiko
2024-07-30 5:16 ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
` (22 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Equivalent transformation. For one thing, it's easier to follow that way.
For another, that simplifies the control flow in the vicinity of struct fd
handling in there, which will allow a switch to CLASS(fd) and make the
thing much easier to verify wrt leaks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
1 file changed, 172 insertions(+), 170 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4cb5441ad75f..d54f61e12062 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18414,11 +18414,180 @@ static bool bpf_map_is_cgroup_storage(struct bpf_map *map)
*
* NOTE: btf_vmlinux is required for converting pseudo btf_id.
*/
+static int do_one_ldimm64(struct bpf_verifier_env *env,
+ struct bpf_insn *insn,
+ struct bpf_insn_aux_data *aux)
+{
+ struct bpf_map *map;
+ struct fd f;
+ u64 addr;
+ u32 fd;
+ int err;
+
+ if (insn[0].src_reg == 0)
+ /* valid generic load 64-bit imm */
+ return 0;
+
+ if (insn[0].src_reg == BPF_PSEUDO_BTF_ID)
+ return check_pseudo_btf_id(env, insn, aux);
+
+ if (insn[0].src_reg == BPF_PSEUDO_FUNC) {
+ aux->ptr_type = PTR_TO_FUNC;
+ return 0;
+ }
+
+ /* In final convert_pseudo_ld_imm64() step, this is
+ * converted into regular 64-bit imm load insn.
+ */
+ switch (insn[0].src_reg) {
+ case BPF_PSEUDO_MAP_VALUE:
+ case BPF_PSEUDO_MAP_IDX_VALUE:
+ break;
+ case BPF_PSEUDO_MAP_FD:
+ case BPF_PSEUDO_MAP_IDX:
+ if (insn[1].imm == 0)
+ break;
+ fallthrough;
+ default:
+ verbose(env, "unrecognized bpf_ld_imm64 insn\n");
+ return -EINVAL;
+ }
+
+ switch (insn[0].src_reg) {
+ case BPF_PSEUDO_MAP_IDX_VALUE:
+ case BPF_PSEUDO_MAP_IDX:
+ if (bpfptr_is_null(env->fd_array)) {
+ verbose(env, "fd_idx without fd_array is invalid\n");
+ return -EPROTO;
+ }
+ if (copy_from_bpfptr_offset(&fd, env->fd_array,
+ insn[0].imm * sizeof(fd),
+ sizeof(fd)))
+ return -EFAULT;
+ break;
+ default:
+ fd = insn[0].imm;
+ break;
+ }
+
+ f = fdget(fd);
+ map = __bpf_map_get(f);
+ if (IS_ERR(map)) {
+ verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
+ return PTR_ERR(map);
+ }
+
+ err = check_map_prog_compatibility(env, map, env->prog);
+ if (err) {
+ fdput(f);
+ return err;
+ }
+
+ if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
+ insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
+ addr = (unsigned long)map;
+ } else {
+ u32 off = insn[1].imm;
+
+ if (off >= BPF_MAX_VAR_OFF) {
+ verbose(env, "direct value offset of %u is not allowed\n", off);
+ fdput(f);
+ return -EINVAL;
+ }
+
+ if (!map->ops->map_direct_value_addr) {
+ verbose(env, "no direct value access support for this map type\n");
+ fdput(f);
+ return -EINVAL;
+ }
+
+ err = map->ops->map_direct_value_addr(map, &addr, off);
+ if (err) {
+ verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
+ map->value_size, off);
+ fdput(f);
+ return err;
+ }
+
+ aux->map_off = off;
+ addr += off;
+ }
+
+ insn[0].imm = (u32)addr;
+ insn[1].imm = addr >> 32;
+
+ /* check whether we recorded this map already */
+ for (int i = 0; i < env->used_map_cnt; i++) {
+ if (env->used_maps[i] == map) {
+ aux->map_index = i;
+ fdput(f);
+ return 0;
+ }
+ }
+
+ if (env->used_map_cnt >= MAX_USED_MAPS) {
+ verbose(env, "The total number of maps per program has reached the limit of %u\n",
+ MAX_USED_MAPS);
+ fdput(f);
+ return -E2BIG;
+ }
+
+ if (env->prog->sleepable)
+ atomic64_inc(&map->sleepable_refcnt);
+ /* hold the map. If the program is rejected by verifier,
+ * the map will be released by release_maps() or it
+ * will be used by the valid program until it's unloaded
+ * and all maps are released in bpf_free_used_maps()
+ */
+ bpf_map_inc(map);
+
+ aux->map_index = env->used_map_cnt;
+ env->used_maps[env->used_map_cnt++] = map;
+
+ if (bpf_map_is_cgroup_storage(map) &&
+ bpf_cgroup_storage_assign(env->prog->aux, map)) {
+ verbose(env, "only one cgroup storage of each type is allowed\n");
+ fdput(f);
+ return -EBUSY;
+ }
+ if (map->map_type == BPF_MAP_TYPE_ARENA) {
+ if (env->prog->aux->arena) {
+ verbose(env, "Only one arena per program\n");
+ fdput(f);
+ return -EBUSY;
+ }
+ if (!env->allow_ptr_leaks || !env->bpf_capable) {
+ verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
+ fdput(f);
+ return -EPERM;
+ }
+ if (!env->prog->jit_requested) {
+ verbose(env, "JIT is required to use arena\n");
+ fdput(f);
+ return -EOPNOTSUPP;
+ }
+ if (!bpf_jit_supports_arena()) {
+ verbose(env, "JIT doesn't support arena\n");
+ fdput(f);
+ return -EOPNOTSUPP;
+ }
+ env->prog->aux->arena = (void *)map;
+ if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
+ verbose(env, "arena's user address must be set via map_extra or mmap()\n");
+ fdput(f);
+ return -EINVAL;
+ }
+ }
+
+ fdput(f);
+ return 0;
+}
+
static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
{
struct bpf_insn *insn = env->prog->insnsi;
int insn_cnt = env->prog->len;
- int i, j, err;
+ int i, err;
err = bpf_prog_calc_tag(env->prog);
if (err)
@@ -18433,12 +18602,6 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
}
if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
- struct bpf_insn_aux_data *aux;
- struct bpf_map *map;
- struct fd f;
- u64 addr;
- u32 fd;
-
if (i == insn_cnt - 1 || insn[1].code != 0 ||
insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
insn[1].off != 0) {
@@ -18446,170 +18609,9 @@ static int resolve_pseudo_ldimm64(struct bpf_verifier_env *env)
return -EINVAL;
}
- if (insn[0].src_reg == 0)
- /* valid generic load 64-bit imm */
- goto next_insn;
-
- if (insn[0].src_reg == BPF_PSEUDO_BTF_ID) {
- aux = &env->insn_aux_data[i];
- err = check_pseudo_btf_id(env, insn, aux);
- if (err)
- return err;
- goto next_insn;
- }
-
- if (insn[0].src_reg == BPF_PSEUDO_FUNC) {
- aux = &env->insn_aux_data[i];
- aux->ptr_type = PTR_TO_FUNC;
- goto next_insn;
- }
-
- /* In final convert_pseudo_ld_imm64() step, this is
- * converted into regular 64-bit imm load insn.
- */
- switch (insn[0].src_reg) {
- case BPF_PSEUDO_MAP_VALUE:
- case BPF_PSEUDO_MAP_IDX_VALUE:
- break;
- case BPF_PSEUDO_MAP_FD:
- case BPF_PSEUDO_MAP_IDX:
- if (insn[1].imm == 0)
- break;
- fallthrough;
- default:
- verbose(env, "unrecognized bpf_ld_imm64 insn\n");
- return -EINVAL;
- }
-
- switch (insn[0].src_reg) {
- case BPF_PSEUDO_MAP_IDX_VALUE:
- case BPF_PSEUDO_MAP_IDX:
- if (bpfptr_is_null(env->fd_array)) {
- verbose(env, "fd_idx without fd_array is invalid\n");
- return -EPROTO;
- }
- if (copy_from_bpfptr_offset(&fd, env->fd_array,
- insn[0].imm * sizeof(fd),
- sizeof(fd)))
- return -EFAULT;
- break;
- default:
- fd = insn[0].imm;
- break;
- }
-
- f = fdget(fd);
- map = __bpf_map_get(f);
- if (IS_ERR(map)) {
- verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
- return PTR_ERR(map);
- }
-
- err = check_map_prog_compatibility(env, map, env->prog);
- if (err) {
- fdput(f);
+ err = do_one_ldimm64(env, insn, &env->insn_aux_data[i]);
+ if (err)
return err;
- }
-
- aux = &env->insn_aux_data[i];
- if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
- insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
- addr = (unsigned long)map;
- } else {
- u32 off = insn[1].imm;
-
- if (off >= BPF_MAX_VAR_OFF) {
- verbose(env, "direct value offset of %u is not allowed\n", off);
- fdput(f);
- return -EINVAL;
- }
-
- if (!map->ops->map_direct_value_addr) {
- verbose(env, "no direct value access support for this map type\n");
- fdput(f);
- return -EINVAL;
- }
-
- err = map->ops->map_direct_value_addr(map, &addr, off);
- if (err) {
- verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
- map->value_size, off);
- fdput(f);
- return err;
- }
-
- aux->map_off = off;
- addr += off;
- }
-
- insn[0].imm = (u32)addr;
- insn[1].imm = addr >> 32;
-
- /* check whether we recorded this map already */
- for (j = 0; j < env->used_map_cnt; j++) {
- if (env->used_maps[j] == map) {
- aux->map_index = j;
- fdput(f);
- goto next_insn;
- }
- }
-
- if (env->used_map_cnt >= MAX_USED_MAPS) {
- verbose(env, "The total number of maps per program has reached the limit of %u\n",
- MAX_USED_MAPS);
- fdput(f);
- return -E2BIG;
- }
-
- if (env->prog->sleepable)
- atomic64_inc(&map->sleepable_refcnt);
- /* hold the map. If the program is rejected by verifier,
- * the map will be released by release_maps() or it
- * will be used by the valid program until it's unloaded
- * and all maps are released in bpf_free_used_maps()
- */
- bpf_map_inc(map);
-
- aux->map_index = env->used_map_cnt;
- env->used_maps[env->used_map_cnt++] = map;
-
- if (bpf_map_is_cgroup_storage(map) &&
- bpf_cgroup_storage_assign(env->prog->aux, map)) {
- verbose(env, "only one cgroup storage of each type is allowed\n");
- fdput(f);
- return -EBUSY;
- }
- if (map->map_type == BPF_MAP_TYPE_ARENA) {
- if (env->prog->aux->arena) {
- verbose(env, "Only one arena per program\n");
- fdput(f);
- return -EBUSY;
- }
- if (!env->allow_ptr_leaks || !env->bpf_capable) {
- verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
- fdput(f);
- return -EPERM;
- }
- if (!env->prog->jit_requested) {
- verbose(env, "JIT is required to use arena\n");
- fdput(f);
- return -EOPNOTSUPP;
- }
- if (!bpf_jit_supports_arena()) {
- verbose(env, "JIT doesn't support arena\n");
- fdput(f);
- return -EOPNOTSUPP;
- }
- env->prog->aux->arena = (void *)map;
- if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
- verbose(env, "arena's user address must be set via map_extra or mmap()\n");
- fdput(f);
- return -EINVAL;
- }
- }
-
- fdput(f);
-next_insn:
insn++;
i++;
continue;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-07-30 5:16 ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
@ 2024-08-06 22:32 ` Andrii Nakryiko
2024-08-07 10:29 ` Christian Brauner
0 siblings, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 22:32 UTC (permalink / raw)
To: viro, bpf
Cc: linux-fsdevel, amir73il, brauner, cgroups, kvm, netdev, torvalds
On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Equivalent transformation. For one thing, it's easier to follow that way.
> For another, that simplifies the control flow in the vicinity of struct fd
> handling in there, which will allow a switch to CLASS(fd) and make the
> thing much easier to verify wrt leaks.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> 1 file changed, 172 insertions(+), 170 deletions(-)
>
This looks unnecessarily intrusive. I think it's best to extract the
logic of fetching and adding bpf_map by fd into a helper and that way
contain fdget + fdput logic nicely. Something like below, which I can
send to bpf-next.
commit b5eec08241cc0263e560551de91eda73ccc5987d
Author: Andrii Nakryiko <andrii@kernel.org>
Date: Tue Aug 6 14:31:34 2024 -0700
bpf: factor out fetching bpf_map from FD and adding it to used_maps list
Factor out the logic to extract bpf_map instances from FD embedded in
bpf_insns, adding it to the list of used_maps (unless it's already
there, in which case we just reuse map's index). This simplifies the
logic in resolve_pseudo_ldimm64(), especially around `struct fd`
handling, as all that is now neatly contained in the helper and doesn't
leak into a dozen error handling paths.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index df3be12096cf..14e4ef687a59 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
bpf_map *map)
map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
}
+/* Add map behind fd to used maps list, if it's not already there, and return
+ * its index. Also set *reused to true if this map was already in the list of
+ * used maps.
+ * Returns <0 on error, or >= 0 index, on success.
+ */
+static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
bool *reused)
+{
+ struct fd f = fdget(fd);
+ struct bpf_map *map;
+ int i;
+
+ map = __bpf_map_get(f);
+ if (IS_ERR(map)) {
+ verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
+ return PTR_ERR(map);
+ }
+
+ /* check whether we recorded this map already */
+ for (i = 0; i < env->used_map_cnt; i++) {
+ if (env->used_maps[i] == map) {
+ *reused = true;
+ fdput(f);
+ return i;
+ }
+ }
+
+ if (env->used_map_cnt >= MAX_USED_MAPS) {
+ verbose(env, "The total number of maps per program has
reached the limit of %u\n",
+ MAX_USED_MAPS);
+ fdput(f);
+ return -E2BIG;
+ }
+
+ if (env->prog->sleepable)
+ atomic64_inc(&map->sleepable_refcnt);
+
+ /* hold the map. If the program is rejected by verifier,
+ * the map will be released by release_maps() or it
+ * will be used by the valid program until it's unloaded
+ * and all maps are released in bpf_free_used_maps()
+ */
+ bpf_map_inc(map);
+
+ *reused = false;
+ env->used_maps[env->used_map_cnt++] = map;
+
+ fdput(f);
+
+ return env->used_map_cnt - 1;
+
+}
+
/* find and rewrite pseudo imm in ld_imm64 instructions:
*
* 1. if it accesses map FD, replace it with actual map pointer.
@@ -18876,7 +18928,7 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
{
struct bpf_insn *insn = env->prog->insnsi;
int insn_cnt = env->prog->len;
- int i, j, err;
+ int i, err;
err = bpf_prog_calc_tag(env->prog);
if (err)
@@ -18893,9 +18945,10 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
struct bpf_insn_aux_data *aux;
struct bpf_map *map;
- struct fd f;
+ int map_idx;
u64 addr;
u32 fd;
+ bool reused;
if (i == insn_cnt - 1 || insn[1].code != 0 ||
insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
@@ -18956,20 +19009,18 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
break;
}
- f = fdget(fd);
- map = __bpf_map_get(f);
- if (IS_ERR(map)) {
- verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
- return PTR_ERR(map);
- }
+ map_idx = add_used_map_from_fd(env, fd, &reused);
+ if (map_idx < 0)
+ return map_idx;
+ map = env->used_maps[map_idx];
+
+ aux = &env->insn_aux_data[i];
+ aux->map_index = map_idx;
err = check_map_prog_compatibility(env, map, env->prog);
- if (err) {
- fdput(f);
+ if (err)
return err;
- }
- aux = &env->insn_aux_data[i];
if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
addr = (unsigned long)map;
@@ -18978,13 +19029,11 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
if (off >= BPF_MAX_VAR_OFF) {
verbose(env, "direct value offset of %u is not
allowed\n", off);
- fdput(f);
return -EINVAL;
}
if (!map->ops->map_direct_value_addr) {
verbose(env, "no direct value access support for
this map type\n");
- fdput(f);
return -EINVAL;
}
@@ -18992,7 +19041,6 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
if (err) {
verbose(env, "invalid access to map value
pointer, value_size=%u off=%u\n",
map->value_size, off);
- fdput(f);
return err;
}
@@ -19003,70 +19051,39 @@ static int resolve_pseudo_ldimm64(struct
bpf_verifier_env *env)
insn[0].imm = (u32)addr;
insn[1].imm = addr >> 32;
- /* check whether we recorded this map already */
- for (j = 0; j < env->used_map_cnt; j++) {
- if (env->used_maps[j] == map) {
- aux->map_index = j;
- fdput(f);
- goto next_insn;
- }
- }
-
- if (env->used_map_cnt >= MAX_USED_MAPS) {
- verbose(env, "The total number of maps per program
has reached the limit of %u\n",
- MAX_USED_MAPS);
- fdput(f);
- return -E2BIG;
- }
-
- if (env->prog->sleepable)
- atomic64_inc(&map->sleepable_refcnt);
- /* hold the map. If the program is rejected by verifier,
- * the map will be released by release_maps() or it
- * will be used by the valid program until it's unloaded
- * and all maps are released in bpf_free_used_maps()
- */
- bpf_map_inc(map);
-
- aux->map_index = env->used_map_cnt;
- env->used_maps[env->used_map_cnt++] = map;
+ /* proceed with extra checks only if its newly added used map */
+ if (reused)
+ goto next_insn;
if (bpf_map_is_cgroup_storage(map) &&
bpf_cgroup_storage_assign(env->prog->aux, map)) {
verbose(env, "only one cgroup storage of each type is
allowed\n");
- fdput(f);
return -EBUSY;
}
if (map->map_type == BPF_MAP_TYPE_ARENA) {
if (env->prog->aux->arena) {
verbose(env, "Only one arena per program\n");
- fdput(f);
return -EBUSY;
}
if (!env->allow_ptr_leaks || !env->bpf_capable) {
verbose(env, "CAP_BPF and CAP_PERFMON are
required to use arena\n");
- fdput(f);
return -EPERM;
}
if (!env->prog->jit_requested) {
verbose(env, "JIT is required to use arena\n");
- fdput(f);
return -EOPNOTSUPP;
}
if (!bpf_jit_supports_arena()) {
verbose(env, "JIT doesn't support arena\n");
- fdput(f);
return -EOPNOTSUPP;
}
env->prog->aux->arena = (void *)map;
if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
verbose(env, "arena's user address must be set
via map_extra or mmap()\n");
- fdput(f);
return -EINVAL;
}
}
- fdput(f);
next_insn:
insn++;
i++;
[...]
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-06 22:32 ` Andrii Nakryiko
@ 2024-08-07 10:29 ` Christian Brauner
2024-08-07 15:30 ` Andrii Nakryiko
0 siblings, 1 reply; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:29 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: viro, bpf, linux-fsdevel, amir73il, cgroups, kvm, netdev,
torvalds
On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> >
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > Equivalent transformation. For one thing, it's easier to follow that way.
> > For another, that simplifies the control flow in the vicinity of struct fd
> > handling in there, which will allow a switch to CLASS(fd) and make the
> > thing much easier to verify wrt leaks.
> >
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > ---
> > kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > 1 file changed, 172 insertions(+), 170 deletions(-)
> >
>
> This looks unnecessarily intrusive. I think it's best to extract the
> logic of fetching and adding bpf_map by fd into a helper and that way
> contain fdget + fdput logic nicely. Something like below, which I can
> send to bpf-next.
>
> commit b5eec08241cc0263e560551de91eda73ccc5987d
> Author: Andrii Nakryiko <andrii@kernel.org>
> Date: Tue Aug 6 14:31:34 2024 -0700
>
> bpf: factor out fetching bpf_map from FD and adding it to used_maps list
>
> Factor out the logic to extract bpf_map instances from FD embedded in
> bpf_insns, adding it to the list of used_maps (unless it's already
> there, in which case we just reuse map's index). This simplifies the
> logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> handling, as all that is now neatly contained in the helper and doesn't
> leak into a dozen error handling paths.
>
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index df3be12096cf..14e4ef687a59 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> bpf_map *map)
> map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> }
>
> +/* Add map behind fd to used maps list, if it's not already there, and return
> + * its index. Also set *reused to true if this map was already in the list of
> + * used maps.
> + * Returns <0 on error, or >= 0 index, on success.
> + */
> +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> bool *reused)
> +{
> + struct fd f = fdget(fd);
Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-07 10:29 ` Christian Brauner
@ 2024-08-07 15:30 ` Andrii Nakryiko
2024-08-08 16:51 ` Alexei Starovoitov
0 siblings, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-07 15:30 UTC (permalink / raw)
To: Christian Brauner
Cc: viro, bpf, linux-fsdevel, amir73il, cgroups, kvm, netdev,
torvalds
On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
>
> On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > >
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > Equivalent transformation. For one thing, it's easier to follow that way.
> > > For another, that simplifies the control flow in the vicinity of struct fd
> > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > thing much easier to verify wrt leaks.
> > >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > ---
> > > kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > 1 file changed, 172 insertions(+), 170 deletions(-)
> > >
> >
> > This looks unnecessarily intrusive. I think it's best to extract the
> > logic of fetching and adding bpf_map by fd into a helper and that way
> > contain fdget + fdput logic nicely. Something like below, which I can
> > send to bpf-next.
> >
> > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > Author: Andrii Nakryiko <andrii@kernel.org>
> > Date: Tue Aug 6 14:31:34 2024 -0700
> >
> > bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> >
> > Factor out the logic to extract bpf_map instances from FD embedded in
> > bpf_insns, adding it to the list of used_maps (unless it's already
> > there, in which case we just reuse map's index). This simplifies the
> > logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > handling, as all that is now neatly contained in the helper and doesn't
> > leak into a dozen error handling paths.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index df3be12096cf..14e4ef687a59 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > bpf_map *map)
> > map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > }
> >
> > +/* Add map behind fd to used maps list, if it's not already there, and return
> > + * its index. Also set *reused to true if this map was already in the list of
> > + * used maps.
> > + * Returns <0 on error, or >= 0 index, on success.
> > + */
> > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > bool *reused)
> > +{
> > + struct fd f = fdget(fd);
>
> Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
That was the point of Al's next patch in the series, so I didn't want
to do it in this one that just refactored the logic of adding maps.
But I can fold that in and send it to bpf-next.
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-07 15:30 ` Andrii Nakryiko
@ 2024-08-08 16:51 ` Alexei Starovoitov
2024-08-08 20:35 ` Andrii Nakryiko
2024-08-10 3:29 ` Al Viro
0 siblings, 2 replies; 134+ messages in thread
From: Alexei Starovoitov @ 2024-08-08 16:51 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
open list:CONTROL GROUP (CGROUP), kvm, Network Development,
Linus Torvalds
On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > >
> > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > >
> > > > Equivalent transformation. For one thing, it's easier to follow that way.
> > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > thing much easier to verify wrt leaks.
> > > >
> > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > ---
> > > > kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > 1 file changed, 172 insertions(+), 170 deletions(-)
> > > >
> > >
> > > This looks unnecessarily intrusive. I think it's best to extract the
> > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > contain fdget + fdput logic nicely. Something like below, which I can
> > > send to bpf-next.
> > >
> > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > Date: Tue Aug 6 14:31:34 2024 -0700
> > >
> > > bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > >
> > > Factor out the logic to extract bpf_map instances from FD embedded in
> > > bpf_insns, adding it to the list of used_maps (unless it's already
> > > there, in which case we just reuse map's index). This simplifies the
> > > logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > handling, as all that is now neatly contained in the helper and doesn't
> > > leak into a dozen error handling paths.
> > >
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > >
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index df3be12096cf..14e4ef687a59 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > bpf_map *map)
> > > map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > }
> > >
> > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > + * its index. Also set *reused to true if this map was already in the list of
> > > + * used maps.
> > > + * Returns <0 on error, or >= 0 index, on success.
> > > + */
> > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > bool *reused)
> > > +{
> > > + struct fd f = fdget(fd);
> >
> > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
>
> That was the point of Al's next patch in the series, so I didn't want
> to do it in this one that just refactored the logic of adding maps.
> But I can fold that in and send it to bpf-next.
+1.
The bpf changes look ok and Andrii's approach is easier to grasp.
It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
so it goes through bpf CI and our other testing.
bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
and fderr, so pretty much independent from other patches.
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-08 16:51 ` Alexei Starovoitov
@ 2024-08-08 20:35 ` Andrii Nakryiko
2024-08-09 1:23 ` Alexei Starovoitov
2024-08-10 3:29 ` Al Viro
1 sibling, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-08 20:35 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
open list:CONTROL GROUP (CGROUP), kvm, Network Development,
Linus Torvalds
On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > >
> > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > > >
> > > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > > >
> > > > > Equivalent transformation. For one thing, it's easier to follow that way.
> > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > thing much easier to verify wrt leaks.
> > > > >
> > > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > > ---
> > > > > kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > > 1 file changed, 172 insertions(+), 170 deletions(-)
> > > > >
> > > >
> > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > send to bpf-next.
> > > >
> > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > > Date: Tue Aug 6 14:31:34 2024 -0700
> > > >
> > > > bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > >
> > > > Factor out the logic to extract bpf_map instances from FD embedded in
> > > > bpf_insns, adding it to the list of used_maps (unless it's already
> > > > there, in which case we just reuse map's index). This simplifies the
> > > > logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > > handling, as all that is now neatly contained in the helper and doesn't
> > > > leak into a dozen error handling paths.
> > > >
> > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > >
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index df3be12096cf..14e4ef687a59 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > bpf_map *map)
> > > > map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > > }
> > > >
> > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > + * used maps.
> > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > + */
> > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > bool *reused)
> > > > +{
> > > > + struct fd f = fdget(fd);
> > >
> > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> >
> > That was the point of Al's next patch in the series, so I didn't want
> > to do it in this one that just refactored the logic of adding maps.
> > But I can fold that in and send it to bpf-next.
>
> +1.
>
> The bpf changes look ok and Andrii's approach is easier to grasp.
> It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> so it goes through bpf CI and our other testing.
>
> bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> and fderr, so pretty much independent from other patches.
Ok, so CLASS(fd, f) won't work just yet because of peculiar
__bpf_map_get() contract: if it gets valid struct fd but it doesn't
contain a valid struct bpf_map, then __bpf_map_get() does fdput()
internally. In all other cases the caller has to do fdput() and
returned struct bpf_map's refcount has to be bumped by the caller
(__bpf_map_get() doesn't do that, I guess that's why it's
double-underscored).
I think the reason it was done was just a convenience to not have to
get/put bpf_map for temporary uses (and instead rely on file's
reference keeping bpf_map alive), plus we have bpf_map_inc() and
bpf_map_inc_uref() variants, so in some cases we need to bump just
refcount, and in some both user and normal refcounts.
So can't use CLASS(fd, ...) without some more clean up.
Alexei, how about changing __bpf_map_get(struct fd f) to
__bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
always returning bpf_map with (normal) refcount bumped (if successful,
of course). We can then split bpf_map_inc_with_uref() into just
bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
extra uref-only increment, if necessary.
I can do that as a pre-patch, there are about 15 callers, so not too
much work to clean this up. Let me know.
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-08 20:35 ` Andrii Nakryiko
@ 2024-08-09 1:23 ` Alexei Starovoitov
2024-08-09 17:23 ` Andrii Nakryiko
0 siblings, 1 reply; 134+ messages in thread
From: Alexei Starovoitov @ 2024-08-09 1:23 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
open list:CONTROL GROUP (CGROUP), kvm, Network Development,
Linus Torvalds
On Thu, Aug 8, 2024 at 1:35 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > > >
> > > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > > > >
> > > > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > > > >
> > > > > > Equivalent transformation. For one thing, it's easier to follow that way.
> > > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > > thing much easier to verify wrt leaks.
> > > > > >
> > > > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > ---
> > > > > > kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > > > 1 file changed, 172 insertions(+), 170 deletions(-)
> > > > > >
> > > > >
> > > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > > send to bpf-next.
> > > > >
> > > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > > > Date: Tue Aug 6 14:31:34 2024 -0700
> > > > >
> > > > > bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > > >
> > > > > Factor out the logic to extract bpf_map instances from FD embedded in
> > > > > bpf_insns, adding it to the list of used_maps (unless it's already
> > > > > there, in which case we just reuse map's index). This simplifies the
> > > > > logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > > > handling, as all that is now neatly contained in the helper and doesn't
> > > > > leak into a dozen error handling paths.
> > > > >
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > >
> > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > index df3be12096cf..14e4ef687a59 100644
> > > > > --- a/kernel/bpf/verifier.c
> > > > > +++ b/kernel/bpf/verifier.c
> > > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > > bpf_map *map)
> > > > > map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > > > }
> > > > >
> > > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > > + * used maps.
> > > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > > + */
> > > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > > bool *reused)
> > > > > +{
> > > > > + struct fd f = fdget(fd);
> > > >
> > > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> > >
> > > That was the point of Al's next patch in the series, so I didn't want
> > > to do it in this one that just refactored the logic of adding maps.
> > > But I can fold that in and send it to bpf-next.
> >
> > +1.
> >
> > The bpf changes look ok and Andrii's approach is easier to grasp.
> > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > so it goes through bpf CI and our other testing.
> >
> > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > and fderr, so pretty much independent from other patches.
>
> Ok, so CLASS(fd, f) won't work just yet because of peculiar
> __bpf_map_get() contract: if it gets valid struct fd but it doesn't
> contain a valid struct bpf_map, then __bpf_map_get() does fdput()
> internally. In all other cases the caller has to do fdput() and
> returned struct bpf_map's refcount has to be bumped by the caller
> (__bpf_map_get() doesn't do that, I guess that's why it's
> double-underscored).
>
> I think the reason it was done was just a convenience to not have to
> get/put bpf_map for temporary uses (and instead rely on file's
> reference keeping bpf_map alive), plus we have bpf_map_inc() and
> bpf_map_inc_uref() variants, so in some cases we need to bump just
> refcount, and in some both user and normal refcounts.
>
> So can't use CLASS(fd, ...) without some more clean up.
>
> Alexei, how about changing __bpf_map_get(struct fd f) to
> __bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
> always returning bpf_map with (normal) refcount bumped (if successful,
> of course). We can then split bpf_map_inc_with_uref() into just
> bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
> extra uref-only increment, if necessary.
>
> I can do that as a pre-patch, there are about 15 callers, so not too
> much work to clean this up. Let me know.
Yeah. Let's kill __bpf_map_get(struct fd ..) altogether.
This logic was added in 2014.
fdget() had to be first and fdput() last to make sure
the map won't disappear while sys_bpf command is running.
All of the places can use bpf_map_get(), bpf_map_put() pair
and rely on map->refcnt, but...
- it's atomic64_inc(&map->refcnt); The cost is probably
in the noise compared to all the work that map sys_bpf commands do.
- It also opens new fuzzing opportunity to do some map operation
in one thread and close(map_fd) in the other, so map->usercnt can
drop to zero and map_release_uref() cleanup can start while
the other thread is still busy doing something like map_update_elem().
It can be mitigated by doing bpf_map_get_with_uref(), but two
atomic64_inc() is kinda too much.
So let's remove __bpf_map_get() and replace all users with bpf_map_get(),
but we may need to revisit that later.
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-09 1:23 ` Alexei Starovoitov
@ 2024-08-09 17:23 ` Andrii Nakryiko
0 siblings, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-09 17:23 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Christian Brauner, viro, bpf, Linux-Fsdevel, Amir Goldstein,
open list:CONTROL GROUP (CGROUP), kvm, Network Development,
Linus Torvalds
On Thu, Aug 8, 2024 at 6:23 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Thu, Aug 8, 2024 at 1:35 PM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Thu, Aug 8, 2024 at 9:51 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Aug 7, 2024 at 8:31 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > >
> > > > On Wed, Aug 7, 2024 at 3:30 AM Christian Brauner <brauner@kernel.org> wrote:
> > > > >
> > > > > On Tue, Aug 06, 2024 at 03:32:20PM GMT, Andrii Nakryiko wrote:
> > > > > > On Mon, Jul 29, 2024 at 10:20 PM <viro@kernel.org> wrote:
> > > > > > >
> > > > > > > From: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > >
> > > > > > > Equivalent transformation. For one thing, it's easier to follow that way.
> > > > > > > For another, that simplifies the control flow in the vicinity of struct fd
> > > > > > > handling in there, which will allow a switch to CLASS(fd) and make the
> > > > > > > thing much easier to verify wrt leaks.
> > > > > > >
> > > > > > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> > > > > > > ---
> > > > > > > kernel/bpf/verifier.c | 342 +++++++++++++++++++++---------------------
> > > > > > > 1 file changed, 172 insertions(+), 170 deletions(-)
> > > > > > >
> > > > > >
> > > > > > This looks unnecessarily intrusive. I think it's best to extract the
> > > > > > logic of fetching and adding bpf_map by fd into a helper and that way
> > > > > > contain fdget + fdput logic nicely. Something like below, which I can
> > > > > > send to bpf-next.
> > > > > >
> > > > > > commit b5eec08241cc0263e560551de91eda73ccc5987d
> > > > > > Author: Andrii Nakryiko <andrii@kernel.org>
> > > > > > Date: Tue Aug 6 14:31:34 2024 -0700
> > > > > >
> > > > > > bpf: factor out fetching bpf_map from FD and adding it to used_maps list
> > > > > >
> > > > > > Factor out the logic to extract bpf_map instances from FD embedded in
> > > > > > bpf_insns, adding it to the list of used_maps (unless it's already
> > > > > > there, in which case we just reuse map's index). This simplifies the
> > > > > > logic in resolve_pseudo_ldimm64(), especially around `struct fd`
> > > > > > handling, as all that is now neatly contained in the helper and doesn't
> > > > > > leak into a dozen error handling paths.
> > > > > >
> > > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > > > >
> > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > > > index df3be12096cf..14e4ef687a59 100644
> > > > > > --- a/kernel/bpf/verifier.c
> > > > > > +++ b/kernel/bpf/verifier.c
> > > > > > @@ -18865,6 +18865,58 @@ static bool bpf_map_is_cgroup_storage(struct
> > > > > > bpf_map *map)
> > > > > > map->map_type == BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE);
> > > > > > }
> > > > > >
> > > > > > +/* Add map behind fd to used maps list, if it's not already there, and return
> > > > > > + * its index. Also set *reused to true if this map was already in the list of
> > > > > > + * used maps.
> > > > > > + * Returns <0 on error, or >= 0 index, on success.
> > > > > > + */
> > > > > > +static int add_used_map_from_fd(struct bpf_verifier_env *env, int fd,
> > > > > > bool *reused)
> > > > > > +{
> > > > > > + struct fd f = fdget(fd);
> > > > >
> > > > > Use CLASS(fd, f)(fd) and you can avoid all that fdput() stuff.
> > > >
> > > > That was the point of Al's next patch in the series, so I didn't want
> > > > to do it in this one that just refactored the logic of adding maps.
> > > > But I can fold that in and send it to bpf-next.
> > >
> > > +1.
> > >
> > > The bpf changes look ok and Andrii's approach is easier to grasp.
> > > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > > so it goes through bpf CI and our other testing.
> > >
> > > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > > and fderr, so pretty much independent from other patches.
> >
> > Ok, so CLASS(fd, f) won't work just yet because of peculiar
> > __bpf_map_get() contract: if it gets valid struct fd but it doesn't
> > contain a valid struct bpf_map, then __bpf_map_get() does fdput()
> > internally. In all other cases the caller has to do fdput() and
> > returned struct bpf_map's refcount has to be bumped by the caller
> > (__bpf_map_get() doesn't do that, I guess that's why it's
> > double-underscored).
> >
> > I think the reason it was done was just a convenience to not have to
> > get/put bpf_map for temporary uses (and instead rely on file's
> > reference keeping bpf_map alive), plus we have bpf_map_inc() and
> > bpf_map_inc_uref() variants, so in some cases we need to bump just
> > refcount, and in some both user and normal refcounts.
> >
> > So can't use CLASS(fd, ...) without some more clean up.
> >
> > Alexei, how about changing __bpf_map_get(struct fd f) to
> > __bpf_map_get_from_fd(int ufd), doing fdget/fdput internally, and
> > always returning bpf_map with (normal) refcount bumped (if successful,
> > of course). We can then split bpf_map_inc_with_uref() into just
> > bpf_map_inc() and bpf_map_inc_uref(), and callers will be able to do
> > extra uref-only increment, if necessary.
> >
> > I can do that as a pre-patch, there are about 15 callers, so not too
> > much work to clean this up. Let me know.
>
> Yeah. Let's kill __bpf_map_get(struct fd ..) altogether.
> This logic was added in 2014.
> fdget() had to be first and fdput() last to make sure
> the map won't disappear while sys_bpf command is running.
> All of the places can use bpf_map_get(), bpf_map_put() pair
> and rely on map->refcnt, but...
>
> - it's atomic64_inc(&map->refcnt); The cost is probably
> in the noise compared to all the work that map sys_bpf commands do.
>
agreed, not too worried about this
> - It also opens new fuzzing opportunity to do some map operation
> in one thread and close(map_fd) in the other, so map->usercnt can
> drop to zero and map_release_uref() cleanup can start while
> the other thread is still busy doing something like map_update_elem().
> It can be mitigated by doing bpf_map_get_with_uref(), but two
> atomic64_inc() is kinda too much.
>
yep, with_uref() is an overkill for most cases. I'd rather fix any
such bugs, if we have them.
> So let's remove __bpf_map_get() and replace all users with bpf_map_get(),
> but we may need to revisit that later.
Ok, I will probably send something next week.
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-08 16:51 ` Alexei Starovoitov
2024-08-08 20:35 ` Andrii Nakryiko
@ 2024-08-10 3:29 ` Al Viro
2024-08-12 20:05 ` Andrii Nakryiko
1 sibling, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-08-10 3:29 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Andrii Nakryiko, Christian Brauner, viro, bpf, Linux-Fsdevel,
Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
Network Development, Linus Torvalds
On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
> The bpf changes look ok and Andrii's approach is easier to grasp.
> It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> so it goes through bpf CI and our other testing.
>
> bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> and fderr, so pretty much independent from other patches.
Representation change and switch to accessors do matter, though.
OTOH, I can put just those into never-rebased branch (basically,
"introduce fd_file(), convert all accessors to it" +
"struct fd representation change" + possibly "add struct fd constructors,
get rid of __to_fd()", for completeness sake), so you could pull it.
Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-10 3:29 ` Al Viro
@ 2024-08-12 20:05 ` Andrii Nakryiko
2024-08-13 2:06 ` Al Viro
0 siblings, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-12 20:05 UTC (permalink / raw)
To: Al Viro
Cc: Alexei Starovoitov, Christian Brauner, viro, bpf, Linux-Fsdevel,
Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
Network Development, Linus Torvalds
On Fri, Aug 9, 2024 at 8:29 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
>
> > The bpf changes look ok and Andrii's approach is easier to grasp.
> > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > so it goes through bpf CI and our other testing.
> >
> > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > and fderr, so pretty much independent from other patches.
>
> Representation change and switch to accessors do matter, though.
> OTOH, I can put just those into never-rebased branch (basically,
> "introduce fd_file(), convert all accessors to it" +
> "struct fd representation change" + possibly "add struct fd constructors,
> get rid of __to_fd()", for completeness sake), so you could pull it.
> Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...
Yep, makes sense. Let's do that, we can merge that branch into
bpf-next/master and I will follow up with my changes on top of that.
Let's just drop the do_one_ldimm64() extraction, and keep fdput(f)
logic, plus add fd_file() accessor changes. I'll then add a switch to
CLASS(fd) after a bit more BPF-specific clean ups. This code is pretty
sensitive, so I'd rather have all the non-trivial refactoring done
separately. Thanks!
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-12 20:05 ` Andrii Nakryiko
@ 2024-08-13 2:06 ` Al Viro
2024-08-13 3:32 ` Andrii Nakryiko
0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-08-13 2:06 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: Alexei Starovoitov, Christian Brauner, viro, bpf, Linux-Fsdevel,
Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
Network Development, Linus Torvalds
On Mon, Aug 12, 2024 at 01:05:19PM -0700, Andrii Nakryiko wrote:
> On Fri, Aug 9, 2024 at 8:29???PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
> >
> > > The bpf changes look ok and Andrii's approach is easier to grasp.
> > > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > > so it goes through bpf CI and our other testing.
> > >
> > > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > > and fderr, so pretty much independent from other patches.
> >
> > Representation change and switch to accessors do matter, though.
> > OTOH, I can put just those into never-rebased branch (basically,
> > "introduce fd_file(), convert all accessors to it" +
> > "struct fd representation change" + possibly "add struct fd constructors,
> > get rid of __to_fd()", for completeness sake), so you could pull it.
> > Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...
>
> Yep, makes sense. Let's do that, we can merge that branch into
> bpf-next/master and I will follow up with my changes on top of that.
>
> Let's just drop the do_one_ldimm64() extraction, and keep fdput(f)
> logic, plus add fd_file() accessor changes. I'll then add a switch to
> CLASS(fd) after a bit more BPF-specific clean ups. This code is pretty
> sensitive, so I'd rather have all the non-trivial refactoring done
> separately. Thanks!
Done (#stable-struct_fd); BTW, which tree do you want "convert __bpf_prog_get()
to CLASS(fd)" to go through?
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper
2024-08-13 2:06 ` Al Viro
@ 2024-08-13 3:32 ` Andrii Nakryiko
0 siblings, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-13 3:32 UTC (permalink / raw)
To: Al Viro
Cc: Alexei Starovoitov, Christian Brauner, viro, bpf, Linux-Fsdevel,
Amir Goldstein, open list:CONTROL GROUP (CGROUP), kvm,
Network Development, Linus Torvalds
On Mon, Aug 12, 2024 at 7:06 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Mon, Aug 12, 2024 at 01:05:19PM -0700, Andrii Nakryiko wrote:
> > On Fri, Aug 9, 2024 at 8:29???PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > On Thu, Aug 08, 2024 at 09:51:34AM -0700, Alexei Starovoitov wrote:
> > >
> > > > The bpf changes look ok and Andrii's approach is easier to grasp.
> > > > It's better to route bpf conversion to CLASS(fd,..) via bpf-next,
> > > > so it goes through bpf CI and our other testing.
> > > >
> > > > bpf patches don't seem to depend on newly added CLASS(fd_pos, ...
> > > > and fderr, so pretty much independent from other patches.
> > >
> > > Representation change and switch to accessors do matter, though.
> > > OTOH, I can put just those into never-rebased branch (basically,
> > > "introduce fd_file(), convert all accessors to it" +
> > > "struct fd representation change" + possibly "add struct fd constructors,
> > > get rid of __to_fd()", for completeness sake), so you could pull it.
> > > Otherwise you'll get textual conflicts on all those f.file vs. fd_file(f)...
> >
> > Yep, makes sense. Let's do that, we can merge that branch into
> > bpf-next/master and I will follow up with my changes on top of that.
> >
> > Let's just drop the do_one_ldimm64() extraction, and keep fdput(f)
> > logic, plus add fd_file() accessor changes. I'll then add a switch to
> > CLASS(fd) after a bit more BPF-specific clean ups. This code is pretty
> > sensitive, so I'd rather have all the non-trivial refactoring done
> > separately. Thanks!
>
> Done (#stable-struct_fd);
great, thanks, I'll look at this tomorrow
> BTW, which tree do you want "convert __bpf_prog_get()
> to CLASS(fd)" to go through?
So we seem to have the following for BPF-related stuff:
[PATCH 16/39] convert __bpf_prog_get() to CLASS(fd, ...)
This looks to be ready to go in.
[PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single
ldimm64 insn into helper
This one I'd like to rework differently and land it through bpf-next.
[PATCH 18/39] bpf maps: switch to CLASS(fd, ...)
This one touches __bpf_map_get() which I'm going to remove or refactor
as part of the abovementioned refactoring, so there will be conflicts.
[PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...)
This one touches a bunch of cases across multiple systems, including
BPF's kernel/bpf/bpf_inode_storage.c.
So how about this. We take #16 as is through bpf-next, change how #17
is done, take 18 mostly as is but adjust as necessary. As for #19, if
you could split out changes in bpf_inode_storage.c to a separate
patch, we can also apply it in bpf-next as one coherent set. I'll send
all that as one complete patch set for you to do the final review.
WDYT?
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 18/39] bpf maps: switch to CLASS(fd, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (15 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 17/39] bpf: resolve_pseudo_ldimm64(): take handling of a single ldimm64 insn into helper viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:34 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
` (21 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Calling conventions for __bpf_map_get() would be more convenient
if it left fpdut() on failure to callers. Makes for simpler logics
in the callers.
Among other things, the proof of memory safety no longer has to
rely upon file->private_data never being ERR_PTR(...) for bpffs files.
Original calling conventions made it impossible for the caller to tell
whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
the file not be a bpf map one (in which case it would've done fdput())
or because it found that ERR_PTR(-EINVAL) in file->private_data of a
bpf map file (in which case fdput() would _not_ have been done).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
include/linux/bpf.h | 9 +++-
kernel/bpf/map_in_map.c | 38 ++++---------
kernel/bpf/syscall.c | 117 ++++++++++++----------------------------
kernel/bpf/verifier.c | 19 +------
net/core/sock_map.c | 23 +++-----
5 files changed, 60 insertions(+), 146 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 3b94ec161e8c..004d37e8b8df 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2227,7 +2227,14 @@ void __bpf_obj_drop_impl(void *p, const struct btf_record *rec, bool percpu);
struct bpf_map *bpf_map_get(u32 ufd);
struct bpf_map *bpf_map_get_with_uref(u32 ufd);
-struct bpf_map *__bpf_map_get(struct fd f);
+
+struct bpf_map *bpf_file_to_map(struct file *f);
+static inline struct bpf_map *__bpf_map_get(struct fd f)
+{
+ if (fd_empty(f))
+ return ERR_PTR(-EBADF);
+ return bpf_file_to_map(fd_file(f));
+}
void bpf_map_inc(struct bpf_map *map);
void bpf_map_inc_with_uref(struct bpf_map *map);
struct bpf_map *__bpf_map_inc_not_zero(struct bpf_map *map, bool uref);
diff --git a/kernel/bpf/map_in_map.c b/kernel/bpf/map_in_map.c
index b4f18c85d7bc..645bd30bc9a9 100644
--- a/kernel/bpf/map_in_map.c
+++ b/kernel/bpf/map_in_map.c
@@ -11,24 +11,18 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
{
struct bpf_map *inner_map, *inner_map_meta;
u32 inner_map_meta_size;
- struct fd f;
- int ret;
+ CLASS(fd, f)(inner_map_ufd);
- f = fdget(inner_map_ufd);
inner_map = __bpf_map_get(f);
if (IS_ERR(inner_map))
return inner_map;
/* Does not support >1 level map-in-map */
- if (inner_map->inner_map_meta) {
- ret = -EINVAL;
- goto put;
- }
+ if (inner_map->inner_map_meta)
+ return ERR_PTR(-EINVAL);
- if (!inner_map->ops->map_meta_equal) {
- ret = -ENOTSUPP;
- goto put;
- }
+ if (!inner_map->ops->map_meta_equal)
+ return ERR_PTR(-ENOTSUPP);
inner_map_meta_size = sizeof(*inner_map_meta);
/* In some cases verifier needs to access beyond just base map. */
@@ -36,10 +30,8 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_map_meta_size = sizeof(struct bpf_array);
inner_map_meta = kzalloc(inner_map_meta_size, GFP_USER);
- if (!inner_map_meta) {
- ret = -ENOMEM;
- goto put;
- }
+ if (!inner_map_meta)
+ return ERR_PTR(-ENOMEM);
inner_map_meta->map_type = inner_map->map_type;
inner_map_meta->key_size = inner_map->key_size;
@@ -53,8 +45,9 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
* invalid/empty/valid, but ERR_PTR in case of errors. During
* equality NULL or IS_ERR is equivalent.
*/
- ret = PTR_ERR(inner_map_meta->record);
- goto free;
+ struct bpf_map *ret = ERR_CAST(inner_map_meta->record);
+ kfree(inner_map_meta);
+ return ret;
}
/* Note: We must use the same BTF, as we also used btf_record_dup above
* which relies on BTF being same for both maps, as some members like
@@ -77,14 +70,7 @@ struct bpf_map *bpf_map_meta_alloc(int inner_map_ufd)
inner_array_meta->elem_size = inner_array->elem_size;
inner_map_meta->bypass_spec_v1 = inner_map->bypass_spec_v1;
}
-
- fdput(f);
return inner_map_meta;
-free:
- kfree(inner_map_meta);
-put:
- fdput(f);
- return ERR_PTR(ret);
}
void bpf_map_meta_free(struct bpf_map *map_meta)
@@ -110,9 +96,8 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
int ufd)
{
struct bpf_map *inner_map, *inner_map_meta;
- struct fd f;
+ CLASS(fd, f)(ufd);
- f = fdget(ufd);
inner_map = __bpf_map_get(f);
if (IS_ERR(inner_map))
return inner_map;
@@ -123,7 +108,6 @@ void *bpf_map_fd_get_ptr(struct bpf_map *map,
else
inner_map = ERR_PTR(-EINVAL);
- fdput(f);
return inner_map;
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c5b252c0faa8..807042c643c8 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1418,19 +1418,11 @@ static int map_create(union bpf_attr *attr)
return err;
}
-/* if error is returned, fd is released.
- * On success caller should complete fd access with matching fdput()
- */
-struct bpf_map *__bpf_map_get(struct fd f)
+struct bpf_map *bpf_file_to_map(struct file *f)
{
- if (!fd_file(f))
- return ERR_PTR(-EBADF);
- if (fd_file(f)->f_op != &bpf_map_fops) {
- fdput(f);
+ if (unlikely(f->f_op != &bpf_map_fops))
return ERR_PTR(-EINVAL);
- }
-
- return fd_file(f)->private_data;
+ return f->private_data;
}
void bpf_map_inc(struct bpf_map *map)
@@ -1448,15 +1440,11 @@ EXPORT_SYMBOL_GPL(bpf_map_inc_with_uref);
struct bpf_map *bpf_map_get(u32 ufd)
{
- struct fd f = fdget(ufd);
- struct bpf_map *map;
-
- map = __bpf_map_get(f);
- if (IS_ERR(map))
- return map;
+ CLASS(fd, f)(ufd);
+ struct bpf_map *map = __bpf_map_get(f);
- bpf_map_inc(map);
- fdput(f);
+ if (!IS_ERR(map))
+ bpf_map_inc(map);
return map;
}
@@ -1464,15 +1452,11 @@ EXPORT_SYMBOL(bpf_map_get);
struct bpf_map *bpf_map_get_with_uref(u32 ufd)
{
- struct fd f = fdget(ufd);
- struct bpf_map *map;
-
- map = __bpf_map_get(f);
- if (IS_ERR(map))
- return map;
+ CLASS(fd, f)(ufd);
+ struct bpf_map *map = __bpf_map_get(f);
- bpf_map_inc_with_uref(map);
- fdput(f);
+ if (!IS_ERR(map))
+ bpf_map_inc_with_uref(map);
return map;
}
@@ -1537,11 +1521,9 @@ static int map_lookup_elem(union bpf_attr *attr)
{
void __user *ukey = u64_to_user_ptr(attr->key);
void __user *uvalue = u64_to_user_ptr(attr->value);
- int ufd = attr->map_fd;
struct bpf_map *map;
void *key, *value;
u32 value_size;
- struct fd f;
int err;
if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
@@ -1550,26 +1532,20 @@ static int map_lookup_elem(union bpf_attr *attr)
if (attr->flags & ~BPF_F_LOCK)
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->map_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
- if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
- err = -EPERM;
- goto err_put;
- }
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ))
+ return -EPERM;
if ((attr->flags & BPF_F_LOCK) &&
- !btf_record_has_field(map->record, BPF_SPIN_LOCK)) {
- err = -EINVAL;
- goto err_put;
- }
+ !btf_record_has_field(map->record, BPF_SPIN_LOCK))
+ return -EINVAL;
key = __bpf_copy_key(ukey, map->key_size);
- if (IS_ERR(key)) {
- err = PTR_ERR(key);
- goto err_put;
- }
+ if (IS_ERR(key))
+ return PTR_ERR(key);
value_size = bpf_map_value_size(map);
@@ -1600,8 +1576,6 @@ static int map_lookup_elem(union bpf_attr *attr)
kvfree(value);
free_key:
kvfree(key);
-err_put:
- fdput(f);
return err;
}
@@ -1612,17 +1586,15 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
{
bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
bpfptr_t uvalue = make_bpfptr(attr->value, uattr.is_kernel);
- int ufd = attr->map_fd;
struct bpf_map *map;
void *key, *value;
u32 value_size;
- struct fd f;
int err;
if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM))
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->map_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -1660,7 +1632,6 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
kvfree(key);
err_put:
bpf_map_write_active_dec(map);
- fdput(f);
return err;
}
@@ -1669,16 +1640,14 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
{
bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
- int ufd = attr->map_fd;
struct bpf_map *map;
- struct fd f;
void *key;
int err;
if (CHECK_ATTR(BPF_MAP_DELETE_ELEM))
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->map_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -1715,7 +1684,6 @@ static int map_delete_elem(union bpf_attr *attr, bpfptr_t uattr)
kvfree(key);
err_put:
bpf_map_write_active_dec(map);
- fdput(f);
return err;
}
@@ -1726,30 +1694,24 @@ static int map_get_next_key(union bpf_attr *attr)
{
void __user *ukey = u64_to_user_ptr(attr->key);
void __user *unext_key = u64_to_user_ptr(attr->next_key);
- int ufd = attr->map_fd;
struct bpf_map *map;
void *key, *next_key;
- struct fd f;
int err;
if (CHECK_ATTR(BPF_MAP_GET_NEXT_KEY))
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->map_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
- if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
- err = -EPERM;
- goto err_put;
- }
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ))
+ return -EPERM;
if (ukey) {
key = __bpf_copy_key(ukey, map->key_size);
- if (IS_ERR(key)) {
- err = PTR_ERR(key);
- goto err_put;
- }
+ if (IS_ERR(key))
+ return PTR_ERR(key);
} else {
key = NULL;
}
@@ -1781,8 +1743,6 @@ static int map_get_next_key(union bpf_attr *attr)
kvfree(next_key);
free_key:
kvfree(key);
-err_put:
- fdput(f);
return err;
}
@@ -2011,11 +1971,9 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
{
void __user *ukey = u64_to_user_ptr(attr->key);
void __user *uvalue = u64_to_user_ptr(attr->value);
- int ufd = attr->map_fd;
struct bpf_map *map;
void *key, *value;
u32 value_size;
- struct fd f;
int err;
if (CHECK_ATTR(BPF_MAP_LOOKUP_AND_DELETE_ELEM))
@@ -2024,7 +1982,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
if (attr->flags & ~BPF_F_LOCK)
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->map_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -2094,7 +2052,6 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
kvfree(key);
err_put:
bpf_map_write_active_dec(map);
- fdput(f);
return err;
}
@@ -2102,27 +2059,22 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
static int map_freeze(const union bpf_attr *attr)
{
- int err = 0, ufd = attr->map_fd;
+ int err = 0;
struct bpf_map *map;
- struct fd f;
if (CHECK_ATTR(BPF_MAP_FREEZE))
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->map_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
- if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS || !IS_ERR_OR_NULL(map->record)) {
- fdput(f);
+ if (map->map_type == BPF_MAP_TYPE_STRUCT_OPS || !IS_ERR_OR_NULL(map->record))
return -ENOTSUPP;
- }
- if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
- fdput(f);
+ if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE))
return -EPERM;
- }
mutex_lock(&map->freeze_mutex);
if (bpf_map_write_active(map)) {
@@ -2137,7 +2089,6 @@ static int map_freeze(const union bpf_attr *attr)
WRITE_ONCE(map->frozen, true);
err_put:
mutex_unlock(&map->freeze_mutex);
- fdput(f);
return err;
}
@@ -5178,14 +5129,13 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
cmd == BPF_MAP_LOOKUP_AND_DELETE_BATCH;
bool has_write = cmd != BPF_MAP_LOOKUP_BATCH;
struct bpf_map *map;
- int err, ufd;
- struct fd f;
+ int err;
if (CHECK_ATTR(BPF_MAP_BATCH))
return -EINVAL;
- ufd = attr->batch.map_fd;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->batch.map_fd);
+
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -5213,7 +5163,6 @@ static int bpf_map_do_batch(const union bpf_attr *attr,
maybe_wait_bpf_programs(map);
bpf_map_write_active_dec(map);
}
- fdput(f);
return err;
}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d54f61e12062..011c51a84158 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18419,7 +18419,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
struct bpf_insn_aux_data *aux)
{
struct bpf_map *map;
- struct fd f;
u64 addr;
u32 fd;
int err;
@@ -18470,7 +18469,7 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
break;
}
- f = fdget(fd);
+ CLASS(fd, f)(fd);
map = __bpf_map_get(f);
if (IS_ERR(map)) {
verbose(env, "fd %d is not pointing to valid bpf_map\n", fd);
@@ -18478,10 +18477,8 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
}
err = check_map_prog_compatibility(env, map, env->prog);
- if (err) {
- fdput(f);
+ if (err)
return err;
- }
if (insn[0].src_reg == BPF_PSEUDO_MAP_FD ||
insn[0].src_reg == BPF_PSEUDO_MAP_IDX) {
@@ -18491,13 +18488,11 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
if (off >= BPF_MAX_VAR_OFF) {
verbose(env, "direct value offset of %u is not allowed\n", off);
- fdput(f);
return -EINVAL;
}
if (!map->ops->map_direct_value_addr) {
verbose(env, "no direct value access support for this map type\n");
- fdput(f);
return -EINVAL;
}
@@ -18505,7 +18500,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
if (err) {
verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
map->value_size, off);
- fdput(f);
return err;
}
@@ -18520,7 +18514,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
for (int i = 0; i < env->used_map_cnt; i++) {
if (env->used_maps[i] == map) {
aux->map_index = i;
- fdput(f);
return 0;
}
}
@@ -18528,7 +18521,6 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
if (env->used_map_cnt >= MAX_USED_MAPS) {
verbose(env, "The total number of maps per program has reached the limit of %u\n",
MAX_USED_MAPS);
- fdput(f);
return -E2BIG;
}
@@ -18547,39 +18539,32 @@ static int do_one_ldimm64(struct bpf_verifier_env *env,
if (bpf_map_is_cgroup_storage(map) &&
bpf_cgroup_storage_assign(env->prog->aux, map)) {
verbose(env, "only one cgroup storage of each type is allowed\n");
- fdput(f);
return -EBUSY;
}
if (map->map_type == BPF_MAP_TYPE_ARENA) {
if (env->prog->aux->arena) {
verbose(env, "Only one arena per program\n");
- fdput(f);
return -EBUSY;
}
if (!env->allow_ptr_leaks || !env->bpf_capable) {
verbose(env, "CAP_BPF and CAP_PERFMON are required to use arena\n");
- fdput(f);
return -EPERM;
}
if (!env->prog->jit_requested) {
verbose(env, "JIT is required to use arena\n");
- fdput(f);
return -EOPNOTSUPP;
}
if (!bpf_jit_supports_arena()) {
verbose(env, "JIT doesn't support arena\n");
- fdput(f);
return -EOPNOTSUPP;
}
env->prog->aux->arena = (void *)map;
if (!bpf_arena_get_user_vm_start(env->prog->aux->arena)) {
verbose(env, "arena's user address must be set via map_extra or mmap()\n");
- fdput(f);
return -EINVAL;
}
}
- fdput(f);
return 0;
}
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index d3dbb92153f2..0f5f80f44d52 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -67,46 +67,39 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog)
{
- u32 ufd = attr->target_fd;
struct bpf_map *map;
- struct fd f;
int ret;
if (attr->attach_flags || attr->replace_bpf_fd)
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->target_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
mutex_lock(&sockmap_mutex);
ret = sock_map_prog_update(map, prog, NULL, NULL, attr->attach_type);
mutex_unlock(&sockmap_mutex);
- fdput(f);
return ret;
}
int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
{
- u32 ufd = attr->target_fd;
struct bpf_prog *prog;
struct bpf_map *map;
- struct fd f;
int ret;
if (attr->attach_flags || attr->replace_bpf_fd)
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->target_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
prog = bpf_prog_get(attr->attach_bpf_fd);
- if (IS_ERR(prog)) {
- ret = PTR_ERR(prog);
- goto put_map;
- }
+ if (IS_ERR(prog))
+ return PTR_ERR(prog);
if (prog->type != ptype) {
ret = -EINVAL;
@@ -118,8 +111,6 @@ int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
mutex_unlock(&sockmap_mutex);
put_prog:
bpf_prog_put(prog);
-put_map:
- fdput(f);
return ret;
}
@@ -1550,18 +1541,17 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
- u32 prog_cnt = 0, flags = 0, ufd = attr->target_fd;
+ u32 prog_cnt = 0, flags = 0;
struct bpf_prog **pprog;
struct bpf_prog *prog;
struct bpf_map *map;
- struct fd f;
u32 id = 0;
int ret;
if (attr->query.query_flags)
return -EINVAL;
- f = fdget(ufd);
+ CLASS(fd, f)(attr->target_fd);
map = __bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -1593,7 +1583,6 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr,
copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
ret = -EFAULT;
- fdput(f);
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 18/39] bpf maps: switch to CLASS(fd, ...)
2024-07-30 5:16 ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
@ 2024-08-07 10:34 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:34 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:04AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Calling conventions for __bpf_map_get() would be more convenient
> if it left fpdut() on failure to callers. Makes for simpler logics
> in the callers.
>
> Among other things, the proof of memory safety no longer has to
> rely upon file->private_data never being ERR_PTR(...) for bpffs files.
> Original calling conventions made it impossible for the caller to tell
> whether __bpf_map_get() has returned ERR_PTR(-EINVAL) because it has found
> the file not be a bpf map one (in which case it would've done fdput())
> or because it found that ERR_PTR(-EINVAL) in file->private_data of a
> bpf map file (in which case fdput() would _not_ have been done).
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (16 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 18/39] bpf maps: switch to CLASS(fd, ...) viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:35 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
` (20 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
arch/arm/kernel/sys_oabi-compat.c | 10 +++-----
fs/fcntl.c | 42 +++++++++++++------------------
fs/namei.c | 13 +++-------
fs/open.c | 13 +++-------
fs/quota/quota.c | 12 +++------
fs/stat.c | 10 +++-----
fs/statfs.c | 12 ++++-----
kernel/bpf/bpf_inode_storage.c | 25 ++++++------------
kernel/cgroup/cgroup.c | 9 +++----
security/landlock/syscalls.c | 19 +++++---------
10 files changed, 58 insertions(+), 107 deletions(-)
diff --git a/arch/arm/kernel/sys_oabi-compat.c b/arch/arm/kernel/sys_oabi-compat.c
index f5781ff54a5c..2944721e82a2 100644
--- a/arch/arm/kernel/sys_oabi-compat.c
+++ b/arch/arm/kernel/sys_oabi-compat.c
@@ -235,12 +235,12 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
unsigned long arg)
{
void __user *argp = (void __user *)arg;
- struct fd f = fdget_raw(fd);
+ CLASS(fd_raw, f)(fd);
struct flock64 flock;
- long err = -EBADF;
+ long err;
- if (!fd_file(f))
- goto out;
+ if (fd_empty(f))
+ return -EBADF;
switch (cmd) {
case F_GETLK64:
@@ -271,8 +271,6 @@ asmlinkage long sys_oabi_fcntl64(unsigned int fd, unsigned int cmd,
err = sys_fcntl64(fd, cmd, arg);
break;
}
- fdput(f);
-out:
return err;
}
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 2b5616762354..f96328ee3853 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -476,24 +476,21 @@ static int check_fcntl_cmd(unsigned cmd)
SYSCALL_DEFINE3(fcntl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
- struct fd f = fdget_raw(fd);
- long err = -EBADF;
+ CLASS(fd_raw, f)(fd);
+ long err;
- if (!fd_file(f))
- goto out;
+ if (fd_empty(f))
+ return -EBADF;
if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
if (!check_fcntl_cmd(cmd))
- goto out1;
+ return -EBADF;
}
err = security_file_fcntl(fd_file(f), cmd, arg);
if (!err)
err = do_fcntl(fd, cmd, arg, fd_file(f));
-out1:
- fdput(f);
-out:
return err;
}
@@ -502,21 +499,21 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
unsigned long, arg)
{
void __user *argp = (void __user *)arg;
- struct fd f = fdget_raw(fd);
+ CLASS(fd_raw, f)(fd);
struct flock64 flock;
- long err = -EBADF;
+ long err;
- if (!fd_file(f))
- goto out;
+ if (fd_empty(f))
+ return -EBADF;
if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
if (!check_fcntl_cmd(cmd))
- goto out1;
+ return -EBADF;
}
err = security_file_fcntl(fd_file(f), cmd, arg);
if (err)
- goto out1;
+ return err;
switch (cmd) {
case F_GETLK64:
@@ -541,9 +538,6 @@ SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd,
err = do_fcntl(fd, cmd, arg, fd_file(f));
break;
}
-out1:
- fdput(f);
-out:
return err;
}
#endif
@@ -639,21 +633,21 @@ static int fixup_compat_flock(struct flock *flock)
static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
compat_ulong_t arg)
{
- struct fd f = fdget_raw(fd);
+ CLASS(fd_raw, f)(fd);
struct flock flock;
- long err = -EBADF;
+ long err;
- if (!fd_file(f))
- return err;
+ if (fd_empty(f))
+ return -EBADF;
if (unlikely(fd_file(f)->f_mode & FMODE_PATH)) {
if (!check_fcntl_cmd(cmd))
- goto out_put;
+ return -EBADF;
}
err = security_file_fcntl(fd_file(f), cmd, arg);
if (err)
- goto out_put;
+ return err;
switch (cmd) {
case F_GETLK:
@@ -696,8 +690,6 @@ static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
err = do_fcntl(fd, cmd, arg, fd_file(f));
break;
}
-out_put:
- fdput(f);
return err;
}
diff --git a/fs/namei.c b/fs/namei.c
index af86e3330594..9a1dcaf8be30 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2489,26 +2489,22 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
}
} else {
/* Caller must check execute permissions on the starting path component */
- struct fd f = fdget_raw(nd->dfd);
+ CLASS(fd_raw, f)(nd->dfd);
struct dentry *dentry;
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
if (flags & LOOKUP_LINKAT_EMPTY) {
if (fd_file(f)->f_cred != current_cred() &&
- !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
- fdput(f);
+ !ns_capable(fd_file(f)->f_cred->user_ns, CAP_DAC_READ_SEARCH))
return ERR_PTR(-ENOENT);
- }
}
dentry = fd_file(f)->f_path.dentry;
- if (*s && unlikely(!d_can_lookup(dentry))) {
- fdput(f);
+ if (*s && unlikely(!d_can_lookup(dentry)))
return ERR_PTR(-ENOTDIR);
- }
nd->path = fd_file(f)->f_path;
if (flags & LOOKUP_RCU) {
@@ -2518,7 +2514,6 @@ static const char *path_init(struct nameidata *nd, unsigned flags)
path_get(&nd->path);
nd->inode = nd->path.dentry->d_inode;
}
- fdput(f);
}
/* For scoped-lookups we need to set the root to the dirfd as well. */
diff --git a/fs/open.c b/fs/open.c
index a388828ccd22..624c173a0270 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -581,23 +581,18 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
SYSCALL_DEFINE1(fchdir, unsigned int, fd)
{
- struct fd f = fdget_raw(fd);
+ CLASS(fd_raw, f)(fd);
int error;
- error = -EBADF;
- if (!fd_file(f))
- goto out;
+ if (fd_empty(f))
+ return -EBADF;
- error = -ENOTDIR;
if (!d_can_lookup(fd_file(f)->f_path.dentry))
- goto out_putf;
+ return -ENOTDIR;
error = file_permission(fd_file(f), MAY_EXEC | MAY_CHDIR);
if (!error)
set_fs_pwd(current->fs, &fd_file(f)->f_path);
-out_putf:
- fdput(f);
-out:
return error;
}
diff --git a/fs/quota/quota.c b/fs/quota/quota.c
index 290157bc7bec..7c2b75a44485 100644
--- a/fs/quota/quota.c
+++ b/fs/quota/quota.c
@@ -976,21 +976,19 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
struct super_block *sb;
unsigned int cmds = cmd >> SUBCMDSHIFT;
unsigned int type = cmd & SUBCMDMASK;
- struct fd f;
+ CLASS(fd_raw, f)(fd);
int ret;
- f = fdget_raw(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- ret = -EINVAL;
if (type >= MAXQUOTAS)
- goto out;
+ return -EINVAL;
if (quotactl_cmd_write(cmds)) {
ret = mnt_want_write(fd_file(f)->f_path.mnt);
if (ret)
- goto out;
+ return ret;
}
sb = fd_file(f)->f_path.mnt->mnt_sb;
@@ -1008,7 +1006,5 @@ SYSCALL_DEFINE4(quotactl_fd, unsigned int, fd, unsigned int, cmd,
if (quotactl_cmd_write(cmds))
mnt_drop_write(fd_file(f)->f_path.mnt);
-out:
- fdput(f);
return ret;
}
diff --git a/fs/stat.c b/fs/stat.c
index 41e598376d7e..56d6ce2b2c79 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -220,15 +220,11 @@ EXPORT_SYMBOL(vfs_getattr);
*/
int vfs_fstat(int fd, struct kstat *stat)
{
- struct fd f;
- int error;
+ CLASS(fd_raw, f)(fd);
- f = fdget_raw(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- error = vfs_getattr(&fd_file(f)->f_path, stat, STATX_BASIC_STATS, 0);
- fdput(f);
- return error;
+ return vfs_getattr(&fd_file(f)->f_path, stat, STATX_BASIC_STATS, 0);
}
int getname_statx_lookup_flags(int flags)
diff --git a/fs/statfs.c b/fs/statfs.c
index 9c7bb27e7932..a45ac85e6048 100644
--- a/fs/statfs.c
+++ b/fs/statfs.c
@@ -114,13 +114,11 @@ int user_statfs(const char __user *pathname, struct kstatfs *st)
int fd_statfs(int fd, struct kstatfs *st)
{
- struct fd f = fdget_raw(fd);
- int error = -EBADF;
- if (fd_file(f)) {
- error = vfs_statfs(&fd_file(f)->f_path, st);
- fdput(f);
- }
- return error;
+ CLASS(fd_raw, f)(fd);
+
+ if (fd_empty(f))
+ return -EBADF;
+ return vfs_statfs(&fd_file(f)->f_path, st);
}
static int do_statfs_native(struct kstatfs *st, struct statfs __user *p)
diff --git a/kernel/bpf/bpf_inode_storage.c b/kernel/bpf/bpf_inode_storage.c
index 0a79aee6523d..a2c05db49ebc 100644
--- a/kernel/bpf/bpf_inode_storage.c
+++ b/kernel/bpf/bpf_inode_storage.c
@@ -78,13 +78,12 @@ void bpf_inode_storage_free(struct inode *inode)
static void *bpf_fd_inode_storage_lookup_elem(struct bpf_map *map, void *key)
{
struct bpf_local_storage_data *sdata;
- struct fd f = fdget_raw(*(int *)key);
+ CLASS(fd_raw, f)(*(int *)key);
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
sdata = inode_storage_lookup(file_inode(fd_file(f)), map, true);
- fdput(f);
return sdata ? sdata->data : NULL;
}
@@ -92,19 +91,16 @@ static long bpf_fd_inode_storage_update_elem(struct bpf_map *map, void *key,
void *value, u64 map_flags)
{
struct bpf_local_storage_data *sdata;
- struct fd f = fdget_raw(*(int *)key);
+ CLASS(fd_raw, f)(*(int *)key);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- if (!inode_storage_ptr(file_inode(fd_file(f)))) {
- fdput(f);
+ if (!inode_storage_ptr(file_inode(fd_file(f))))
return -EBADF;
- }
sdata = bpf_local_storage_update(file_inode(fd_file(f)),
(struct bpf_local_storage_map *)map,
value, map_flags, GFP_ATOMIC);
- fdput(f);
return PTR_ERR_OR_ZERO(sdata);
}
@@ -123,15 +119,10 @@ static int inode_storage_delete(struct inode *inode, struct bpf_map *map)
static long bpf_fd_inode_storage_delete_elem(struct bpf_map *map, void *key)
{
- struct fd f = fdget_raw(*(int *)key);
- int err;
-
- if (!fd_file(f))
+ CLASS(fd_raw, f)(*(int *)key);
+ if (fd_empty(f))
return -EBADF;
-
- err = inode_storage_delete(file_inode(fd_file(f)), map);
- fdput(f);
- return err;
+ return inode_storage_delete(file_inode(fd_file(f)), map);
}
/* *gfp_flags* is a hidden argument provided by the verifier */
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index b96489277f70..1244a8c8878e 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6899,14 +6899,11 @@ EXPORT_SYMBOL_GPL(cgroup_get_from_path);
*/
struct cgroup *cgroup_v1v2_get_from_fd(int fd)
{
- struct cgroup *cgrp;
- struct fd f = fdget_raw(fd);
- if (!fd_file(f))
+ CLASS(fd_raw, f)(fd);
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
- cgrp = cgroup_v1v2_get_from_file(fd_file(f));
- fdput(f);
- return cgrp;
+ return cgroup_v1v2_get_from_file(fd_file(f));
}
/**
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 00b63971ab64..9689169ad5cc 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -269,15 +269,12 @@ static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
*/
static int get_path_from_fd(const s32 fd, struct path *const path)
{
- struct fd f;
- int err = 0;
+ CLASS(fd_raw, f)(fd);
BUILD_BUG_ON(!__same_type(
fd, ((struct landlock_path_beneath_attr *)NULL)->parent_fd));
- /* Handles O_PATH. */
- f = fdget_raw(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
/*
* Forbids ruleset FDs, internal filesystems (e.g. nsfs), including
@@ -288,16 +285,12 @@ static int get_path_from_fd(const s32 fd, struct path *const path)
(fd_file(f)->f_path.mnt->mnt_flags & MNT_INTERNAL) ||
(fd_file(f)->f_path.dentry->d_sb->s_flags & SB_NOUSER) ||
d_is_negative(fd_file(f)->f_path.dentry) ||
- IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry))) {
- err = -EBADFD;
- goto out_fdput;
- }
+ IS_PRIVATE(d_backing_inode(fd_file(f)->f_path.dentry)))
+ return -EBADFD;
+
*path = fd_file(f)->f_path;
path_get(path);
-
-out_fdput:
- fdput(f);
- return err;
+ return 0;
}
static int add_rule_path_beneath(struct landlock_ruleset *const ruleset,
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...)
2024-07-30 5:16 ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
@ 2024-08-07 10:35 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:35 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:05AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it.
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (17 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 19/39] fdget_raw() users: switch to CLASS(fd_raw, ...) viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:36 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput() viro
` (19 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdget_pos() for constructor, fdput_pos() for cleanup, all users of
fd..._pos() converted trivially.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
arch/alpha/kernel/osf_sys.c | 5 ++---
fs/read_write.c | 34 +++++++++++++---------------------
fs/readdir.c | 28 ++++++++++------------------
include/linux/file.h | 1 +
4 files changed, 26 insertions(+), 42 deletions(-)
diff --git a/arch/alpha/kernel/osf_sys.c b/arch/alpha/kernel/osf_sys.c
index 56fea57f9642..09d82f0963d0 100644
--- a/arch/alpha/kernel/osf_sys.c
+++ b/arch/alpha/kernel/osf_sys.c
@@ -152,7 +152,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
long __user *, basep)
{
int error;
- struct fd arg = fdget_pos(fd);
+ CLASS(fd_pos, arg)(fd);
struct osf_dirent_callback buf = {
.ctx.actor = osf_filldir,
.dirent = dirent,
@@ -160,7 +160,7 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
.count = count
};
- if (!fd_file(arg))
+ if (fd_empty(arg))
return -EBADF;
error = iterate_dir(fd_file(arg), &buf.ctx);
@@ -169,7 +169,6 @@ SYSCALL_DEFINE4(osf_getdirentries, unsigned int, fd,
if (count != buf.count)
error = count - buf.count;
- fdput_pos(arg);
return error;
}
diff --git a/fs/read_write.c b/fs/read_write.c
index 59d6d0dee579..7e5a2e50e12b 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -293,8 +293,8 @@ EXPORT_SYMBOL(vfs_llseek);
static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
{
off_t retval;
- struct fd f = fdget_pos(fd);
- if (!fd_file(f))
+ CLASS(fd_pos, f)(fd);
+ if (fd_empty(f))
return -EBADF;
retval = -EINVAL;
@@ -304,7 +304,6 @@ static off_t ksys_lseek(unsigned int fd, off_t offset, unsigned int whence)
if (res != (loff_t)retval)
retval = -EOVERFLOW; /* LFS: should only happen on 32 bit platforms */
}
- fdput_pos(f);
return retval;
}
@@ -327,15 +326,14 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
unsigned int, whence)
{
int retval;
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
loff_t offset;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- retval = -EINVAL;
if (whence > SEEK_MAX)
- goto out_putf;
+ return -EINVAL;
offset = vfs_llseek(fd_file(f), ((loff_t) offset_high << 32) | offset_low,
whence);
@@ -346,8 +344,6 @@ SYSCALL_DEFINE5(llseek, unsigned int, fd, unsigned long, offset_high,
if (!copy_to_user(result, &offset, sizeof(offset)))
retval = 0;
}
-out_putf:
- fdput_pos(f);
return retval;
}
#endif
@@ -607,10 +603,10 @@ static inline loff_t *file_ppos(struct file *file)
ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
{
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
ssize_t ret = -EBADF;
- if (fd_file(f)) {
+ if (!fd_empty(f)) {
loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
@@ -619,7 +615,6 @@ ssize_t ksys_read(unsigned int fd, char __user *buf, size_t count)
ret = vfs_read(fd_file(f), buf, count, ppos);
if (ret >= 0 && ppos)
fd_file(f)->f_pos = pos;
- fdput_pos(f);
}
return ret;
}
@@ -631,10 +626,10 @@ SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
{
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
ssize_t ret = -EBADF;
- if (fd_file(f)) {
+ if (!fd_empty(f)) {
loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
@@ -643,7 +638,6 @@ ssize_t ksys_write(unsigned int fd, const char __user *buf, size_t count)
ret = vfs_write(fd_file(f), buf, count, ppos);
if (ret >= 0 && ppos)
fd_file(f)->f_pos = pos;
- fdput_pos(f);
}
return ret;
@@ -982,10 +976,10 @@ static ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, rwf_t flags)
{
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
ssize_t ret = -EBADF;
- if (fd_file(f)) {
+ if (!fd_empty(f)) {
loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
@@ -994,7 +988,6 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
ret = vfs_readv(fd_file(f), vec, vlen, ppos, flags);
if (ret >= 0 && ppos)
fd_file(f)->f_pos = pos;
- fdput_pos(f);
}
if (ret > 0)
@@ -1006,10 +999,10 @@ static ssize_t do_readv(unsigned long fd, const struct iovec __user *vec,
static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, rwf_t flags)
{
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
ssize_t ret = -EBADF;
- if (fd_file(f)) {
+ if (!fd_empty(f)) {
loff_t pos, *ppos = file_ppos(fd_file(f));
if (ppos) {
pos = *ppos;
@@ -1018,7 +1011,6 @@ static ssize_t do_writev(unsigned long fd, const struct iovec __user *vec,
ret = vfs_writev(fd_file(f), vec, vlen, ppos, flags);
if (ret >= 0 && ppos)
fd_file(f)->f_pos = pos;
- fdput_pos(f);
}
if (ret > 0)
diff --git a/fs/readdir.c b/fs/readdir.c
index 6d29cab8576e..0038efda417b 100644
--- a/fs/readdir.c
+++ b/fs/readdir.c
@@ -219,20 +219,19 @@ SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
struct old_linux_dirent __user *, dirent, unsigned int, count)
{
int error;
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
struct readdir_callback buf = {
.ctx.actor = fillonedir,
.dirent = dirent
};
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = iterate_dir(fd_file(f), &buf.ctx);
if (buf.result)
error = buf.result;
- fdput_pos(f);
return error;
}
@@ -309,7 +308,7 @@ static bool filldir(struct dir_context *ctx, const char *name, int namlen,
SYSCALL_DEFINE3(getdents, unsigned int, fd,
struct linux_dirent __user *, dirent, unsigned int, count)
{
- struct fd f;
+ CLASS(fd_pos, f)(fd);
struct getdents_callback buf = {
.ctx.actor = filldir,
.count = count,
@@ -317,8 +316,7 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
};
int error;
- f = fdget_pos(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = iterate_dir(fd_file(f), &buf.ctx);
@@ -333,7 +331,6 @@ SYSCALL_DEFINE3(getdents, unsigned int, fd,
else
error = count - buf.count;
}
- fdput_pos(f);
return error;
}
@@ -392,7 +389,7 @@ static bool filldir64(struct dir_context *ctx, const char *name, int namlen,
SYSCALL_DEFINE3(getdents64, unsigned int, fd,
struct linux_dirent64 __user *, dirent, unsigned int, count)
{
- struct fd f;
+ CLASS(fd_pos, f)(fd);
struct getdents_callback64 buf = {
.ctx.actor = filldir64,
.count = count,
@@ -400,8 +397,7 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
};
int error;
- f = fdget_pos(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = iterate_dir(fd_file(f), &buf.ctx);
@@ -417,7 +413,6 @@ SYSCALL_DEFINE3(getdents64, unsigned int, fd,
else
error = count - buf.count;
}
- fdput_pos(f);
return error;
}
@@ -477,20 +472,19 @@ COMPAT_SYSCALL_DEFINE3(old_readdir, unsigned int, fd,
struct compat_old_linux_dirent __user *, dirent, unsigned int, count)
{
int error;
- struct fd f = fdget_pos(fd);
+ CLASS(fd_pos, f)(fd);
struct compat_readdir_callback buf = {
.ctx.actor = compat_fillonedir,
.dirent = dirent
};
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = iterate_dir(fd_file(f), &buf.ctx);
if (buf.result)
error = buf.result;
- fdput_pos(f);
return error;
}
@@ -560,7 +554,7 @@ static bool compat_filldir(struct dir_context *ctx, const char *name, int namlen
COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
struct compat_linux_dirent __user *, dirent, unsigned int, count)
{
- struct fd f;
+ CLASS(fd_pos, f)(fd);
struct compat_getdents_callback buf = {
.ctx.actor = compat_filldir,
.current_dir = dirent,
@@ -568,8 +562,7 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
};
int error;
- f = fdget_pos(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = iterate_dir(fd_file(f), &buf.ctx);
@@ -584,7 +577,6 @@ COMPAT_SYSCALL_DEFINE3(getdents, unsigned int, fd,
else
error = count - buf.count;
}
- fdput_pos(f);
return error;
}
#endif
diff --git a/include/linux/file.h b/include/linux/file.h
index d3165d7a8112..2a0f4e13e1af 100644
--- a/include/linux/file.h
+++ b/include/linux/file.h
@@ -108,6 +108,7 @@ static inline void fdput_pos(struct fd f)
DEFINE_CLASS(fd, struct fd, fdput(_T), fdget(fd), int fd)
DEFINE_CLASS(fd_raw, struct fd, fdput(_T), fdget_raw(fd), int fd)
+DEFINE_CLASS(fd_pos, struct fd, fdput_pos(_T), fdget_pos(fd), int fd)
extern int f_dupfd(unsigned int from, struct file *file, unsigned flags);
extern int replace_fd(unsigned fd, struct file *file, unsigned flags);
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it.
2024-07-30 5:16 ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
@ 2024-08-07 10:36 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:36 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:06AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdget_pos() for constructor, fdput_pos() for cleanup, all users of
> fd..._pos() converted trivially.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (18 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 20/39] introduce "fd_pos" class, convert fdget_pos() users to it viro
@ 2024-07-30 5:16 ` viro
2024-07-30 5:16 ` [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() viro
` (18 subsequent siblings)
38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Preparation for CLASS(fd) conversion.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/ocfs2/cluster/heartbeat.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index 4b9f45d7049e..bc55340a60c3 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1770,23 +1770,23 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
int live_threshold;
if (reg->hr_bdev_file)
- goto out;
+ return -EINVAL;
/* We can't heartbeat without having had our node number
* configured yet. */
if (o2nm_this_node() == O2NM_MAX_NODES)
- goto out;
+ return -EINVAL;
fd = simple_strtol(p, &p, 0);
if (!p || (*p && (*p != '\n')))
- goto out;
+ return -EINVAL;
if (fd < 0 || fd >= INT_MAX)
- goto out;
+ return -EINVAL;
f = fdget(fd);
if (fd_file(f) == NULL)
- goto out;
+ return -EINVAL;
if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
reg->hr_block_bytes == 0)
@@ -1908,7 +1908,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
}
out2:
fdput(f);
-out:
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (19 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 21/39] o2hb_region_dev_store(): avoid goto around fdget()/fdput() viro
@ 2024-07-30 5:16 ` viro
2024-07-30 5:16 ` [PATCH 23/39] fdget(), trivial conversions viro
` (17 subsequent siblings)
38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
just call it, same as privcmd_ioeventfd_deassign() does...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/xen/privcmd.c | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 54e4f285c0f4..ba02b732fa49 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -1324,7 +1324,6 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
struct privcmd_kernel_ioeventfd *kioeventfd;
struct privcmd_kernel_ioreq *kioreq;
unsigned long flags;
- struct fd f;
int ret;
/* Check for range overflow */
@@ -1344,15 +1343,7 @@ static int privcmd_ioeventfd_assign(struct privcmd_ioeventfd *ioeventfd)
if (!kioeventfd)
return -ENOMEM;
- f = fdget(ioeventfd->event_fd);
- if (!fd_file(f)) {
- ret = -EBADF;
- goto error_kfree;
- }
-
- kioeventfd->eventfd = eventfd_ctx_fileget(fd_file(f));
- fdput(f);
-
+ kioeventfd->eventfd = eventfd_ctx_fdget(ioeventfd->event_fd);
if (IS_ERR(kioeventfd->eventfd)) {
ret = PTR_ERR(kioeventfd->eventfd);
goto error_kfree;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* [PATCH 23/39] fdget(), trivial conversions
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (20 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 22/39] privcmd_ioeventfd_assign(): don't open-code eventfd_ctx_fdget() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:37 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 24/39] fdget(), more " viro
` (16 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdget() is the first thing done in scope, all matching fdput() are
immediately followed by leaving the scope.
[conflict in fs/fhandle.c and a trival one in kernel/bpf/syscall.c]
[conflict in fs/xattr.c]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
arch/powerpc/kvm/book3s_64_vio.c | 21 +++---------
arch/powerpc/kvm/powerpc.c | 24 ++++---------
arch/powerpc/platforms/cell/spu_syscalls.c | 6 ++--
arch/x86/kernel/cpu/sgx/main.c | 10 ++----
arch/x86/kvm/svm/sev.c | 39 ++++++++--------------
drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c | 23 ++++---------
drivers/gpu/drm/drm_syncobj.c | 9 ++---
drivers/media/rc/lirc_dev.c | 13 +++-----
fs/btrfs/ioctl.c | 5 ++-
fs/eventfd.c | 9 ++---
fs/eventpoll.c | 23 ++++---------
fs/fhandle.c | 5 ++-
fs/ioctl.c | 23 +++++--------
fs/kernel_read_file.c | 12 +++----
fs/notify/fanotify/fanotify_user.c | 15 +++------
fs/notify/inotify/inotify_user.c | 17 +++-------
fs/open.c | 36 +++++++++-----------
fs/read_write.c | 28 +++++-----------
fs/signalfd.c | 9 ++---
fs/sync.c | 29 ++++++----------
fs/xattr.c | 35 ++++++++-----------
io_uring/sqpoll.c | 29 +++++-----------
kernel/bpf/btf.c | 11 ++----
kernel/bpf/syscall.c | 10 ++----
kernel/bpf/token.c | 9 ++---
kernel/events/core.c | 14 +++-----
kernel/nsproxy.c | 5 ++-
kernel/pid.c | 7 ++--
kernel/sys.c | 15 +++------
kernel/watch_queue.c | 6 ++--
mm/fadvise.c | 10 ++----
mm/readahead.c | 17 +++-------
net/core/net_namespace.c | 10 +++---
security/landlock/syscalls.c | 26 +++++----------
virt/kvm/vfio.c | 8 ++---
35 files changed, 187 insertions(+), 381 deletions(-)
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 34c0adb9fdbf..742aa58a7c7e 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -115,10 +115,9 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
struct iommu_table_group *table_group;
long i;
struct kvmppc_spapr_tce_iommu_table *stit;
- struct fd f;
+ CLASS(fd, f)(tablefd);
- f = fdget(tablefd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
rcu_read_lock();
@@ -130,16 +129,12 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
}
rcu_read_unlock();
- if (!found) {
- fdput(f);
+ if (!found)
return -EINVAL;
- }
table_group = iommu_group_get_iommudata(grp);
- if (WARN_ON(!table_group)) {
- fdput(f);
+ if (WARN_ON(!table_group))
return -EFAULT;
- }
for (i = 0; i < IOMMU_TABLE_GROUP_MAX_TABLES; ++i) {
struct iommu_table *tbltmp = table_group->tables[i];
@@ -160,10 +155,8 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
break;
}
}
- if (!tbl) {
- fdput(f);
+ if (!tbl)
return -EINVAL;
- }
rcu_read_lock();
list_for_each_entry_rcu(stit, &stt->iommu_tables, next) {
@@ -174,7 +167,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
/* stit is being destroyed */
iommu_tce_table_put(tbl);
rcu_read_unlock();
- fdput(f);
return -ENOTTY;
}
/*
@@ -182,7 +174,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
* its KVM reference counter and can return.
*/
rcu_read_unlock();
- fdput(f);
return 0;
}
rcu_read_unlock();
@@ -190,7 +181,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
stit = kzalloc(sizeof(*stit), GFP_KERNEL);
if (!stit) {
iommu_tce_table_put(tbl);
- fdput(f);
return -ENOMEM;
}
@@ -199,7 +189,6 @@ long kvm_spapr_tce_attach_iommu_group(struct kvm *kvm, int tablefd,
list_add_rcu(&stit->next, &stt->iommu_tables);
- fdput(f);
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index f14329989e9a..b3b37ea77849 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1933,12 +1933,11 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
#endif
#ifdef CONFIG_KVM_MPIC
case KVM_CAP_IRQ_MPIC: {
- struct fd f;
+ CLASS(fd, f)(cap->args[0]);
struct kvm_device *dev;
r = -EBADF;
- f = fdget(cap->args[0]);
- if (!fd_file(f))
+ if (fd_empty(f))
break;
r = -EPERM;
@@ -1946,18 +1945,16 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
if (dev)
r = kvmppc_mpic_connect_vcpu(dev, vcpu, cap->args[1]);
- fdput(f);
break;
}
#endif
#ifdef CONFIG_KVM_XICS
case KVM_CAP_IRQ_XICS: {
- struct fd f;
+ CLASS(fd, f)(cap->args[0]);
struct kvm_device *dev;
r = -EBADF;
- f = fdget(cap->args[0]);
- if (!fd_file(f))
+ if (fd_empty(f))
break;
r = -EPERM;
@@ -1968,34 +1965,27 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu,
else
r = kvmppc_xics_connect_vcpu(dev, vcpu, cap->args[1]);
}
-
- fdput(f);
break;
}
#endif /* CONFIG_KVM_XICS */
#ifdef CONFIG_KVM_XIVE
case KVM_CAP_PPC_IRQ_XIVE: {
- struct fd f;
+ CLASS(fd, f)(cap->args[0]);
struct kvm_device *dev;
r = -EBADF;
- f = fdget(cap->args[0]);
- if (!fd_file(f))
+ if (fd_empty(f))
break;
r = -ENXIO;
- if (!xive_enabled()) {
- fdput(f);
+ if (!xive_enabled())
break;
- }
r = -EPERM;
dev = kvm_device_from_filp(fd_file(f));
if (dev)
r = kvmppc_xive_native_connect_vcpu(dev, vcpu,
cap->args[1]);
-
- fdput(f);
break;
}
#endif /* CONFIG_KVM_XIVE */
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index cd7d42fc12a6..da4fad7fc8bf 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -64,12 +64,10 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
return -ENOSYS;
if (flags & SPU_CREATE_AFFINITY_SPU) {
- struct fd neighbor = fdget(neighbor_fd);
+ CLASS(fd, neighbor)(neighbor_fd);
ret = -EBADF;
- if (fd_file(neighbor)) {
+ if (!fd_empty(neighbor))
ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
- fdput(neighbor);
- }
} else
ret = calls->create_thread(name, flags, mode, NULL);
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index d01deb386395..d91284001162 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -893,19 +893,15 @@ static struct miscdevice sgx_dev_provision = {
int sgx_set_attribute(unsigned long *allowed_attributes,
unsigned int attribute_fd)
{
- struct fd f = fdget(attribute_fd);
+ CLASS(fd, f)(attribute_fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EINVAL;
- if (fd_file(f)->f_op != &sgx_provision_fops) {
- fdput(f);
+ if (fd_file(f)->f_op != &sgx_provision_fops)
return -EINVAL;
- }
*allowed_attributes |= SGX_ATTR_PROVISIONKEY;
-
- fdput(f);
return 0;
}
EXPORT_SYMBOL_GPL(sgx_set_attribute);
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 0e38b5223263..65d6670a5969 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -530,17 +530,12 @@ static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
static int __sev_issue_cmd(int fd, int id, void *data, int *error)
{
- struct fd f;
- int ret;
+ CLASS(fd, f)(fd);
- f = fdget(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- ret = sev_issue_cmd_external_user(fd_file(f), id, data, error);
-
- fdput(f);
- return ret;
+ return sev_issue_cmd_external_user(fd_file(f), id, data, error);
}
static int sev_issue_cmd(struct kvm *kvm, int id, void *data, int *error)
@@ -2073,23 +2068,21 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
{
struct kvm_sev_info *dst_sev = &to_kvm_svm(kvm)->sev_info;
struct kvm_sev_info *src_sev, *cg_cleanup_sev;
- struct fd f = fdget(source_fd);
+ CLASS(fd, f)(source_fd);
struct kvm *source_kvm;
bool charged = false;
int ret;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- if (!file_is_kvm(fd_file(f))) {
- ret = -EBADF;
- goto out_fput;
- }
+ if (!file_is_kvm(fd_file(f)))
+ return -EBADF;
source_kvm = fd_file(f)->private_data;
ret = sev_lock_two_vms(kvm, source_kvm);
if (ret)
- goto out_fput;
+ return ret;
if (kvm->arch.vm_type != source_kvm->arch.vm_type ||
sev_guest(kvm) || !sev_guest(source_kvm)) {
@@ -2136,8 +2129,6 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
cg_cleanup_sev->misc_cg = NULL;
out_unlock:
sev_unlock_two_vms(kvm, source_kvm);
-out_fput:
- fdput(f);
return ret;
}
@@ -2796,23 +2787,21 @@ int sev_mem_enc_unregister_region(struct kvm *kvm,
int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
{
- struct fd f = fdget(source_fd);
+ CLASS(fd, f)(source_fd);
struct kvm *source_kvm;
struct kvm_sev_info *source_sev, *mirror_sev;
int ret;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- if (!file_is_kvm(fd_file(f))) {
- ret = -EBADF;
- goto e_source_fput;
- }
+ if (!file_is_kvm(fd_file(f)))
+ return -EBADF;
source_kvm = fd_file(f)->private_data;
ret = sev_lock_two_vms(kvm, source_kvm);
if (ret)
- goto e_source_fput;
+ return ret;
/*
* Mirrors of mirrors should work, but let's not get silly. Also
@@ -2855,8 +2844,6 @@ int sev_vm_copy_enc_context_from(struct kvm *kvm, unsigned int source_fd)
e_unlock:
sev_unlock_two_vms(kvm, source_kvm);
-e_source_fput:
- fdput(f);
return ret;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
index a9298cb8d19a..570634654489 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sched.c
@@ -36,21 +36,19 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
int fd,
int32_t priority)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
struct amdgpu_fpriv *fpriv;
struct amdgpu_ctx_mgr *mgr;
struct amdgpu_ctx *ctx;
uint32_t id;
int r;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EINVAL;
r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
- if (r) {
- fdput(f);
+ if (r)
return r;
- }
mgr = &fpriv->ctx_mgr;
mutex_lock(&mgr->lock);
@@ -58,7 +56,6 @@ static int amdgpu_sched_process_priority_override(struct amdgpu_device *adev,
amdgpu_ctx_priority_override(ctx, priority);
mutex_unlock(&mgr->lock);
- fdput(f);
return 0;
}
@@ -67,31 +64,25 @@ static int amdgpu_sched_context_priority_override(struct amdgpu_device *adev,
unsigned ctx_id,
int32_t priority)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
struct amdgpu_fpriv *fpriv;
struct amdgpu_ctx *ctx;
int r;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EINVAL;
r = amdgpu_file_to_fpriv(fd_file(f), &fpriv);
- if (r) {
- fdput(f);
+ if (r)
return r;
- }
ctx = amdgpu_ctx_get(fpriv, ctx_id);
- if (!ctx) {
- fdput(f);
+ if (!ctx)
return -EINVAL;
- }
amdgpu_ctx_priority_override(ctx, priority);
amdgpu_ctx_put(ctx);
- fdput(f);
-
return 0;
}
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 7fb31ca3b5fc..4eaebd69253b 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -712,16 +712,14 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
int fd, u32 *handle)
{
struct drm_syncobj *syncobj;
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
int ret;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EINVAL;
- if (fd_file(f)->f_op != &drm_syncobj_file_fops) {
- fdput(f);
+ if (fd_file(f)->f_op != &drm_syncobj_file_fops)
return -EINVAL;
- }
/* take a reference to put in the idr */
syncobj = fd_file(f)->private_data;
@@ -739,7 +737,6 @@ static int drm_syncobj_fd_to_handle(struct drm_file *file_private,
} else
drm_syncobj_put(syncobj);
- fdput(f);
return ret;
}
diff --git a/drivers/media/rc/lirc_dev.c b/drivers/media/rc/lirc_dev.c
index b8dfd530fab7..af6337489d21 100644
--- a/drivers/media/rc/lirc_dev.c
+++ b/drivers/media/rc/lirc_dev.c
@@ -816,28 +816,23 @@ void __exit lirc_dev_exit(void)
struct rc_dev *rc_dev_get_from_fd(int fd, bool write)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
struct lirc_fh *fh;
struct rc_dev *dev;
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
- if (fd_file(f)->f_op != &lirc_fops) {
- fdput(f);
+ if (fd_file(f)->f_op != &lirc_fops)
return ERR_PTR(-EINVAL);
- }
- if (write && !(fd_file(f)->f_mode & FMODE_WRITE)) {
- fdput(f);
+ if (write && !(fd_file(f)->f_mode & FMODE_WRITE))
return ERR_PTR(-EPERM);
- }
fh = fd_file(f)->private_data;
dev = fh->rc;
get_device(&dev->dev);
- fdput(f);
return dev;
}
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 32ddd3d31719..04bee97a1ca0 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -1310,9 +1310,9 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
ret = btrfs_mksubvol(&file->f_path, idmap, name,
namelen, NULL, readonly, inherit);
} else {
- struct fd src = fdget(fd);
+ CLASS(fd, src)(fd);
struct inode *src_inode;
- if (!fd_file(src)) {
+ if (fd_empty(src)) {
ret = -EINVAL;
goto out_drop_write;
}
@@ -1343,7 +1343,6 @@ static noinline int __btrfs_ioctl_snap_create(struct file *file,
BTRFS_I(src_inode)->root,
readonly, inherit);
}
- fdput(src);
}
out_drop_write:
mnt_drop_write_file(file);
diff --git a/fs/eventfd.c b/fs/eventfd.c
index 22c934f3a080..76129bfcd663 100644
--- a/fs/eventfd.c
+++ b/fs/eventfd.c
@@ -347,13 +347,10 @@ EXPORT_SYMBOL_GPL(eventfd_fget);
*/
struct eventfd_ctx *eventfd_ctx_fdget(int fd)
{
- struct eventfd_ctx *ctx;
- struct fd f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
- ctx = eventfd_ctx_fileget(fd_file(f));
- fdput(f);
- return ctx;
+ return eventfd_ctx_fileget(fd_file(f));
}
EXPORT_SYMBOL_GPL(eventfd_ctx_fdget);
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 28d1a754cf33..4609f7342005 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2259,25 +2259,22 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
{
int error;
int full_check = 0;
- struct fd f, tf;
struct eventpoll *ep;
struct epitem *epi;
struct eventpoll *tep = NULL;
- error = -EBADF;
- f = fdget(epfd);
- if (!fd_file(f))
- goto error_return;
+ CLASS(fd, f)(epfd);
+ if (fd_empty(f))
+ return -EBADF;
/* Get the "struct file *" for the target file */
- tf = fdget(fd);
- if (!fd_file(tf))
- goto error_fput;
+ CLASS(fd, tf)(fd);
+ if (fd_empty(tf))
+ return -EBADF;
/* The target file descriptor must support poll */
- error = -EPERM;
if (!file_can_poll(fd_file(tf)))
- goto error_tgt_fput;
+ return -EPERM;
/* Check if EPOLLWAKEUP is allowed */
if (ep_op_has_event(op))
@@ -2396,12 +2393,6 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
loop_check_gen++;
mutex_unlock(&epnested_mutex);
}
-
- fdput(tf);
-error_fput:
- fdput(f);
-error_return:
-
return error;
}
diff --git a/fs/fhandle.c b/fs/fhandle.c
index 3f07b52874a8..5996fd47a6f7 100644
--- a/fs/fhandle.c
+++ b/fs/fhandle.c
@@ -124,12 +124,11 @@ static int get_path_from_fd(int fd, struct path *root)
path_get(root);
spin_unlock(&fs->lock);
} else {
- struct fd f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EBADF;
*root = fd_file(f)->f_path;
path_get(root);
- fdput(f);
}
return 0;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index 6e0c954388d4..638a36be31c1 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -231,11 +231,11 @@ static int ioctl_fiemap(struct file *filp, struct fiemap __user *ufiemap)
static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
u64 off, u64 olen, u64 destoff)
{
- struct fd src_file = fdget(srcfd);
+ CLASS(fd, src_file)(srcfd);
loff_t cloned;
int ret;
- if (!fd_file(src_file))
+ if (fd_empty(src_file))
return -EBADF;
cloned = vfs_clone_file_range(fd_file(src_file), off, dst_file, destoff,
olen, 0);
@@ -245,7 +245,6 @@ static long ioctl_file_clone(struct file *dst_file, unsigned long srcfd,
ret = -EINVAL;
else
ret = 0;
- fdput(src_file);
return ret;
}
@@ -892,22 +891,20 @@ static int do_vfs_ioctl(struct file *filp, unsigned int fd,
SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd, unsigned long, arg)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
int error;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = security_file_ioctl(fd_file(f), cmd, arg);
if (error)
- goto out;
+ return error;
error = do_vfs_ioctl(fd_file(f), fd, cmd, arg);
if (error == -ENOIOCTLCMD)
error = vfs_ioctl(fd_file(f), cmd, arg);
-out:
- fdput(f);
return error;
}
@@ -950,15 +947,15 @@ EXPORT_SYMBOL(compat_ptr_ioctl);
COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
compat_ulong_t, arg)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
int error;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
error = security_file_ioctl_compat(fd_file(f), cmd, arg);
if (error)
- goto out;
+ return error;
switch (cmd) {
/* FICLONE takes an int argument, so don't use compat_ptr() */
@@ -1009,10 +1006,6 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd, unsigned int, cmd,
error = -ENOTTY;
break;
}
-
- out:
- fdput(f);
-
return error;
}
#endif
diff --git a/fs/kernel_read_file.c b/fs/kernel_read_file.c
index 9ff37ae650ea..de32c95d823d 100644
--- a/fs/kernel_read_file.c
+++ b/fs/kernel_read_file.c
@@ -175,15 +175,11 @@ ssize_t kernel_read_file_from_fd(int fd, loff_t offset, void **buf,
size_t buf_size, size_t *file_size,
enum kernel_read_file_id id)
{
- struct fd f = fdget(fd);
- ssize_t ret = -EBADF;
+ CLASS(fd, f)(fd);
- if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
- goto out;
+ if (fd_empty(f) || !(fd_file(f)->f_mode & FMODE_READ))
+ return -EBADF;
- ret = kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
-out:
- fdput(f);
- return ret;
+ return kernel_read_file(fd_file(f), offset, buf, buf_size, file_size, id);
}
EXPORT_SYMBOL_GPL(kernel_read_file_from_fd);
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 13454e5fd3fb..64b1adc75fe5 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1003,22 +1003,17 @@ static int fanotify_find_path(int dfd, const char __user *filename,
dfd, filename, flags);
if (filename == NULL) {
- struct fd f = fdget(dfd);
+ CLASS(fd, f)(dfd);
- ret = -EBADF;
- if (!fd_file(f))
- goto out;
+ if (fd_empty(f))
+ return -EBADF;
- ret = -ENOTDIR;
if ((flags & FAN_MARK_ONLYDIR) &&
- !(S_ISDIR(file_inode(fd_file(f))->i_mode))) {
- fdput(f);
- goto out;
- }
+ !(S_ISDIR(file_inode(fd_file(f))->i_mode)))
+ return -ENOTDIR;
*path = fd_file(f)->f_path;
path_get(path);
- fdput(f);
} else {
unsigned int lookup_flags = 0;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index c7e451d5bd51..711830b73275 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -794,33 +794,26 @@ SYSCALL_DEFINE2(inotify_rm_watch, int, fd, __s32, wd)
{
struct fsnotify_group *group;
struct inotify_inode_mark *i_mark;
- struct fd f;
- int ret = -EINVAL;
+ CLASS(fd, f)(fd);
- f = fdget(fd);
- if (unlikely(!fd_file(f)))
+ if (fd_empty(f))
return -EBADF;
/* verify that this is indeed an inotify instance */
if (unlikely(fd_file(f)->f_op != &inotify_fops))
- goto out;
+ return -EINVAL;
group = fd_file(f)->private_data;
i_mark = inotify_idr_find(group, wd);
if (unlikely(!i_mark))
- goto out;
-
- ret = 0;
+ return -EINVAL;
fsnotify_destroy_mark(&i_mark->fsn_mark, group);
/* match ref taken by inotify_idr_find */
fsnotify_put_mark(&i_mark->fsn_mark);
-
-out:
- fdput(f);
- return ret;
+ return 0;
}
/*
diff --git a/fs/open.c b/fs/open.c
index 624c173a0270..6d7dd58ce095 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -350,14 +350,12 @@ EXPORT_SYMBOL_GPL(vfs_fallocate);
int ksys_fallocate(int fd, int mode, loff_t offset, loff_t len)
{
- struct fd f = fdget(fd);
- int error = -EBADF;
+ CLASS(fd, f)(fd);
- if (fd_file(f)) {
- error = vfs_fallocate(fd_file(f), mode, offset, len);
- fdput(f);
- }
- return error;
+ if (fd_empty(f))
+ return -EBADF;
+
+ return vfs_fallocate(fd_file(f), mode, offset, len);
}
SYSCALL_DEFINE4(fallocate, int, fd, int, mode, loff_t, offset, loff_t, len)
@@ -667,14 +665,12 @@ int vfs_fchmod(struct file *file, umode_t mode)
SYSCALL_DEFINE2(fchmod, unsigned int, fd, umode_t, mode)
{
- struct fd f = fdget(fd);
- int err = -EBADF;
+ CLASS(fd, f)(fd);
- if (fd_file(f)) {
- err = vfs_fchmod(fd_file(f), mode);
- fdput(f);
- }
- return err;
+ if (fd_empty(f))
+ return -EBADF;
+
+ return vfs_fchmod(fd_file(f), mode);
}
static int do_fchmodat(int dfd, const char __user *filename, umode_t mode,
@@ -861,14 +857,12 @@ int vfs_fchown(struct file *file, uid_t user, gid_t group)
int ksys_fchown(unsigned int fd, uid_t user, gid_t group)
{
- struct fd f = fdget(fd);
- int error = -EBADF;
+ CLASS(fd, f)(fd);
- if (fd_file(f)) {
- error = vfs_fchown(fd_file(f), user, group);
- fdput(f);
- }
- return error;
+ if (fd_empty(f))
+ return -EBADF;
+
+ return vfs_fchown(fd_file(f), user, group);
}
SYSCALL_DEFINE3(fchown, unsigned int, fd, uid_t, user, gid_t, group)
diff --git a/fs/read_write.c b/fs/read_write.c
index 7e5a2e50e12b..b3743e7dd93f 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1570,36 +1570,32 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
{
loff_t pos_in;
loff_t pos_out;
- struct fd f_in;
- struct fd f_out;
ssize_t ret = -EBADF;
- f_in = fdget(fd_in);
- if (!fd_file(f_in))
- goto out2;
+ CLASS(fd, f_in)(fd_in);
+ if (fd_empty(f_in))
+ return -EBADF;
- f_out = fdget(fd_out);
- if (!fd_file(f_out))
- goto out1;
+ CLASS(fd, f_out)(fd_out);
+ if (fd_empty(f_out))
+ return -EBADF;
- ret = -EFAULT;
if (off_in) {
if (copy_from_user(&pos_in, off_in, sizeof(loff_t)))
- goto out;
+ return -EFAULT;
} else {
pos_in = fd_file(f_in)->f_pos;
}
if (off_out) {
if (copy_from_user(&pos_out, off_out, sizeof(loff_t)))
- goto out;
+ return -EFAULT;
} else {
pos_out = fd_file(f_out)->f_pos;
}
- ret = -EINVAL;
if (flags != 0)
- goto out;
+ return -EINVAL;
ret = vfs_copy_file_range(fd_file(f_in), pos_in, fd_file(f_out), pos_out, len,
flags);
@@ -1621,12 +1617,6 @@ SYSCALL_DEFINE6(copy_file_range, int, fd_in, loff_t __user *, off_in,
fd_file(f_out)->f_pos = pos_out;
}
}
-
-out:
- fdput(f_out);
-out1:
- fdput(f_in);
-out2:
return ret;
}
diff --git a/fs/signalfd.c b/fs/signalfd.c
index 777e889ab0e8..b1844e0f58eb 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -288,20 +288,17 @@ static int do_signalfd4(int ufd, sigset_t *mask, int flags)
fd_install(ufd, file);
} else {
- struct fd f = fdget(ufd);
- if (!fd_file(f))
+ CLASS(fd, f)(ufd);
+ if (fd_empty(f))
return -EBADF;
ctx = fd_file(f)->private_data;
- if (fd_file(f)->f_op != &signalfd_fops) {
- fdput(f);
+ if (fd_file(f)->f_op != &signalfd_fops)
return -EINVAL;
- }
spin_lock_irq(¤t->sighand->siglock);
ctx->sigmask = *mask;
spin_unlock_irq(¤t->sighand->siglock);
wake_up(¤t->sighand->signalfd_wqh);
- fdput(f);
}
return ufd;
diff --git a/fs/sync.c b/fs/sync.c
index 67df255eb189..2955cd4c77a3 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -148,11 +148,11 @@ void emergency_sync(void)
*/
SYSCALL_DEFINE1(syncfs, int, fd)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
struct super_block *sb;
int ret, ret2;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
sb = fd_file(f)->f_path.dentry->d_sb;
@@ -162,7 +162,6 @@ SYSCALL_DEFINE1(syncfs, int, fd)
ret2 = errseq_check_and_advance(&sb->s_wb_err, &fd_file(f)->f_sb_err);
- fdput(f);
return ret ? ret : ret2;
}
@@ -205,14 +204,12 @@ EXPORT_SYMBOL(vfs_fsync);
static int do_fsync(unsigned int fd, int datasync)
{
- struct fd f = fdget(fd);
- int ret = -EBADF;
+ CLASS(fd, f)(fd);
- if (fd_file(f)) {
- ret = vfs_fsync(fd_file(f), datasync);
- fdput(f);
- }
- return ret;
+ if (fd_empty(f))
+ return -EBADF;
+
+ return vfs_fsync(fd_file(f), datasync);
}
SYSCALL_DEFINE1(fsync, unsigned int, fd)
@@ -355,16 +352,12 @@ int sync_file_range(struct file *file, loff_t offset, loff_t nbytes,
int ksys_sync_file_range(int fd, loff_t offset, loff_t nbytes,
unsigned int flags)
{
- int ret;
- struct fd f;
+ CLASS(fd, f)(fd);
- ret = -EBADF;
- f = fdget(fd);
- if (fd_file(f))
- ret = sync_file_range(fd_file(f), offset, nbytes, flags);
+ if (fd_empty(f))
+ return -EBADF;
- fdput(f);
- return ret;
+ return sync_file_range(fd_file(f), offset, nbytes, flags);
}
SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
diff --git a/fs/xattr.c b/fs/xattr.c
index 05ec7e7d9e87..0fc813cb005c 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -697,9 +697,9 @@ SYSCALL_DEFINE5(fsetxattr, int, fd, const char __user *, name,
int error;
CLASS(fd, f)(fd);
- if (!fd_file(f))
- return -EBADF;
+ if (fd_empty(f))
+ return -EBADF;
audit_file(fd_file(f));
error = setxattr_copy(name, &ctx);
if (error)
@@ -809,16 +809,13 @@ SYSCALL_DEFINE4(lgetxattr, const char __user *, pathname,
SYSCALL_DEFINE4(fgetxattr, int, fd, const char __user *, name,
void __user *, value, size_t, size)
{
- struct fd f = fdget(fd);
- ssize_t error = -EBADF;
+ CLASS(fd, f)(fd);
- if (!fd_file(f))
- return error;
+ if (fd_empty(f))
+ return -EBADF;
audit_file(fd_file(f));
- error = getxattr(file_mnt_idmap(fd_file(f)), fd_file(f)->f_path.dentry,
+ return getxattr(file_mnt_idmap(fd_file(f)), fd_file(f)->f_path.dentry,
name, value, size);
- fdput(f);
- return error;
}
/*
@@ -885,15 +882,12 @@ SYSCALL_DEFINE3(llistxattr, const char __user *, pathname, char __user *, list,
SYSCALL_DEFINE3(flistxattr, int, fd, char __user *, list, size_t, size)
{
- struct fd f = fdget(fd);
- ssize_t error = -EBADF;
+ CLASS(fd, f)(fd);
- if (!fd_file(f))
- return error;
+ if (fd_empty(f))
+ return -EBADF;
audit_file(fd_file(f));
- error = listxattr(fd_file(f)->f_path.dentry, list, size);
- fdput(f);
- return error;
+ return listxattr(fd_file(f)->f_path.dentry, list, size);
}
/*
@@ -950,12 +944,12 @@ SYSCALL_DEFINE2(lremovexattr, const char __user *, pathname,
SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
char kname[XATTR_NAME_MAX + 1];
- int error = -EBADF;
+ int error;
- if (!fd_file(f))
- return error;
+ if (fd_empty(f))
+ return -EBADF;
audit_file(fd_file(f));
error = strncpy_from_user(kname, name, sizeof(kname));
@@ -970,7 +964,6 @@ SYSCALL_DEFINE2(fremovexattr, int, fd, const char __user *, name)
fd_file(f)->f_path.dentry, kname);
mnt_drop_write_file(fd_file(f));
}
- fdput(f);
return error;
}
diff --git a/io_uring/sqpoll.c b/io_uring/sqpoll.c
index ffa7d341bd95..26b1c14d5967 100644
--- a/io_uring/sqpoll.c
+++ b/io_uring/sqpoll.c
@@ -105,29 +105,21 @@ static struct io_sq_data *io_attach_sq_data(struct io_uring_params *p)
{
struct io_ring_ctx *ctx_attach;
struct io_sq_data *sqd;
- struct fd f;
+ CLASS(fd, f)(p->wq_fd);
- f = fdget(p->wq_fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-ENXIO);
- if (!io_is_uring_fops(fd_file(f))) {
- fdput(f);
+ if (!io_is_uring_fops(fd_file(f)))
return ERR_PTR(-EINVAL);
- }
ctx_attach = fd_file(f)->private_data;
sqd = ctx_attach->sq_data;
- if (!sqd) {
- fdput(f);
+ if (!sqd)
return ERR_PTR(-EINVAL);
- }
- if (sqd->task_tgid != current->tgid) {
- fdput(f);
+ if (sqd->task_tgid != current->tgid)
return ERR_PTR(-EPERM);
- }
refcount_inc(&sqd->refs);
- fdput(f);
return sqd;
}
@@ -415,16 +407,11 @@ __cold int io_sq_offload_create(struct io_ring_ctx *ctx,
/* Retain compatibility with failing for an invalid attach attempt */
if ((ctx->flags & (IORING_SETUP_ATTACH_WQ | IORING_SETUP_SQPOLL)) ==
IORING_SETUP_ATTACH_WQ) {
- struct fd f;
-
- f = fdget(p->wq_fd);
- if (!fd_file(f))
+ CLASS(fd, f)(p->wq_fd);
+ if (fd_empty(f))
return -ENXIO;
- if (!io_is_uring_fops(fd_file(f))) {
- fdput(f);
+ if (!io_is_uring_fops(fd_file(f)))
return -EINVAL;
- }
- fdput(f);
}
if (ctx->flags & IORING_SETUP_SQPOLL) {
struct task_struct *tsk;
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index 4de1e3dc2284..8afcffefbb56 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -7673,21 +7673,16 @@ int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size)
struct btf *btf_get_by_fd(int fd)
{
struct btf *btf;
- struct fd f;
+ CLASS(fd, f)(fd);
- f = fdget(fd);
-
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
- if (fd_file(f)->f_op != &btf_fops) {
- fdput(f);
+ if (fd_file(f)->f_op != &btf_fops)
return ERR_PTR(-EINVAL);
- }
btf = fd_file(f)->private_data;
refcount_inc(&btf->refcnt);
- fdput(f);
return btf;
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 807042c643c8..4b39a1ea63e2 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3197,20 +3197,16 @@ int bpf_link_new_fd(struct bpf_link *link)
struct bpf_link *bpf_link_get_from_fd(u32 ufd)
{
- struct fd f = fdget(ufd);
+ CLASS(fd, f)(ufd);
struct bpf_link *link;
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
- if (fd_file(f)->f_op != &bpf_link_fops && fd_file(f)->f_op != &bpf_link_fops_poll) {
- fdput(f);
+ if (fd_file(f)->f_op != &bpf_link_fops && fd_file(f)->f_op != &bpf_link_fops_poll)
return ERR_PTR(-EINVAL);
- }
link = fd_file(f)->private_data;
bpf_link_inc(link);
- fdput(f);
-
return link;
}
EXPORT_SYMBOL(bpf_link_get_from_fd);
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 9a1d356e79ed..9b92cb886d49 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -232,19 +232,16 @@ int bpf_token_create(union bpf_attr *attr)
struct bpf_token *bpf_token_get_from_fd(u32 ufd)
{
- struct fd f = fdget(ufd);
+ CLASS(fd, f)(ufd);
struct bpf_token *token;
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
- if (fd_file(f)->f_op != &bpf_token_fops) {
- fdput(f);
+ if (fd_file(f)->f_op != &bpf_token_fops)
return ERR_PTR(-EINVAL);
- }
token = fd_file(f)->private_data;
bpf_token_inc(token);
- fdput(f);
return token;
}
diff --git a/kernel/events/core.c b/kernel/events/core.c
index dae815c30514..81c3c926e938 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -930,22 +930,20 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
{
struct perf_cgroup *cgrp;
struct cgroup_subsys_state *css;
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
int ret = 0;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
css = css_tryget_online_from_dir(fd_file(f)->f_path.dentry,
&perf_event_cgrp_subsys);
- if (IS_ERR(css)) {
- ret = PTR_ERR(css);
- goto out;
- }
+ if (IS_ERR(css))
+ return PTR_ERR(css);
ret = perf_cgroup_ensure_storage(event, css);
if (ret)
- goto out;
+ return ret;
cgrp = container_of(css, struct perf_cgroup, css);
event->cgrp = cgrp;
@@ -959,8 +957,6 @@ static inline int perf_cgroup_connect(int fd, struct perf_event *event,
perf_detach_cgroup(event);
ret = -EINVAL;
}
-out:
- fdput(f);
return ret;
}
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index dc952c3b05af..c9d97ed20122 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -545,12 +545,12 @@ static void commit_nsset(struct nsset *nsset)
SYSCALL_DEFINE2(setns, int, fd, int, flags)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
struct ns_common *ns = NULL;
struct nsset nsset = {};
int err = 0;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
if (proc_ns_file(fd_file(f))) {
@@ -580,7 +580,6 @@ SYSCALL_DEFINE2(setns, int, fd, int, flags)
}
put_nsset(&nsset);
out:
- fdput(f);
return err;
}
diff --git a/kernel/pid.c b/kernel/pid.c
index 2715afb77eab..b5bbc1a8a6e4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -536,11 +536,10 @@ EXPORT_SYMBOL_GPL(find_ge_pid);
struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
{
- struct fd f;
+ CLASS(fd, f)(fd);
struct pid *pid;
- f = fdget(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
pid = pidfd_pid(fd_file(f));
@@ -548,8 +547,6 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
get_pid(pid);
*flags = fd_file(f)->f_flags;
}
-
- fdput(f);
return pid;
}
diff --git a/kernel/sys.c b/kernel/sys.c
index a4be1e568ff5..243d58916899 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1911,12 +1911,11 @@ SYSCALL_DEFINE1(umask, int, mask)
static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
{
- struct fd exe;
+ CLASS(fd, exe)(fd);
struct inode *inode;
int err;
- exe = fdget(fd);
- if (!fd_file(exe))
+ if (fd_empty(exe))
return -EBADF;
inode = file_inode(fd_file(exe));
@@ -1926,18 +1925,14 @@ static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd)
* sure that this one is executable as well, to avoid breaking an
* overall picture.
*/
- err = -EACCES;
if (!S_ISREG(inode->i_mode) || path_noexec(&fd_file(exe)->f_path))
- goto exit;
+ return -EACCES;
err = file_permission(fd_file(exe), MAY_EXEC);
if (err)
- goto exit;
+ return err;
- err = replace_mm_exe_file(mm, fd_file(exe));
-exit:
- fdput(exe);
- return err;
+ return replace_mm_exe_file(mm, fd_file(exe));
}
/*
diff --git a/kernel/watch_queue.c b/kernel/watch_queue.c
index d36242fd4936..1895fbc32bcb 100644
--- a/kernel/watch_queue.c
+++ b/kernel/watch_queue.c
@@ -663,16 +663,14 @@ struct watch_queue *get_watch_queue(int fd)
{
struct pipe_inode_info *pipe;
struct watch_queue *wqueue = ERR_PTR(-EINVAL);
- struct fd f;
+ CLASS(fd, f)(fd);
- f = fdget(fd);
- if (fd_file(f)) {
+ if (!fd_empty(f)) {
pipe = get_pipe_info(fd_file(f), false);
if (pipe && pipe->watch_queue) {
wqueue = pipe->watch_queue;
kref_get(&wqueue->usage);
}
- fdput(f);
}
return wqueue;
diff --git a/mm/fadvise.c b/mm/fadvise.c
index 532dee205c6e..588fe76c5a14 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -190,16 +190,12 @@ EXPORT_SYMBOL(vfs_fadvise);
int ksys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice)
{
- struct fd f = fdget(fd);
- int ret;
+ CLASS(fd, f)(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
- ret = vfs_fadvise(fd_file(f), offset, len, advice);
-
- fdput(f);
- return ret;
+ return vfs_fadvise(fd_file(f), offset, len, advice);
}
SYSCALL_DEFINE4(fadvise64_64, int, fd, loff_t, offset, loff_t, len, int, advice)
diff --git a/mm/readahead.c b/mm/readahead.c
index e83fe1c6e5ac..fa881c55646e 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -641,29 +641,22 @@ EXPORT_SYMBOL_GPL(page_cache_async_ra);
ssize_t ksys_readahead(int fd, loff_t offset, size_t count)
{
- ssize_t ret;
- struct fd f;
+ CLASS(fd, f)(fd);
- ret = -EBADF;
- f = fdget(fd);
- if (!fd_file(f) || !(fd_file(f)->f_mode & FMODE_READ))
- goto out;
+ if (fd_empty(f) || !(fd_file(f)->f_mode & FMODE_READ))
+ return -EBADF;
/*
* The readahead() syscall is intended to run only on files
* that can execute readahead. If readahead is not possible
* on this file, then we must return -EINVAL.
*/
- ret = -EINVAL;
if (!fd_file(f)->f_mapping || !fd_file(f)->f_mapping->a_ops ||
(!S_ISREG(file_inode(fd_file(f))->i_mode) &&
!S_ISBLK(file_inode(fd_file(f))->i_mode)))
- goto out;
+ return -EINVAL;
- ret = vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
-out:
- fdput(f);
- return ret;
+ return vfs_fadvise(fd_file(f), offset, count, POSIX_FADV_WILLNEED);
}
SYSCALL_DEFINE3(readahead, int, fd, loff_t, offset, size_t, count)
diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index 18b7c8f31234..f4334aa5ef1f 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -708,20 +708,18 @@ EXPORT_SYMBOL_GPL(get_net_ns);
struct net *get_net_ns_by_fd(int fd)
{
- struct fd f = fdget(fd);
- struct net *net = ERR_PTR(-EINVAL);
+ CLASS(fd, f)(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return ERR_PTR(-EBADF);
if (proc_ns_file(fd_file(f))) {
struct ns_common *ns = get_proc_ns(file_inode(fd_file(f)));
if (ns->ops == &netns_operations)
- net = get_net(container_of(ns, struct net, ns));
+ return get_net(container_of(ns, struct net, ns));
}
- fdput(f);
- return net;
+ return ERR_PTR(-EINVAL);
}
EXPORT_SYMBOL_GPL(get_net_ns_by_fd);
#endif
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 9689169ad5cc..c7edb2b590d5 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -234,31 +234,21 @@ SYSCALL_DEFINE3(landlock_create_ruleset,
static struct landlock_ruleset *get_ruleset_from_fd(const int fd,
const fmode_t mode)
{
- struct fd ruleset_f;
+ CLASS(fd, ruleset_f)(fd);
struct landlock_ruleset *ruleset;
- ruleset_f = fdget(fd);
- if (!fd_file(ruleset_f))
+ if (fd_empty(ruleset_f))
return ERR_PTR(-EBADF);
/* Checks FD type and access right. */
- if (fd_file(ruleset_f)->f_op != &ruleset_fops) {
- ruleset = ERR_PTR(-EBADFD);
- goto out_fdput;
- }
- if (!(fd_file(ruleset_f)->f_mode & mode)) {
- ruleset = ERR_PTR(-EPERM);
- goto out_fdput;
- }
+ if (fd_file(ruleset_f)->f_op != &ruleset_fops)
+ return ERR_PTR(-EBADFD);
+ if (!(fd_file(ruleset_f)->f_mode & mode))
+ return ERR_PTR(-EPERM);
ruleset = fd_file(ruleset_f)->private_data;
- if (WARN_ON_ONCE(ruleset->num_layers != 1)) {
- ruleset = ERR_PTR(-EINVAL);
- goto out_fdput;
- }
+ if (WARN_ON_ONCE(ruleset->num_layers != 1))
+ return ERR_PTR(-EINVAL);
landlock_get_ruleset(ruleset);
-
-out_fdput:
- fdput(ruleset_f);
return ruleset;
}
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 388ae471d258..53262b8a7656 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -190,11 +190,10 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
{
struct kvm_vfio *kv = dev->private;
struct kvm_vfio_file *kvf;
- struct fd f;
+ CLASS(fd, f)(fd);
int ret;
- f = fdget(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
ret = -ENOENT;
@@ -220,9 +219,6 @@ static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
kvm_vfio_update_coherency(dev);
mutex_unlock(&kv->lock);
-
- fdput(f);
-
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 23/39] fdget(), trivial conversions
2024-07-30 5:16 ` [PATCH 23/39] fdget(), trivial conversions viro
@ 2024-08-07 10:37 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:37 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:09AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdget() is the first thing done in scope, all matching fdput() are
> immediately followed by leaving the scope.
>
> [conflict in fs/fhandle.c and a trival one in kernel/bpf/syscall.c]
> [conflict in fs/xattr.c]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 24/39] fdget(), more trivial conversions
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (21 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 23/39] fdget(), trivial conversions viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:39 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
` (15 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
all failure exits prior to fdget() leave the scope, all matching fdput()
are immediately followed by leaving the scope.
[trivial conflicts in fs/fsopen.c and kernel/bpf/syscall.c]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/infiniband/core/ucma.c | 19 +++-----
drivers/vfio/group.c | 6 +--
fs/eventpoll.c | 15 ++----
fs/ext4/ioctl.c | 21 +++------
fs/f2fs/file.c | 15 ++----
fs/fsopen.c | 19 +++-----
fs/fuse/dev.c | 6 +--
fs/locks.c | 15 ++----
fs/namespace.c | 47 ++++++------------
fs/notify/fanotify/fanotify_user.c | 29 +++++-------
fs/notify/inotify/inotify_user.c | 21 +++------
fs/ocfs2/cluster/heartbeat.c | 13 ++---
fs/open.c | 12 ++---
fs/read_write.c | 71 ++++++++++------------------
fs/splice.c | 45 +++++++-----------
fs/utimes.c | 11 ++---
fs/xfs/xfs_exchrange.c | 10 ++--
fs/xfs/xfs_ioctl.c | 69 +++++++++------------------
ipc/mqueue.c | 76 +++++++++---------------------
kernel/bpf/syscall.c | 22 +++------
kernel/module/main.c | 11 ++---
kernel/pid.c | 13 ++---
kernel/signal.c | 29 ++++--------
kernel/taskstats.c | 18 +++----
security/integrity/ima/ima_main.c | 7 +--
security/loadpin/loadpin.c | 8 +---
virt/kvm/vfio.c | 6 +--
27 files changed, 207 insertions(+), 427 deletions(-)
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index dc57d07a1f45..cc95718fd24b 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -1615,7 +1615,6 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
struct ucma_event *uevent, *tmp;
struct ucma_context *ctx;
LIST_HEAD(event_list);
- struct fd f;
struct ucma_file *cur_file;
int ret = 0;
@@ -1623,21 +1622,17 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
return -EFAULT;
/* Get current fd to protect against it being closed */
- f = fdget(cmd.fd);
- if (!fd_file(f))
+ CLASS(fd, f)(cmd.fd);
+ if (fd_empty(f))
return -ENOENT;
- if (fd_file(f)->f_op != &ucma_fops) {
- ret = -EINVAL;
- goto file_put;
- }
+ if (fd_file(f)->f_op != &ucma_fops)
+ return -EINVAL;
cur_file = fd_file(f)->private_data;
/* Validate current fd and prevent destruction of id. */
ctx = ucma_get_ctx(cur_file, cmd.id);
- if (IS_ERR(ctx)) {
- ret = PTR_ERR(ctx);
- goto file_put;
- }
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
rdma_lock_handler(ctx->cm_id);
/*
@@ -1678,8 +1673,6 @@ static ssize_t ucma_migrate_id(struct ucma_file *new_file,
err_unlock:
rdma_unlock_handler(ctx->cm_id);
ucma_put_ctx(ctx);
-file_put:
- fdput(f);
return ret;
}
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 95b336de8a17..49559605177e 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -104,15 +104,14 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
{
struct vfio_container *container;
struct iommufd_ctx *iommufd;
- struct fd f;
int ret;
int fd;
if (get_user(fd, arg))
return -EFAULT;
- f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EBADF;
mutex_lock(&group->group_lock);
@@ -153,7 +152,6 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group,
out_unlock:
mutex_unlock(&group->group_lock);
- fdput(f);
return ret;
}
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 4609f7342005..1e63f3b03ca5 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -2420,8 +2420,6 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
static int do_epoll_wait(int epfd, struct epoll_event __user *events,
int maxevents, struct timespec64 *to)
{
- int error;
- struct fd f;
struct eventpoll *ep;
/* The maximum number of event must be greater than zero */
@@ -2433,17 +2431,16 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
return -EFAULT;
/* Get the "struct file *" for the eventpoll file */
- f = fdget(epfd);
- if (!fd_file(f))
+ CLASS(fd, f)(epfd);
+ if (fd_empty(f))
return -EBADF;
/*
* We have to check that the file structure underneath the fd
* the user passed to us _is_ an eventpoll file.
*/
- error = -EINVAL;
if (!is_file_epoll(fd_file(f)))
- goto error_fput;
+ return -EINVAL;
/*
* At this point it is safe to assume that the "private_data" contains
@@ -2452,11 +2449,7 @@ static int do_epoll_wait(int epfd, struct epoll_event __user *events,
ep = fd_file(f)->private_data;
/* Time to fish for events ... */
- error = ep_poll(ep, events, maxevents, to);
-
-error_fput:
- fdput(f);
- return error;
+ return ep_poll(ep, events, maxevents, to);
}
SYSCALL_DEFINE4(epoll_wait, int, epfd, struct epoll_event __user *, events,
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 1c77400bd88e..7b9ce71c1c81 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -1330,7 +1330,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
case EXT4_IOC_MOVE_EXT: {
struct move_extent me;
- struct fd donor;
int err;
if (!(filp->f_mode & FMODE_READ) ||
@@ -1342,30 +1341,26 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
return -EFAULT;
me.moved_len = 0;
- donor = fdget(me.donor_fd);
- if (!fd_file(donor))
+ CLASS(fd, donor)(me.donor_fd);
+ if (fd_empty(donor))
return -EBADF;
- if (!(fd_file(donor)->f_mode & FMODE_WRITE)) {
- err = -EBADF;
- goto mext_out;
- }
+ if (!(fd_file(donor)->f_mode & FMODE_WRITE))
+ return -EBADF;
if (ext4_has_feature_bigalloc(sb)) {
ext4_msg(sb, KERN_ERR,
"Online defrag not supported with bigalloc");
- err = -EOPNOTSUPP;
- goto mext_out;
+ return -EOPNOTSUPP;
} else if (IS_DAX(inode)) {
ext4_msg(sb, KERN_ERR,
"Online defrag not supported with DAX");
- err = -EOPNOTSUPP;
- goto mext_out;
+ return -EOPNOTSUPP;
}
err = mnt_want_write_file(filp);
if (err)
- goto mext_out;
+ return err;
err = ext4_move_extents(filp, fd_file(donor), me.orig_start,
me.donor_start, me.len, &me.moved_len);
@@ -1374,8 +1369,6 @@ static long __ext4_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
if (copy_to_user((struct move_extent __user *)arg,
&me, sizeof(me)))
err = -EFAULT;
-mext_out:
- fdput(donor);
return err;
}
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 903337f8d21a..4682d83ab6b4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3006,32 +3006,27 @@ static int f2fs_move_file_range(struct file *file_in, loff_t pos_in,
static int __f2fs_ioc_move_range(struct file *filp,
struct f2fs_move_range *range)
{
- struct fd dst;
int err;
if (!(filp->f_mode & FMODE_READ) ||
!(filp->f_mode & FMODE_WRITE))
return -EBADF;
- dst = fdget(range->dst_fd);
- if (!fd_file(dst))
+ CLASS(fd, dst)(range->dst_fd);
+ if (fd_empty(dst))
return -EBADF;
- if (!(fd_file(dst)->f_mode & FMODE_WRITE)) {
- err = -EBADF;
- goto err_out;
- }
+ if (!(fd_file(dst)->f_mode & FMODE_WRITE))
+ return -EBADF;
err = mnt_want_write_file(filp);
if (err)
- goto err_out;
+ return err;
err = f2fs_move_file_range(filp, range->pos_in, fd_file(dst),
range->pos_out, range->len);
mnt_drop_write_file(filp);
-err_out:
- fdput(dst);
return err;
}
diff --git a/fs/fsopen.c b/fs/fsopen.c
index ee92ca58429e..ff807662a41c 100644
--- a/fs/fsopen.c
+++ b/fs/fsopen.c
@@ -350,7 +350,6 @@ SYSCALL_DEFINE5(fsconfig,
int, aux)
{
struct fs_context *fc;
- struct fd f;
int ret;
int lookup_flags = 0;
@@ -393,12 +392,11 @@ SYSCALL_DEFINE5(fsconfig,
return -EOPNOTSUPP;
}
- f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EBADF;
- ret = -EINVAL;
if (fd_file(f)->f_op != &fscontext_fops)
- goto out_f;
+ return -EINVAL;
fc = fd_file(f)->private_data;
if (fc->ops == &legacy_fs_context_ops) {
@@ -408,17 +406,14 @@ SYSCALL_DEFINE5(fsconfig,
case FSCONFIG_SET_PATH_EMPTY:
case FSCONFIG_SET_FD:
case FSCONFIG_CMD_CREATE_EXCL:
- ret = -EOPNOTSUPP;
- goto out_f;
+ return -EOPNOTSUPP;
}
}
if (_key) {
param.key = strndup_user(_key, 256);
- if (IS_ERR(param.key)) {
- ret = PTR_ERR(param.key);
- goto out_f;
- }
+ if (IS_ERR(param.key))
+ return PTR_ERR(param.key);
}
switch (cmd) {
@@ -497,7 +492,5 @@ SYSCALL_DEFINE5(fsconfig,
}
out_key:
kfree(param.key);
-out_f:
- fdput(f);
return ret;
}
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 991b9ae8e7c9..27826116a4fb 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -2315,13 +2315,12 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
int res;
int oldfd;
struct fuse_dev *fud = NULL;
- struct fd f;
if (get_user(oldfd, argp))
return -EFAULT;
- f = fdget(oldfd);
- if (!fd_file(f))
+ CLASS(fd, f)(oldfd);
+ if (fd_empty(f))
return -EINVAL;
/*
@@ -2338,7 +2337,6 @@ static long fuse_dev_ioctl_clone(struct file *file, __u32 __user *argp)
mutex_unlock(&fuse_mutex);
}
- fdput(f);
return res;
}
diff --git a/fs/locks.c b/fs/locks.c
index 239759a356e9..ea56dc70d445 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -2132,7 +2132,6 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
{
int can_sleep, error, type;
struct file_lock fl;
- struct fd f;
/*
* LOCK_MAND locks were broken for a long time in that they never
@@ -2151,19 +2150,18 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
if (type < 0)
return type;
- error = -EBADF;
- f = fdget(fd);
- if (!fd_file(f))
- return error;
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
+ return -EBADF;
if (type != F_UNLCK && !(fd_file(f)->f_mode & (FMODE_READ | FMODE_WRITE)))
- goto out_putf;
+ return -EBADF;
flock_make_lock(fd_file(f), &fl, type);
error = security_file_lock(fd_file(f), fl.c.flc_type);
if (error)
- goto out_putf;
+ return error;
can_sleep = !(cmd & LOCK_NB);
if (can_sleep)
@@ -2177,9 +2175,6 @@ SYSCALL_DEFINE2(flock, unsigned int, fd, unsigned int, cmd)
error = locks_lock_file_wait(fd_file(f), &fl);
locks_release_private(&fl);
- out_putf:
- fdput(f);
-
return error;
}
diff --git a/fs/namespace.c b/fs/namespace.c
index c46d48bb38cd..876ab546feba 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -4070,7 +4070,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
struct file *file;
struct path newmount;
struct mount *mnt;
- struct fd f;
unsigned int mnt_flags = 0;
long ret;
@@ -4098,19 +4097,18 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
return -EINVAL;
}
- f = fdget(fs_fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fs_fd);
+ if (fd_empty(f))
return -EBADF;
- ret = -EINVAL;
if (fd_file(f)->f_op != &fscontext_fops)
- goto err_fsfd;
+ return -EINVAL;
fc = fd_file(f)->private_data;
ret = mutex_lock_interruptible(&fc->uapi_mutex);
if (ret < 0)
- goto err_fsfd;
+ return ret;
/* There must be a valid superblock or we can't mount it */
ret = -EINVAL;
@@ -4177,8 +4175,6 @@ SYSCALL_DEFINE3(fsmount, int, fs_fd, unsigned int, flags,
path_put(&newmount);
err_unlock:
mutex_unlock(&fc->uapi_mutex);
-err_fsfd:
- fdput(f);
return ret;
}
@@ -4629,10 +4625,8 @@ static int do_mount_setattr(struct path *path, struct mount_kattr *kattr)
static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
struct mount_kattr *kattr, unsigned int flags)
{
- int err = 0;
struct ns_common *ns;
struct user_namespace *mnt_userns;
- struct fd f;
if (!((attr->attr_set | attr->attr_clr) & MOUNT_ATTR_IDMAP))
return 0;
@@ -4648,20 +4642,16 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
if (attr->userns_fd > INT_MAX)
return -EINVAL;
- f = fdget(attr->userns_fd);
- if (!fd_file(f))
+ CLASS(fd, f)(attr->userns_fd);
+ if (fd_empty(f))
return -EBADF;
- if (!proc_ns_file(fd_file(f))) {
- err = -EINVAL;
- goto out_fput;
- }
+ if (!proc_ns_file(fd_file(f)))
+ return -EINVAL;
ns = get_proc_ns(file_inode(fd_file(f)));
- if (ns->ops->type != CLONE_NEWUSER) {
- err = -EINVAL;
- goto out_fput;
- }
+ if (ns->ops->type != CLONE_NEWUSER)
+ return -EINVAL;
/*
* The initial idmapping cannot be used to create an idmapped
@@ -4672,22 +4662,15 @@ static int build_mount_idmapped(const struct mount_attr *attr, size_t usize,
* result.
*/
mnt_userns = container_of(ns, struct user_namespace, ns);
- if (mnt_userns == &init_user_ns) {
- err = -EPERM;
- goto out_fput;
- }
+ if (mnt_userns == &init_user_ns)
+ return -EPERM;
/* We're not controlling the target namespace. */
- if (!ns_capable(mnt_userns, CAP_SYS_ADMIN)) {
- err = -EPERM;
- goto out_fput;
- }
+ if (!ns_capable(mnt_userns, CAP_SYS_ADMIN))
+ return -EPERM;
kattr->mnt_userns = get_user_ns(mnt_userns);
-
-out_fput:
- fdput(f);
- return err;
+ return 0;
}
static int build_mount_kattr(const struct mount_attr *attr, size_t usize,
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 64b1adc75fe5..7b5abc1b8c8f 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1677,7 +1677,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
struct inode *inode = NULL;
struct vfsmount *mnt = NULL;
struct fsnotify_group *group;
- struct fd f;
struct path path;
struct fan_fsid __fsid, *fsid = NULL;
u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
@@ -1747,14 +1746,13 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
umask = FANOTIFY_EVENT_FLAGS;
}
- f = fdget(fanotify_fd);
- if (unlikely(!fd_file(f)))
+ CLASS(fd, f)(fanotify_fd);
+ if (fd_empty(f))
return -EBADF;
/* verify that this is indeed an fanotify instance */
- ret = -EINVAL;
if (unlikely(fd_file(f)->f_op != &fanotify_fops))
- goto fput_and_out;
+ return -EINVAL;
group = fd_file(f)->private_data;
/*
@@ -1762,23 +1760,21 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
* marks. This also includes setting up such marks by a group that
* was initialized by an unprivileged user.
*/
- ret = -EPERM;
if ((!capable(CAP_SYS_ADMIN) ||
FAN_GROUP_FLAG(group, FANOTIFY_UNPRIV)) &&
mark_type != FAN_MARK_INODE)
- goto fput_and_out;
+ return -EPERM;
/*
* Permission events require minimum priority FAN_CLASS_CONTENT.
*/
- ret = -EINVAL;
if (mask & FANOTIFY_PERM_EVENTS &&
group->priority < FSNOTIFY_PRIO_CONTENT)
- goto fput_and_out;
+ return -EINVAL;
if (mask & FAN_FS_ERROR &&
mark_type != FAN_MARK_FILESYSTEM)
- goto fput_and_out;
+ return -EINVAL;
/*
* Evictable is only relevant for inode marks, because only inode object
@@ -1786,7 +1782,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
*/
if (flags & FAN_MARK_EVICTABLE &&
mark_type != FAN_MARK_INODE)
- goto fput_and_out;
+ return -EINVAL;
/*
* Events that do not carry enough information to report
@@ -1798,7 +1794,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
if (mask & ~(FANOTIFY_FD_EVENTS|FANOTIFY_EVENT_FLAGS) &&
(!fid_mode || mark_type == FAN_MARK_MOUNT))
- goto fput_and_out;
+ return -EINVAL;
/*
* FAN_RENAME uses special info type records to report the old and
@@ -1806,23 +1802,22 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
* useful and was not implemented.
*/
if (mask & FAN_RENAME && !(fid_mode & FAN_REPORT_NAME))
- goto fput_and_out;
+ return -EINVAL;
if (mark_cmd == FAN_MARK_FLUSH) {
- ret = 0;
if (mark_type == FAN_MARK_MOUNT)
fsnotify_clear_vfsmount_marks_by_group(group);
else if (mark_type == FAN_MARK_FILESYSTEM)
fsnotify_clear_sb_marks_by_group(group);
else
fsnotify_clear_inode_marks_by_group(group);
- goto fput_and_out;
+ return 0;
}
ret = fanotify_find_path(dfd, pathname, &path, flags,
(mask & ALL_FSNOTIFY_EVENTS), obj_type);
if (ret)
- goto fput_and_out;
+ return ret;
if (mark_cmd == FAN_MARK_ADD) {
ret = fanotify_events_supported(group, &path, mask, flags);
@@ -1901,8 +1896,6 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
path_put_and_out:
path_put(&path);
-fput_and_out:
- fdput(f);
return ret;
}
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 711830b73275..f46ec6afee3c 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -732,7 +732,6 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
struct fsnotify_group *group;
struct inode *inode;
struct path path;
- struct fd f;
int ret;
unsigned flags = 0;
@@ -752,21 +751,17 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
if (unlikely(!(mask & ALL_INOTIFY_BITS)))
return -EINVAL;
- f = fdget(fd);
- if (unlikely(!fd_file(f)))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EBADF;
/* IN_MASK_ADD and IN_MASK_CREATE don't make sense together */
- if (unlikely((mask & IN_MASK_ADD) && (mask & IN_MASK_CREATE))) {
- ret = -EINVAL;
- goto fput_and_out;
- }
+ if (unlikely((mask & IN_MASK_ADD) && (mask & IN_MASK_CREATE)))
+ return -EINVAL;
/* verify that this is indeed an inotify instance */
- if (unlikely(fd_file(f)->f_op != &inotify_fops)) {
- ret = -EINVAL;
- goto fput_and_out;
- }
+ if (unlikely(fd_file(f)->f_op != &inotify_fops))
+ return -EINVAL;
if (!(mask & IN_DONT_FOLLOW))
flags |= LOOKUP_FOLLOW;
@@ -776,7 +771,7 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
ret = inotify_find_inode(pathname, &path, flags,
(mask & IN_ALL_EVENTS));
if (ret)
- goto fput_and_out;
+ return ret;
/* inode held in place by reference to path; group by fget on fd */
inode = path.dentry->d_inode;
@@ -785,8 +780,6 @@ SYSCALL_DEFINE3(inotify_add_watch, int, fd, const char __user *, pathname,
/* create/update an inode mark */
ret = inotify_update_watch(group, inode, mask);
path_put(&path);
-fput_and_out:
- fdput(f);
return ret;
}
diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c
index bc55340a60c3..4200a0341343 100644
--- a/fs/ocfs2/cluster/heartbeat.c
+++ b/fs/ocfs2/cluster/heartbeat.c
@@ -1765,7 +1765,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
long fd;
int sectsize;
char *p = (char *)page;
- struct fd f;
ssize_t ret = -EINVAL;
int live_threshold;
@@ -1784,23 +1783,23 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
if (fd < 0 || fd >= INT_MAX)
return -EINVAL;
- f = fdget(fd);
- if (fd_file(f) == NULL)
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EINVAL;
if (reg->hr_blocks == 0 || reg->hr_start_block == 0 ||
reg->hr_block_bytes == 0)
- goto out2;
+ return -EINVAL;
if (!S_ISBLK(fd_file(f)->f_mapping->host->i_mode))
- goto out2;
+ return -EINVAL;
reg->hr_bdev_file = bdev_file_open_by_dev(fd_file(f)->f_mapping->host->i_rdev,
BLK_OPEN_WRITE | BLK_OPEN_READ, NULL, NULL);
if (IS_ERR(reg->hr_bdev_file)) {
ret = PTR_ERR(reg->hr_bdev_file);
reg->hr_bdev_file = NULL;
- goto out2;
+ return ret;
}
sectsize = bdev_logical_block_size(reg_bdev(reg));
@@ -1906,8 +1905,6 @@ static ssize_t o2hb_region_dev_store(struct config_item *item,
fput(reg->hr_bdev_file);
reg->hr_bdev_file = NULL;
}
-out2:
- fdput(f);
return ret;
}
diff --git a/fs/open.c b/fs/open.c
index 6d7dd58ce095..45884ff7dead 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -187,19 +187,13 @@ long do_ftruncate(struct file *file, loff_t length, int small)
long do_sys_ftruncate(unsigned int fd, loff_t length, int small)
{
- struct fd f;
- int error;
-
if (length < 0)
return -EINVAL;
- f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EBADF;
- error = do_ftruncate(fd_file(f), length, small);
-
- fdput(f);
- return error;
+ return do_ftruncate(fd_file(f), length, small);
}
SYSCALL_DEFINE2(ftruncate, unsigned int, fd, off_t, length)
diff --git a/fs/read_write.c b/fs/read_write.c
index b3743e7dd93f..789ea6ecf147 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -652,21 +652,17 @@ SYSCALL_DEFINE3(write, unsigned int, fd, const char __user *, buf,
ssize_t ksys_pread64(unsigned int fd, char __user *buf, size_t count,
loff_t pos)
{
- struct fd f;
- ssize_t ret = -EBADF;
-
if (pos < 0)
return -EINVAL;
- f = fdget(fd);
- if (fd_file(f)) {
- ret = -ESPIPE;
- if (fd_file(f)->f_mode & FMODE_PREAD)
- ret = vfs_read(fd_file(f), buf, count, &pos);
- fdput(f);
- }
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
+ return -EBADF;
- return ret;
+ if (fd_file(f)->f_mode & FMODE_PREAD)
+ return vfs_read(fd_file(f), buf, count, &pos);
+
+ return -ESPIPE;
}
SYSCALL_DEFINE4(pread64, unsigned int, fd, char __user *, buf,
@@ -686,21 +682,17 @@ COMPAT_SYSCALL_DEFINE5(pread64, unsigned int, fd, char __user *, buf,
ssize_t ksys_pwrite64(unsigned int fd, const char __user *buf,
size_t count, loff_t pos)
{
- struct fd f;
- ssize_t ret = -EBADF;
-
if (pos < 0)
return -EINVAL;
- f = fdget(fd);
- if (fd_file(f)) {
- ret = -ESPIPE;
- if (fd_file(f)->f_mode & FMODE_PWRITE)
- ret = vfs_write(fd_file(f), buf, count, &pos);
- fdput(f);
- }
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
+ return -EBADF;
- return ret;
+ if (fd_file(f)->f_mode & FMODE_PWRITE)
+ return vfs_write(fd_file(f), buf, count, &pos);
+
+ return -ESPIPE;
}
SYSCALL_DEFINE4(pwrite64, unsigned int, fd, const char __user *, buf,
@@ -1214,7 +1206,6 @@ COMPAT_SYSCALL_DEFINE6(pwritev2, compat_ulong_t, fd,
static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
size_t count, loff_t max)
{
- struct fd in, out;
struct inode *in_inode, *out_inode;
struct pipe_inode_info *opipe;
loff_t pos;
@@ -1225,35 +1216,32 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
/*
* Get input file, and verify that it is ok..
*/
- retval = -EBADF;
- in = fdget(in_fd);
- if (!fd_file(in))
- goto out;
+ CLASS(fd, in)(in_fd);
+ if (fd_empty(in))
+ return -EBADF;
if (!(fd_file(in)->f_mode & FMODE_READ))
- goto fput_in;
- retval = -ESPIPE;
+ return -EBADF;
if (!ppos) {
pos = fd_file(in)->f_pos;
} else {
pos = *ppos;
if (!(fd_file(in)->f_mode & FMODE_PREAD))
- goto fput_in;
+ return -ESPIPE;
}
retval = rw_verify_area(READ, fd_file(in), &pos, count);
if (retval < 0)
- goto fput_in;
+ return retval;
if (count > MAX_RW_COUNT)
count = MAX_RW_COUNT;
/*
* Get output file, and verify that it is ok..
*/
- retval = -EBADF;
- out = fdget(out_fd);
- if (!fd_file(out))
- goto fput_in;
+ CLASS(fd, out)(out_fd);
+ if (fd_empty(out))
+ return -EBADF;
if (!(fd_file(out)->f_mode & FMODE_WRITE))
- goto fput_out;
+ return -EBADF;
in_inode = file_inode(fd_file(in));
out_inode = file_inode(fd_file(out));
out_pos = fd_file(out)->f_pos;
@@ -1262,9 +1250,8 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
max = min(in_inode->i_sb->s_maxbytes, out_inode->i_sb->s_maxbytes);
if (unlikely(pos + count > max)) {
- retval = -EOVERFLOW;
if (pos >= max)
- goto fput_out;
+ return -EOVERFLOW;
count = max - pos;
}
@@ -1283,7 +1270,7 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
if (!opipe) {
retval = rw_verify_area(WRITE, fd_file(out), &out_pos, count);
if (retval < 0)
- goto fput_out;
+ return retval;
retval = do_splice_direct(fd_file(in), &pos, fd_file(out), &out_pos,
count, fl);
} else {
@@ -1309,12 +1296,6 @@ static ssize_t do_sendfile(int out_fd, int in_fd, loff_t *ppos,
inc_syscw(current);
if (pos > max)
retval = -EOVERFLOW;
-
-fput_out:
- fdput(out);
-fput_in:
- fdput(in);
-out:
return retval;
}
diff --git a/fs/splice.c b/fs/splice.c
index 29cd39d7f4a0..2898fa1e9e63 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1622,27 +1622,22 @@ SYSCALL_DEFINE6(splice, int, fd_in, loff_t __user *, off_in,
int, fd_out, loff_t __user *, off_out,
size_t, len, unsigned int, flags)
{
- struct fd in, out;
- ssize_t error;
-
if (unlikely(!len))
return 0;
if (unlikely(flags & ~SPLICE_F_ALL))
return -EINVAL;
- error = -EBADF;
- in = fdget(fd_in);
- if (fd_file(in)) {
- out = fdget(fd_out);
- if (fd_file(out)) {
- error = __do_splice(fd_file(in), off_in, fd_file(out), off_out,
+ CLASS(fd, in)(fd_in);
+ if (fd_empty(in))
+ return -EBADF;
+
+ CLASS(fd, out)(fd_out);
+ if (fd_empty(out))
+ return -EBADF;
+
+ return __do_splice(fd_file(in), off_in, fd_file(out), off_out,
len, flags);
- fdput(out);
- }
- fdput(in);
- }
- return error;
}
/*
@@ -1992,25 +1987,19 @@ ssize_t do_tee(struct file *in, struct file *out, size_t len,
SYSCALL_DEFINE4(tee, int, fdin, int, fdout, size_t, len, unsigned int, flags)
{
- struct fd in, out;
- ssize_t error;
-
if (unlikely(flags & ~SPLICE_F_ALL))
return -EINVAL;
if (unlikely(!len))
return 0;
- error = -EBADF;
- in = fdget(fdin);
- if (fd_file(in)) {
- out = fdget(fdout);
- if (fd_file(out)) {
- error = do_tee(fd_file(in), fd_file(out), len, flags);
- fdput(out);
- }
- fdput(in);
- }
+ CLASS(fd, in)(fdin);
+ if (fd_empty(in))
+ return -EBADF;
- return error;
+ CLASS(fd, out)(fdout);
+ if (fd_empty(out))
+ return -EBADF;
+
+ return do_tee(fd_file(in), fd_file(out), len, flags);
}
diff --git a/fs/utimes.c b/fs/utimes.c
index 99b26f792b89..c7c7958e57b2 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -108,18 +108,13 @@ static int do_utimes_path(int dfd, const char __user *filename,
static int do_utimes_fd(int fd, struct timespec64 *times, int flags)
{
- struct fd f;
- int error;
-
if (flags)
return -EINVAL;
- f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EBADF;
- error = vfs_utimes(&fd_file(f)->f_path, times);
- fdput(f);
- return error;
+ return vfs_utimes(&fd_file(f)->f_path, times);
}
/*
diff --git a/fs/xfs/xfs_exchrange.c b/fs/xfs/xfs_exchrange.c
index 9790e0f45d14..35b9b58a4f6f 100644
--- a/fs/xfs/xfs_exchrange.c
+++ b/fs/xfs/xfs_exchrange.c
@@ -778,8 +778,6 @@ xfs_ioc_exchange_range(
.file2 = file,
};
struct xfs_exchange_range args;
- struct fd file1;
- int error;
if (copy_from_user(&args, argp, sizeof(args)))
return -EFAULT;
@@ -793,12 +791,10 @@ xfs_ioc_exchange_range(
fxr.length = args.length;
fxr.flags = args.flags;
- file1 = fdget(args.file1_fd);
- if (!fd_file(file1))
+ CLASS(fd, file1)(args.file1_fd);
+ if (fd_empty(file1))
return -EBADF;
fxr.file1 = fd_file(file1);
- error = xfs_exchange_range(&fxr);
- fdput(file1);
- return error;
+ return xfs_exchange_range(&fxr);
}
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 95641817370b..1bb89f85af1b 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1000,41 +1000,29 @@ xfs_ioc_swapext(
xfs_swapext_t *sxp)
{
xfs_inode_t *ip, *tip;
- struct fd f, tmp;
- int error = 0;
/* Pull information for the target fd */
- f = fdget((int)sxp->sx_fdtarget);
- if (!fd_file(f)) {
- error = -EINVAL;
- goto out;
- }
+ CLASS(fd, f)((int)sxp->sx_fdtarget);
+ if (fd_empty(f))
+ return -EINVAL;
if (!(fd_file(f)->f_mode & FMODE_WRITE) ||
!(fd_file(f)->f_mode & FMODE_READ) ||
- (fd_file(f)->f_flags & O_APPEND)) {
- error = -EBADF;
- goto out_put_file;
- }
+ (fd_file(f)->f_flags & O_APPEND))
+ return -EBADF;
- tmp = fdget((int)sxp->sx_fdtmp);
- if (!fd_file(tmp)) {
- error = -EINVAL;
- goto out_put_file;
- }
+ CLASS(fd, tmp)((int)sxp->sx_fdtmp);
+ if (fd_empty(tmp))
+ return -EINVAL;
if (!(fd_file(tmp)->f_mode & FMODE_WRITE) ||
!(fd_file(tmp)->f_mode & FMODE_READ) ||
- (fd_file(tmp)->f_flags & O_APPEND)) {
- error = -EBADF;
- goto out_put_tmp_file;
- }
+ (fd_file(tmp)->f_flags & O_APPEND))
+ return -EBADF;
if (IS_SWAPFILE(file_inode(fd_file(f))) ||
- IS_SWAPFILE(file_inode(fd_file(tmp)))) {
- error = -EINVAL;
- goto out_put_tmp_file;
- }
+ IS_SWAPFILE(file_inode(fd_file(tmp))))
+ return -EINVAL;
/*
* We need to ensure that the fds passed in point to XFS inodes
@@ -1042,37 +1030,22 @@ xfs_ioc_swapext(
* control over what the user passes us here.
*/
if (fd_file(f)->f_op != &xfs_file_operations ||
- fd_file(tmp)->f_op != &xfs_file_operations) {
- error = -EINVAL;
- goto out_put_tmp_file;
- }
+ fd_file(tmp)->f_op != &xfs_file_operations)
+ return -EINVAL;
ip = XFS_I(file_inode(fd_file(f)));
tip = XFS_I(file_inode(fd_file(tmp)));
- if (ip->i_mount != tip->i_mount) {
- error = -EINVAL;
- goto out_put_tmp_file;
- }
-
- if (ip->i_ino == tip->i_ino) {
- error = -EINVAL;
- goto out_put_tmp_file;
- }
+ if (ip->i_mount != tip->i_mount)
+ return -EINVAL;
- if (xfs_is_shutdown(ip->i_mount)) {
- error = -EIO;
- goto out_put_tmp_file;
- }
+ if (ip->i_ino == tip->i_ino)
+ return -EINVAL;
- error = xfs_swap_extents(ip, tip, sxp);
+ if (xfs_is_shutdown(ip->i_mount))
+ return -EIO;
- out_put_tmp_file:
- fdput(tmp);
- out_put_file:
- fdput(f);
- out:
- return error;
+ return xfs_swap_extents(ip, tip, sxp);
}
static int
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index 4f1dec518fae..35b4f8659904 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -1063,7 +1063,6 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
size_t msg_len, unsigned int msg_prio,
struct timespec64 *ts)
{
- struct fd f;
struct inode *inode;
struct ext_wait_queue wait;
struct ext_wait_queue *receiver;
@@ -1084,37 +1083,27 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
audit_mq_sendrecv(mqdes, msg_len, msg_prio, ts);
- f = fdget(mqdes);
- if (unlikely(!fd_file(f))) {
- ret = -EBADF;
- goto out;
- }
+ CLASS(fd, f)(mqdes);
+ if (fd_empty(f))
+ return -EBADF;
inode = file_inode(fd_file(f));
- if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
- ret = -EBADF;
- goto out_fput;
- }
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+ return -EBADF;
info = MQUEUE_I(inode);
audit_file(fd_file(f));
- if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE))) {
- ret = -EBADF;
- goto out_fput;
- }
+ if (unlikely(!(fd_file(f)->f_mode & FMODE_WRITE)))
+ return -EBADF;
- if (unlikely(msg_len > info->attr.mq_msgsize)) {
- ret = -EMSGSIZE;
- goto out_fput;
- }
+ if (unlikely(msg_len > info->attr.mq_msgsize))
+ return -EMSGSIZE;
/* First try to allocate memory, before doing anything with
* existing queues. */
msg_ptr = load_msg(u_msg_ptr, msg_len);
- if (IS_ERR(msg_ptr)) {
- ret = PTR_ERR(msg_ptr);
- goto out_fput;
- }
+ if (IS_ERR(msg_ptr))
+ return PTR_ERR(msg_ptr);
msg_ptr->m_ts = msg_len;
msg_ptr->m_type = msg_prio;
@@ -1172,9 +1161,6 @@ static int do_mq_timedsend(mqd_t mqdes, const char __user *u_msg_ptr,
out_free:
if (ret)
free_msg(msg_ptr);
-out_fput:
- fdput(f);
-out:
return ret;
}
@@ -1184,7 +1170,6 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
{
ssize_t ret;
struct msg_msg *msg_ptr;
- struct fd f;
struct inode *inode;
struct mqueue_inode_info *info;
struct ext_wait_queue wait;
@@ -1198,30 +1183,22 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
audit_mq_sendrecv(mqdes, msg_len, 0, ts);
- f = fdget(mqdes);
- if (unlikely(!fd_file(f))) {
- ret = -EBADF;
- goto out;
- }
+ CLASS(fd, f)(mqdes);
+ if (fd_empty(f))
+ return -EBADF;
inode = file_inode(fd_file(f));
- if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
- ret = -EBADF;
- goto out_fput;
- }
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
+ return -EBADF;
info = MQUEUE_I(inode);
audit_file(fd_file(f));
- if (unlikely(!(fd_file(f)->f_mode & FMODE_READ))) {
- ret = -EBADF;
- goto out_fput;
- }
+ if (unlikely(!(fd_file(f)->f_mode & FMODE_READ)))
+ return -EBADF;
/* checks if buffer is big enough */
- if (unlikely(msg_len < info->attr.mq_msgsize)) {
- ret = -EMSGSIZE;
- goto out_fput;
- }
+ if (unlikely(msg_len < info->attr.mq_msgsize))
+ return -EMSGSIZE;
/*
* msg_insert really wants us to have a valid, spare node struct so
@@ -1275,9 +1252,6 @@ static int do_mq_timedreceive(mqd_t mqdes, char __user *u_msg_ptr,
}
free_msg(msg_ptr);
}
-out_fput:
- fdput(f);
-out:
return ret;
}
@@ -1437,21 +1411,18 @@ SYSCALL_DEFINE2(mq_notify, mqd_t, mqdes,
static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
{
- struct fd f;
struct inode *inode;
struct mqueue_inode_info *info;
if (new && (new->mq_flags & (~O_NONBLOCK)))
return -EINVAL;
- f = fdget(mqdes);
- if (!fd_file(f))
+ CLASS(fd, f)(mqdes);
+ if (fd_empty(f))
return -EBADF;
- if (unlikely(fd_file(f)->f_op != &mqueue_file_operations)) {
- fdput(f);
+ if (unlikely(fd_file(f)->f_op != &mqueue_file_operations))
return -EBADF;
- }
inode = file_inode(fd_file(f));
info = MQUEUE_I(inode);
@@ -1475,7 +1446,6 @@ static int do_mq_getsetattr(int mqdes, struct mq_attr *new, struct mq_attr *old)
}
spin_unlock(&info->lock);
- fdput(f);
return 0;
}
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4b39a1ea63e2..abb15cdbc5a3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -4911,33 +4911,25 @@ static int bpf_link_get_info_by_fd(struct file *file,
static int bpf_obj_get_info_by_fd(const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
- int ufd = attr->info.bpf_fd;
- struct fd f;
- int err;
-
if (CHECK_ATTR(BPF_OBJ_GET_INFO_BY_FD))
return -EINVAL;
- f = fdget(ufd);
- if (!fd_file(f))
+ CLASS(fd, f)(attr->info.bpf_fd);
+ if (fd_empty(f))
return -EBADFD;
if (fd_file(f)->f_op == &bpf_prog_fops)
- err = bpf_prog_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
+ return bpf_prog_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
uattr);
else if (fd_file(f)->f_op == &bpf_map_fops)
- err = bpf_map_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
+ return bpf_map_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr,
uattr);
else if (fd_file(f)->f_op == &btf_fops)
- err = bpf_btf_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr, uattr);
+ return bpf_btf_get_info_by_fd(fd_file(f), fd_file(f)->private_data, attr, uattr);
else if (fd_file(f)->f_op == &bpf_link_fops || fd_file(f)->f_op == &bpf_link_fops_poll)
- err = bpf_link_get_info_by_fd(fd_file(f), fd_file(f)->private_data,
+ return bpf_link_get_info_by_fd(fd_file(f), fd_file(f)->private_data,
attr, uattr);
- else
- err = -EINVAL;
-
- fdput(f);
- return err;
+ return -EINVAL;
}
#define BPF_BTF_LOAD_LAST_FIELD btf_token_fd
diff --git a/kernel/module/main.c b/kernel/module/main.c
index 93cc9fea9d9d..ea036fd737b5 100644
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@ -3196,10 +3196,7 @@ static int idempotent_init_module(struct file *f, const char __user * uargs, int
SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
{
- int err;
- struct fd f;
-
- err = may_init_module();
+ int err = may_init_module();
if (err)
return err;
@@ -3210,12 +3207,10 @@ SYSCALL_DEFINE3(finit_module, int, fd, const char __user *, uargs, int, flags)
|MODULE_INIT_COMPRESSED_FILE))
return -EINVAL;
- f = fdget(fd);
+ CLASS(fd, f)(fd);
if (fd_empty(f))
return -EBADF;
- err = idempotent_init_module(fd_file(f), uargs, flags);
- fdput(f);
- return err;
+ return idempotent_init_module(fd_file(f), uargs, flags);
}
/* Keep in sync with MODULE_FLAGS_BUF_SIZE !!! */
diff --git a/kernel/pid.c b/kernel/pid.c
index b5bbc1a8a6e4..115448e89c3e 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -744,23 +744,18 @@ SYSCALL_DEFINE3(pidfd_getfd, int, pidfd, int, fd,
unsigned int, flags)
{
struct pid *pid;
- struct fd f;
- int ret;
/* flags is currently unused - make sure it's unset */
if (flags)
return -EINVAL;
- f = fdget(pidfd);
- if (!fd_file(f))
+ CLASS(fd, f)(pidfd);
+ if (fd_empty(f))
return -EBADF;
pid = pidfd_pid(fd_file(f));
if (IS_ERR(pid))
- ret = PTR_ERR(pid);
- else
- ret = pidfd_getfd(pid, fd);
+ return PTR_ERR(pid);
- fdput(f);
- return ret;
+ return pidfd_getfd(pid, fd);
}
diff --git a/kernel/signal.c b/kernel/signal.c
index cc5d87cfa7c0..e555d22f8e42 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -3908,7 +3908,6 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
siginfo_t __user *, info, unsigned int, flags)
{
int ret;
- struct fd f;
struct pid *pid;
kernel_siginfo_t kinfo;
enum pid_type type;
@@ -3921,20 +3920,17 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
if (hweight32(flags & PIDFD_SEND_SIGNAL_FLAGS) > 1)
return -EINVAL;
- f = fdget(pidfd);
- if (!fd_file(f))
+ CLASS(fd, f)(pidfd);
+ if (fd_empty(f))
return -EBADF;
/* Is this a pidfd? */
pid = pidfd_to_pid(fd_file(f));
- if (IS_ERR(pid)) {
- ret = PTR_ERR(pid);
- goto err;
- }
+ if (IS_ERR(pid))
+ return PTR_ERR(pid);
- ret = -EINVAL;
if (!access_pidfd_pidns(pid))
- goto err;
+ return -EINVAL;
switch (flags) {
case 0:
@@ -3958,28 +3954,23 @@ SYSCALL_DEFINE4(pidfd_send_signal, int, pidfd, int, sig,
if (info) {
ret = copy_siginfo_from_user_any(&kinfo, info);
if (unlikely(ret))
- goto err;
+ return ret;
- ret = -EINVAL;
if (unlikely(sig != kinfo.si_signo))
- goto err;
+ return -EINVAL;
/* Only allow sending arbitrary signals to yourself. */
- ret = -EPERM;
if ((task_pid(current) != pid || type > PIDTYPE_TGID) &&
(kinfo.si_code >= 0 || kinfo.si_code == SI_TKILL))
- goto err;
+ return -EPERM;
} else {
prepare_kill_siginfo(sig, &kinfo, type);
}
if (type == PIDTYPE_PGID)
- ret = kill_pgrp_info(sig, &kinfo, pid);
+ return kill_pgrp_info(sig, &kinfo, pid);
else
- ret = kill_pid_info_type(sig, &kinfo, pid, type);
-err:
- fdput(f);
- return ret;
+ return kill_pid_info_type(sig, &kinfo, pid, type);
}
static int
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index 0700f40c53ac..0cd680ccc7e5 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -411,15 +411,14 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
struct nlattr *na;
size_t size;
u32 fd;
- struct fd f;
na = info->attrs[CGROUPSTATS_CMD_ATTR_FD];
if (!na)
return -EINVAL;
fd = nla_get_u32(info->attrs[CGROUPSTATS_CMD_ATTR_FD]);
- f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return 0;
size = nla_total_size(sizeof(struct cgroupstats));
@@ -427,14 +426,13 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
rc = prepare_reply(info, CGROUPSTATS_CMD_NEW, &rep_skb,
size);
if (rc < 0)
- goto err;
+ return rc;
na = nla_reserve(rep_skb, CGROUPSTATS_TYPE_CGROUP_STATS,
sizeof(struct cgroupstats));
if (na == NULL) {
nlmsg_free(rep_skb);
- rc = -EMSGSIZE;
- goto err;
+ return -EMSGSIZE;
}
stats = nla_data(na);
@@ -443,14 +441,10 @@ static int cgroupstats_user_cmd(struct sk_buff *skb, struct genl_info *info)
rc = cgroupstats_build(stats, fd_file(f)->f_path.dentry);
if (rc < 0) {
nlmsg_free(rep_skb);
- goto err;
+ return rc;
}
- rc = send_reply(rep_skb, info);
-
-err:
- fdput(f);
- return rc;
+ return send_reply(rep_skb, info);
}
static int cmd_attr_register_cpumask(struct genl_info *info)
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index e7c1d3ae33fe..e6b4cdd6e84c 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -1062,19 +1062,16 @@ int process_buffer_measurement(struct mnt_idmap *idmap,
*/
void ima_kexec_cmdline(int kernel_fd, const void *buf, int size)
{
- struct fd f;
-
if (!buf || !size)
return;
- f = fdget(kernel_fd);
- if (!fd_file(f))
+ CLASS(fd, f)(kernel_fd);
+ if (fd_empty(f))
return;
process_buffer_measurement(file_mnt_idmap(fd_file(f)), file_inode(fd_file(f)),
buf, size, "kexec-cmdline", KEXEC_CMDLINE, 0,
NULL, false, NULL, 0);
- fdput(f);
}
/**
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index 02144ec39f43..68252452b66c 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -283,7 +283,6 @@ enum loadpin_securityfs_interface_index {
static int read_trusted_verity_root_digests(unsigned int fd)
{
- struct fd f;
void *data;
int rc;
char *p, *d;
@@ -295,8 +294,8 @@ static int read_trusted_verity_root_digests(unsigned int fd)
if (!list_empty(&dm_verity_loadpin_trusted_root_digests))
return -EPERM;
- f = fdget(fd);
- if (!fd_file(f))
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
return -EINVAL;
data = kzalloc(SZ_4K, GFP_KERNEL);
@@ -359,7 +358,6 @@ static int read_trusted_verity_root_digests(unsigned int fd)
}
kfree(data);
- fdput(f);
return 0;
@@ -379,8 +377,6 @@ static int read_trusted_verity_root_digests(unsigned int fd)
/* disallow further attempts after reading a corrupt/invalid file */
deny_reading_verity_digests = true;
- fdput(f);
-
return rc;
}
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 53262b8a7656..72aa1fdeb699 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -229,14 +229,13 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
struct kvm_vfio_spapr_tce param;
struct kvm_vfio *kv = dev->private;
struct kvm_vfio_file *kvf;
- struct fd f;
int ret;
if (copy_from_user(¶m, arg, sizeof(struct kvm_vfio_spapr_tce)))
return -EFAULT;
- f = fdget(param.groupfd);
- if (!fd_file(f))
+ CLASS(fd, f)(param.groupfd);
+ if (fd_empty(f))
return -EBADF;
ret = -ENOENT;
@@ -262,7 +261,6 @@ static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
err_fdput:
mutex_unlock(&kv->lock);
- fdput(f);
return ret;
}
#endif
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 24/39] fdget(), more trivial conversions
2024-07-30 5:16 ` [PATCH 24/39] fdget(), more " viro
@ 2024-08-07 10:39 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:39 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:10AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> all failure exits prior to fdget() leave the scope, all matching fdput()
> are immediately followed by leaving the scope.
>
> [trivial conflicts in fs/fsopen.c and kernel/bpf/syscall.c]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 25/39] convert do_preadv()/do_pwritev()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (22 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 24/39] fdget(), more " viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:39 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 26/39] convert cachestat(2) viro
` (14 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdput() can be transposed with add_{r,w}char() and inc_sysc{r,w}();
it's the same story as with do_readv()/do_writev(), only with
fdput() instead of fdput_pos().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/read_write.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/fs/read_write.c b/fs/read_write.c
index 789ea6ecf147..486ed16ab633 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -1020,18 +1020,16 @@ static inline loff_t pos_from_hilo(unsigned long high, unsigned long low)
static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags)
{
- struct fd f;
ssize_t ret = -EBADF;
if (pos < 0)
return -EINVAL;
- f = fdget(fd);
- if (fd_file(f)) {
+ CLASS(fd, f)(fd);
+ if (!fd_empty(f)) {
ret = -ESPIPE;
if (fd_file(f)->f_mode & FMODE_PREAD)
ret = vfs_readv(fd_file(f), vec, vlen, &pos, flags);
- fdput(f);
}
if (ret > 0)
@@ -1043,18 +1041,16 @@ static ssize_t do_preadv(unsigned long fd, const struct iovec __user *vec,
static ssize_t do_pwritev(unsigned long fd, const struct iovec __user *vec,
unsigned long vlen, loff_t pos, rwf_t flags)
{
- struct fd f;
ssize_t ret = -EBADF;
if (pos < 0)
return -EINVAL;
- f = fdget(fd);
- if (fd_file(f)) {
+ CLASS(fd, f)(fd);
+ if (!fd_empty(f)) {
ret = -ESPIPE;
if (fd_file(f)->f_mode & FMODE_PWRITE)
ret = vfs_writev(fd_file(f), vec, vlen, &pos, flags);
- fdput(f);
}
if (ret > 0)
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 25/39] convert do_preadv()/do_pwritev()
2024-07-30 5:16 ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
@ 2024-08-07 10:39 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:39 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:11AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdput() can be transposed with add_{r,w}char() and inc_sysc{r,w}();
> it's the same story as with do_readv()/do_writev(), only with
> fdput() instead of fdput_pos().
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 26/39] convert cachestat(2)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (23 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 25/39] convert do_preadv()/do_pwritev() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:39 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use viro
` (13 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdput() can be transposed with copy_to_user()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
mm/filemap.c | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 0b5cbd644fdd..9ef41935b0a7 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4387,31 +4387,25 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
struct cachestat_range __user *, cstat_range,
struct cachestat __user *, cstat, unsigned int, flags)
{
- struct fd f = fdget(fd);
+ CLASS(fd, f)(fd);
struct address_space *mapping;
struct cachestat_range csr;
struct cachestat cs;
pgoff_t first_index, last_index;
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
if (copy_from_user(&csr, cstat_range,
- sizeof(struct cachestat_range))) {
- fdput(f);
+ sizeof(struct cachestat_range)))
return -EFAULT;
- }
/* hugetlbfs is not supported */
- if (is_file_hugepages(fd_file(f))) {
- fdput(f);
+ if (is_file_hugepages(fd_file(f)))
return -EOPNOTSUPP;
- }
- if (flags != 0) {
- fdput(f);
+ if (flags != 0)
return -EINVAL;
- }
first_index = csr.off >> PAGE_SHIFT;
last_index =
@@ -4419,7 +4413,6 @@ SYSCALL_DEFINE4(cachestat, unsigned int, fd,
memset(&cs, 0, sizeof(struct cachestat));
mapping = fd_file(f)->f_mapping;
filemap_cachestat(mapping, first_index, last_index, &cs);
- fdput(f);
if (copy_to_user(cstat, &cs, sizeof(struct cachestat)))
return -EFAULT;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 26/39] convert cachestat(2)
2024-07-30 5:16 ` [PATCH 26/39] convert cachestat(2) viro
@ 2024-08-07 10:39 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:39 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:12AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdput() can be transposed with copy_to_user()
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (24 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 26/39] convert cachestat(2) viro
@ 2024-07-30 5:16 ` viro
2024-07-30 5:16 ` [PATCH 28/39] convert spu_run(2) viro
` (12 subsequent siblings)
38 siblings, 0 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
arch/powerpc/platforms/cell/spu_syscalls.c | 52 +++++++---------------
1 file changed, 17 insertions(+), 35 deletions(-)
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index da4fad7fc8bf..64a4c9eac6e0 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -36,6 +36,9 @@ static inline struct spufs_calls *spufs_calls_get(void)
static inline void spufs_calls_put(struct spufs_calls *calls)
{
+ if (!calls)
+ return;
+
BUG_ON(calls != spufs_calls);
/* we don't need to rcu this, as we hold a reference to the module */
@@ -53,35 +56,30 @@ static inline void spufs_calls_put(struct spufs_calls *calls) { }
#endif /* CONFIG_SPU_FS_MODULE */
+DEFINE_CLASS(spufs_calls, struct spufs_calls *, spufs_calls_put(_T), spufs_calls_get(), void)
+
SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
umode_t, mode, int, neighbor_fd)
{
- long ret;
- struct spufs_calls *calls;
-
- calls = spufs_calls_get();
+ CLASS(spufs_calls, calls)();
if (!calls)
return -ENOSYS;
if (flags & SPU_CREATE_AFFINITY_SPU) {
CLASS(fd, neighbor)(neighbor_fd);
- ret = -EBADF;
- if (!fd_empty(neighbor))
- ret = calls->create_thread(name, flags, mode, fd_file(neighbor));
- } else
- ret = calls->create_thread(name, flags, mode, NULL);
-
- spufs_calls_put(calls);
- return ret;
+ if (fd_empty(neighbor))
+ return -EBADF;
+ return calls->create_thread(name, flags, mode, fd_file(neighbor));
+ } else {
+ return calls->create_thread(name, flags, mode, NULL);
+ }
}
SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
{
long ret;
struct fd arg;
- struct spufs_calls *calls;
-
- calls = spufs_calls_get();
+ CLASS(spufs_calls, calls)();
if (!calls)
return -ENOSYS;
@@ -91,42 +89,26 @@ SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
ret = calls->spu_run(fd_file(arg), unpc, ustatus);
fdput(arg);
}
-
- spufs_calls_put(calls);
return ret;
}
#ifdef CONFIG_COREDUMP
int elf_coredump_extra_notes_size(void)
{
- struct spufs_calls *calls;
- int ret;
-
- calls = spufs_calls_get();
+ CLASS(spufs_calls, calls)();
if (!calls)
return 0;
- ret = calls->coredump_extra_notes_size();
-
- spufs_calls_put(calls);
-
- return ret;
+ return calls->coredump_extra_notes_size();
}
int elf_coredump_extra_notes_write(struct coredump_params *cprm)
{
- struct spufs_calls *calls;
- int ret;
-
- calls = spufs_calls_get();
+ CLASS(spufs_calls, calls)();
if (!calls)
return 0;
- ret = calls->coredump_extra_notes_write(cprm);
-
- spufs_calls_put(calls);
-
- return ret;
+ return calls->coredump_extra_notes_write(cprm);
}
#endif
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* [PATCH 28/39] convert spu_run(2)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (25 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 27/39] switch spufs_calls_{get,put}() to CLASS() use viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:40 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 29/39] convert media_request_get_by_fd() viro
` (11 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
all failure exits prior to fdget() are returns, fdput() is immediately
followed by return.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
arch/powerpc/platforms/cell/spu_syscalls.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/platforms/cell/spu_syscalls.c b/arch/powerpc/platforms/cell/spu_syscalls.c
index 64a4c9eac6e0..000894e07b02 100644
--- a/arch/powerpc/platforms/cell/spu_syscalls.c
+++ b/arch/powerpc/platforms/cell/spu_syscalls.c
@@ -77,19 +77,15 @@ SYSCALL_DEFINE4(spu_create, const char __user *, name, unsigned int, flags,
SYSCALL_DEFINE3(spu_run,int, fd, __u32 __user *, unpc, __u32 __user *, ustatus)
{
- long ret;
- struct fd arg;
CLASS(spufs_calls, calls)();
if (!calls)
return -ENOSYS;
- ret = -EBADF;
- arg = fdget(fd);
- if (fd_file(arg)) {
- ret = calls->spu_run(fd_file(arg), unpc, ustatus);
- fdput(arg);
- }
- return ret;
+ CLASS(fd, arg)(fd);
+ if (fd_empty(arg))
+ return -EBADF;
+
+ return calls->spu_run(fd_file(arg), unpc, ustatus);
}
#ifdef CONFIG_COREDUMP
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 28/39] convert spu_run(2)
2024-07-30 5:16 ` [PATCH 28/39] convert spu_run(2) viro
@ 2024-08-07 10:40 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:40 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:14AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> all failure exits prior to fdget() are returns, fdput() is immediately
> followed by return.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 29/39] convert media_request_get_by_fd()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (26 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 28/39] convert spu_run(2) viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:40 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 30/39] convert coda_parse_fd() viro
` (10 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
the only thing done after fdput() (in failure cases) is a printk; safely
transposable with fdput()...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/media/mc/mc-request.c | 18 ++++++------------
1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/media/mc/mc-request.c b/drivers/media/mc/mc-request.c
index e064914c476e..df39c8c11e9a 100644
--- a/drivers/media/mc/mc-request.c
+++ b/drivers/media/mc/mc-request.c
@@ -246,22 +246,21 @@ static const struct file_operations request_fops = {
struct media_request *
media_request_get_by_fd(struct media_device *mdev, int request_fd)
{
- struct fd f;
struct media_request *req;
if (!mdev || !mdev->ops ||
!mdev->ops->req_validate || !mdev->ops->req_queue)
return ERR_PTR(-EBADR);
- f = fdget(request_fd);
- if (!fd_file(f))
- goto err_no_req_fd;
+ CLASS(fd, f)(request_fd);
+ if (fd_empty(f))
+ goto err;
if (fd_file(f)->f_op != &request_fops)
- goto err_fput;
+ goto err;
req = fd_file(f)->private_data;
if (req->mdev != mdev)
- goto err_fput;
+ goto err;
/*
* Note: as long as someone has an open filehandle of the request,
@@ -272,14 +271,9 @@ media_request_get_by_fd(struct media_device *mdev, int request_fd)
* before media_request_get() is called.
*/
media_request_get(req);
- fdput(f);
-
return req;
-err_fput:
- fdput(f);
-
-err_no_req_fd:
+err:
dev_dbg(mdev->dev, "cannot find request_fd %d\n", request_fd);
return ERR_PTR(-EINVAL);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 29/39] convert media_request_get_by_fd()
2024-07-30 5:16 ` [PATCH 29/39] convert media_request_get_by_fd() viro
@ 2024-08-07 10:40 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:40 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:15AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> the only thing done after fdput() (in failure cases) is a printk; safely
> transposable with fdput()...
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 30/39] convert coda_parse_fd()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (27 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 29/39] convert media_request_get_by_fd() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:41 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
` (9 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdput() is followed by invalf(), which is transposable with it.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/coda/inode.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
diff --git a/fs/coda/inode.c b/fs/coda/inode.c
index 7d56b6d1e4c3..293cf5e6dfeb 100644
--- a/fs/coda/inode.c
+++ b/fs/coda/inode.c
@@ -122,22 +122,17 @@ static const struct fs_parameter_spec coda_param_specs[] = {
static int coda_parse_fd(struct fs_context *fc, int fd)
{
struct coda_fs_context *ctx = fc->fs_private;
- struct fd f;
+ CLASS(fd, f)(fd);
struct inode *inode;
int idx;
- f = fdget(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
inode = file_inode(fd_file(f));
- if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR) {
- fdput(f);
+ if (!S_ISCHR(inode->i_mode) || imajor(inode) != CODA_PSDEV_MAJOR)
return invalf(fc, "code: Not coda psdev");
- }
idx = iminor(inode);
- fdput(f);
-
if (idx < 0 || idx >= MAX_CODADEVS)
return invalf(fc, "coda: Bad minor number");
ctx->idx = idx;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 30/39] convert coda_parse_fd()
2024-07-30 5:16 ` [PATCH 30/39] convert coda_parse_fd() viro
@ 2024-08-07 10:41 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:41 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:16AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdput() is followed by invalf(), which is transposable with it.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 31/39] convert cifs_ioctl_copychunk()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (28 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 30/39] convert coda_parse_fd() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:41 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
` (8 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdput() moved past mnt_drop_file_write(); harmless, if somewhat cringeworthy.
Reordering could be avoided either by adding an explicit scope or by making
mnt_drop_file_write() called via __cleanup.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/smb/client/ioctl.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index 94bf2e5014d9..6d9df3646df3 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -72,7 +72,6 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
unsigned long srcfd)
{
int rc;
- struct fd src_file;
struct inode *src_inode;
cifs_dbg(FYI, "ioctl copychunk range\n");
@@ -89,8 +88,8 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
return rc;
}
- src_file = fdget(srcfd);
- if (!fd_file(src_file)) {
+ CLASS(fd, src_file)(srcfd);
+ if (fd_empty(src_file)) {
rc = -EBADF;
goto out_drop_write;
}
@@ -98,20 +97,18 @@ static long cifs_ioctl_copychunk(unsigned int xid, struct file *dst_file,
if (fd_file(src_file)->f_op->unlocked_ioctl != cifs_ioctl) {
rc = -EBADF;
cifs_dbg(VFS, "src file seems to be from a different filesystem type\n");
- goto out_fput;
+ goto out_drop_write;
}
src_inode = file_inode(fd_file(src_file));
rc = -EINVAL;
if (S_ISDIR(src_inode->i_mode))
- goto out_fput;
+ goto out_drop_write;
rc = cifs_file_copychunk_range(xid, fd_file(src_file), 0, dst_file, 0,
src_inode->i_size, 0);
if (rc > 0)
rc = 0;
-out_fput:
- fdput(src_file);
out_drop_write:
mnt_drop_write_file(dst_file);
return rc;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 31/39] convert cifs_ioctl_copychunk()
2024-07-30 5:16 ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
@ 2024-08-07 10:41 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:41 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:17AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdput() moved past mnt_drop_file_write(); harmless, if somewhat cringeworthy.
> Reordering could be avoided either by adding an explicit scope or by making
> mnt_drop_file_write() called via __cleanup.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 32/39] convert vfs_dedupe_file_range().
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (29 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 31/39] convert cifs_ioctl_copychunk() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:42 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 33/39] convert do_select() viro
` (7 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
fdput() is followed by checking fatal_signal_pending() (and aborting
the loop in such case). fdput() is transposable with that check.
Yes, it'll probably end up with slightly fatter code (call after the
check has returned false + call on the almost never taken out-of-line path
instead of one call before the check), but it's not worth bothering with
explicit extra scope there (or dragging the check into the loop condition,
for that matter).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/remap_range.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/fs/remap_range.c b/fs/remap_range.c
index 017d0d1ea6c9..26afbbbfb10c 100644
--- a/fs/remap_range.c
+++ b/fs/remap_range.c
@@ -536,7 +536,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
}
for (i = 0, info = same->info; i < count; i++, info++) {
- struct fd dst_fd = fdget(info->dest_fd);
+ CLASS(fd, dst_fd)(info->dest_fd);
if (fd_empty(dst_fd)) {
info->status = -EBADF;
@@ -545,7 +545,7 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
if (info->reserved) {
info->status = -EINVAL;
- goto next_fdput;
+ goto next_loop;
}
deduped = vfs_dedupe_file_range_one(file, off, fd_file(dst_fd),
@@ -558,8 +558,6 @@ int vfs_dedupe_file_range(struct file *file, struct file_dedupe_range *same)
else
info->bytes_deduped = len;
-next_fdput:
- fdput(dst_fd);
next_loop:
if (fatal_signal_pending(current))
break;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 32/39] convert vfs_dedupe_file_range().
2024-07-30 5:16 ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
@ 2024-08-07 10:42 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:42 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:18AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> fdput() is followed by checking fatal_signal_pending() (and aborting
> the loop in such case). fdput() is transposable with that check.
> Yes, it'll probably end up with slightly fatter code (call after the
> check has returned false + call on the almost never taken out-of-line path
> instead of one call before the check), but it's not worth bothering with
> explicit extra scope there (or dragging the check into the loop condition,
> for that matter).
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 33/39] convert do_select()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (30 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 32/39] convert vfs_dedupe_file_range() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:42 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
` (6 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
take the logics from fdget() to fdput() into an inlined helper - with existing
wait_key_set() subsumed into that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/select.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 97e1009dde00..039f81c6f817 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -465,15 +465,22 @@ static int max_select_fd(unsigned long n, fd_set_bits *fds)
EPOLLNVAL)
#define POLLEX_SET (EPOLLPRI | EPOLLNVAL)
-static inline void wait_key_set(poll_table *wait, unsigned long in,
+static inline __poll_t select_poll_one(int fd, poll_table *wait, unsigned long in,
unsigned long out, unsigned long bit,
__poll_t ll_flag)
{
+ CLASS(fd, f)(fd);
+
+ if (fd_empty(f))
+ return EPOLLNVAL;
+
wait->_key = POLLEX_SET | ll_flag;
if (in & bit)
wait->_key |= POLLIN_SET;
if (out & bit)
wait->_key |= POLLOUT_SET;
+
+ return vfs_poll(fd_file(f), wait);
}
static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec64 *end_time)
@@ -525,20 +532,12 @@ static noinline_for_stack int do_select(int n, fd_set_bits *fds, struct timespec
}
for (j = 0; j < BITS_PER_LONG; ++j, ++i, bit <<= 1) {
- struct fd f;
if (i >= n)
break;
if (!(bit & all_bits))
continue;
- mask = EPOLLNVAL;
- f = fdget(i);
- if (fd_file(f)) {
- wait_key_set(wait, in, out, bit,
- busy_flag);
- mask = vfs_poll(fd_file(f), wait);
-
- fdput(f);
- }
+ mask = select_poll_one(i, wait, in, out, bit,
+ busy_flag);
if ((mask & POLLIN_SET) && (in & bit)) {
res_in |= bit;
retval++;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 33/39] convert do_select()
2024-07-30 5:16 ` [PATCH 33/39] convert do_select() viro
@ 2024-08-07 10:42 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:42 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:19AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> take the logics from fdget() to fdput() into an inlined helper - with existing
> wait_key_set() subsumed into that.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 34/39] do_pollfd(): convert to CLASS(fd)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (31 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 33/39] convert do_select() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:43 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 35/39] convert bpf_token_create() viro
` (5 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
lift setting ->revents into the caller, so that failure exits (including
the early one) would be plain returns.
We need the scope of our struct fd to end before the store to ->revents,
since that's shared with the failure exits prior to the point where we
can do fdget().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/select.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/fs/select.c b/fs/select.c
index 039f81c6f817..995b4d9491d3 100644
--- a/fs/select.c
+++ b/fs/select.c
@@ -856,15 +856,14 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
__poll_t busy_flag)
{
int fd = pollfd->fd;
- __poll_t mask = 0, filter;
- struct fd f;
+ __poll_t mask, filter;
if (fd < 0)
- goto out;
- mask = EPOLLNVAL;
- f = fdget(fd);
- if (!fd_file(f))
- goto out;
+ return 0;
+
+ CLASS(fd, f)(fd);
+ if (fd_empty(f))
+ return EPOLLNVAL;
/* userland u16 ->events contains POLL... bitmap */
filter = demangle_poll(pollfd->events) | EPOLLERR | EPOLLHUP;
@@ -872,13 +871,7 @@ static inline __poll_t do_pollfd(struct pollfd *pollfd, poll_table *pwait,
mask = vfs_poll(fd_file(f), pwait);
if (mask & busy_flag)
*can_busy_poll = true;
- mask &= filter; /* Mask out unneeded events. */
- fdput(f);
-
-out:
- /* ... and so does ->revents */
- pollfd->revents = mangle_poll(mask);
- return mask;
+ return mask & filter; /* Mask out unneeded events. */
}
static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
@@ -910,6 +903,7 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
pfd = walk->entries;
pfd_end = pfd + walk->len;
for (; pfd != pfd_end; pfd++) {
+ __poll_t mask;
/*
* Fish for events. If we found one, record it
* and kill poll_table->_qproc, so we don't
@@ -917,8 +911,9 @@ static int do_poll(struct poll_list *list, struct poll_wqueues *wait,
* this. They'll get immediately deregistered
* when we break out and return.
*/
- if (do_pollfd(pfd, pt, &can_busy_loop,
- busy_flag)) {
+ mask = do_pollfd(pfd, pt, &can_busy_loop, busy_flag);
+ pfd->revents = mangle_poll(mask);
+ if (mask) {
count++;
pt->_qproc = NULL;
/* found something, stop busy polling */
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 34/39] do_pollfd(): convert to CLASS(fd)
2024-07-30 5:16 ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
@ 2024-08-07 10:43 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:43 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:20AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> lift setting ->revents into the caller, so that failure exits (including
> the early one) would be plain returns.
>
> We need the scope of our struct fd to end before the store to ->revents,
> since that's shared with the failure exits prior to the point where we
> can do fdget().
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 35/39] convert bpf_token_create()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (32 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 34/39] do_pollfd(): convert to CLASS(fd) viro
@ 2024-07-30 5:16 ` viro
2024-08-06 22:42 ` Andrii Nakryiko
2024-08-07 10:44 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
` (4 subsequent siblings)
38 siblings, 2 replies; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
keep file reference through the entire thing, don't bother with
grabbing struct path reference (except, for now, around the LSM
call and that only until it gets constified) and while we are
at it, don't confuse the hell out of readers by random mix of
path.dentry->d_sb and path.mnt->mnt_sb uses - these two are equal,
so just put one of those into a local variable and use that.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
kernel/bpf/token.c | 69 +++++++++++++++++-----------------------------
1 file changed, 26 insertions(+), 43 deletions(-)
diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
index 9b92cb886d49..15da405d8302 100644
--- a/kernel/bpf/token.c
+++ b/kernel/bpf/token.c
@@ -116,67 +116,52 @@ int bpf_token_create(union bpf_attr *attr)
struct user_namespace *userns;
struct inode *inode;
struct file *file;
+ CLASS(fd, f)(attr->token_create.bpffs_fd);
struct path path;
- struct fd f;
+ struct super_block *sb;
umode_t mode;
int err, fd;
- f = fdget(attr->token_create.bpffs_fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
path = fd_file(f)->f_path;
- path_get(&path);
- fdput(f);
+ sb = path.dentry->d_sb;
- if (path.dentry != path.mnt->mnt_sb->s_root) {
- err = -EINVAL;
- goto out_path;
- }
- if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
- err = -EINVAL;
- goto out_path;
- }
+ if (path.dentry != sb->s_root)
+ return -EINVAL;
+ if (sb->s_op != &bpf_super_ops)
+ return -EINVAL;
err = path_permission(&path, MAY_ACCESS);
if (err)
- goto out_path;
+ return err;
- userns = path.dentry->d_sb->s_user_ns;
+ userns = sb->s_user_ns;
/*
* Enforce that creators of BPF tokens are in the same user
* namespace as the BPF FS instance. This makes reasoning about
* permissions a lot easier and we can always relax this later.
*/
- if (current_user_ns() != userns) {
- err = -EPERM;
- goto out_path;
- }
- if (!ns_capable(userns, CAP_BPF)) {
- err = -EPERM;
- goto out_path;
- }
+ if (current_user_ns() != userns)
+ return -EPERM;
+ if (!ns_capable(userns, CAP_BPF))
+ return -EPERM;
/* Creating BPF token in init_user_ns doesn't make much sense. */
- if (current_user_ns() == &init_user_ns) {
- err = -EOPNOTSUPP;
- goto out_path;
- }
+ if (current_user_ns() == &init_user_ns)
+ return -EOPNOTSUPP;
- mnt_opts = path.dentry->d_sb->s_fs_info;
+ mnt_opts = sb->s_fs_info;
if (mnt_opts->delegate_cmds == 0 &&
mnt_opts->delegate_maps == 0 &&
mnt_opts->delegate_progs == 0 &&
- mnt_opts->delegate_attachs == 0) {
- err = -ENOENT; /* no BPF token delegation is set up */
- goto out_path;
- }
+ mnt_opts->delegate_attachs == 0)
+ return -ENOENT; /* no BPF token delegation is set up */
mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
- inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
- if (IS_ERR(inode)) {
- err = PTR_ERR(inode);
- goto out_path;
- }
+ inode = bpf_get_inode(sb, NULL, mode);
+ if (IS_ERR(inode))
+ return PTR_ERR(inode);
inode->i_op = &bpf_token_iops;
inode->i_fop = &bpf_token_fops;
@@ -185,8 +170,7 @@ int bpf_token_create(union bpf_attr *attr)
file = alloc_file_pseudo(inode, path.mnt, BPF_TOKEN_INODE_NAME, O_RDWR, &bpf_token_fops);
if (IS_ERR(file)) {
iput(inode);
- err = PTR_ERR(file);
- goto out_path;
+ return PTR_ERR(file);
}
token = kzalloc(sizeof(*token), GFP_USER);
@@ -205,7 +189,9 @@ int bpf_token_create(union bpf_attr *attr)
token->allowed_progs = mnt_opts->delegate_progs;
token->allowed_attachs = mnt_opts->delegate_attachs;
- err = security_bpf_token_create(token, attr, &path);
+ path_get(&path); // kill it
+ err = security_bpf_token_create(token, attr, &path); // constify
+ path_put(&path); // kill it
if (err)
goto out_token;
@@ -218,15 +204,12 @@ int bpf_token_create(union bpf_attr *attr)
file->private_data = token;
fd_install(fd, file);
- path_put(&path);
return fd;
out_token:
bpf_token_free(token);
out_file:
fput(file);
-out_path:
- path_put(&path);
return err;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 35/39] convert bpf_token_create()
2024-07-30 5:16 ` [PATCH 35/39] convert bpf_token_create() viro
@ 2024-08-06 22:42 ` Andrii Nakryiko
2024-08-10 3:46 ` Al Viro
2024-08-07 10:44 ` Christian Brauner
1 sibling, 1 reply; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-06 22:42 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Mon, Jul 29, 2024 at 10:27 PM <viro@kernel.org> wrote:
>
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> keep file reference through the entire thing, don't bother with
> grabbing struct path reference (except, for now, around the LSM
> call and that only until it gets constified) and while we are
> at it, don't confuse the hell out of readers by random mix of
> path.dentry->d_sb and path.mnt->mnt_sb uses - these two are equal,
> so just put one of those into a local variable and use that.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> kernel/bpf/token.c | 69 +++++++++++++++++-----------------------------
> 1 file changed, 26 insertions(+), 43 deletions(-)
>
LGTM overall (modulo // comments, but see below)
Acked-by: Andrii Nakryiko <andrii@kernel.org>
> diff --git a/kernel/bpf/token.c b/kernel/bpf/token.c
> index 9b92cb886d49..15da405d8302 100644
> --- a/kernel/bpf/token.c
> +++ b/kernel/bpf/token.c
> @@ -116,67 +116,52 @@ int bpf_token_create(union bpf_attr *attr)
[...]
> - err = security_bpf_token_create(token, attr, &path);
> + path_get(&path); // kill it
> + err = security_bpf_token_create(token, attr, &path); // constify
> + path_put(&path); // kill it
> if (err)
> goto out_token;
>
By constify you mean something like below?
commit 06a6442ca9cc441805881eea61fd57d7defadaca
Author: Andrii Nakryiko <andrii@kernel.org>
Date: Tue Aug 6 15:38:12 2024 -0700
security: constify struct path in bpf_token_create() LSM hook
There is no reason why struct path pointer shouldn't be const-qualified
when being passed into bpf_token_create() LSM hook. Add that const.
Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 855db460e08b..462b55378241 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -431,7 +431,7 @@ LSM_HOOK(int, 0, bpf_prog_load, struct bpf_prog
*prog, union bpf_attr *attr,
struct bpf_token *token)
LSM_HOOK(void, LSM_RET_VOID, bpf_prog_free, struct bpf_prog *prog)
LSM_HOOK(int, 0, bpf_token_create, struct bpf_token *token, union
bpf_attr *attr,
- struct path *path)
+ const struct path *path)
LSM_HOOK(void, LSM_RET_VOID, bpf_token_free, struct bpf_token *token)
LSM_HOOK(int, 0, bpf_token_cmd, const struct bpf_token *token, enum
bpf_cmd cmd)
LSM_HOOK(int, 0, bpf_token_capable, const struct bpf_token *token, int cap)
diff --git a/include/linux/security.h b/include/linux/security.h
index 1390f1efb4f0..31523a2c71c4 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -2137,7 +2137,7 @@ extern int security_bpf_prog_load(struct
bpf_prog *prog, union bpf_attr *attr,
struct bpf_token *token);
extern void security_bpf_prog_free(struct bpf_prog *prog);
extern int security_bpf_token_create(struct bpf_token *token, union
bpf_attr *attr,
- struct path *path);
+ const struct path *path);
extern void security_bpf_token_free(struct bpf_token *token);
extern int security_bpf_token_cmd(const struct bpf_token *token, enum
bpf_cmd cmd);
extern int security_bpf_token_capable(const struct bpf_token *token, int cap);
@@ -2177,7 +2177,7 @@ static inline void security_bpf_prog_free(struct
bpf_prog *prog)
{ }
static inline int security_bpf_token_create(struct bpf_token *token,
union bpf_attr *attr,
- struct path *path)
+ const struct path *path)
{
return 0;
}
diff --git a/security/security.c b/security/security.c
index 8cee5b6c6e6d..d8d0b67ced25 100644
--- a/security/security.c
+++ b/security/security.c
@@ -5510,7 +5510,7 @@ int security_bpf_prog_load(struct bpf_prog
*prog, union bpf_attr *attr,
* Return: Returns 0 on success, error on failure.
*/
int security_bpf_token_create(struct bpf_token *token, union bpf_attr *attr,
- struct path *path)
+ const struct path *path)
{
return call_int_hook(bpf_token_create, token, attr, path);
}
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 55c78c318ccd..0eec141a8f37 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -6965,7 +6965,7 @@ static void selinux_bpf_prog_free(struct bpf_prog *prog)
}
static int selinux_bpf_token_create(struct bpf_token *token, union
bpf_attr *attr,
- struct path *path)
+ const struct path *path)
{
struct bpf_security_struct *bpfsec;
[...]
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 35/39] convert bpf_token_create()
2024-08-06 22:42 ` Andrii Nakryiko
@ 2024-08-10 3:46 ` Al Viro
2024-08-12 20:06 ` Andrii Nakryiko
0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-08-10 3:46 UTC (permalink / raw)
To: Andrii Nakryiko
Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue, Aug 06, 2024 at 03:42:56PM -0700, Andrii Nakryiko wrote:
> By constify you mean something like below?
Yep. Should go through LSM folks, probably, and once it's in
those path_get() and path_put() around the call can go to hell,
along with 'path' itself (we can use file->f_path instead).
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 35/39] convert bpf_token_create()
2024-08-10 3:46 ` Al Viro
@ 2024-08-12 20:06 ` Andrii Nakryiko
0 siblings, 0 replies; 134+ messages in thread
From: Andrii Nakryiko @ 2024-08-12 20:06 UTC (permalink / raw)
To: Al Viro
Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Fri, Aug 9, 2024 at 8:46 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Aug 06, 2024 at 03:42:56PM -0700, Andrii Nakryiko wrote:
>
> > By constify you mean something like below?
>
> Yep. Should go through LSM folks, probably, and once it's in
> those path_get() and path_put() around the call can go to hell,
> along with 'path' itself (we can use file->f_path instead).
Ok, cool. Let's then do that branch you proposed, and I can send
everything on top of it, removing path_get()+path_put() you add here.
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 35/39] convert bpf_token_create()
2024-07-30 5:16 ` [PATCH 35/39] convert bpf_token_create() viro
2024-08-06 22:42 ` Andrii Nakryiko
@ 2024-08-07 10:44 ` Christian Brauner
1 sibling, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:44 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:21AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> keep file reference through the entire thing, don't bother with
> grabbing struct path reference (except, for now, around the LSM
> call and that only until it gets constified) and while we are
> at it, don't confuse the hell out of readers by random mix of
> path.dentry->d_sb and path.mnt->mnt_sb uses - these two are equal,
> so just put one of those into a local variable and use that.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (33 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 35/39] convert bpf_token_create() viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:46 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 37/39] memcg_write_event_control(): switch " viro
` (3 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
in all of those failure exits prior to fdget() are plain returns and
the only thing done after fdput() is (on failure exits) a kfree(),
which can be done before fdput() just fine.
NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for
eventfd_ctx_fileget() failure (we only want fdput() there) and once
we stop doing that, it doesn't need to check if eventfd is NULL or
ERR_PTR(...) there.
NOTE: in privcmd we move fdget() up before the allocation - more
to the point, before the copy_from_user() attempt.
[trivial conflict in privcmd]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/vfio/virqfd.c | 16 +++-------------
drivers/virt/acrn/irqfd.c | 13 ++++---------
drivers/xen/privcmd.c | 17 ++++-------------
virt/kvm/eventfd.c | 15 +++------------
4 files changed, 14 insertions(+), 47 deletions(-)
diff --git a/drivers/vfio/virqfd.c b/drivers/vfio/virqfd.c
index d22881245e89..aa2891f97508 100644
--- a/drivers/vfio/virqfd.c
+++ b/drivers/vfio/virqfd.c
@@ -113,7 +113,6 @@ int vfio_virqfd_enable(void *opaque,
void (*thread)(void *, void *),
void *data, struct virqfd **pvirqfd, int fd)
{
- struct fd irqfd;
struct eventfd_ctx *ctx;
struct virqfd *virqfd;
int ret = 0;
@@ -133,8 +132,8 @@ int vfio_virqfd_enable(void *opaque,
INIT_WORK(&virqfd->inject, virqfd_inject);
INIT_WORK(&virqfd->flush_inject, virqfd_flush_inject);
- irqfd = fdget(fd);
- if (!fd_file(irqfd)) {
+ CLASS(fd, irqfd)(fd);
+ if (fd_empty(irqfd)) {
ret = -EBADF;
goto err_fd;
}
@@ -142,7 +141,7 @@ int vfio_virqfd_enable(void *opaque,
ctx = eventfd_ctx_fileget(fd_file(irqfd));
if (IS_ERR(ctx)) {
ret = PTR_ERR(ctx);
- goto err_ctx;
+ goto err_fd;
}
virqfd->eventfd = ctx;
@@ -181,18 +180,9 @@ int vfio_virqfd_enable(void *opaque,
if ((!handler || handler(opaque, data)) && thread)
schedule_work(&virqfd->inject);
}
-
- /*
- * Do not drop the file until the irqfd is fully initialized,
- * otherwise we might race against the EPOLLHUP.
- */
- fdput(irqfd);
-
return 0;
err_busy:
eventfd_ctx_put(ctx);
-err_ctx:
- fdput(irqfd);
err_fd:
kfree(virqfd);
diff --git a/drivers/virt/acrn/irqfd.c b/drivers/virt/acrn/irqfd.c
index 9994d818bb7e..b7da24ca1475 100644
--- a/drivers/virt/acrn/irqfd.c
+++ b/drivers/virt/acrn/irqfd.c
@@ -112,7 +112,6 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
struct eventfd_ctx *eventfd = NULL;
struct hsm_irqfd *irqfd, *tmp;
__poll_t events;
- struct fd f;
int ret = 0;
irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
@@ -124,8 +123,8 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
INIT_LIST_HEAD(&irqfd->list);
INIT_WORK(&irqfd->shutdown, hsm_irqfd_shutdown_work);
- f = fdget(args->fd);
- if (!fd_file(f)) {
+ CLASS(fd, f)(args->fd);
+ if (fd_empty(f)) {
ret = -EBADF;
goto out;
}
@@ -133,7 +132,7 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
eventfd = eventfd_ctx_fileget(fd_file(f));
if (IS_ERR(eventfd)) {
ret = PTR_ERR(eventfd);
- goto fail;
+ goto out;
}
irqfd->eventfd = eventfd;
@@ -162,13 +161,9 @@ static int acrn_irqfd_assign(struct acrn_vm *vm, struct acrn_irqfd *args)
if (events & EPOLLIN)
acrn_irqfd_inject(irqfd);
- fdput(f);
return 0;
fail:
- if (eventfd && !IS_ERR(eventfd))
- eventfd_ctx_put(eventfd);
-
- fdput(f);
+ eventfd_ctx_put(eventfd);
out:
kfree(irqfd);
return ret;
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index ba02b732fa49..8a5bdf1f3050 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -939,10 +939,11 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
struct privcmd_kernel_irqfd *kirqfd, *tmp;
unsigned long flags;
__poll_t events;
- struct fd f;
void *dm_op;
int ret, idx;
+ CLASS(fd, f)(irqfd->fd);
+
kirqfd = kzalloc(sizeof(*kirqfd) + irqfd->size, GFP_KERNEL);
if (!kirqfd)
return -ENOMEM;
@@ -958,8 +959,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
kirqfd->dom = irqfd->dom;
INIT_WORK(&kirqfd->shutdown, irqfd_shutdown);
- f = fdget(irqfd->fd);
- if (!fd_file(f)) {
+ if (fd_empty(f)) {
ret = -EBADF;
goto error_kfree;
}
@@ -967,7 +967,7 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
kirqfd->eventfd = eventfd_ctx_fileget(fd_file(f));
if (IS_ERR(kirqfd->eventfd)) {
ret = PTR_ERR(kirqfd->eventfd);
- goto error_fd_put;
+ goto error_kfree;
}
/*
@@ -1000,20 +1000,11 @@ static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
irqfd_inject(kirqfd);
srcu_read_unlock(&irqfds_srcu, idx);
-
- /*
- * Do not drop the file until the kirqfd is fully initialized, otherwise
- * we might race against the EPOLLHUP.
- */
- fdput(f);
return 0;
error_eventfd:
eventfd_ctx_put(kirqfd->eventfd);
-error_fd_put:
- fdput(f);
-
error_kfree:
kfree(kirqfd);
return ret;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 65efb3735e79..70bc0d1f5f6a 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,6 @@ static int
kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
{
struct kvm_kernel_irqfd *irqfd, *tmp;
- struct fd f;
struct eventfd_ctx *eventfd = NULL, *resamplefd = NULL;
int ret;
__poll_t events;
@@ -326,8 +325,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
seqcount_spinlock_init(&irqfd->irq_entry_sc, &kvm->irqfds.lock);
- f = fdget(args->fd);
- if (!fd_file(f)) {
+ CLASS(fd, f)(args->fd);
+ if (fd_empty(f)) {
ret = -EBADF;
goto out;
}
@@ -335,7 +334,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
eventfd = eventfd_ctx_fileget(fd_file(f));
if (IS_ERR(eventfd)) {
ret = PTR_ERR(eventfd);
- goto fail;
+ goto out;
}
irqfd->eventfd = eventfd;
@@ -439,12 +438,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
#endif
srcu_read_unlock(&kvm->irq_srcu, idx);
-
- /*
- * do not drop the file until the irqfd is fully initialized, otherwise
- * we might race against the EPOLLHUP
- */
- fdput(f);
return 0;
fail:
@@ -457,8 +450,6 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
if (eventfd && !IS_ERR(eventfd))
eventfd_ctx_put(eventfd);
- fdput(f);
-
out:
kfree(irqfd);
return ret;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd)
2024-07-30 5:16 ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
@ 2024-08-07 10:46 ` Christian Brauner
2024-08-10 3:53 ` Al Viro
0 siblings, 1 reply; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:46 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:22AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> in all of those failure exits prior to fdget() are plain returns and
> the only thing done after fdput() is (on failure exits) a kfree(),
They could also be converted to:
struct virqfd *virqfd __free(kfree) = NULL;
and then direct returns are usable.
> which can be done before fdput() just fine.
>
> NOTE: in acrn_irqfd_assign() 'fail:' failure exit is wrong for
> eventfd_ctx_fileget() failure (we only want fdput() there) and once
> we stop doing that, it doesn't need to check if eventfd is NULL or
> ERR_PTR(...) there.
>
> NOTE: in privcmd we move fdget() up before the allocation - more
> to the point, before the copy_from_user() attempt.
>
> [trivial conflict in privcmd]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd)
2024-08-07 10:46 ` Christian Brauner
@ 2024-08-10 3:53 ` Al Viro
0 siblings, 0 replies; 134+ messages in thread
From: Al Viro @ 2024-08-10 3:53 UTC (permalink / raw)
To: Christian Brauner
Cc: viro, linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev,
torvalds
On Wed, Aug 07, 2024 at 12:46:35PM +0200, Christian Brauner wrote:
> On Tue, Jul 30, 2024 at 01:16:22AM GMT, viro@kernel.org wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > in all of those failure exits prior to fdget() are plain returns and
> > the only thing done after fdput() is (on failure exits) a kfree(),
>
> They could also be converted to:
>
> struct virqfd *virqfd __free(kfree) = NULL;
>
> and then direct returns are usable.
No. They could be converted, but kfree() is *not* the right
destructor for that thing. This is a good example of the
reasons why __cleanup should not be used blindly - this
"oh, we'll just return, this object will be taken care of
automagically" would better really take care of the object.
Look for goto error_eventfd; in there...
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 37/39] memcg_write_event_control(): switch to CLASS(fd)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (34 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 36/39] assorted variants of irqfd setup: convert to CLASS(fd) viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:47 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
` (2 subsequent siblings)
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
some reordering required - take both fdget() to the point before
the allocations, with matching move of fdput() to the very end
of failure exit(s); after that it converts trivially.
simplify the cleanups that involve css_put(), while we are at it...
[stuff moved in mainline]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
mm/memcontrol-v1.c | 44 +++++++++++++++-----------------------------
1 file changed, 15 insertions(+), 29 deletions(-)
diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index 9725c731fb21..478bb3130c1b 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1824,8 +1824,6 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
struct mem_cgroup_event *event;
struct cgroup_subsys_state *cfile_css;
unsigned int efd, cfd;
- struct fd efile;
- struct fd cfile;
struct dentry *cdentry;
const char *name;
char *endp;
@@ -1849,6 +1847,12 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
else
return -EINVAL;
+ CLASS(fd, efile)(efd);
+ if (fd_empty(efile))
+ return -EBADF;
+
+ CLASS(fd, cfile)(cfd);
+
event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event)
return -ENOMEM;
@@ -1859,20 +1863,13 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
init_waitqueue_func_entry(&event->wait, memcg_event_wake);
INIT_WORK(&event->remove, memcg_event_remove);
- efile = fdget(efd);
- if (!fd_file(efile)) {
- ret = -EBADF;
- goto out_kfree;
- }
-
event->eventfd = eventfd_ctx_fileget(fd_file(efile));
if (IS_ERR(event->eventfd)) {
ret = PTR_ERR(event->eventfd);
- goto out_put_efile;
+ goto out_kfree;
}
- cfile = fdget(cfd);
- if (!fd_file(cfile)) {
+ if (fd_empty(cfile)) {
ret = -EBADF;
goto out_put_eventfd;
}
@@ -1881,7 +1878,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
/* AV: shouldn't we check that it's been opened for read instead? */
ret = file_permission(fd_file(cfile), MAY_READ);
if (ret < 0)
- goto out_put_cfile;
+ goto out_put_eventfd;
/*
* The control file must be a regular cgroup1 file. As a regular cgroup
@@ -1890,7 +1887,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
cdentry = fd_file(cfile)->f_path.dentry;
if (cdentry->d_sb->s_type != &cgroup_fs_type || !d_is_reg(cdentry)) {
ret = -EINVAL;
- goto out_put_cfile;
+ goto out_put_eventfd;
}
/*
@@ -1917,7 +1914,7 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
event->unregister_event = memsw_cgroup_usage_unregister_event;
} else {
ret = -EINVAL;
- goto out_put_cfile;
+ goto out_put_eventfd;
}
/*
@@ -1929,11 +1926,9 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
&memory_cgrp_subsys);
ret = -EINVAL;
if (IS_ERR(cfile_css))
- goto out_put_cfile;
- if (cfile_css != css) {
- css_put(cfile_css);
- goto out_put_cfile;
- }
+ goto out_put_eventfd;
+ if (cfile_css != css)
+ goto out_put_css;
ret = event->register_event(memcg, event->eventfd, buf);
if (ret)
@@ -1944,23 +1939,14 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
spin_lock_irq(&memcg->event_list_lock);
list_add(&event->list, &memcg->event_list);
spin_unlock_irq(&memcg->event_list_lock);
-
- fdput(cfile);
- fdput(efile);
-
return nbytes;
out_put_css:
- css_put(css);
-out_put_cfile:
- fdput(cfile);
+ css_put(cfile_css);
out_put_eventfd:
eventfd_ctx_put(event->eventfd);
-out_put_efile:
- fdput(efile);
out_kfree:
kfree(event);
-
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 37/39] memcg_write_event_control(): switch to CLASS(fd)
2024-07-30 5:16 ` [PATCH 37/39] memcg_write_event_control(): switch " viro
@ 2024-08-07 10:47 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:47 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:23AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> some reordering required - take both fdget() to the point before
> the allocations, with matching move of fdput() to the very end
> of failure exit(s); after that it converts trivially.
>
> simplify the cleanups that involve css_put(), while we are at it...
>
> [stuff moved in mainline]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...)
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (35 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 37/39] memcg_write_event_control(): switch " viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:47 ` Christian Brauner
2024-07-30 5:16 ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
2024-07-30 7:13 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
reference acquired there by fget_raw() is not stashed anywhere -
we could as well borrow instead.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
kernel/cgroup/cgroup.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1244a8c8878e..0838301e4f65 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6409,7 +6409,6 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
struct cgroup *dst_cgrp = NULL;
struct css_set *cset;
struct super_block *sb;
- struct file *f;
if (kargs->flags & CLONE_INTO_CGROUP)
cgroup_lock();
@@ -6426,14 +6425,14 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
return 0;
}
- f = fget_raw(kargs->cgroup);
- if (!f) {
+ CLASS(fd_raw, f)(kargs->cgroup);
+ if (fd_empty(f)) {
ret = -EBADF;
goto err;
}
- sb = f->f_path.dentry->d_sb;
+ sb = fd_file(f)->f_path.dentry->d_sb;
- dst_cgrp = cgroup_get_from_file(f);
+ dst_cgrp = cgroup_get_from_file(fd_file(f));
if (IS_ERR(dst_cgrp)) {
ret = PTR_ERR(dst_cgrp);
dst_cgrp = NULL;
@@ -6481,15 +6480,12 @@ static int cgroup_css_set_fork(struct kernel_clone_args *kargs)
}
put_css_set(cset);
- fput(f);
kargs->cgrp = dst_cgrp;
return ret;
err:
cgroup_threadgroup_change_end(current);
cgroup_unlock();
- if (f)
- fput(f);
if (dst_cgrp)
cgroup_put(dst_cgrp);
put_css_set(cset);
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...)
2024-07-30 5:16 ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
@ 2024-08-07 10:47 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:47 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:24AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> reference acquired there by fget_raw() is not stashed anywhere -
> we could as well borrow instead.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* [PATCH 39/39] deal with the last remaing boolean uses of fd_file()
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (36 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 38/39] css_set_fork(): switch to CLASS(fd_raw, ...) viro
@ 2024-07-30 5:16 ` viro
2024-08-07 10:48 ` Christian Brauner
2024-07-30 7:13 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
38 siblings, 1 reply; 134+ messages in thread
From: viro @ 2024-07-30 5:16 UTC (permalink / raw)
To: linux-fsdevel; +Cc: amir73il, bpf, brauner, cgroups, kvm, netdev, torvalds
From: Al Viro <viro@zeniv.linux.org.uk>
[one added in fs/stat.c]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
drivers/infiniband/core/uverbs_cmd.c | 8 +++-----
fs/stat.c | 2 +-
include/linux/cleanup.h | 2 +-
sound/core/pcm_native.c | 2 +-
4 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a4cce360df21..66b02fbf077a 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -584,7 +584,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
if (cmd.fd != -1) {
/* search for file descriptor */
f = fdget(cmd.fd);
- if (!fd_file(f)) {
+ if (fd_empty(f)) {
ret = -EBADF;
goto err_tree_mutex_unlock;
}
@@ -632,8 +632,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
atomic_inc(&xrcd->usecnt);
}
- if (fd_file(f))
- fdput(f);
+ fdput(f);
mutex_unlock(&ibudev->xrcd_tree_mutex);
uobj_finalize_uobj_create(&obj->uobject, attrs);
@@ -648,8 +647,7 @@ static int ib_uverbs_open_xrcd(struct uverbs_attr_bundle *attrs)
uobj_alloc_abort(&obj->uobject, attrs);
err_tree_mutex_unlock:
- if (fd_file(f))
- fdput(f);
+ fdput(f);
mutex_unlock(&ibudev->xrcd_tree_mutex);
diff --git a/fs/stat.c b/fs/stat.c
index 56d6ce2b2c79..b7be119b070f 100644
--- a/fs/stat.c
+++ b/fs/stat.c
@@ -273,7 +273,7 @@ static int vfs_statx_fd(int fd, int flags, struct kstat *stat,
u32 request_mask)
{
CLASS(fd_raw, f)(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADF;
return vfs_statx_path(&fd_file(f)->f_path, flags, stat, request_mask);
}
diff --git a/include/linux/cleanup.h b/include/linux/cleanup.h
index a3d3e888cf1f..72615212b911 100644
--- a/include/linux/cleanup.h
+++ b/include/linux/cleanup.h
@@ -98,7 +98,7 @@ const volatile void * __must_check_fn(const volatile void *val)
* DEFINE_CLASS(fdget, struct fd, fdput(_T), fdget(fd), int fd)
*
* CLASS(fdget, f)(fd);
- * if (!fd_file(f))
+ * if (fd_empty(f))
* return -EBADF;
*
* // use 'f' without concern
diff --git a/sound/core/pcm_native.c b/sound/core/pcm_native.c
index cbb9c972cb93..320a7637b662 100644
--- a/sound/core/pcm_native.c
+++ b/sound/core/pcm_native.c
@@ -2250,7 +2250,7 @@ static int snd_pcm_link(struct snd_pcm_substream *substream, int fd)
bool nonatomic = substream->pcm->nonatomic;
CLASS(fd, f)(fd);
- if (!fd_file(f))
+ if (fd_empty(f))
return -EBADFD;
if (!is_pcm_file(fd_file(f)))
return -EBADFD;
--
2.39.2
^ permalink raw reply related [flat|nested] 134+ messages in thread* Re: [PATCH 39/39] deal with the last remaing boolean uses of fd_file()
2024-07-30 5:16 ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
@ 2024-08-07 10:48 ` Christian Brauner
0 siblings, 0 replies; 134+ messages in thread
From: Christian Brauner @ 2024-08-07 10:48 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, cgroups, kvm, netdev, torvalds
On Tue, Jul 30, 2024 at 01:16:25AM GMT, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> [one added in fs/stat.c]
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 134+ messages in thread
* Re: [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
2024-07-30 5:15 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops viro
` (37 preceding siblings ...)
2024-07-30 5:16 ` [PATCH 39/39] deal with the last remaing boolean uses of fd_file() viro
@ 2024-07-30 7:13 ` Michal Hocko
2024-07-30 7:18 ` Al Viro
38 siblings, 1 reply; 134+ messages in thread
From: Michal Hocko @ 2024-07-30 7:13 UTC (permalink / raw)
To: viro; +Cc: linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue 30-07-24 01:15:47, viro@kernel.org wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
>
> we are *not* guaranteed that anything past the terminating NUL
> is mapped (let alone initialized with anything sane).
>
> [the sucker got moved in mainline]
>
You could have preserved
Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
Cc: stable
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
and
Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> mm/memcontrol-v1.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
> index 2aeea4d8bf8e..417c96f2da28 100644
> --- a/mm/memcontrol-v1.c
> +++ b/mm/memcontrol-v1.c
> @@ -1842,9 +1842,12 @@ static ssize_t memcg_write_event_control(struct kernfs_open_file *of,
> buf = endp + 1;
>
> cfd = simple_strtoul(buf, &endp, 10);
> - if ((*endp != ' ') && (*endp != '\0'))
> + if (*endp == '\0')
> + buf = endp;
> + else if (*endp == ' ')
> + buf = endp + 1;
> + else
> return -EINVAL;
> - buf = endp + 1;
>
> event = kzalloc(sizeof(*event), GFP_KERNEL);
> if (!event)
> --
> 2.39.2
>
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
2024-07-30 7:13 ` [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops Michal Hocko
@ 2024-07-30 7:18 ` Al Viro
2024-07-30 7:37 ` Michal Hocko
0 siblings, 1 reply; 134+ messages in thread
From: Al Viro @ 2024-07-30 7:18 UTC (permalink / raw)
To: Michal Hocko
Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
On Tue, Jul 30, 2024 at 09:13:38AM +0200, Michal Hocko wrote:
> On Tue 30-07-24 01:15:47, viro@kernel.org wrote:
> > From: Al Viro <viro@zeniv.linux.org.uk>
> >
> > we are *not* guaranteed that anything past the terminating NUL
> > is mapped (let alone initialized with anything sane).
> >
> > [the sucker got moved in mainline]
> >
>
> You could have preserved
> Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
> Cc: stable
>
> > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
>
> and
> Acked-by: Michal Hocko <mhocko@suse.com>
Will do; FWIW, I think it would be better off going via the
cgroup tree - it's completely orthogonal to the rest of the
series, the only relation being "got caught during the same
audit"...
^ permalink raw reply [flat|nested] 134+ messages in thread* Re: [PATCH 01/39] memcg_write_event_control(): fix a user-triggerable oops
2024-07-30 7:18 ` Al Viro
@ 2024-07-30 7:37 ` Michal Hocko
0 siblings, 0 replies; 134+ messages in thread
From: Michal Hocko @ 2024-07-30 7:37 UTC (permalink / raw)
To: Al Viro, Andrew Morton
Cc: viro, linux-fsdevel, amir73il, bpf, brauner, cgroups, kvm, netdev,
torvalds
I think it can go through Andrew as well. The patch is
https://lore.kernel.org/all/20240730051625.14349-1-viro@kernel.org/T/#u
On Tue 30-07-24 08:18:51, Al Viro wrote:
> On Tue, Jul 30, 2024 at 09:13:38AM +0200, Michal Hocko wrote:
> > On Tue 30-07-24 01:15:47, viro@kernel.org wrote:
> > > From: Al Viro <viro@zeniv.linux.org.uk>
> > >
> > > we are *not* guaranteed that anything past the terminating NUL
> > > is mapped (let alone initialized with anything sane).
> > >
> > > [the sucker got moved in mainline]
> > >
> >
> > You could have preserved
> > Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications")
> > Cc: stable
> >
> > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> >
> > and
> > Acked-by: Michal Hocko <mhocko@suse.com>
>
> Will do; FWIW, I think it would be better off going via the
> cgroup tree - it's completely orthogonal to the rest of the
> series, the only relation being "got caught during the same
> audit"...
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 134+ messages in thread